mech->content match regex howto

--000e0cd29c923afee30480baeb8f
Content-Type: text/plain; charset=ISO-8859-1

Hi,

I am trying to understand WWW::Mechanize

I understand that the downloaded content is stored in content().
Why am I not able to use a regex on it in scalar form?

------code------

use strict;
use warnings;
use WWW::Mechanize;

my $mech = WWW::Mechanize->new();
$mech->get("http://checkip.dyndns.org");
my $last_page = $mech->content(); # last page fetched

# this works if I store content in an array [at] last_page
# for ( [at] last_page ) {
# if ( m/([\d+.]+)/ ) {
# print "$1\n";
# }
# }

# ( my $ip ) = grep/(\d+\.)/, $last_page;

( my $ip = $last_page ) =~ m/([\d+\.]+)/;
print "$ip\n";

------end------

my $ip gets the whole source page as its value.

--
Got it while writing out this post :)
--

Now the question becomes what is the difference between these two?

( my $ip = $last_page ) =~ m/([\d+\.]+)/;

( my $ip ) = ( $last_page ) =~ m/([\d+\.]+)/;

I think the above one is "wrong syntax" for using list context?

Also how can I make grep work?

( my $ip ) = grep/(\d+\.)/, $last_page;

--000e0cd29c923afee30480baeb8f--
raphael.japh [ Mo, 01 März 2010 11:43 ] [ ID #2033726 ]

Re: mech->content match regex howto

raphael() wrote:
> Hi,
>
> I am trying to understand WWW::Mechanize
>
> I understand that the downloaded content is stored in content().
> Why am I not able to use a regex on it in scalar form?
>
> ------code------
>
> use strict;
> use warnings;
> use WWW::Mechanize;
>
> my $mech = WWW::Mechanize->new();
> $mech->get("http://checkip.dyndns.org");
> my $last_page = $mech->content(); # last page fetched
>
> # this works if I store content in an array [at] last_page
> # for ( [at] last_page ) {
> # if ( m/([\d+.]+)/ ) {
> # print "$1\n";
> # }
> # }
>
> # ( my $ip ) = grep/(\d+\.)/, $last_page;
>
> ( my $ip = $last_page ) =~ m/([\d+\.]+)/;
> print "$ip\n";
>
> ------end------
>
> my $ip gets the whole source page as its value.
>
> --
> Got it while writing out this post :)
> --
>
> Now the question becomes what is the difference between these two?
>
> ( my $ip = $last_page ) =~ m/([\d+\.]+)/;
>
> ( my $ip ) = ( $last_page ) =~ m/([\d+\.]+)/;
>
> I think the above one is "wrong syntax" for using list context?
>
> Also how can I make grep work?
>
> ( my $ip ) = grep/(\d+\.)/, $last_page;
>

Try:

my ( $ip ) = $last_page =~ m/([\d\.]+)/;

This will capture the first one. To get more than one:

my [at] ips = $last_page =~ m/([\d\.]+)/g;


grep() works on lists. See `perldoc -f grep` for details.


--
Just my 0.00000002 million dollars worth,
Shawn

Programming is as much about organization and communication
as it is about coding.

I like Perl; it's the only language where you can bless your
thingy.

Eliminate software piracy: use only FLOSS.

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Shawn H Corey [ Mo, 01 März 2010 13:36 ] [ ID #2033729 ]

Re: mech->content match regex howto

--0050450176622a858f0480bd3863
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Mar 1, 2010 at 6:06 PM, Shawn H Corey <shawnhcorey [at] gmail.com> wrote:

> raphael() wrote:
> > Hi,
> >
> > I am trying to understand WWW::Mechanize
> >
> > I understand that the downloaded content is stored in content().
> > Why am I not able to use a regex on it in scalar form?
> >
> > ------code------
> >
> > use strict;
> > use warnings;
> > use WWW::Mechanize;
> >
> > my $mech = WWW::Mechanize->new();
> > $mech->get("http://checkip.dyndns.org");
> > my $last_page = $mech->content(); # last page fetched
> >
> > # this works if I store content in an array [at] last_page
> > # for ( [at] last_page ) {
> > # if ( m/([\d+.]+)/ ) {
> > # print "$1\n";
> > # }
> > # }
> >
> > # ( my $ip ) = grep/(\d+\.)/, $last_page;
> >
> > ( my $ip = $last_page ) =~ m/([\d+\.]+)/;
> > print "$ip\n";
> >
> > ------end------
> >
> > my $ip gets the whole source page as its value.
> >
> > --
> > Got it while writing out this post :)
> > --
> >
> > Now the question becomes what is the difference between these two?
> >
> > ( my $ip = $last_page ) =~ m/([\d+\.]+)/;
> >
> > ( my $ip ) = ( $last_page ) =~ m/([\d+\.]+)/;
> >
> > I think the above one is "wrong syntax" for using list context?
> >
> > Also how can I make grep work?
> >
> > ( my $ip ) = grep/(\d+\.)/, $last_page;
> >
>
> Try:
>
> my ( $ip ) = $last_page =~ m/([\d\.]+)/;
>
> This will capture the first one. To get more than one:
>
> my [at] ips = $last_page =~ m/([\d\.]+)/g;
>
>
> grep() works on lists. See `perldoc -f grep` for details.
>
>
> --
> Just my 0.00000002 million dollars worth,
> Shawn
>
> Programming is as much about organization and communication
> as it is about coding.
>
> I like Perl; it's the only language where you can bless your
> thingy.
>
> Eliminate software piracy: use only FLOSS.
>

Thanks!

grep() works on lists. -- How foolish of me, I knew that but didn't recall
it.
That I think are the perils of being new to programming.

I like Perl too; it's the only language where you can bless your
thingy. It is the first programming language that I am learning.

I picked it up because it looked like shell scripting which I daily used.
But Perl is so much better even if you just know the basics. It leaves shell
scripting way behind.

--0050450176622a858f0480bd3863--
raphael.japh [ Mo, 01 März 2010 14:28 ] [ ID #2033733 ]

Re: mech->content match regex howto

raphael() wrote:
> Hi,

Hello,

> I am trying to understand WWW::Mechanize

Did you also look at these pages:

http://search.cpan.org/~petdance/WWW-Mechanize-1.60/lib/WWW/ Mechanize/Examples.pod
http://search.cpan.org/~petdance/WWW-Mechanize-1.60/lib/WWW/ Mechanize/FAQ.pod
http://search.cpan.org/~petdance/WWW-Mechanize-1.60/lib/WWW/ Mechanize/Cookbook.pod


> I understand that the downloaded content is stored in content().
> Why am I not able to use a regex on it in scalar form?
>
> ------code------
>
> use strict;
> use warnings;
> use WWW::Mechanize;
>
> my $mech = WWW::Mechanize->new();
> $mech->get("http://checkip.dyndns.org");
> my $last_page = $mech->content(); # last page fetched
>
> # this works if I store content in an array [at] last_page
> # for ( [at] last_page ) {
> # if ( m/([\d+.]+)/ ) {
> # print "$1\n";
> # }
> # }

$mech->content() returns a scalar value so that is the same as saying:

if ( $last_page[ 0 ] =~ m/([\d+.]+)/ ) {
print "$1\n";
}


> # ( my $ip ) = grep/(\d+\.)/, $last_page;

grep() returns the list items that match the expression /(\d+\.)/. The
regular expression is only used to determine which items to return, it
has no effect on the content of those items. If you want to effect the
contents of the list then you have to use map() instead.


> ( my $ip = $last_page ) =~ m/([\d+\.]+)/;
> print "$ip\n";
>
> ------end------
>
> my $ip gets the whole source page as its value.
>
> --
> Got it while writing out this post :)
> --
>
> Now the question becomes what is the difference between these two?
>
> ( my $ip = $last_page ) =~ m/([\d+\.]+)/;

That is the same as:

my $ip = $last_page;
$ip =~ m/([\d+\.]+)/;

You are not doing anything with the string stored in $1.

And BTW, '+' is not a valid IP address character.


> ( my $ip ) = ( $last_page ) =~ m/([\d+\.]+)/;

That is equivalent to:

my $ip;
if ( $last_page =~ m/([\d+\.]+)/ ) {
$ip = $1;
}


> I think the above one is "wrong syntax" for using list context?

No, you *have* to use list context or $ip will be assigned the result of
the match operator (true or false) and not the contents of the capturing
parentheses.


> Also how can I make grep work?
>
> ( my $ip ) = grep/(\d+\.)/, $last_page;

You can't, grep() doesn't work that way. What you are looking for is map():

( my $ip ) = map /([\d.]+)/, $last_page;

Or, since you are not actually using a list, use the /g global option to
the match operator:

( my $ip ) = $last_page =~ /[\d.]+/g;

Note that this will return a list of [\d.]+ strings but only the first
one will be stored in $ip and the rest will be discarded.




John
--
The programmer is fighting against the two most
destructive forces in the universe: entropy and
human stupidity. -- Damian Conway

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
jwkrahn [ Mo, 01 März 2010 18:57 ] [ ID #2033736 ]

Re: mech->content match regex howto

--001636b2bc83eed40c0480cc517b
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Mar 1, 2010 at 11:27 PM, John W. Krahn <jwkrahn [at] shaw.ca> wrote:

> raphael() wrote:
>
>> Hi,
>>
>
> Hello,
>
>
> I am trying to understand WWW::Mechanize
>>
>
> Did you also look at these pages:
>
>
> http://search.cpan.org/~petdance/WWW-Mechanize-1.60/lib/WWW/ Mechanize/Examples.pod<http://search.cpan.org/%7Epetdance/WWW-Mechanize-1.60/lib/WWW/Mechanize/Examples.pod>
>
> http://search.cpan.org/~petdance/WWW-Mechanize-1.60/lib/WWW/ Mechanize/FAQ.pod<http://search.cpan.org/%7Epetdance/WWW-Mechanize-1.60/lib/WWW/Mechanize/FAQ.pod>
>
> http://search.cpan.org/~petdance/WWW-Mechanize-1.60/lib/WWW/ Mechanize/Cookbook.pod<http://search.cpan.org/%7Epetdance/WWW-Mechanize-1.60/lib/WWW/Mechanize/Cookbook.pod>
>
>
>
> I understand that the downloaded content is stored in content().
>> Why am I not able to use a regex on it in scalar form?
>>
>> ------code------
>>
>> use strict;
>> use warnings;
>> use WWW::Mechanize;
>>
>> my $mech = WWW::Mechanize->new();
>> $mech->get("http://checkip.dyndns.org");
>> my $last_page = $mech->content(); # last page fetched
>>
>> # this works if I store content in an array [at] last_page
>> # for ( [at] last_page ) {
>> # if ( m/([\d+.]+)/ ) {
>> # print "$1\n";
>> # }
>> # }
>>
>
> $mech->content() returns a scalar value so that is the same as saying:
>
> if ( $last_page[ 0 ] =~ m/([\d+.]+)/ ) {
>
> print "$1\n";
> }
>
>
> # ( my $ip ) = grep/(\d+\.)/, $last_page;
>>
>
> grep() returns the list items that match the expression /(\d+\.)/. The
> regular expression is only used to determine which items to return, it has
> no effect on the content of those items. If you want to effect the contents
> of the list then you have to use map() instead.
>
>
>
> ( my $ip = $last_page ) =~ m/([\d+\.]+)/;
>> print "$ip\n";
>>
>> ------end------
>>
>> my $ip gets the whole source page as its value.
>>
>> --
>> Got it while writing out this post :)
>> --
>>
>> Now the question becomes what is the difference between these two?
>>
>> ( my $ip = $last_page ) =~ m/([\d+\.]+)/;
>>
>
> That is the same as:
>
> my $ip = $last_page;
> $ip =~ m/([\d+\.]+)/;
>
> You are not doing anything with the string stored in $1.
>
> And BTW, '+' is not a valid IP address character.
>
>
>
> ( my $ip ) = ( $last_page ) =~ m/([\d+\.]+)/;
>>
>
> That is equivalent to:
>
> my $ip;
> if ( $last_page =~ m/([\d+\.]+)/ ) {
> $ip = $1;
>
> }
>
>
> I think the above one is "wrong syntax" for using list context?
>>
>
> No, you *have* to use list context or $ip will be assigned the result of
> the match operator (true or false) and not the contents of the capturing
> parentheses.
>
>
>
> Also how can I make grep work?
>>
>> ( my $ip ) = grep/(\d+\.)/, $last_page;
>>
>
> You can't, grep() doesn't work that way. What you are looking for is
> map():
>
> ( my $ip ) = map /([\d.]+)/, $last_page;
>
> Or, since you are not actually using a list, use the /g global option to
> the match operator:
>
> ( my $ip ) = $last_page =~ /[\d.]+/g;
>
> Note that this will return a list of [\d.]+ strings but only the first one
> will be stored in $ip and the rest will be discarded.
>
>
>
>
> John
> --
> The programmer is fighting against the two most
> destructive forces in the universe: entropy and
> human stupidity. -- Damian Conway
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
> For additional commands, e-mail: beginners-help [at] perl.org
> http://learn.perl.org/
>
>
>
Cool! I have to admit that is a "detailed" answer.
Also thanks for clearing out the differences between these two..

( my $ip = $last_page ) =~ m/([\d+\.]+)/;
( my $ip ) = ( $last_page ) =~ m/([\d+\.]+)/;

Just to clear out any misunderstanding "by above one"
I meant ( my $ip = $last_page ) =~ m/([\d+\.]+)/;
Now am I getting this right that this is the *wrong syntax* to get list
context.

Your post was very helpful since I didn't know about parenthesis (or lack of
it)
to capture values.

I always did use parenthesis to capture values like
( my $ip ) = $last_page =~ m/*(*[\d+\.]+*)*/g;

Now I *know* this works
( my $ip ) = $last_page =~ m/[\d+\.]+/g;

Thanks again John.

--001636b2bc83eed40c0480cc517b--
raphael.japh [ Di, 02 März 2010 08:29 ] [ ID #2033849 ]

Re: mech->content match regex howto

raphael() wrote:
>
> Cool! I have to admit that is a "detailed" answer.
> Also thanks for clearing out the differences between these two..
>
> ( my $ip = $last_page ) =~ m/([\d+\.]+)/;
> ( my $ip ) = ( $last_page ) =~ m/([\d+\.]+)/;
>
> Just to clear out any misunderstanding "by above one"
> I meant ( my $ip = $last_page ) =~ m/([\d+\.]+)/;
> Now am I getting this right that this is the *wrong syntax* to get list
> context.

Correct, there is no list context in that statement. The parentheses
are required because the =~ operator has higher precedence than the =
operator.


> Your post was very helpful since I didn't know about parenthesis (or lack of
> it) to capture values.
>
> I always did use parenthesis to capture values like
> ( my $ip ) = $last_page =~ m/*(*[\d+\.]+*)*/g;
>
> Now I *know* this works
> ( my $ip ) = $last_page =~ m/[\d+\.]+/g;

Again, the '+' character is not a valid IP address character so it
should be removed from the character class and the '.' period character
does not need to be escaped inside a character class.

The capturing parentheses are *required* when you only want to return
*part* of a pattern:

( my $ip ) = $last_page =~ /IP: *([\d.]+)/g;

Or when you want to match a single pattern without the /g option:

( my $ip ) = $last_page =~ /([\d.]+)/;




John
--
The programmer is fighting against the two most
destructive forces in the universe: entropy and
human stupidity. -- Damian Conway

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
jwkrahn [ Di, 02 März 2010 09:16 ] [ ID #2033850 ]

Re: mech->content match regex howto

--00504502bd71a8deb30480cd8601
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Mar 2, 2010 at 1:46 PM, John W. Krahn <jwkrahn [at] shaw.ca> wrote:

> raphael() wrote:
>
>>
>> Cool! I have to admit that is a "detailed" answer.
>> Also thanks for clearing out the differences between these two..
>>
>> ( my $ip = $last_page ) =~ m/([\d+\.]+)/;
>> ( my $ip ) = ( $last_page ) =~ m/([\d+\.]+)/;
>>
>> Just to clear out any misunderstanding "by above one"
>> I meant ( my $ip = $last_page ) =~ m/([\d+\.]+)/;
>> Now am I getting this right that this is the *wrong syntax* to get list
>> context.
>>
>
> Correct, there is no list context in that statement. The parentheses are
> required because the =~ operator has higher precedence than the = operator.
>
>
>
> Your post was very helpful since I didn't know about parenthesis (or lack
>> of
>> it) to capture values.
>>
>> I always did use parenthesis to capture values like
>> ( my $ip ) = $last_page =~ m/*(*[\d+\.]+*)*/g;
>>
>> Now I *know* this works
>> ( my $ip ) = $last_page =~ m/[\d+\.]+/g;
>>
>
> Again, the '+' character is not a valid IP address character so it should
> be removed from the character class and the '.' period character does not
> need to be escaped inside a character class.
>
> The capturing parentheses are *required* when you only want to return
> *part* of a pattern:
>
> ( my $ip ) = $last_page =~ /IP: *([\d.]+)/g;
>
> Or when you want to match a single pattern without the /g option:
>
> ( my $ip ) = $last_page =~ /([\d.]+)/;
>
>
>
>
>
> John
> --
> The programmer is fighting against the two most
> destructive forces in the universe: entropy and
> human stupidity. -- Damian Conway
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
> For additional commands, e-mail: beginners-help [at] perl.org
> http://learn.perl.org/
>
>
> Yup. Got it.

( my $ip ) = $last_page =~ m/[\d.]+/g

* parenthesis only if part of capture is required or without /g.
* + not inside character class since we don't have ip like 192+. (silly me).
* dot need not be escaped inside character class since it is no longer the
king (meta character) ?
* John is a helpful guy.

--00504502bd71a8deb30480cd8601--
raphael.japh [ Di, 02 März 2010 09:55 ] [ ID #2033851 ]

Re: mech->content match regex howto

raphael() wrote:

> ( my $ip ) = $last_page =~ m/[\d.]+/g

The \d matches 200+ codepoints, so if you want to match only 0-9, then
use [0-9.].

--
Ruud

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
rvtol+usenet [ Di, 02 März 2010 23:47 ] [ ID #2033986 ]

Re: mech->content match regex howto

--001636e90cfd24bc140480e20a63
Content-Type: text/plain; charset=ISO-8859-1

On Wed, Mar 3, 2010 at 4:17 AM, Dr.Ruud
<rvtol+usenet [at] isolution.nl<rvtol%2Busenet [at] isolution.nl>
> wrote:

> raphael() wrote:
>
> ( my $ip ) = $last_page =~ m/[\d.]+/g
>>
>
> The \d matches 200+ codepoints, so if you want to match only 0-9, then use
> [0-9.].
>
> --
> Ruud
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
> For additional commands, e-mail: beginners-help [at] perl.org
> http://learn.perl.org/
>
>
> Thanks for the advice. [0-9.] it is.

--001636e90cfd24bc140480e20a63--
raphael.japh [ Mi, 03 März 2010 10:23 ] [ ID #2033987 ]
Perl » gmane.comp.lang.perl.beginners » mech->content match regex howto

Vorheriges Thema: File::Find NO RECURSION Howto
Nächstes Thema: regular expression - ?(foo.*)?