REGEX Explanation

Hi list

Can anybody please explaing the meaning of the following regular expression

my $x = '12abc34bf5';
[at] num = split /(a|b)+/, $x;
print "NUM= [at] num\n";
NUM=12 b c34 b f5

Does it mean split the string ,here separaters are 'a' or 'b'(one or
more occurance because of + metacharacter).
If it matches from left i.e. a comes before b in the $x then why I am
getting 'b' as the 4th element in [at] num?


changing the split statement produces same result why?
my [at] num = split /(b|a)+/, $x
print "NUM= [at] num\n";
OUTPUT:
NUM=12 b c34 b f5

also I need the explanation of the output because. I cant understand
that if split function splits the string based on separater 'a' then I
should get bc34bf5 as second element.

But the non capturing grouping works as fine and I can understand it easily.

my $x = '12abc34bf5';
[at] num = split /(?:a|b)+/, $x;
print "NUM= [at] num\n";

NUM=12 c34 f5

split on either 'a' or 'b' but dont capture 'a' or 'b'

Thanks & Regards in advance
Anirban.

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Anirban Adhikary [ Fr, 08 April 2011 09:12 ] [ ID #2057793 ]

Re: REGEX Explanation

>>>>> "AA" == Anirban Adhikary <anirban.adhikary [at] gmail.com> writes:

AA> my $x = '12abc34bf5';
AA> [at] num = split /(a|b)+/, $x;
AA> print "NUM= [at] num\n";
AA> NUM=12 b c34 b f5

AA> Does it mean split the string ,here separaters are 'a' or 'b'(one or
AA> more occurance because of + metacharacter).
AA> If it matches from left i.e. a comes before b in the $x then why I am
AA> getting 'b' as the 4th element in [at] num?

this is somewhat tricky and covers several things. this one liner shows
the same result.

perl -le '($x) = "xabc" =~ /(a|b)+/; print $x'
b

that is matching the 'a' and because of + then matching the 'b'. but the
grab only grabs the last of the alternations to match so it returns
'b'. the alternation is matching but it can't grab more than one of the
choices so you get the last one.

the same thing is happening in the split. you have parens around the
delimiter so the grab will be returned in the list. the same logic
happens and 'a' is matched and then 'b' is and 'b' is the final
delimiter and that is returned.

AA> changing the split statement produces same result why?
AA> my [at] num = split /(b|a)+/, $x
AA> print "NUM= [at] num\n";
AA> OUTPUT:
AA> NUM=12 b c34 b f5

AA> also I need the explanation of the output because. I cant understand
AA> that if split function splits the string based on separater 'a' then I
AA> should get bc34bf5 as second element.

it splits on the thing inside the parens which matches 'b' last since it
occurs after 'a' in the string. the last thing matched is grabbed and
returned.

AA> But the non capturing grouping works as fine and I can understand it easily.

AA> my $x = '12abc34bf5';
AA> [at] num = split /(?:a|b)+/, $x;
AA> print "NUM= [at] num\n";

since nothing is grabbed, you get the split you want. the act of
grabbing can only grab 'a' OR 'b' but not a string of 'a's and 'b's. you
need to use a char class for that like [ab]+.

this is how alternation works with grabbing.

uri

--
Uri Guttman ------ uri [at] stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Uri Guttman [ Fr, 08 April 2011 09:31 ] [ ID #2057794 ]

Re: REGEX Explanation

sure......................

On Fri, Apr 8, 2011 at 1:29 PM, Uri Guttman <uri [at] stemsystems.com> wrote:
>
> please always reply to the list. resend that to the list.
>
> uri
>
> --
> Uri Guttman =A0------ =A0uri [at] stemsystems.com =A0-------- =A0http://www.sy=
sarch.com --
> ----- =A0Perl Code Review , Architecture, Development, Training, Support =
------
> --------- =A0Gourmet Hot Cocoa Mix =A0---- =A0http://bestfriendscocoa.com=
---------
>

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Anirban Adhikary [ Fr, 08 April 2011 10:49 ] [ ID #2057795 ]

Re: REGEX Explanation

On 2011-04-08 09:31, Uri Guttman wrote:

> this is how alternation works with grabbing.

And in case you wonder what 'grab' means:
the Perl documentation uses 'capture'.

--
Ruud

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
rvtol+usenet [ Fr, 08 April 2011 10:57 ] [ ID #2057798 ]

Re: REGEX Explanation

On 11-04-08 03:12 AM, Anirban Adhikary wrote:
> my $x = '12abc34bf5';
> [at] num = split /(a|b)+/, $x;
> print "NUM= [at] num\n";
> NUM=12 b c34 b f5

`split` normally splits a string by separating the string into segments
by the regular expression. The following split a string into "words",
that is, text segments without any white space:

#!/usr/bin/env perl

use strict;
use warnings;

use Data::Dumper;

# Make Data::Dumper pretty
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Indent = 1;

# Set maximum depth for Data::Dumper, zero means unlimited
local $Data::Dumper::Maxdepth = 0;

my $string = "The quick brown fox jumped over the lazy dogs.";

my [at] words = split /\s+/, $string;
print ' [at] words: ', Dumper \ [at] words;

__END__

But if you put the regular expression in parentheses, it also returns
what matches:

#!/usr/bin/env perl

use strict;
use warnings;

use Data::Dumper;

# Make Data::Dumper pretty
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Indent = 1;

# Set maximum depth for Data::Dumper, zero means unlimited
local $Data::Dumper::Maxdepth = 0;

my $string = "The quick brown fox jumped over the lazy dogs.";

my [at] words_and_spaces = split /(\s+)/, $string;
print ' [at] words_and_spaces: ', Dumper \ [at] words_and_spaces;

__END__

Because of the parentheses in your regex, it will capture what is the
match but since the plus is outside, it will only capture the last match.

To get it to capture the sequence of a's and b's, use:

[at] num = split /((?:a|b)+)/, $x;

To get it to not capture any matches, use the non-capture parentheses:

[at] num = split /(?:(?:a|b)+)/, $x;

See:
perldoc -f split
perldoc perlretut
perldoc perlre


--
Just my 0.00000002 million dollars worth,
Shawn

Confusion is the first step of understanding.

Programming is as much about organization and communication
as it is about coding.

The secret to great software: Fail early & often.

Eliminate software piracy: use only FLOSS.

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Shawn H Corey [ Fr, 08 April 2011 14:34 ] [ ID #2057801 ]

Re: REGEX Explanation

--000325557406b0b8e204a0682d91
Content-Type: text/plain; charset=ISO-8859-1

This is a nitpick, but..
On Fri, Apr 8, 2011 at 9:34 AM, Shawn H Corey <shawnhcorey [at] gmail.com>wrote:

> To get it to capture the sequence of a's and b's, use:
>
>
> [at] num = split /((?:a|b)+)/, $x;
>
> To get it to not capture any matches, use the non-capture parentheses:
>
>
> [at] num = split /(?:(?:a|b)+)/, $x;
>

split /[ab]+/, $x; makes this a lot less complex : )

Brian.

--000325557406b0b8e204a0682d91--
Brian Fraser [ Fr, 08 April 2011 15:26 ] [ ID #2057802 ]

Re: REGEX Explanation

On 04/08/2011 03:12 AM, Anirban Adhikary wrote:
> Can anybody please explaing the meaning of the following regular expression
>
> my $x = '12abc34bf5';
> [at] num = split /(a|b)+/, $x;

YAPE::Regex::Explain is great for this:

[pdurbin [at] beamish ~]$ perl -MYAPE::Regex::Explain -e 'print
YAPE::Regex::Explain->new("(a|b)+")->explain'
The regular expression:

(?-imsx:(a|b)+)

matches as follows:

NODE EXPLANATION
------------------------------------------------------------ ----------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
------------------------------------------------------------ ----------
( group and capture to \1 (1 or more times
(matching the most amount possible)):
------------------------------------------------------------ ----------
a 'a'
------------------------------------------------------------ ----------
| OR
------------------------------------------------------------ ----------
b 'b'
------------------------------------------------------------ ----------
)+ end of \1 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
------------------------------------------------------------ ----------
) end of grouping
------------------------------------------------------------ ----------
[pdurbin [at] beamish ~]$

Phil

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Philip Durbin [ Fr, 08 April 2011 15:44 ] [ ID #2057803 ]

Re: REGEX Explanation

On 2011-04-08 14:34, Shawn H Corey wrote:

> my [at] words = split /\s+/, $string;

See perldoc -f split, about why you might want to write that as

my [at] words = split ' ', $string;



> To get it to not capture any matches, use the non-capture parentheses:
>
> [at] num = split /(?:(?:a|b)+)/, $x;

Does that do anything different from

[at] num = split /(?:a|b)+/, $x;

?

--
Ruud

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
rvtol+usenet [ Fr, 08 April 2011 15:12 ] [ ID #2057867 ]
Perl » gmane.comp.lang.perl.beginners » REGEX Explanation

Vorheriges Thema: Select equivalent in perl
Nächstes Thema: Make Perl program released on windows