Word boundaries

--0-2117397759-1279639351=:24926
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

Hi ,
=A0
Small confusion about word boundaries. word boundaries matches anything bet=
ween non-word character and word character ,right.
=A0
Here is small example :
$_ =3D "?Jack do you know the beauty of perl"
print "Enter your text:";
my $pattern =3D <STDIN>;
chomp $pattern;

if (/$pattern/){
print "1.$1\n";
}
=A0
Now say I have given pattern as :(\b\W\b) .what will be the result?
=A0
=A0My understanding=A0=A0is quotes a =A0(nonword character) ? a =A0(nonword=
character)
followed by word charatcer "j". The result I have expected=A0 ?.
=A0
but result is empty string.
=A0
Please can some about word boundaries?
=A0
=A0
Regards,
chandan.
=A0=0A=0A
--0-2117397759-1279639351=:24926--
Chandan Kumar [ Di, 20 Juli 2010 17:22 ] [ ID #2044823 ]

Re: Word boundaries

Chandan Kumar wrote:
> Hi ,

Hello,

> Small confusion about word boundaries. word boundaries matches anything
> between non-word character and word character ,right.

Correct.

> Here is small example :
> $_ = "?Jack do you know the beauty of perl"
> print "Enter your text:";
> my $pattern =<STDIN>;
> chomp $pattern;
>
> if (/$pattern/){
> print "1.$1\n";
> }
>
> Now say I have given pattern as :(\b\W\b) .what will be the result?

$1 will contain " " (a space character.)

> My understanding is quotes a (nonword character) ? a (nonword
> character) followed by word charatcer "j". The result I have expected?.
>
> but result is empty string.

The question mark (?) at the beginning of the string does not have a
word character to its left so \b\W\b will not match.



John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
jwkrahn [ Di, 20 Juli 2010 17:43 ] [ ID #2044824 ]

Re: Word boundaries

On 20/07/2010 16:22, Chandan Kumar wrote:
> Hi ,
>
> Small confusion about word boundaries. word boundaries matches
> anything between non-word character and word character ,right.

Not quite. /\b/ matches any (zero-length) point in a string between a
word and a non-word character, or between a word character and the
beginning or end of the string,

> Here is small example :
> $_ = "?Jack do you know the beauty of perl"
> print "Enter your text:";
> my $pattern =<STDIN>;
> chomp $pattern;
>
> if (/$pattern/){
> print "1.$1\n";
> }
>
> Now say I have given pattern as :(\b\W\b) .what will be the result?
>
> My understanding is quotes a (nonword character) ? a (nonword character)
> followed by word charatcer "j". The result I have expected ?.
>
> but result is empty string.
>
> Please can some about word boundaries?

(You are aware that /\W/ matches a single NON-word character? /\w/
matches a word character.)

You are asking for the first non-word character with a word character on
both sides. That is the space after 'Jack', bounded by 'k' and 'd'.

HTH,

Rob



--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Rob Dixon [ Di, 20 Juli 2010 19:36 ] [ ID #2044826 ]

Re: Word boundaries

Rob Dixon wrote:
> On 20/07/2010 16:22, Chandan Kumar wrote:
>>
>> Small confusion about word boundaries. word boundaries matches
>> anything between non-word character and word character ,right.
>
> Not quite.

Quite.

> /\b/ matches any (zero-length) point in a string between a
> word and a non-word character,

Correct.

> or between a word character and the
> beginning or end of the string,

Incorrect. It matches *only* between \w and \W characters.




John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
jwkrahn [ Di, 20 Juli 2010 23:06 ] [ ID #2044827 ]

Re: Word boundaries

Chandan Kumar wrote:
>>
>> --- On Tue, 20/7/10, John W. Krahn<jwkrahn [at] shaw.ca> wrote:
>>
>>> Chandan Kumar wrote:
>>>
>>> Small confusion about word boundaries. word boundaries matches anything
>>> between non-word character and word character ,right.
>>
>> Correct.
>>
>>> Here is small example :
>>> $_ = "?Jack do you know the beauty of perl"
>>> print "Enter your text:";
>>> my $pattern =<STDIN>;
>>> chomp $pattern;
>>>
>>> if (/$pattern/){
>>> print "1.$1\n";
>>> }
>>>
>>> Now say I have given pattern as :(\b\W\b) .what will be the result?
>>
>> $1 will contain " " (a space character.)
>>
>>> My understanding is quotes a (nonword character) ? a (nonword
>>> character) followed by word charatcer "j". The result I have expected?.
>>>
>>> but result is empty string.
>>
>> The question mark (?) at the beginning of the string does not have a
>> word character to its left so \b\W\b will not match.
>
> word boundaries defined as anything between a word character and non
> word character or its nonword and word character. I think its either
> way its correct.

Correct.

> In my example starting quotes are non-word character or not.

The quotes are not a part of the string, they just delimit the string.

$ perl -le'$_ = "?Jack do you know the beauty of perl"; print'
?Jack do you know the beauty of perl

For example, you could write the same thing without using quotes at all:

$ perl -le'$_ = q x?Jack do you know the beauty of perlx; print'
?Jack do you know the beauty of perl


> I'm assuming quotes as non-word character then followed \W (non-word
> character i.e ?) ,followed by word character "j"
>
> so for \b\W\b : the result should be questiomark ( ? ) right.

Even if quotes were a part of the string \b\W\b still wouldn't match ?
because " and ? are both non-word characters


> The way I have understood boundaries is correct or not.Please clarify
> me.I'm confused.

\b will match at the boundary *between* a non-word character and a word
character. In other words, there has to be a non-word character on one
side of \b and a word character on the other side for it to match.



John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
jwkrahn [ Di, 20 Juli 2010 23:26 ] [ ID #2044828 ]

Re: Word boundaries

>>>>> "JWK" == John W Krahn <jwkrahn [at] shaw.ca> writes:

JWK> Rob Dixon wrote:
>> On 20/07/2010 16:22, Chandan Kumar wrote:
>>>
>>> Small confusion about word boundaries. word boundaries matches
>>> anything between non-word character and word character ,right.
>>
>> Not quite.

JWK> Quite.

>> /\b/ matches any (zero-length) point in a string between a
>> word and a non-word character,

JWK> Correct.

>> or between a word character and the
>> beginning or end of the string,

JWK> Incorrect. It matches *only* between \w and \W characters.

sorry to correct that, but rob is right. i knew \b worked with string
ends but i didn't realize it assumed they were non-word (\W) chars. a
little test shows this:

perl -le 'print "yes" if "a" =~ /^\b/'
yes
perl -le 'print "yes" if "!" =~ /^\b/'

the second has no output. so you can consider the ends of a string to be
\W chars for \b.

uri

--
Uri Guttman ------ uri [at] stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Uri Guttman [ Di, 20 Juli 2010 23:27 ] [ ID #2044829 ]

Re: Word boundaries

and the docs say this:

A word boundary ("\b") is a spot between two characters that has
a "\w" on one side of it and a "\W" on the other side of it (in
either order), counting the imaginary characters off the
beginning and end of the string as matching a "\W".

so that backs up my little test.

uri

--
Uri Guttman ------ uri [at] stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Uri Guttman [ Di, 20 Juli 2010 23:28 ] [ ID #2044830 ]

Re: Word boundaries

On 7/20/10 Tue Jul 20, 2010 2:06 PM, "John W. Krahn" <jwkrahn [at] shaw.ca>
scribbled:

> Rob Dixon wrote:
>> On 20/07/2010 16:22, Chandan Kumar wrote:
>>>
>>> Small confusion about word boundaries. word boundaries matches
>>> anything between non-word character and word character ,right.
>>
>> Not quite.
>
> Quite.
>
>> /\b/ matches any (zero-length) point in a string between a
>> word and a non-word character,
>
> Correct.
>
>> or between a word character and the
>> beginning or end of the string,
>
> Incorrect. It matches *only* between \w and \W characters.

Then how are we to interpret this:

% perl -e '
$x = "abc";
if( $x =~ /\ba/ ) {
print "match\n";
}else{
print "no match\n";
}'
match



--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Jim Gibson [ Di, 20 Juli 2010 23:29 ] [ ID #2044831 ]

Re: Word boundaries

Wagner, David --- Senior Programmer Analyst --- CFS wrote:
>> -----Original Message-----
>> From: John W. Krahn [mailto:jwkrahn [at] shaw.ca]
>> Sent: Tuesday, July 20, 2010 15:06
>> To: Perl Beginners
>> Subject: Re: Word boundaries
>>
>> Rob Dixon wrote:
>>> On 20/07/2010 16:22, Chandan Kumar wrote:
>>>>
>>>> Small confusion about word boundaries. word boundaries matches
>>>> anything between non-word character and word character ,right.
>>>
>>> Not quite.
>>
>> Quite.
>>
>>> /\b/ matches any (zero-length) point in a string between a
>>> word and a non-word character,
>>
>> Correct.
>>
>>> or between a word character and the
>>> beginning or end of the string,
>>
>> Incorrect. It matches *only* between \w and \W characters.
>>
>>
> But for the test you were doing, you could have added this:
> (\b{0,1}\W\b) which would have gotten you the ? as the output, but
> unsure that that is what you really wanted...

You can't use a modifier on a zero-width assertion, it makes no sense.

Modifiers are only used on patterns that actually match characters.



John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
jwkrahn [ Di, 20 Juli 2010 23:30 ] [ ID #2044832 ]

RE: Word boundaries

>-----Original Message-----
>From: John W. Krahn [mailto:jwkrahn [at] shaw.ca]
>Sent: Tuesday, July 20, 2010 15:06
>To: Perl Beginners
>Subject: Re: Word boundaries
>
>Rob Dixon wrote:
>> On 20/07/2010 16:22, Chandan Kumar wrote:
>>>
>>> Small confusion about word boundaries. word boundaries matches
>>> anything between non-word character and word character ,right.
>>
>> Not quite.
>
>Quite.
>
>> /\b/ matches any (zero-length) point in a string between a
>> word and a non-word character,
>
>Correct.
>
>> or between a word character and the
>> beginning or end of the string,
>
>Incorrect. It matches *only* between \w and \W characters.
>
>
But for the test you were doing, you could have added this:
(\b{0,1}\W\b) which would have gotten you the ? as the output, but
unsure that that is what you really wanted...

If you have any questions and/or problems, please let me know.
Thanks.

Wags ;)
David R. Wagner
Senior Programmer Analyst
FedEx Services
1.719.484.2097 Tel
1.719.484.2419 Fax
1.408.623.5963 Cell
http://Fedex.com/us


>
>
>John
>--
>Any intelligent fool can make things bigger and
>more complex... It takes a touch of genius -
>and a lot of courage to move in the opposite
>direction. -- Albert Einstein
>
>--
>To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
>For additional commands, e-mail: beginners-help [at] perl.org
>http://learn.perl.org/
>


--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
David.Wagner [ Di, 20 Juli 2010 23:12 ] [ ID #2044886 ]

Re: Word boundaries

--0-21454998-1279774662=:60645
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

Hi ,
=A0
Im still confused of using word boundaries. After all the help given by eve=
ryone here ,I have tried other example to get to know what exactly word bou=
ndaries mean.
=A0
I'm not trying to extract any particular character ,just playing with word =
boundries to understand more on it.
=A0
ex: $_=3D"#!chk/usr/bin/perl";
=A0
1)The output for ---- (\b\W\b) is \
Iam looking for some character which is between a word character and non-wo=
rd character,
To my understanding I'm expecting=A0=A0!=A0as my asnwer. because starting o=
f the string there is # (non-word character) next=A0! =A0(non word characte=
r) next word character " c ".
Then answer should be ! ,Right?=A0 can any one explain.
=A0
Why does it showing result as \ which is between 2 word characters .
=A0
2) For the same example can any one explain (\b\W) and (\W\b)? Is these are=
valid?
=A0
=A0
By the way Im following this link http://www.perl.org/books/beginning-perl/
=A0
Thanks in advance.
=A0
=A0
Best regards,
chandan.
=A0
=A0

--- On Tue, 20/7/10, Jim Gibson <jimsgibson [at] gmail.com> wrote:


From: Jim Gibson <jimsgibson [at] gmail.com>
Subject: Re: Word boundaries
To: "Perl Beginners" <beginners [at] perl.org>
Date: Tuesday, 20 July, 2010, 9:29 PM


On 7/20/10 Tue=A0 Jul 20, 2010=A0 2:06 PM, "John W. Krahn" <jwkrahn [at] shaw.ca=
>
scribbled:

> Rob Dixon wrote:
>> On 20/07/2010 16:22, Chandan Kumar wrote:
>>>
>>> Small confusion about word boundaries. word boundaries matches
>>> anything between non-word character and word character ,right.
>>
>> Not quite.
>
> Quite.
>
>> /\b/ matches any (zero-length) point in a string between a
>> word and a non-word character,
>
> Correct.
>
>> or between a word character and the
>> beginning or end of the string,
>
> Incorrect.=A0 It matches *only* between \w and \W characters.

Then how are we to interpret this:

% perl -e '
$x =3D "abc";
if( $x =3D~ /\ba/ ) {
=A0 print "match\n";
}else{
=A0 print "no match\n";
}'
match



--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/


=0A=0A
--0-21454998-1279774662=:60645--
Chandan Kumar [ Do, 22 Juli 2010 06:57 ] [ ID #2044976 ]

Re: Word boundaries

On Thu, Jul 22, 2010 at 12:57 AM, Chandan Kumar <chandan_28582 [at] yahoo.com> w=
rote:
> ex: $_=3D"#!chk/usr/bin/perl";
>
> 1)The output for ---- (\b\W\b) is \

There is no \ (back-slash) character in your string. :\ I assume then
that you meant '/' (forward-slash)?

> Iam looking for some character which is between a word character and non-=
word character,
> To my understanding I'm expecting=C2=A0=C2=A0!=C2=A0as my asnwer. because=
starting of the string there is # (non-word character) next=C2=A0! =C2=A0(=
non word character) next word character " c ".
> Then answer should be ! ,Right?=C2=A0 can any one explain.
>
> Why does it showing result as \ which is between 2 word characters .

As Uri explained, a word boundary is the point between a word
character and a non-word character. # and ! are both non-word
characters so there is no word boundary between them. The first word
boundary is between ! (non-word character) and c (word character). The
first matching pattern is the forward slash in "k/u" because there is
a word boundary between k (word character) and / (non-word character)
and also between / (non-word character) and u (word character) and /
is between them and also a non-word character.

> 2) For the same example can any one explain (\b\W) and (\W\b)? Is these a=
re valid?

(\b\W) would match a word boundary followed by a non-word character.
An example pattern that we would expect to match is the "!" in "foo!".

(\W\b) is just the opposite, first matching the non-word character and
then matching the word boundary. An example pattern that we would
expect to match is the "(" in "(foo".

I appear to be correct, according to this rudimentary test (note that
I am still new to writing Perl so hopefully I'm not misinterpreting
the results):

use strict;
use warnings;

use Data::Dumper;

my [at] data =3D ('foo!', '(foo');
my [at] regexes =3D ('(\b\W)', '(\W\b)');

for my $r ( [at] regexes)
{
for my $d ( [at] data)
{
my ($match) =3D $d =3D~ m/$r/;

print "$d =3D~ /$r/ matches $match\n" if defined $match;
}
}

Output:

foo! =3D~ /(\b\W)/ matches !
(foo =3D~ /(\W\b)/ matches (


--
Brandon McCaig <bamccaig [at] gmail.com>
V zrna gur orfg jvgu jung V fnl. Vg qbrfa'g nyjnlf fbhaq gung jnl.
Castopulence Software <http://www.castopulence.org/> <bamccaig [at] castopulence=
..org>

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Brandon McCaig [ Do, 22 Juli 2010 19:53 ] [ ID #2044992 ]
Perl » gmane.comp.lang.perl.beginners » Word boundaries

Vorheriges Thema: Real newbie question
Nächstes Thema: Example code for storing picture in MySQL DB