regular expression negate a word (not character)

somebody who is a regular expression guru... how do you negate a word
and grep for all words that is

tire

but not

snow tire

or

snowtire

so for example, it will grep for

winter tire
tire
retire
tired

but will not grep for

snow tire
snow tire
some snowtires

need to do it in one regular expression
Summercoolness [ Sa, 26 Januar 2008 02:16 ] [ ID #1916980 ]

Re: regular expression negate a word (not character)

On Jan 25, 5:16 pm, Summercool <Summercooln... [at] gmail.com> wrote:
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> tire
>
> but not
>
> snow tire
>
> or
>
> snowtire

i could think of something like

/[^s][^n][^o][^w]\s*tire/i

but what if it is not snow but some 20 character-word, then do we need
to do it 20 times to negate it? any shorter way?
Summercoolness [ Sa, 26 Januar 2008 03:15 ] [ ID #1916983 ]

Re: regular expression negate a word (not character)

On Jan 25, 8:16 pm, Summercool <Summercooln... [at] gmail.com> wrote:
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> tire
>
> but not
....
> snow tire
> snow tire
> some snowtires
>
> need to do it in one regular expression

You might be looking for a <b>negative lookahead assertion</b>. Look
that up in a handy
source. The syntax is approximately


(?!foo) --> will match at any place "betwee" chars not immediately
preceded by a"foo".

Now, you have to add in the "bar" afterwards. But remember that

(?!foo) takes up zero width. And be careful about /.*/ matching
anything including zero
chars.

I would say more but this looks sorta like a homework assignment to
me. So this is a "hint" post. I or someone else could do a "solution"
post later, but just having the phrase
"negative lookahead assertion" to look up on the Web or in a book
index will probably
answer all your questions.

Note that what you *really* want is a "negative lookbehind assertion",
to put right in front of the "tire" in your example (or my "bar"), but
I think those won't be working until Perl 6.
paulaireilly [ Sa, 26 Januar 2008 03:42 ] [ ID #1916985 ]

Re: regular expression negate a word (not character)

Quoth paulaireilly <paulaireilly [at] gmail.com>:
>
> You might be looking for a <b>negative lookahead assertion</b>. Look
<snip>
>
> Note that what you *really* want is a "negative lookbehind assertion",
> to put right in front of the "tire" in your example (or my "bar"), but
> I think those won't be working until Perl 6.

No, they work perfectly well in Perl 5, at least for fixed-length
strings. Syntax is (?<= ) and (?<! ). In 5.10 you can get
variable-length positive (but not negative) lookbehind at the start of
the match using \K.

Ben
Ben Morrow [ Sa, 26 Januar 2008 04:27 ] [ ID #1916987 ]

Re: regular expression negate a word (not character)

[newsgroups line fixed, f'ups set to clpm]

Quoth Summercool <Summercoolness [at] gmail.com>:
> On Jan 25, 5:16 pm, Summercool <Summercooln... [at] gmail.com> wrote:
> > somebody who is a regular expression guru... how do you negate a word
> > and grep for all words that is
> >
> > tire
> >
> > but not
> >
> > snow tire
> >
> > or
> >
> > snowtire
>
> i could think of something like
>
> /[^s][^n][^o][^w]\s*tire/i
>
> but what if it is not snow but some 20 character-word, then do we need
> to do it 20 times to negate it? any shorter way?

This is no good, since 'snoo tire' fails to match even though you want
it to. You need something more like

/ (?: [^s]... | [^n].. | [^o]. | [^w] | ^ ) \s* tire /ix

but that gets *really* tedious for long strings, unless you generate it.

Ben
Ben Morrow [ Sa, 26 Januar 2008 04:37 ] [ ID #1916988 ]

Re: regular expression negate a word (not character)

"Summercool" <Summercoolness [at] gmail.com> wrote in message
news:27249159-9ff3-4887-acb7-99cf0d2582a8 [at] n20g2000hsh.google groups.com...
>
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> tire
>
> but not
>
> snow tire
>
> or
>
> snowtire
>
> so for example, it will grep for
>
> winter tire
> tire
> retire
> tired
>
> but will not grep for
>
> snow tire
> snow tire
> some snowtires
>
> need to do it in one regular expression
>

What you want is a negative lookbehind assertion:

>>> re.search(r'(?<!snow)tire','snowtire') # no match
>>> re.search(r'(?<!snow)tire','baldtire')
<_sre.SRE_Match object at 0x00FCD608>

Unfortunately you want variable whitespace:

>>> re.search(r'(?<!snow\s*)tire','snow tire')
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
File "C:\dev\python\lib\re.py", line 134, in search
return _compile(pattern, flags).search(string)
File "C:\dev\python\lib\re.py", line 233, in _compile
raise error, v # invalid expression
error: look-behind requires fixed-width pattern
>>>

Python doesn't support lookbehind assertions that can vary in size. This
doesn't work either:

>>> re.search(r'(?<!snow)\s*tire','snow tire')
<_sre.SRE_Match object at 0x00F93480>

Here's some code (not heavily tested) that implements a variable lookbehind
assertion, and a function to mark matches in a string to demonstrate it:

### BEGIN CODE ###

import re

def finditerexcept(pattern,notpattern,string):
for matchobj in
re.finditer('(?:%s)|(?:%s)'%(notpattern,pattern),string):
if not re.match(notpattern,matchobj.group()):
yield matchobj

def markexcept(pattern,notpattern,string):
substrings = []
current = 0

for matchobj in finditerexcept(pattern,notpattern,string):
substrings.append(string[current:matchobj.start()])
substrings.append('[' + matchobj.group() + ']')
current = matchobj.end() #

substrings.append(string[current:])
return ''.join(substrings)

### END CODE ###

>>> sample='''winter tire
.... tire
.... retire
.... tired
.... snow tire
.... snow tire
.... some snowtires
.... '''
>>> print markexcept('tire','snow\s*tire',sample)
winter [tire]
[tire]
re[tire]
[tire]d
snow tire
snow tire
some snowtires

--Mark
Mark Tolonen [ Sa, 26 Januar 2008 05:40 ] [ ID #1916989 ]

Re: regular expression negate a word (not character)

to add to the test cases, the regular expression must be able to grep


snowbird tire
tired on a snow day
snow tire and regular tire
Summercoolness [ Sa, 26 Januar 2008 10:53 ] [ ID #1916993 ]

Re: regular expression negate a word (not character)

Summercool:
> to add to the test cases, the regular expression must be able to grep
> snow tire and regular tire

I presume there only the second tire has to be found.

This is my first try:

text = """
tire
word tire word
word retire word
word tired word
snowbird tire word
tired on a snow day word
snow tire and regular tire word
word snow tire word
word snow tire word
word some snowtires word
"""

import re

def finder(text):
patt = re.compile( r"\b (\w*) \s* (tire)", re.VERBOSE)
for mo in patt.finditer(text):
if not mo.group(1).endswith("snow"):
yield mo.start(2)

for end in finder(text):
print end

The (lazy) output is the starting point of the "tire" that match:


1
11
28
43
63
73
120

Bye,
bearophile
bearophileHUGS [ Sa, 26 Januar 2008 11:46 ] [ ID #1916995 ]

Re: regular expression negate a word (not character)

On Jan 26, 1:16 am, Summercool <Summercooln... [at] gmail.com> wrote:
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> tire
>
> but not
>
> snow tire
>
> or
>
> snowtire
>
> so for example, it will grep for
>
> winter tire
> tire
> retire
> tired
>
> but will not grep for
>
> snow tire
> snow tire
> some snowtires
>
> need to do it in one regular expression

Try the answer here:
http://mail.python.org/pipermail/tutor/2003-August/024902.ht ml
Paddy [ Sa, 26 Januar 2008 12:34 ] [ ID #1916996 ]

Re: regular expression negate a word (not character)

Paddy:
> Try the answer here:
> http://mail.python.org/pipermail/tutor/2003-August/024902.ht ml

But in the OP problem there can be variable-sized spaces in the
middle...

Bye,
bearophile
bearophileHUGS [ Sa, 26 Januar 2008 12:53 ] [ ID #1916997 ]

Re: regular expression negate a word (not character)

[A complimentary Cc of this posting was sent to
Summercool
<Summercoolness [at] gmail.com>], who wrote in article <27249159-9ff3-4887-acb7-99cf0d2582a8 [at] n20g2000hsh.googlegroups.com>:
> so for example, it will grep for
>
> winter tire
> tire
> retire
> tired
>
> but will not grep for
>
> snow tire
> snow tire
> some snowtires

This does not describe the problem completely. What about

thisnow tire
snow; tire

etc? Anyway, one of the obvious modifications of

(^ | \b(?!snow) \w+ ) \W* tire

should work.

Hope this helps,
Ilya
Ilya Zakharevich [ Sa, 26 Januar 2008 22:39 ] [ ID #1917007 ]

Re: regular expression negate a word (not character)

The code below at least passes your tests.

Hope it helps,
Greg

#! /usr/bin/perl

use warnings;
use strict;

use constant {
MATCH => 1,
NO_MATCH => 0,
};

my [at] tests = (
[ "winter tire", => MATCH ],
[ "tire", => MATCH ],
[ "retire", => MATCH ],
[ "tired", => MATCH ],
[ "snowbird tire", => MATCH ],
[ "tired on a snow day", => MATCH ],
[ "snow tire and regular tire", => MATCH ],
[ " tire" => MATCH ],
[ "snow tire" => NO_MATCH ],
[ "snow tire" => NO_MATCH ],
[ "some snowtires" => NO_MATCH ],
);

my $not_snow_tire = qr/
^ \s* tire |
([^w\s]|[^o]w|[^n]ow|[^s]now)\s*tire
/xi;

my $fail;
for ( [at] tests) {
my($str,$want) = [at] $_;
my $got = $str =~ /$not_snow_tire/;
my $pass = !!$want == !!$got;

print "$str: ", ($pass ? "PASS" : "FAIL"), "\n";

++$fail unless $pass;
}

print "\n", (!$fail ? "PASS" : "FAIL"), "\n";

__END__

--
... all these cries of having 'abolished slavery,' of having 'preserved the
union,' of establishing a 'government by consent,' and of 'maintaining the
national honor' are all gross, shameless, transparent cheats -- so trans-
parent that they ought to deceive no one. -- Lysander Spooner, "No Treason"
gbacon [ Mo, 28 Januar 2008 19:53 ] [ ID #1918541 ]

Re: regular expression negate a word (not character)

Greg Bacon schreef:

> #! /usr/bin/perl
>
> use warnings;
> use strict;
>
> use constant {
> MATCH => 1,
> NO_MATCH => 0,
> };
>
> my [at] tests = (
> [ "winter tire", => MATCH ],
> [ "tire", => MATCH ],
> [ "retire", => MATCH ],
> [ "tired", => MATCH ],
> [ "snowbird tire", => MATCH ],
> [ "tired on a snow day", => MATCH ],
> [ "snow tire and regular tire", => MATCH ],
> [ " tire" => MATCH ],
> [ "snow tire" => NO_MATCH ],
> [ "snow tire" => NO_MATCH ],
> [ "some snowtires" => NO_MATCH ],
> );
> [...]

I negated the test, to make the regex simpler:

my $snow_tire = qr/
snow [[:blank:]]* tire (?!.*tire)
/x;

my $fail;
for ( [at] tests) {
my($str,$want) = [at] $_;
my $got = $str !~ /$snow_tire/;
my $pass = !!$want == !!$got;

print "$str: ", ($pass ? "PASS" : "FAIL"), "\n";

++$fail unless $pass;
}

print "\n", (!$fail ? "PASS" : "FAIL"), "\n";

__END__

--
Affijn, Ruud

"Gewoon is een tijger."
rvtol+news [ Mo, 28 Januar 2008 21:00 ] [ ID #1918543 ]

Re: regular expression negate a word (not character)

On Jan 25, 7:16=A0pm, Summercool <Summercooln... [at] gmail.com> wrote:
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> =A0 tire
>
> but not
>
> =A0 snow tire
>
> or
>
> =A0 snowtire
>

Too bad pyparsing's not an option. Here's what it would look like:

data =3D """
Match:
> winter tire
> tire
> retire
> tired

But not match:
> snow tire
> snow tire
> some snowtires

snowbird tire
tired on a snow day
snow tire and regular tire

"""

from pyparsing import CaselessLiteral,Literal,line

# caseless wasn't really necessary but you never know
# when you'll run into a "Snow tire"
snow =3D CaselessLiteral("snow")
tire =3D Literal("tire")
tire.ignore(snow + tire)

for matchTokens,matchStart,matchEnd in tire.scanString(data):
print line(matchStart, data)


Prints:

> winter tire
> tire
> retire
> tired
snowbird tire
tired on a snow day
snow tire and regular tire

-- Paul
Paul McGuire [ Mo, 28 Januar 2008 22:37 ] [ ID #1918545 ]

Re: regular expression negate a word (not character)

In article <fnlfr0.1fk.1 [at] news.isolution.nl>,
Dr.Ruud <rvtol+news [at] isolution.nl> wrote:

: I negated the test, to make the regex simpler: [...]

Yes, your approach is simpler. I assumed from the "need it all
in one pattern" constraint that the OP is feeding the regular
expression to some other program that is looking for matches.

I dunno. Maybe it was the familiar compulsion with Perl to
attempt to cram everything into a single pattern.

Greg
--
What light is to the eyes -- what air is to the lungs -- what love is to
the heart, liberty is to the soul of man.
-- Robert Green Ingersoll
gbacon [ Di, 29 Januar 2008 18:12 ] [ ID #1919476 ]

Re: regular expression negate a word (not character)

Greg Bacon schreef:
> Dr.Ruud:

>> I negated the test, to make the regex simpler: [...]
>
> Yes, your approach is simpler. I assumed from the "need it all
> in one pattern" constraint that the OP is feeding the regular
> expression to some other program that is looking for matches.

Yes, I assumed about the same, but thought it would be a nice
alternative anyways.
Happy Perling!

--
Affijn, Ruud

"Gewoon is een tijger."
rvtol+news [ Fr, 01 Februar 2008 11:36 ] [ ID #1922144 ]
Perl » comp.lang.perl.misc » regular expression negate a word (not character)

Vorheriges Thema: Find file and add to classpath
Nächstes Thema: Posting Guidelines for comp.lang.perl.misc ($Revision: 1.8 $)