regular expression negate a word (not character)

regular expression negate a word (not character)

am 26.01.2008 02:16:31 von Summercoolness

somebody who is a regular expression guru... how do you negate a word
and grep for all words that is

tire

but not

snow tire

or

snowtire

so for example, it will grep for

winter tire
tire
retire
tired

but will not grep for

snow tire
snow tire
some snowtires

need to do it in one regular expression

Re: regular expression negate a word (not character)

am 26.01.2008 03:15:36 von Summercoolness

On Jan 25, 5:16 pm, Summercool wrote:
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> tire
>
> but not
>
> snow tire
>
> or
>
> snowtire

i could think of something like

/[^s][^n][^o][^w]\s*tire/i

but what if it is not snow but some 20 character-word, then do we need
to do it 20 times to negate it? any shorter way?

Re: regular expression negate a word (not character)

am 26.01.2008 03:42:19 von paulaireilly

On Jan 25, 8:16 pm, Summercool wrote:
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> tire
>
> but not
....
> snow tire
> snow tire
> some snowtires
>
> need to do it in one regular expression

You might be looking for a negative lookahead assertion. Look
that up in a handy
source. The syntax is approximately


(?!foo) --> will match at any place "betwee" chars not immediately
preceded by a"foo".

Now, you have to add in the "bar" afterwards. But remember that

(?!foo) takes up zero width. And be careful about /.*/ matching
anything including zero
chars.

I would say more but this looks sorta like a homework assignment to
me. So this is a "hint" post. I or someone else could do a "solution"
post later, but just having the phrase
"negative lookahead assertion" to look up on the Web or in a book
index will probably
answer all your questions.

Note that what you *really* want is a "negative lookbehind assertion",
to put right in front of the "tire" in your example (or my "bar"), but
I think those won't be working until Perl 6.

Re: regular expression negate a word (not character)

am 26.01.2008 04:27:31 von Ben Morrow

Quoth paulaireilly :
>
> You might be looking for a negative lookahead assertion. Look

>
> Note that what you *really* want is a "negative lookbehind assertion",
> to put right in front of the "tire" in your example (or my "bar"), but
> I think those won't be working until Perl 6.

No, they work perfectly well in Perl 5, at least for fixed-length
strings. Syntax is (?<= ) and (? variable-length positive (but not negative) lookbehind at the start of
the match using \K.

Ben

Re: regular expression negate a word (not character)

am 26.01.2008 04:37:53 von Ben Morrow

[newsgroups line fixed, f'ups set to clpm]

Quoth Summercool :
> On Jan 25, 5:16 pm, Summercool wrote:
> > somebody who is a regular expression guru... how do you negate a word
> > and grep for all words that is
> >
> > tire
> >
> > but not
> >
> > snow tire
> >
> > or
> >
> > snowtire
>
> i could think of something like
>
> /[^s][^n][^o][^w]\s*tire/i
>
> but what if it is not snow but some 20 character-word, then do we need
> to do it 20 times to negate it? any shorter way?

This is no good, since 'snoo tire' fails to match even though you want
it to. You need something more like

/ (?: [^s]... | [^n].. | [^o]. | [^w] | ^ ) \s* tire /ix

but that gets *really* tedious for long strings, unless you generate it.

Ben

Re: regular expression negate a word (not character)

am 26.01.2008 05:40:23 von Mark Tolonen

"Summercool" wrote in message
news:27249159-9ff3-4887-acb7-99cf0d2582a8@n20g2000hsh.google groups.com...
>
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> tire
>
> but not
>
> snow tire
>
> or
>
> snowtire
>
> so for example, it will grep for
>
> winter tire
> tire
> retire
> tired
>
> but will not grep for
>
> snow tire
> snow tire
> some snowtires
>
> need to do it in one regular expression
>

What you want is a negative lookbehind assertion:

>>> re.search(r'(? >>> re.search(r'(? <_sre.SRE_Match object at 0x00FCD608>

Unfortunately you want variable whitespace:

>>> re.search(r'(? Traceback (most recent call last):
File "", line 1, in
File "C:\dev\python\lib\re.py", line 134, in search
return _compile(pattern, flags).search(string)
File "C:\dev\python\lib\re.py", line 233, in _compile
raise error, v # invalid expression
error: look-behind requires fixed-width pattern
>>>

Python doesn't support lookbehind assertions that can vary in size. This
doesn't work either:

>>> re.search(r'(? <_sre.SRE_Match object at 0x00F93480>

Here's some code (not heavily tested) that implements a variable lookbehind
assertion, and a function to mark matches in a string to demonstrate it:

### BEGIN CODE ###

import re

def finditerexcept(pattern,notpattern,string):
for matchobj in
re.finditer('(?:%s)|(?:%s)'%(notpattern,pattern),string):
if not re.match(notpattern,matchobj.group()):
yield matchobj

def markexcept(pattern,notpattern,string):
substrings = []
current = 0

for matchobj in finditerexcept(pattern,notpattern,string):
substrings.append(string[current:matchobj.start()])
substrings.append('[' + matchobj.group() + ']')
current = matchobj.end() #

substrings.append(string[current:])
return ''.join(substrings)

### END CODE ###

>>> sample='''winter tire
.... tire
.... retire
.... tired
.... snow tire
.... snow tire
.... some snowtires
.... '''
>>> print markexcept('tire','snow\s*tire',sample)
winter [tire]
[tire]
re[tire]
[tire]d
snow tire
snow tire
some snowtires

--Mark

Re: regular expression negate a word (not character)

am 26.01.2008 10:53:25 von Summercoolness

to add to the test cases, the regular expression must be able to grep


snowbird tire
tired on a snow day
snow tire and regular tire

Re: regular expression negate a word (not character)

am 26.01.2008 11:46:59 von bearophileHUGS

Summercool:
> to add to the test cases, the regular expression must be able to grep
> snow tire and regular tire

I presume there only the second tire has to be found.

This is my first try:

text = """
tire
word tire word
word retire word
word tired word
snowbird tire word
tired on a snow day word
snow tire and regular tire word
word snow tire word
word snow tire word
word some snowtires word
"""

import re

def finder(text):
patt = re.compile( r"\b (\w*) \s* (tire)", re.VERBOSE)
for mo in patt.finditer(text):
if not mo.group(1).endswith("snow"):
yield mo.start(2)

for end in finder(text):
print end

The (lazy) output is the starting point of the "tire" that match:


1
11
28
43
63
73
120

Bye,
bearophile

Re: regular expression negate a word (not character)

am 26.01.2008 12:34:16 von Paddy

On Jan 26, 1:16 am, Summercool wrote:
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> tire
>
> but not
>
> snow tire
>
> or
>
> snowtire
>
> so for example, it will grep for
>
> winter tire
> tire
> retire
> tired
>
> but will not grep for
>
> snow tire
> snow tire
> some snowtires
>
> need to do it in one regular expression

Try the answer here:
http://mail.python.org/pipermail/tutor/2003-August/024902.ht ml

Re: regular expression negate a word (not character)

am 26.01.2008 12:53:42 von bearophileHUGS

Paddy:
> Try the answer here:
> http://mail.python.org/pipermail/tutor/2003-August/024902.ht ml

But in the OP problem there can be variable-sized spaces in the
middle...

Bye,
bearophile

Re: regular expression negate a word (not character)

am 26.01.2008 22:39:02 von Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Summercool
], who wrote in article <27249159-9ff3-4887-acb7-99cf0d2582a8@n20g2000hsh.googlegroups.com>:
> so for example, it will grep for
>
> winter tire
> tire
> retire
> tired
>
> but will not grep for
>
> snow tire
> snow tire
> some snowtires

This does not describe the problem completely. What about

thisnow tire
snow; tire

etc? Anyway, one of the obvious modifications of

(^ | \b(?!snow) \w+ ) \W* tire

should work.

Hope this helps,
Ilya

Re: regular expression negate a word (not character)

am 28.01.2008 19:53:42 von gbacon

The code below at least passes your tests.

Hope it helps,
Greg

#! /usr/bin/perl

use warnings;
use strict;

use constant {
MATCH => 1,
NO_MATCH => 0,
};

my @tests = (
[ "winter tire", => MATCH ],
[ "tire", => MATCH ],
[ "retire", => MATCH ],
[ "tired", => MATCH ],
[ "snowbird tire", => MATCH ],
[ "tired on a snow day", => MATCH ],
[ "snow tire and regular tire", => MATCH ],
[ " tire" => MATCH ],
[ "snow tire" => NO_MATCH ],
[ "snow tire" => NO_MATCH ],
[ "some snowtires" => NO_MATCH ],
);

my $not_snow_tire = qr/
^ \s* tire |
([^w\s]|[^o]w|[^n]ow|[^s]now)\s*tire
/xi;

my $fail;
for (@tests) {
my($str,$want) = @$_;
my $got = $str =~ /$not_snow_tire/;
my $pass = !!$want == !!$got;

print "$str: ", ($pass ? "PASS" : "FAIL"), "\n";

++$fail unless $pass;
}

print "\n", (!$fail ? "PASS" : "FAIL"), "\n";

__END__

--
... all these cries of having 'abolished slavery,' of having 'preserved the
union,' of establishing a 'government by consent,' and of 'maintaining the
national honor' are all gross, shameless, transparent cheats -- so trans-
parent that they ought to deceive no one. -- Lysander Spooner, "No Treason"

Re: regular expression negate a word (not character)

am 28.01.2008 21:00:55 von rvtol+news

Greg Bacon schreef:

> #! /usr/bin/perl
>
> use warnings;
> use strict;
>
> use constant {
> MATCH => 1,
> NO_MATCH => 0,
> };
>
> my @tests = (
> [ "winter tire", => MATCH ],
> [ "tire", => MATCH ],
> [ "retire", => MATCH ],
> [ "tired", => MATCH ],
> [ "snowbird tire", => MATCH ],
> [ "tired on a snow day", => MATCH ],
> [ "snow tire and regular tire", => MATCH ],
> [ " tire" => MATCH ],
> [ "snow tire" => NO_MATCH ],
> [ "snow tire" => NO_MATCH ],
> [ "some snowtires" => NO_MATCH ],
> );
> [...]

I negated the test, to make the regex simpler:

my $snow_tire = qr/
snow [[:blank:]]* tire (?!.*tire)
/x;

my $fail;
for (@tests) {
my($str,$want) = @$_;
my $got = $str !~ /$snow_tire/;
my $pass = !!$want == !!$got;

print "$str: ", ($pass ? "PASS" : "FAIL"), "\n";

++$fail unless $pass;
}

print "\n", (!$fail ? "PASS" : "FAIL"), "\n";

__END__

--
Affijn, Ruud

"Gewoon is een tijger."

Re: regular expression negate a word (not character)

am 28.01.2008 22:37:36 von Paul McGuire

On Jan 25, 7:16=A0pm, Summercool wrote:
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> =A0 tire
>
> but not
>
> =A0 snow tire
>
> or
>
> =A0 snowtire
>

Too bad pyparsing's not an option. Here's what it would look like:

data =3D """
Match:
> winter tire
> tire
> retire
> tired

But not match:
> snow tire
> snow tire
> some snowtires

snowbird tire
tired on a snow day
snow tire and regular tire

"""

from pyparsing import CaselessLiteral,Literal,line

# caseless wasn't really necessary but you never know
# when you'll run into a "Snow tire"
snow =3D CaselessLiteral("snow")
tire =3D Literal("tire")
tire.ignore(snow + tire)

for matchTokens,matchStart,matchEnd in tire.scanString(data):
print line(matchStart, data)


Prints:

> winter tire
> tire
> retire
> tired
snowbird tire
tired on a snow day
snow tire and regular tire

-- Paul

Re: regular expression negate a word (not character)

am 29.01.2008 18:12:09 von gbacon

In article ,
Dr.Ruud wrote:

: I negated the test, to make the regex simpler: [...]

Yes, your approach is simpler. I assumed from the "need it all
in one pattern" constraint that the OP is feeding the regular
expression to some other program that is looking for matches.

I dunno. Maybe it was the familiar compulsion with Perl to
attempt to cram everything into a single pattern.

Greg
--
What light is to the eyes -- what air is to the lungs -- what love is to
the heart, liberty is to the soul of man.
-- Robert Green Ingersoll

Re: regular expression negate a word (not character)

am 01.02.2008 11:36:11 von rvtol+news

Greg Bacon schreef:
> Dr.Ruud:

>> I negated the test, to make the regex simpler: [...]
>
> Yes, your approach is simpler. I assumed from the "need it all
> in one pattern" constraint that the OP is feeding the regular
> expression to some other program that is looking for matches.

Yes, I assumed about the same, but thought it would be a nice
alternative anyways.
Happy Perling!

--
Affijn, Ruud

"Gewoon is een tijger."