Regular Expressions Question

Dear All,

This is more of a generic question on regular expressions as my
program is working fine but I was just curious.

Say you have the following URLs:

http://www.test.com/image.gif
http://www.test.com/?src=image.gif?width=12

I want to get the type of the image, i.e. the string gif.

For the first URL the regular expression .*\.([a-z]{3}) will do the
trick while for the second one I am using .*=\([a-z]{3})?.*.

Ignoring the fact that the REs can be written better my question is:

If I put them together, that is write them as

..*\.([a-z]{3})|.*=\([a-z]{3})?.*

perl thinks that the or only applies to the characters immediately
surrounding it (in this case ) and .).

Is there a way to say here is a whole RE, here is another and match
the first or the second?

Regards,
George


--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
cityuk [ So, 10 April 2011 13:05 ] [ ID #2057885 ]

Re: Regular Expressions Question

cityuk wrote:
> Dear All,

Hello,

> This is more of a generic question on regular expressions as my
> program is working fine but I was just curious.
>
> Say you have the following URLs:
>
> http://www.test.com/image.gif
> http://www.test.com/?src=image.gif?width=12
>
> I want to get the type of the image, i.e. the string gif.
>
> For the first URL the regular expression .*\.([a-z]{3}) will do the
> trick while for the second one I am using .*=\([a-z]{3})?.*.
>
> Ignoring the fact that the REs can be written better my question is:
>
> If I put them together, that is write them as
>
> .*\.([a-z]{3})|.*=\([a-z]{3})?.*
>
> perl thinks that the or only applies to the characters immediately
> surrounding it (in this case ) and .).

No. The alternation applies to the complete pattern '.*\.([a-z]{3})' OR
'.*=\([a-z]{3})?.*'.



John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
jwkrahn [ Mo, 11 April 2011 00:03 ] [ ID #2057886 ]

Re: Regular Expressions Question

On 04/10/2011 04:05 AM, cityuk wrote:
> Is there a way to say here is a whole RE, here is another and match
> the first or the second?

Jeffrey E.F. Friedl, 2006, "Mastering Regular Expressions", 3 e.,
O'Reilly Media, ISBN 978-0-596-52812-6.

http://oreilly.com/catalog/9780596528126/


HTH,

David

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
David Christensen [ Mo, 11 April 2011 02:08 ] [ ID #2057923 ]

Re: Regular Expressions Question

On Sunday 10 Apr 2011 14:05:49 cityuk wrote:
> Dear All,
>
> This is more of a generic question on regular expressions as my
> program is working fine but I was just curious.
>
> Say you have the following URLs:
>
> http://www.test.com/image.gif
> http://www.test.com/?src=image.gif?width=12
>

Don't use regular expressions to parse URLs - instead use URI.pm:

http://cpan.uwinnipeg.ca/dist/URI

Regards,

Shlomi Fish

--
------------------------------------------------------------ -----
Shlomi Fish http://www.shlomifish.org/
http://www.shlomifish.org/humour/ways_to_do_it.html

Electrical Engineering studies. In the Technion. Been there. Done that. Forgot
a lot. Remember too much.

Please reply to list if it's a mailing list post - http://shlom.in/reply .

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Shlomi Fish [ Mo, 11 April 2011 07:43 ] [ ID #2057924 ]

Re: Regular Expressions Question

On 11/04/2011 06:43, Shlomi Fish wrote:
> On Sunday 10 Apr 2011 14:05:49 cityuk wrote:
>>
>> This is more of a generic question on regular expressions as my
>> program is working fine but I was just curious.
>>
>> Say you have the following URLs:
>>
>> http://www.test.com/image.gif
>> http://www.test.com/?src=image.gif?width=12
>>
>
> Don't use regular expressions to parse URLs - instead use URI.pm:
>
> http://cpan.uwinnipeg.ca/dist/URI

I agree. The program below shows a subroutine which will extract the
file type from either form of URL. It first checks to see if there is a
'src' option in the query, using this for the file name if so; otherwise
it uses the last segment of the URL path. The file type type is
extracted by capturing all trailing non-dot characters from the file
name.

(I assume your second address should read
<http://www.test.com/?src=image.gif&width=12> with an ampersand instead
of a second question mark?)

HTH,

Rob


use strict;
use warnings;

use URI;

sub filetype_from_url {
my $url = URI->new($_[0]);
my %form = $url->query_form;
my $file = $form{src} || ($url->path_segments)[-1];
return $file =~ /([^.]+)\z/;
}

print filetype_from_url('http://www.test.com/image.gif'), "\n";
print filetype_from_url('http://www.test.com/?src=image.gif&width= 12'), "\n";





--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Rob Dixon [ Mo, 11 April 2011 17:51 ] [ ID #2057927 ]

Re: Regular Expressions Question

On Apr 10, 11:03=A0pm, jwkr... [at] shaw.ca ("John W. Krahn") wrote:
> cityuk wrote:
> > Dear All,
>
> Hello,
>
>
>
> > This is more of a generic question on regular expressions as my
> > program is working fine but I was just curious.
>
> > Say you have the following URLs:
>
> >http://www.test.com/image.gif
> >http://www.test.com/?src=3Dimage.gif?width=3D12
>
> > I want to get the type of the image, i.e. the string gif.
>
> > For the first URL the regular expression .*\.([a-z]{3}) will do the
> > trick while for the second one I am using .*=3D\([a-z]{3})?.*.
>
> > Ignoring the fact that the REs can be written better my question is:
>
> > If I put them together, that is write them as
>
> > .*\.([a-z]{3})|.*=3D\([a-z]{3})?.*
>
> > perl thinks that the or only applies to the characters immediately
> > surrounding it (in this case ) and .).
>
> No. =A0The alternation applies to the complete pattern '.*\.([a-z]{3})' O=
R

OK. So if I understood you correctly, given the following (actual)
URLs

http://beta.images.theglobeandmail.com/archive/01258/electio n_heads__125899=
3cl-3.jpg
http://storage.canoe.ca/v1/dynamic_resize/?src=3Dhttp://www. torontosun.com/=
news/decision2011/2011/04/06/300_harper_boring.jpg&size=3D24 8x186

the following pattern

^\s*.*\.([a-zA-z]{3})$ | ^\S*\?\S*\.([a-zA-z]{3})&.*$

should match them both. Am I correct?

Regards,
George


> '.*=3D\([a-z]{3})?.*'.
>
> John
> --
> Any intelligent fool can make things bigger and
> more complex... It takes a touch of genius -
> and a lot of courage to move in the opposite
> direction. =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -- Albert Einstein


--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
gkl [ Mo, 11 April 2011 16:21 ] [ ID #2057986 ]

Re: Regular Expressions Question

On Apr 11, 7:21=A0am, gklc... [at] googlemail.com (gkl) wrote:
> On Apr 10, 11:03=A0pm, jwkr... [at] shaw.ca ("John W. Krahn") wrote:
stion on regular expressions as my
> > > program is working fine but I was just curious.
>
> > > Say you have the following URLs:
>
> > >http://www.test.com/image.gif
> > >http://www.test.com/?src=3Dimage.gif?width=3D12
....
>
> OK. So if I understood you correctly, given the following (actual)
> URLs
>
> http://beta.images.theglobeandmail.com/archive/01258/electio n_heads__...h=
ttp://storage.canoe.ca/v1/dynamic_resize/?src=3Dhttp://www.t orontosun....
>
> the following pattern
>
> ^\s*.*\.([a-zA-z]{3})$ | ^\S*\?\S*\.([a-zA-z]{3})&.*$
>
> should match them both. Am I correct?
>

No, there is at least one problem. In your first
alternative, the '.*' will also match the literal '?'
which the second alternative is matching.

See: 'perldoc perlretut' for a review.

[ The URI module which was mentioned will
be a quicker solution and will work work all
cases. ]

--
Charles DeRykus


See: perldoc perlretut


--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
derykus [ Di, 12 April 2011 19:28 ] [ ID #2058001 ]

Re: Regular Expressions Question

On 11/04/2011 15:21, gkl wrote:
>
> OK. So if I understood you correctly, given the following (actual)
> URLs
>
> http://beta.images.theglobeandmail.com/archive/01258/electio n_heads__1258993cl-3.jpg
> http://storage.canoe.ca/v1/dynamic_resize/?src=http://www.to rontosun.com/news/decision2011/2011/04/06/300_harper_boring. jpg&size=248x186
>
> the following pattern
>
> ^\s*.*\.([a-zA-z]{3})$ | ^\S*\?\S*\.([a-zA-z]{3})&.*$
>
> should match them both. Am I correct?

First of all I notice that the src parameter in your second URL's query
is now an absolute URL, whereas your first post had just a file name.
Since we cannot anticipate how far and in which direction your problem
may grow, it is your responsibility to present the entirety of the
possibilities as you know them. Otherwise you will be engaging the world
in a goose chase of the wildest sort.

If you mean

/^\s*.*\.([a-zA-z]{3})$ | ^\S*\?\S*\.([a-zA-z]{3})&.*$/

then you must apply the /x modifier, otherwise the spaces at the end of
the first option and at the beginning of the second form part of the
expressions.

As far as I can think,

/^\s*.*\.([a-zA-z]{3})$/

is exactly equivalent to

/\.([a-zA-z]{3})$/

which, presumably as you intend, will match the first URL and capture
'jpg'. It will fail to match the second URL.


While the first option seemed to be considering the possibility of
irrelevant leading spaces, the second

/^\S*\?\S*\.([a-zA-z]{3})&.*$/

is insisting on a sequence of non-spaces from the beginning of the
string up to the last possible question mark. Then another sequence of
non-spaces up to the last possible dot, followed by three alphas and an
ampersand. The subsequent /.*$/ does nothing.

I suggest to you that simply

/.*\.([a-z]+)/i

will match all of the four URLs you have posted so far, and capture from
them exactly what you expect. Only you can know the full extent of your
problem, and why you refuse the advice you have been offered.

I will continue to try to help you.

Rob


























--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Rob Dixon [ Mi, 13 April 2011 00:25 ] [ ID #2058042 ]
Perl » gmane.comp.lang.perl.beginners » Regular Expressions Question

Vorheriges Thema: writing to output using filehandles
Nächstes Thema: [moonbuzz@gmail.com: Re: Help using cgi]Re: Help using cgi