Regex

I want some code to go through HTML pages, identify external links and
insert a small image

Any suggestions?
nugget1960 [ Do, 05 Oktober 2006 10:31 ] [ ID #1490383 ]

Re: Regex

On 10/05/2006 03:31 AM, Mark & Ingrid Nugent wrote:
> I want some code to go through HTML pages, identify external links and
> insert a small image
>
> Any suggestions?
>
>

Have you tried to write this yourself? It could be interesting and fun.


--
paduille.4058.mumia.w [at] earthlink.net
Posting Guidelines for comp.lang.perl.misc:
http://www.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines. html
paduille.4058.mumia.w [ Do, 05 Oktober 2006 11:26 ] [ ID #1490384 ]

Re: Regex

Mark & Ingrid Nugent wrote:
> I want some code to go through HTML pages, identify external links and
> insert a small image
>
> Any suggestions?

There are multiple modules available on CPAN to assist with parsing
HTML. I recommend you head to http://search.cpan.org and search for
HTML::TokeParser

Paul Lalli
Paul Lalli [ Do, 05 Oktober 2006 14:18 ] [ ID #1490385 ]

Re: REGEX

On 10/05/2006 05:39 AM, Mark & Ingrid Nugent wrote:
> Thanks for responding to my post. I have spent some time today trying to get my head around some perl code. It doesn't seem to conform to the information on the web. For example in the first section what does the s{ mean. I was expecting this kind of format $v =~ s///.
>
> Also, what is the gsi at the end?
>
> # document links: require target="_blank", append file size and
> icon
> $v =~ s{(<a\s+[^> ]*?href="([^""]+\.(avi|bmp|csv|dat|doc|dot|eps|exe|gif|jpg|m ov|mp3|mpg|pdf|pps|ppt|pub|rtf|tif|txt|xls|zip))")([^>]*>.*?</a>)}{$1
> target="_blank"$4 <small class="note">new window <!--#fsize virtual="$2"
> --> <!--#include virtual="/ssi/file/icon/$3.inc" --></small>}gsi;
>
>
> # TRIM document links: require target="_blank", append file size
> and icon
> $v =~ s{(<a\s+[^>]*?href="([^""]+\.(trf|tr5))")([^>]*>.*?</a>)}{$1
> target="_blank"$4 <small class="note">new TRIM window <!--#include
> virtual="/ssi/file/icon/$3.inc" --></small>}gsi;
>
>
> # if it already had a target, that was all that is required
> $v =~ s{\s*target="_blank"(\s*target=".*?")}{$1}gsi;
>
>
> # remove file size for external documents
> $v =~ s|<!--#fsize\s*virtual="http://.*?"\s*-->\s*||gs;
>
>
> Mark Nugent
>
>

(Re-directed to alt.perl)

Keep conversations in the group.

s{}{} is the same as s/// except that the delimiter characters are
different.

The /gsi options mean, Global, Single-line, Insensitive-to-case.

Read the perl documentation:
perldoc perlrequick
perldoc perlre


--
paduille.4058.mumia.w [at] earthlink.net
Posting Guidelines for comp.lang.perl.misc:
http://www.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines. html
paduille.4058.mumia.w [ Do, 05 Oktober 2006 16:05 ] [ ID #1490386 ]

Re: Regex

"Mark & Ingrid Nugent" <nugget1960 [at] optusnet.com.au> writes:

> I want some code to go through HTML pages, identify external links and
> insert a small image
>
> Any suggestions?

HTML::Parser is a good start.

sherm--

--
Web Hosting by West Virginians, for West Virginians: http://wv-www.net
Cocoa programming in Perl: http://camelbones.sourceforge.net
Sherm Pendley [ Do, 05 Oktober 2006 16:53 ] [ ID #1490387 ]
Perl » alt.perl » Regex

Vorheriges Thema: What is the best/simplest way to Send yahoo mail from windows
Nächstes Thema: Re: it's me, wendy. please visit our new free family site/blog at FAMSTER.COM everybody :) :) :) :)