HTML::Parser

------=_NextPart_000_0376_01C64C38.64E29BD0
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit

I have been using HTML::Parser for a few years now and would like to resolve
the issue of processing malformed HTML -- that is missing start and end
tags. In particular, I'm running many web pages which are missing the
closing </A> and </SCRIPT> tags.

Is there is an easy way for HTML::Parser to insert implied tags such as what
is now done by HTML::TreeBuilder and HTML::Element????



Gil.Vidals [at] PositionResearch.com
Position Research, Inc.
Search engine results by research
tel: (760) 480-8291 fax: (760) 480-8271
www.PositionResearch.com <http://www.positionresearch.com/>

------=_NextPart_000_0376_01C64C38.64E29BD0--
gil.vidals [ Di, 21 März 2006 01:07 ] [ ID #1239818 ]

Re: HTML::Parser

"Gil Vidals" <gil.vidals [at] positionresearch.com> writes:

> I have been using HTML::Parser for a few years now and would like to resolve
> the issue of processing malformed HTML -- that is missing start and end
> tags. In particular, I'm running many web pages which are missing the
> closing </A> and </SCRIPT> tags.
>
> Is there is an easy way for HTML::Parser to insert implied tags such as what
> is now done by HTML::TreeBuilder and HTML::Element????

No, but if you can come up with simple rules for when the missing tags
should be inserted then writing a wrapper should be easy enough :)

Missing </SCRIPT> can be a challenge because HTML::Parser will just
report the rest of the document as text. You would have to find a
suitable place to restart parsing within this text and then perhaps
start off a new HTML::Parser instance there.

I would just use HTML::TreeBuilder :)

Regards,
Gisle
gisle [ Di, 21 März 2006 12:28 ] [ ID #1239819 ]
Perl » perl.libwww » HTML::Parser

Vorheriges Thema: Re: Subject: [Crypt::SSLeay] test problems on Solaris 2.8
Nächstes Thema: uninitialized value