Encoding Decoding problems in TreeBuilder

I'm running web pages through HTML::TreeBuilder and certain chracters,
namely < , >, ', and & are being encoded. For example, I ran the
following text through TreeBuilder:

The dog's collar 4 > 2 and 2 < 4 & amper

And the output is:

The dog's collar 4 > and 2 < 4 & amper

My question then, is this the expected and acceptable behavior for
TreeBuilder? According to W3.org,
http://www.w3.org/TR/html4/charset.html, a user agent (browser) will
translate character sets, but TreeBuilder isn't purporting to be a
user agent as LWP would be or is it????

I'm confused and need clarificaiton.

I am using TreeBuilder 3.13 and I have HTML::Tree 3.20 installed on
Perl 5.8.7.


--
Gil.Vidals [at] PositionResearch.com
Position Research, Inc.
Search engine results by research
tel: (760) 480-8291 fax: (760) 480-8271
www.PositionResearch.com
gil.vidals [ Di, 13 Juni 2006 21:48 ] [ ID #1354053 ]
Perl » perl.libwww » Encoding Decoding problems in TreeBuilder

Vorheriges Thema: AJAX/Google Pages
Nächstes Thema: Angle Brackets remain when tags removed using HTML::TreeBuilder