Angle Brackets remain when tags removed using HTML::TreeBuilder

Hello -
Hopefully, this is an easy one.

I have some ugly HTML like this:

<span style=3D"font-weight: bold;"><font
style=3D"font-family: Arial;"
face=3DArial>MMCM4</font></span>

I am trying to get rid of the <font> tags using HTML::TreeBuilder.

Here is my script:

#!/usr/bin/perl
use strict;
use warnings;
use HTML::TreeBuilder;

my $filename =3D "test.htm";
open OUT, ">", "output.txt" || die "Can't open $!";

my $root =3D HTML::TreeBuilder->new;
$root->ignore_text(0);
$root->ignore_ignorable_whitespace(0);
$root->no_space_compacting(1);
$root->parse_file($filename);

my [at] fonts =3D $root->look_down('_tag', 'font');

foreach my $font ( [at] fonts) {
$font->tag(undef);
$font->attr('face',undef);
$font->attr('style',undef);
}
print OUT $root->as_HTML("","",{});

$root->delete();

And here is what the output looks like:

<span style=3D"font-weight: bold;"><>MMCM4</></span>

The problem is that although the font tags/attributes themselves are
removed, the angle bracket pairs <> and </>
are left behind. This causes the starting <> to be rendered in the
browser.

I've tried using $font->detach and $font->delete, but these methods also
delete the text content which must
be preserved.

It seems there must be something obvious I am missing.

Thanks
Dave
DMcGovern [ Di, 13 Juni 2006 20:50 ] [ ID #1354051 ]

RE: Angle Brackets remain when tags removed using HTML::TreeBuilder



> -----Original Message-----
> From: DMcGovern [at] sungardfutures.com
> [mailto:DMcGovern [at] sungardfutures.com]
> Sent: Tuesday, June 13, 2006 1:51 PM
> To: libwww [at] perl.org
> Subject: Angle Brackets remain when tags removed using
> HTML::TreeBuilder
>
> Hello -
> Hopefully, this is an easy one.
>
> I have some ugly HTML like this:
>
> <span style=3D"font-weight: bold;"><font
> style=3D"font-family: Arial;"
> face=3DArial>MMCM4</font></span>
>
> I am trying to get rid of the <font> tags using HTML::TreeBuilder.
>
> Here is my script:
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use HTML::TreeBuilder;
>
> my $filename =3D "test.htm";
> open OUT, ">", "output.txt" || die "Can't open $!";
>
> my $root =3D HTML::TreeBuilder->new;
> $root->ignore_text(0);
> $root->ignore_ignorable_whitespace(0);
> $root->no_space_compacting(1);
> $root->parse_file($filename);
>
> my [at] fonts =3D $root->look_down('_tag', 'font');
>
> foreach my $font ( [at] fonts) {
> $font->tag(undef);
> $font->attr('face',undef);
> $font->attr('style',undef);
> }

try

foreach my $font ( [at] fonts) { $font->replace_with_content->delete; }

That's untested, but I think it will do what you want.

Forrest Cahoon
not speaking for merrill corporation
Forrest.Cahoon [ Di, 13 Juni 2006 21:00 ] [ ID #1354052 ]
Perl » perl.libwww » Angle Brackets remain when tags removed using HTML::TreeBuilder

Vorheriges Thema: Encoding Decoding problems in TreeBuilder
Nächstes Thema: New add-on module: HTTP::RangeSaver