completely rewind an HTML::TokeParser (PullParser)

I want to parse out some HTML to count the text characters and then
rewind to the beginning of the parser to step through tags and excerpt
to an argument supplied character or percentage point in the *text*
chars and then, keeping track of what tags are open, automatically add
close tags.

To be used in a TT2 filter like "truncate" but for HTML instead of
plain text. So that...

[% html = "<p><b>this <i>is something to truncate</i></b></p>" %]
[% html | truncate_html(10) %]

Would output
<p><b>this <i>is so...</i></b></p>
^^^^^^12345^^^67890

Anyway, the trouble I'm trying to address is rewinding the parser for
the second walkthrough once I've counted text characters.

I want to do this: 1 while $p->unget_token()

But it doesn't work because HTML::PullParser->unget_token returns the
parser object itself.

To make a long story short, and it's not too late for that -- would
changing the return be reasonable or would it break code?

sub unget_token
{
my $self = shift;
unshift [at] {$self->{pullparser_accum}}, [at] _;
# $self; <-- change, don't return $self anymore
}

If not, does anyone have a smart idea for how to rewind it while
respecting the interface (ie, not testing $self->{pullparser_accum})?

Thanks!
-Ashley
Ashley [ Mi, 21 Dezember 2005 20:23 ] [ ID #1112419 ]

Re: completely rewind an HTML::TokeParser (PullParser)

D'oh, sorry, just realized that proposed sub change is useless. So,
please ignore that part. Still looking for ideas to make the rewind
work though.

On Wednesday, December 21, 2005, at 11:23 AM, Ashley Pond V wrote:

> I want to parse out some HTML to count the text characters and then
> rewind to the beginning of the parser to step through tags and excerpt
> to an argument supplied character or percentage point in the *text*
> chars and then, keeping track of what tags are open, automatically add
> close tags.
>
> To be used in a TT2 filter like "truncate" but for HTML instead of
> plain text. So that...
>
> [% html = "<p><b>this <i>is something to truncate</i></b></p>" %]
> [% html | truncate_html(10) %]
>
> Would output
> <p><b>this <i>is so...</i></b></p>
> ^^^^^^12345^^^67890
>
> Anyway, the trouble I'm trying to address is rewinding the parser for
> the second walkthrough once I've counted text characters.
>
> I want to do this: 1 while $p->unget_token()
>
> But it doesn't work because HTML::PullParser->unget_token returns the
> parser object itself.
>
> To make a long story short, and it's not too late for that -- would
> changing the return be reasonable or would it break code?
>
> sub unget_token
> {
> my $self = shift;
> unshift [at] {$self->{pullparser_accum}}, [at] _;
> # $self; <-- change, don't return $self anymore
> }
>
> If not, does anyone have a smart idea for how to rewind it while
> respecting the interface (ie, not testing $self->{pullparser_accum})?
>
> Thanks!
> -Ashley
>
apv [ Mi, 21 Dezember 2005 20:34 ] [ ID #1112420 ]

Re: completely rewind an HTML::TokeParser (PullParser)

apv <apv [at] sedition.com> writes:

> D'oh, sorry, just realized that proposed sub change is useless. So,
> please ignore that part. Still looking for ideas to make the rewind
> work though.

Why not just create a new parser object for the same string?

--Gisle
gisle [ Mi, 21 Dezember 2005 22:24 ] [ ID #1112421 ]

Re: completely rewind an HTML::TokeParser (PullParser)

Yeah, I think this is probably the right solution after playing some
more
myself. I think I was suffering from premature optimization, trying to
avoid creating another object.

Thanks!
-Ashley

On Wednesday, December 21, 2005, at 01:24 PM, Gisle Aas wrote:

> apv <apv [at] sedition.com> writes:
>
>> D'oh, sorry, just realized that proposed sub change is useless. So,
>> please ignore that part. Still looking for ideas to make the rewind
>> work though.
>
> Why not just create a new parser object for the same string?
>
> --Gisle
>
>
apv [ Mi, 21 Dezember 2005 22:30 ] [ ID #1112423 ]
Perl » perl.libwww » completely rewind an HTML::TokeParser (PullParser)

Vorheriges Thema: LWP::Authen::Negotiate
Nächstes Thema: WWW::Mechanize, save images from a page