Moving through tree's using LWP
------=_NextPart_000_000D_01CBB2A1.BD7CE9B0
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Hi all.
I posted previously on how to move in a tree by a node at a time by =
using HTML:TreeBuilder which is used with LWP.
I wish to move by a single node and get the following info:
tag of the current HTML element.
Text of the HTML element.
Attributes of the current HTML element.
Of course, if there are none, then report undef.
any assistance would be great. Since I have tried a range of things and =
cannot work it out.
Sean
------=_NextPart_000_000D_01CBB2A1.BD7CE9B0--
Re: Moving through tree's using LWP
Hi Sean,
On Wednesday 12 Jan 2011 12:43:24 Sean Murphy wrote:
> Hi all.
>
> I posted previously on how to move in a tree by a node at a time by using
> HTML:TreeBuilder which is used with LWP.
>
> I wish to move by a single node and get the following info:
>
> tag of the current HTML element.
Did the ->tag() method work for you?
http://search.cpan.org/~jfearn/HTML-
Tree-4.1/lib/HTML/Element.pm#$h-%3Etag%28%29_or_$h-%3Etag%28 %27tagname%27%29
(sorry for the broken link).
Regards,
Shlomi Fish
--
------------------------------------------------------------ -----
Shlomi Fish http://www.shlomifish.org/
What Makes Software Apps High Quality - http://shlom.in/sw-quality
Chuck Norris can make the statement "This statement is false" a true one.
Please reply to list if it's a mailing list post - http://shlom.in/reply .
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Moving through tree's using LWP
At 12:52 +0200 12/01/2011, Shlomi Fish wrote:
>http://search.cpan.org/~jfearn/HTML-
>Tree-4.1/lib/HTML/Element.pm#$h-%3Etag%28%29_or_$h-%3Etag%2 8%27tagname%27%29
>
>(sorry for the broken link).
Links won't break in proper mailers if you enclose them in <>
<http://search.cpan.org/~jfearn/HTML-Tree-4.1/lib/HTML/Element.pm#$h-%3Etag%28%29_or_$h-%3Etag%28%27tagname%27%29>
JD
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Moving through tree's using LWP
Hi All.
I have read the page and the O'rielly book on PWL. I must be thick or
something. but when I dump the content of the web page into the treeBuilder
via a scaler. Then I try and print the tag. I get:
HTML::Element=HASH(0x41b1074)->Tag ( )
Below is the code extract. I have included the HTML:element and
HTML::Treebuilder modules.
sub download_file {
print "accessing new stream \n";
# content of web page passed to this routine.
my $root = HTML::TreeBuilder->new;
$root = HTML::TreeBuilder->new_from_content($_[0]);
#scan_for_non_table_text($root->find_by_tag_name('h5'));
scan_for_non_table_text($root);
$root->delete; # erase this tree because we're done with it
return;
} # end sub
sub scan_for_non_table_text {
my $element = $_[0];
#return if $element->tag eq 'table'; # prune!
foreach my $child ($element->content_list) {
if (ref $child) { # it's an element
print "This is an element\n";
print "$child->Tag ( )\n";
scan_for_non_table_text($child); # recurse!
} else { # it's a text node!
my $text .= $child;
print "text node: $text\n";
}
}
return;
}
I can get the text. But not the name of the Tag or Attributes. I am starting
at the top of the tree. The sub routine naming is screwed because I have
done so much playing around and haven't fixed things up.
So whatam I doing wrong?
Sean
Message -----
From: "John Delacour" <johndelacour [at] gmail.com>
To: <beginners [at] perl.org>
Sent: Wednesday, January 12, 2011 10:17 PM
Subject: Re: Moving through tree's using LWP
> At 12:52 +0200 12/01/2011, Shlomi Fish wrote:
>
>>http://search.cpan.org/~jfearn/HTML-
>>Tree-4.1/lib/HTML/Element.pm#$h-%3Etag%28%29_or_$h-%3Etag% 28%27tagname%27%29
>>
>>(sorry for the broken link).
>
> Links won't break in proper mailers if you enclose them in <>
>
>
> <http://search.cpan.org/~jfearn/HTML-Tree-4.1/lib/HTML/Element.pm#$h-%3Etag%28%29_or_$h-%3Etag%28%27tagname%27%29>
>
>
> JD
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
> For additional commands, e-mail: beginners-help [at] perl.org
> http://learn.perl.org/
>
>
>
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Moving through tree's using LWP
At 11:05 PM +1100 1/12/11, Sean Murphy wrote:
>Hi All.
>
>I have read the page and the O'rielly book on PWL. I must be thick
>or something. but when I dump the content of the web page into the
>treeBuilder via a scaler. Then I try and print the tag. I get:
>
>HTML::Element=HASH(0x41b1074)->Tag ( )
>
>Below is the code extract. I have included the HTML:element and
>HTML::Treebuilder modules.
>print "$child->Tag ( )\n";
Methods and subroutines are not called ("interpolated") within double
quotes. You can call the method and save in a scalar variable, then
print the value of that variable:
my $child_tag = $child->Tag();
print "$child_tag\n";
.... or print the value returned from the method directly:
print $child->Tag(), "\n";
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Moving through tree's using LWP
On 12/01/2011 16:03, Jim Gibson wrote:
> At 11:05 PM +1100 1/12/11, Sean Murphy wrote:
>> Hi All.
>>
>> I have read the page and the O'rielly book on PWL. I must be thick or
>> something. but when I dump the content of the web page into the
>> treeBuilder via a scaler. Then I try and print the tag. I get:
>>
>> HTML::Element=HASH(0x41b1074)->Tag ( )
>>
>> Below is the code extract. I have included the HTML:element and
>> HTML::Treebuilder modules.
>
>
>> print "$child->Tag ( )\n";
>
>
> Methods and subroutines are not called ("interpolated") within double
> quotes. You can call the method and save in a scalar variable, then
> print the value of that variable:
>
> my $child_tag = $child->Tag();
> print "$child_tag\n";
>
> ... or print the value returned from the method directly:
>
> print $child->Tag(), "\n";
Also the method should not be capitalised. It is
$child->tag()
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Moving through tree's using LWP
On Wed, Jan 12, 2011 at 11:03 AM, Jim Gibson <jimsgibson [at] gmail.com> wrote:
> Methods and subroutines are not called ("interpolated") within double
> quotes.
Unless you choose to be dirty and "trick"[1] Perl into doing it anyway:
print " [at] {[ $child->Tag() ]}\n";
Thanks to array dereferencing (i.e., [at] {} ) an anonymous array
reference (i.e., [] ).
Of course, it's more difficult to read and write (i.e., error-prone),
and probably somewhat wasteful considering the extra steps needed to
make it happen, so Jim Gibson's suggestion to print the result
directly or store it into a scalar first is certainly preferred in
good code. :P As I said, it's dirty. ^^
[1] http://www.perlcircus.org/subs.shtml
--
Brandon McCaig <http://www.bamccaig.com> <bamccaig [at] gmail.com>
V zrna gur orfg jvgu jung V fnl. Vg qbrfa'g nyjnlf fbhaq gung jnl.
Castopulence Software <http://www.castopulence.org/> <bamccaig [at] castopulence.org>
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/