Dereference Links

I'm trying to dereference the [at] {$links} produced
by WWW::SimpleRobot and am having a heck of
a time getting it done. Can anybody help?
You can see some of the things I have tried
below.

I know I can do this link extraction myself with
LinkExtor, or at least think I can do it, but
I'd like to know how to dereference this script.


Mike Flannigan



#
#
#
#!/usr/local/bin/perl
#
use strict;
use warnings;
use WWW::SimpleRobot;
my $robot = WWW::SimpleRobot->new(
URLS => [ 'http://www.portofhouston.com/' ],
FOLLOW_REGEX => "^http://www.portofhouston.com//",
DEPTH => 1,
TRAVERSAL => 'depth',
VISIT_CALLBACK =>
sub {
my ( $url, $depth, $html, $links ) = [at] _;
my [at] linkder = [at] {$links};
print STDERR "Visiting $url\n\n";
# print STDERR "Depth = $depth\n";
# print STDERR "HTML = $html\n";
# print STDERR "Links = [at] {$links}\n";
# print STDERR "Links = [at] linkder\n";
# foreach ( [at] linkder){
# print STDERR "$_\n";
# }
for (my $num = 0; $num <= $#linkder; $num++) {
print STDERR "$linkder[$num]\n";
}
# for (my $num = 0; $num <= $#linkder; $num++) {
# print STDERR "${$linkder}[$num]\n";
# }
}

,
BROKEN_LINK_CALLBACK =>
sub {
my ( $url, $linked_from, $depth ) = [at] _;
print STDERR "$url looks like a broken link on
$linked_from\n";
print STDERR "Depth = $depth\n";
}
);
$robot->traverse;
my [at] urls = [at] {$robot->urls};
my [at] pages = [at] {$robot->pages};
for my $page ( [at] pages )
{
my $url = $page->{url};
my $depth = $page->{depth};
my $modification_time = $page->{modification_time};
}

print "\nAll done.\n";


__END__



--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Mike Flannigan [ Fr, 21 Januar 2011 15:01 ] [ ID #2053651 ]

Re: Dereference Links

On Fri, 21 Jan 2011 08:01:03 -0600, Mike Flannigan wrote:
> I'm trying to dereference the [at] {$links} produced by WWW::SimpleRobot and
> am having a heck of a time getting it done. Can anybody help? You can
> see some of the things I have tried below.

That module hasn't been updated since 2001. You'll have a much easier
time using WWW::Mechanize and many more people will be in a position to
help you.

--
Peter Scott
http://www.perlmedic.com/ http://www.perldebugged.com/
http://www.informit.com/store/product.aspx?isbn=0137001274
http://www.oreillyschool.com/courses/perl3/

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Peter Scott [ Mo, 24 Januar 2011 00:18 ] [ ID #2053745 ]

Re: Dereference Links

--001517510f5899ca1b049a8d1391
Content-Type: text/plain; charset=ISO-8859-1

You also might want to look into Data::Dumper to see exactly what you're
working with.

Off the top though:
map { [at] $_} [at] $arr
might be a start.

--001517510f5899ca1b049a8d1391--
Shawn Wilson [ Mo, 24 Januar 2011 01:56 ] [ ID #2053746 ]

Re: Dereference Links

On 21/01/2011 14:01, Mike Flannigan wrote:
>
> I'm trying to dereference the [at] {$links} produced
> by WWW::SimpleRobot and am having a heck of
> a time getting it done. Can anybody help?
> You can see some of the things I have tried
> below.
>
> I know I can do this link extraction myself with
> LinkExtor, or at least think I can do it, but
> I'd like to know how to dereference this script.
>
>
> Mike Flannigan
>
>
>
> #
> #
> #
> #!/usr/local/bin/perl
> #
> use strict;
> use warnings;
> use WWW::SimpleRobot;
> my $robot = WWW::SimpleRobot->new(
> URLS => [ 'http://www.portofhouston.com/' ],
> FOLLOW_REGEX => "^http://www.portofhouston.com//",
> DEPTH => 1,
> TRAVERSAL => 'depth',
> VISIT_CALLBACK =>
> sub {
> my ( $url, $depth, $html, $links ) = [at] _;
> my [at] linkder = [at] {$links};
> print STDERR "Visiting $url\n\n";
> # print STDERR "Depth = $depth\n";
> # print STDERR "HTML = $html\n";
> # print STDERR "Links = [at] {$links}\n";
> # print STDERR "Links = [at] linkder\n";
> # foreach ( [at] linkder){
> # print STDERR "$_\n";
> # }
> for (my $num = 0; $num <= $#linkder; $num++) {
> print STDERR "$linkder[$num]\n";
> }
> # for (my $num = 0; $num <= $#linkder; $num++) {
> # print STDERR "${$linkder}[$num]\n";
> # }
> }
>
> ,
> BROKEN_LINK_CALLBACK =>
> sub {
> my ( $url, $linked_from, $depth ) = [at] _;
> print STDERR "$url looks like a broken link on
> $linked_from\n";
> print STDERR "Depth = $depth\n";
> }
> );
> $robot->traverse;
> my [at] urls = [at] {$robot->urls};
> my [at] pages = [at] {$robot->pages};
> for my $page ( [at] pages )
> {
> my $url = $page->{url};
> my $depth = $page->{depth};
> my $modification_time = $page->{modification_time};
> }
>
> print "\nAll done.\n";

Hey Mike

What you have written can be fixed by changing it to

for (my $num = 0; $num <= $#linkder; $num++) {
print STDERR " [at] {$linkder[$num]}\n";
}

or even

for (my $num = 0; $num <= $#{$links}; $num++) {
print STDERR " [at] {$links->[$num]}\n";
}

but it is much clear and more Perlish to write

foreach my $link ( [at] $links) {
print STDERR " [at] {$link}\n";
}

Remember: everywhere you could put a simple variable identifier you can
put a reference. Surrounding it in braces is always valid and helps
resolve ambiguity, so [at] linkder is the same as [at] {linkder} is the same as
[at] {$links}. Likewise, $linkder[$num] (or $links->[$num]) is an array
reference, and can be dereferenced with [at] {$linkder[$num]}.

HTH,

Rob


--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Rob Dixon [ Mo, 24 Januar 2011 15:39 ] [ ID #2053756 ]

Re: Dereference Links

--------------070003030008040704000905
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit


On 1/23/2011 5:21 PM, beginners-digest-help [at] perl.org wrote:

>That module hasn't been updated since 2001. You'll have a >much easier

>time using WWW::Mechanize and many more people will be in>a position to
>help you.
>
>Peter Scott

>


Thank you for the reply.
I appreciate it.


Mike Flannigan


--------------070003030008040704000905--
Mike Flannigan [ Di, 25 Januar 2011 03:10 ] [ ID #2053814 ]

Re: Dereference Links

On 1/25/2011 6:07 PM, Rob and Shawn wrote:
> Hey Mike
>
> What you have written can be fixed by changing it to
>
> for (my $num = 0; $num <= $#linkder; $num++) {
> print STDERR " [at] {$linkder[$num]}\n";
> }
>
> or even
>
> for (my $num = 0; $num <= $#{$links}; $num++) {
> print STDERR " [at] {$links->[$num]}\n";
> }
>
> but it is much clear and more Perlish to write
>
> foreach my $link ( [at] $links) {
> print STDERR " [at] {$link}\n";
> }
>
> Remember: everywhere you could put a simple variable identifier you can
> put a reference. Surrounding it in braces is always valid and helps
> resolve ambiguity, so [at] linkder is the same as [at] {linkder} is the same as
> [at] {$links}. Likewise, $linkder[$num] (or $links->[$num]) is an array
> reference, and can be dereferenced with [at] {$linkder[$num]}.
>
> HTH,
>
> Rob

You also might want to look into Data::Dumper to see exactly what you're
working with.

Off the top though:
map { [at] $_} [at] $arr

might be a start.

_____________________________________________



Thank you Rob and Shawn. It worked well.
I can't claim I know fully why, but it worked.

I sure hope Perl6 gets rid of dereferencing.


Mike Flannigan



--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Mike Flannigan [ Do, 27 Januar 2011 15:12 ] [ ID #2053962 ]
Perl » gmane.comp.lang.perl.beginners » Dereference Links

Vorheriges Thema: Out of memory, HTML::TableExtract
Nächstes Thema: Newbie queries