Spidering

--000e0cdf13ba8ae0a804a96ebc96
Content-Type: text/plain; charset=ISO-8859-1

Hi everyone i am a beginer for Perl can you give me a psedocode and a
sample code for a spider program.It will be helpful in understanding web
interfaces.Thank you


--

VinoRex.E
Research Scholar
DBT-Computational Biology Facility
Bharathiar University

--000e0cdf13ba8ae0a804a96ebc96--
vinorex [ Mo, 01 August 2011 12:03 ] [ ID #2062870 ]

Re: Spidering

Hello,

> Hi everyone i am a beginer for Perl can you give me a psedocode and a
> sample code for a spider program.It will be helpful in understanding web
> interfaces.Thank you

Check out WWW::Mechanize - http://search.cpan.org/perldoc?WWW::Mechanize
The SYNOPSIS section will help you get started.

Regards,
Alan Haggai Alavi.
--
The difference makes the difference.

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Alan Haggai Alavi [ Mo, 01 August 2011 12:14 ] [ ID #2062871 ]

Re: Spidering

--bcaec520f419632d0304a96f1b5b
Content-Type: text/plain; charset=ISO-8859-1

On Aug 1, 2011 6:16 AM, "Alan Haggai Alavi" <alanhaggai [at] alanhaggai.org>
wrote:
>
> Hello,
>
> > Hi everyone i am a beginer for Perl can you give me a psedocode and a
> > sample code for a spider program.It will be helpful in understanding web
> > interfaces.Thank you
>
> Check out WWW::Mechanize - http://search.cpan.org/perldoc?WWW::Mechanize
> The SYNOPSIS section will help you get started.
>

Mechanize is probably where I'd go. However, IIRC, there was a perl based
web spider that I remember seeing as looking through useragent names.

--bcaec520f419632d0304a96f1b5b--
Shawn Wilson [ Mo, 01 August 2011 12:29 ] [ ID #2062872 ]

Re: Spidering

On 01/08/2011 11:03, VinoRex.E wrote:
>
> Hi everyone i am a beginer for Perl can you give me a psedocode and a
> sample code for a spider program.It will be helpful in understanding web
> interfaces.Thank you

If you can't write your own pseudocode for a web spider then check
Bharathiar University for a more appropriate course. One version goes

function fetchall(URL)
content = get(URL)
loop for it over findlinks(content)
content = content + fetchall(it)
return content
end

Since the purpose of your efforts is to learn Perl, I think a module
like WWW::Mechanize is the wrong choice. To write a program that
accesses the internet, you should install and study the LWP library.

Rob

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Rob Dixon [ Mo, 01 August 2011 19:51 ] [ ID #2062881 ]

Re: Spidering

On Mon, Aug 01, 2011 at 06:51:37PM +0100, Rob Dixon wrote:
> On 01/08/2011 11:03, VinoRex.E wrote:
> >
> >Hi everyone i am a beginer for Perl can you give me a psedocode and a
> >sample code for a spider program.It will be helpful in understanding web
> >interfaces.Thank you
>
> Since the purpose of your efforts is to learn Perl, I think a module
> like WWW::Mechanize is the wrong choice. To write a program that
> accesses the internet, you should install and study the LWP library.

For my first ever web ap I started with Mechanize because I'd seen it
recommended here so many times. I don't believe it possible to use
Mechanize without having to become quite familiar with most of the
LWP library, particularly LWP::UserAgent, HTML::TreeBuilder,
HTML::Form.
JMHO,
Mike
--
Satisfied user of Linux since 1997.
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Mike McClain [ Mi, 03 August 2011 01:47 ] [ ID #2062958 ]

Re: Spidering

On Tue, Aug 2, 2011 at 19:47, Mike McClain <mike.junk [at] cox.net> wrote:
> On Mon, Aug 01, 2011 at 06:51:37PM +0100, Rob Dixon wrote:
>> On 01/08/2011 11:03, VinoRex.E wrote:
>> >
>> >Hi everyone i am a =A0beginer for Perl can you give me a psedocode and =
a
>> >sample code for a spider program.It will be helpful in understanding we=
b
>> >interfaces.Thank you
>>
>> Since the purpose of your efforts is to learn Perl, I think a module
>> like WWW::Mechanize is the wrong choice. To write a program that
>> accesses the internet, you should install and study the LWP library.
>
> For my first ever web ap I started with Mechanize because I'd seen it
> recommended here so many times. I don't believe it possible to use
> Mechanize without having to become quite familiar with most of the
> LWP library, particularly LWP::UserAgent, HTML::TreeBuilder,
> HTML::Form.
> JMHO,

yeah, that's why i like Web::Scraper. now that i know it (even though
it's been three month sense i've had the need for it), i can still
scrape a site in 15 minutes. but, for more intense stuff, i can
understand mechanize - most sites aren't that complex though.

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Shawn Wilson [ Mi, 03 August 2011 02:08 ] [ ID #2062960 ]

Re: Spidering

On Aug 1, 10:51=A0am, rob.di... [at] gmx.com (Rob Dixon) wrote:
> On 01/08/2011 11:03, VinoRex.E wrote:
>
>
>
> > Hi everyone i am a =A0beginer for Perl can you give me a psedocode and =
a
> > sample code for a spider program.It will be helpful in understanding we=
b
> > interfaces.Thank you
>
> If you can't write your own pseudocode for a web spider then check
> Bharathiar University for a more appropriate course. One version goes
>
> =A0 =A0function fetchall(URL)
> =A0 =A0 =A0content =3D get(URL)
> =A0 =A0 =A0loop for it over findlinks(content)
> =A0 =A0 =A0 =A0content =3D content + fetchall(it)
> =A0 =A0 =A0return content
> =A0 =A0end
>
> Since the purpose of your efforts is to learn Perl, I think a module
> like WWW::Mechanize is the wrong choice. To write a program that
> accesses the internet, you should install and study the LWP library.

LWP::RobotUA can be used in conjunction with other modules
in the LWP library suite too. It'll provide methods to ensure
appropriate spidering behavior, ie, not hitting sites too fast and
heeding a site's 'robots.txt' guidelines. This is very important for
any spidering programs you write.

--
Charles DeRykus


--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
derykus [ Mi, 03 August 2011 12:09 ] [ ID #2062967 ]
Perl » gmane.comp.lang.perl.beginners » Spidering

Vorheriges Thema: how to fork mysqldump in perl cgi script
Nächstes Thema: is there //= operator?