problem w/ robotrules record parse

I've run into the same problem J and T ran into here:
http://www.mail-archive.com/libwww [at] perl.org/msg05452.html

Namely, malformed robots.txt files out there are being parsed somewhat =
draconianly by WWW::RobotRules.pm.

I've cobbled together a "fixed" version of RobotRules.pm -- how can I =
get it reviewed and ultimately blessed by the LWP community?

http://www.geocities.com/mvaneerde/RobotRules.pm.txt

--
Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902
Hispanic Business Inc./HireDiversity.com Software Engineer
Matthew.van.Eerde [ Mi, 21 September 2005 18:04 ] [ ID #977240 ]

Re: problem w/ robotrules record parse

On Wed, Sep 21, 2005 at 09:04:33AM -0700, Matthew.van.Eerde [at] hbinc.com (Matthew.van.Eerde [at] hbinc.com) wrote:
> I've cobbled together a "fixed" version of RobotRules.pm -- how can
> I get it reviewed and ultimately blessed by the LWP community?

It's not a community thing. Send a patch directly to Gisle, the owner
of LWP.

xoxo,
Andy

--
Andy Lester => andy [at] petdance.com => www.petdance.com => AIM:petdance
Andy [ Mi, 21 September 2005 18:07 ] [ ID #977241 ]

Re: problem w/ robotrules record parse

Andy Lester <andy [at] petdance.com> writes:

> On Wed, Sep 21, 2005 at 09:04:33AM -0700, Matthew.van.Eerde [at] hbinc.com (Matthew.van.Eerde [at] hbinc.com) wrote:
> > I've cobbled together a "fixed" version of RobotRules.pm -- how can
> > I get it reviewed and ultimately blessed by the LWP community?
>
> It's not a community thing. Send a patch directly to Gisle, the owner
> of LWP.

I still prefer patches to be posted to this list instead of me
directly. That way others might comment or pick up the patch for
their local use even if I'm not able to process it timely.

--Gisle
gisle [ Mi, 21 September 2005 18:55 ] [ ID #977242 ]

RE: problem w/ robotrules record parse

Gisle Aas wrote:
> Andy Lester <andy [at] petdance.com> writes:
>
>> Matthew.van.Eerde [at] hbinc.com (Matthew.van.Eerde [at] hbinc.com) wrote:
>>> I've cobbled together a "fixed" version of RobotRules.pm
>>
>> Send a patch directly to Gisle
>
> I still prefer patches to be posted to this list

Here's the patch, for the list.

http://www.geocities.com/mvaneerde/RobotRules.patch.txt

--
Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902
Hispanic Business Inc./HireDiversity.com Software Engineer
Matthew.van.Eerde [ Mi, 21 September 2005 19:01 ] [ ID #977243 ]

RE: problem w/ robotrules record parse

Matthew.van.Eerde wrote:
> Gisle Aas wrote:
>> Andy Lester <andy [at] petdance.com> writes:
>>
>>> Matthew.van.Eerde [at] hbinc.com (Matthew.van.Eerde [at] hbinc.com) wrote:
>>>> I've cobbled together a "fixed" version of RobotRules.pm
>>>
>>> Send a patch directly to Gisle
>>
>> I still prefer patches to be posted to this list
>
> Here's the patch, for the list.
>
> http://www.geocities.com/mvaneerde/RobotRules.patch.txt

And here's a smaller patch - only eleven new lines of code - which =
should have the same net effect.

http://www.geocities.com/mvaneerde/RobotRules.patch-3.txt

--
Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902
Hispanic Business Inc./HireDiversity.com Software Engineer
Matthew.van.Eerde [ Mi, 21 September 2005 19:58 ] [ ID #977244 ]

Re: problem w/ robotrules record parse

<Matthew.van.Eerde [at] hbinc.com> writes:

> And here's a smaller patch - only eleven new lines of code - which
> should have the same net effect.

This patch looks good. I'll apply it. Can you provide an update to
t/robot/rules.t as well?

Regards,
Gisle
gisle [ Mi, 21 September 2005 20:11 ] [ ID #977245 ]

Re: problem w/ robotrules record parse

<Matthew.van.Eerde [at] hbinc.com> writes:

> I've added a "warn" line in the case where a record separation is assumed... see
> http://www.geocities.com/mvaneerde/RobotRules.patch-4.txt
>
> rules.t patch:
> http://www.geocities.com/mvaneerde/rules-patch.txt

These patches have now been applied. Thanks!

--Gisle
gisle [ Mi, 21 September 2005 21:38 ] [ ID #977246 ]
Perl » perl.libwww » problem w/ robotrules record parse

Vorheriges Thema: Re: RobotRules fails on user-agents with spaces
Nächstes Thema: HTML-Parser: storing into a DB words with special chars