RE: RobotRules fails on user-agents with spaces

Gisle Aas wrote:
> <Matthew.van.Eerde [at] hbinc.com> writes:
>
>> The problem... if I include a space in my robot's user agent, it
>> will fail to recognize robots.txt records targeted to my robot.
>
> You are not allowed to have space in the user agent name. See section
> "3.8 Product Tokens" of RFC 2616 [1]. Isn't it an option to just
> rename your spider to something that follows the spec?

Oops! Yes, of course. I will rename my spider accordingly.
Patch proposal withdrawn.

> I'm not really opposed to this patch if product names with spaces are
> actually in common use. Do you have data to suggest it is?

Well, I do... here's some spiders that hit my site last week that are of =
this form:
Syndication Engine/1.1 (http://www.hexlet.com)
Feedster Crawler/1.0; Feedster, Inc.
Jakarta Commons-HttpClient/3.0-rc1
FAST Enterprise Crawler/6.4 (helpdesk at fast.no)
Jakarta HTTP Client/1.0
UPG1 UP/4.0 (compatible; Blazer 1.0)

On the other hand it's doubtful that any of these use RobotRules.pm, so =
these don't imply that a patch is called for.

--
Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902
Hispanic Business Inc./HireDiversity.com Software Engineer
Matthew.van.Eerde [ Fr, 14 Oktober 2005 17:32 ] [ ID #1013827 ]
Perl » perl.libwww » RE: RobotRules fails on user-agents with spaces

Vorheriges Thema: [Crypt::SSLeay] [CSL #283637] segfault in make test with 0.9.8a
Nächstes Thema: Re: RobotRules fails on user-agents with spaces