HTML::Parser is not thread-safe

The README file for HTML::Parser says to report bugs to this mailing
list.


I have tried using HTML::Parser in a threaded application. If an
HTML::Parser object exists when there is more than one thread, it has
problems destroying the object afterwards (and the program dies).

Here are a few one-liners (and their output) that demonstrate the
problem:

$ perl -MHTML::Parser -Mthreads -e'(async{new HTML::Parser})->join'
Bad signature in parser state object at 3767c0.
Unbalanced string table refcount: (1) for "_hparser_xs_state" during
global destruction.
Scalars leaked: 4

$ perl -MHTML::Parser -Mthreads -e'$p=new HTML::Parser; (async{})->join'
Scalars leaked: -13
Bad signature in parser state object at 62ccd0 during global
destruction.

But if I destroy my HTML::Parser object before creating a thread,
there is no problem:
$ perl -MHTML::Parser -Mthreads -le'$p=new HTML::Parser; undef $p;
(async{})->join; print "ok"'
ok

I hope this is helpful. I'm afraid know almost nothing about C and
XS, so I can't be of any more help.


Father Chrysostomos.


P.S.: I am using threads.pm version 1.57 and HTML::Parser version 3.55.
Here is the output from perl -V:

Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
Platform:
osname=darwin, osvers=8.8.0, archname=darwin-thread-multi-2level
uname='darwin treebeard.local 8.8.0 darwin kernel version 8.8.0:
fri sep 8 17:18:57 pdt 2006; root:xnu-792.12.6.obj~1release_ppc power
macintosh powerpc '
config_args=''
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-g -pipe -fno-common -DPERL_DARWIN -no-cpp-
precomp -fno-strict-aliasing -I/usr/local/include',
optimize='-O3',
cppflags='-no-cpp-precomp -g -pipe -fno-common -DPERL_DARWIN -no-
cpp-precomp -fno-strict-aliasing -I/usr/local/include'
ccversion='', gccversion='4.0.0 20041026 (Apple Computer, Inc.
build 4061)', gccosandvers='darwin8'
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=4, nvtype='double', nvsize=8,
Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags =' -L/usr/
local/lib'
libpth=/usr/local/lib /usr/lib
libs=-ldbm -ldl -lm -lc
perllibs=-ldl -lm -lc
libc=, so=dylib, useshrplib=false, libperl=libperl.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup -L/
usr/local/lib'


Characteristics of this binary (from libperl):
Compile-time options: MULTIPLICITY PERL_IMPLICIT_CONTEXT
PERL_MALLOC_WRAP USE_ITHREADS USE_LARGE_FILES
USE_PERLIO
Built under darwin
Compiled at Jan 9 2007 19:29:53
[at] INC:
/usr/local/lib/perl5/5.8.8/darwin-thread-multi-2level
/usr/local/lib/perl5/5.8.8
/usr/local/lib/perl5/site_perl/5.8.8/darwin-thread-multi-2le vel
/usr/local/lib/perl5/site_perl/5.8.8
/usr/local/lib/perl5/site_perl
/System/Library/Perl/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/5.8.6
/Library/Perl/5.8.6/darwin-thread-multi-2level
/Library/Perl/5.8.6/darwin-thread-multi-2level
/Library/Perl/5.8.6
/Library/Perl
/Network/Library/Perl/5.8.6/darwin-thread-multi-2level
/Network/Library/Perl/5.8.6
/Network/Library/Perl
/System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/Extras/5.8.6
/Library/Perl/5.8.1
.
Sprout [ Do, 11 Januar 2007 08:26 ] [ ID #1592677 ]

Re: HTML::Parser is not thread-safe

On 1/11/07, Father Chrysostomos <sprout [at] cpan.org> wrote:
> I have tried using HTML::Parser in a threaded application. If an
> HTML::Parser object exists when there is more than one thread, it has
> problems destroying the object afterwards (and the program dies).
>
> Here are a few one-liners (and their output) that demonstrate the
> problem:
>
> $ perl -MHTML::Parser -Mthreads -e'(async{new HTML::Parser})->join'
> Bad signature in parser state object at 3767c0.
> Unbalanced string table refcount: (1) for "_hparser_xs_state" during
> global destruction.
> Scalars leaked: 4

I see the same noise here on Linux, so there is definitively a problem
here. I have no idea what the problem is though. I have tried to
ignore knowing much about 'threads'. There should be no inherent
reason for HTML::Parser not to be thread safe. Anybody with threads
know-how that can help?

--
Gisle Aas
gisle [ Do, 11 Januar 2007 22:55 ] [ ID #1592678 ]

Re: HTML::Parser is not thread-safe

On 1/11/07, Gisle Aas <gisle [at] aas.no> wrote:
> Anybody with threads know-how that can help?

Bo Lindbergh provided a patch that fixes this problem and
HTML-Parser-3.56 has now been uploaded to CPAN with this fix. Thanks
Bo!

--Gisle
gisle [ Fr, 12 Januar 2007 12:12 ] [ ID #1593844 ]

Re: HTML::Parser is not thread-safe

On Jan 12, 2007, at 3:12 AM, Gisle Aas wrote:

> On 1/11/07, Gisle Aas <gisle [at] aas.no> wrote:
>> Anybody with threads know-how that can help?
>
> Bo Lindbergh provided a patch that fixes this problem and
> HTML-Parser-3.56 has now been uploaded to CPAN with this fix. Thanks
> Bo!
>
> --Gisle

That was quick. Thank you!

It seems to work fine now, except that, if a Parser object is created
within a thread *other than* the main thread, perl complains about
leaking scalars when the program exits.

$ perl -MHTML::Parser -Mthreads -le'(async{new HTML::Parser})->join;
END { print "end" }'
end
Scalars leaked: 1

But if the object is created in the main thread, there are no error
messages at all.


Father Chrysostomos


P.S.: I am not trying to put pressure on anyone--this module works
well enough for me as it is. I am simply trying to help by pointing
out bugs.
Sprout [ Fr, 12 Januar 2007 22:22 ] [ ID #1594571 ]

Re: HTML::Parser is not thread-safe

In article <AC620479-1A75-4311-B2B0-0CBC9833B8D3 [at] cpan.org>,
sprout [at] cpan.org (Father Chrysostomos) wrote:
> It seems to work fine now, except that, if a Parser object is created
> within a thread *other than* the main thread, perl complains about
> leaking scalars when the program exits.
>
> $ perl -MHTML::Parser -Mthreads -le'(async{new HTML::Parser})->join;
> END { print "end" }'
> end
> Scalars leaked: 1

However, perldoc threads says:
> Returning objects from threads does not work.

So don't do what you did in that example. :-)


/Bo Lindbergh
blgl [ Sa, 13 Januar 2007 10:36 ] [ ID #1594573 ]

Re: HTML::Parser is not thread-safe

> In article <AC620479-1A75-4311-B2B0-0CBC9833B8D3[at]cpan.org>,
> sprout[at]cpan.org (Father Chrysostomos) wrote:
> > It seems to work fine now, except that, if a Parser object is
> created
> > within a thread *other than* the main thread, perl complains about
> > leaking scalars when the program exits.
> >
> > $ perl -MHTML::Parser -Mthreads -le'(async{new HTML::Parser})->join;
> > END { print "end" }'
> > end
> > Scalars leaked: 1
>
> However, perldoc threads says:
> > Returning objects from threads does not work.
>
> So don't do what you did in that example. :-)
>
>
> /Bo Lindbergh
>
I'm sorry. You're right. My example was badly written. If I put ";
return" before the closing brace, it works.

Father Chrysostomos
Sprout [ Sa, 13 Januar 2007 21:34 ] [ ID #1595149 ]
Perl » perl.libwww » HTML::Parser is not thread-safe

Vorheriges Thema: Help with Mechanize
Nächstes Thema: Fetching the charset when set in meta.