Out of memory, HTML::TableExtract
--0016364270585953cf049927ffd4
Content-Type: text/plain; charset=ISO-8859-1
Hi experts,
Have you ever experienced Out of memory problem while using
HTML::TableExtract. I'm having little large html files, still i didn't
expect this to happen
Would you be able to suggest some workarounds for this. I'm using this
subroutine in another for loop.
sub zParseHTMLFiles ($$) {
my ( $lrefFileList, $lrefColNames ) = [at] _;
my [at] ldata;
foreach my $lFile ( [at] $lrefFileList) {
my $lTableExtract = HTML::TableExtract->new( headers =>
[ [at] $lrefColNames] );
chomp($lFile);
$lTableExtract->parse_file($lFile);
foreach my $ls ( $lTableExtract->tables ) {
foreach my $lrow ( $lTableExtract->rows ) {
chomp( [at] $lrow[$#$lrow] );
push( [at] ldata, $lrow );
}
}
}
return \ [at] ldata;
}
Thanks
Jins Thomas
--0016364270585953cf049927ffd4--
Re: Out of memory, HTML::TableExtract
Maybe because you aren't closing each file after you have done your
thing and it remains in memory?
On 2011-01-06 02:26:13 -0500, Jins Thomas said:
> --0016364270585953cf049927ffd4
>
> Content-Type: text/plain; charset=ISO-8859-1
>
>
>
> Hi experts,
>
>
>
> Have you ever experienced Out of memory problem while using
>
> HTML::TableExtract. I'm having little large html files, still i didn't
>
> expect this to happen
>
>
>
> Would you be able to suggest some workarounds for this. I'm using this
>
> subroutine in another for loop.
>
>
>
> sub zParseHTMLFiles ($$) {
>
>
>
> my ( $lrefFileList, $lrefColNames ) = [at] _;
>
> my [at] ldata;
>
> foreach my $lFile ( [at] $lrefFileList) {
>
> my $lTableExtract = HTML::TableExtract->new( headers =>
>
> [ [at] $lrefColNames] );
>
> chomp($lFile);
>
> $lTableExtract->parse_file($lFile);
>
> foreach my $ls ( $lTableExtract->tables ) {
>
> foreach my $lrow ( $lTableExtract->rows ) {
>
> chomp( [at] $lrow[$#$lrow] );
>
> push( [at] ldata, $lrow );
>
> }
>
> }
>
> }
>
> return \ [at] ldata;
>
> }
>
>
>
> Thanks
>
> Jins Thomas
>
>
>
> --0016364270585953cf049927ffd4--
>
>
>
--
Robert
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Out of memory, HTML::TableExtract
> Maybe because you aren't closing each file after you have done your thing
> and it remains in memory?
Well I may be wrong but I think since he is using same file handler
for each file, new instance is over writing the older one so all files
cannot remain opened and hence cannot be in memory.
I think here the issue is that the array is saturating the memory. May
be he needs to write the data to some temp file and flush the array
for each file.
Cheers,
Parag
On Thu, Jan 6, 2011 at 6:59 PM, Robert <sigzero [at] gmail.com> wrote:
> Maybe because you aren't closing each file after you have done your thing
> and it remains in memory?
>
> On 2011-01-06 02:26:13 -0500, Jins Thomas said:
>
>> --0016364270585953cf049927ffd4
>>
>> Content-Type: text/plain; charset=3DISO-8859-1
>>
>>
>>
>> Hi experts,
>>
>>
>>
>> Have you ever experienced Out of memory problem while using
>>
>> HTML::TableExtract. I'm having little large html files, still i didn't
>>
>> expect this to happen
>>
>>
>>
>> Would you be able to suggest some workarounds for this. I'm using this
>>
>> subroutine in another for loop.
>>
>>
>>
>> sub zParseHTMLFiles ($$) {
>>
>>
>>
>> =C2=A0 =C2=A0my ( $lrefFileList, $lrefColNames ) =3D [at] _;
>>
>> =C2=A0 =C2=A0my [at] ldata;
>>
>> =C2=A0 =C2=A0foreach my $lFile ( [at] $lrefFileList) {
>>
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0my $lTableExtract =3D HTML::TableExtract->new=
( headers =3D>
>>
>> [ [at] $lrefColNames] );
>>
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0chomp($lFile);
>>
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0$lTableExtract->parse_file($lFile);
>>
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0foreach my $ls ( $lTableExtract->tables ) {
>>
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0foreach my $lrow ( $lTableExtra=
ct->rows ) {
>>
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0chomp( [at] $lrow[$#$=
lrow] );
>>
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0push( [at] ldata, $lr=
ow );
>>
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
>>
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0}
>>
>> =C2=A0 =C2=A0}
>>
>> =C2=A0 =C2=A0return \ [at] ldata;
>>
>> }
>>
>>
>>
>> Thanks
>>
>> Jins Thomas
>>
>>
>>
>> --0016364270585953cf049927ffd4--
>>
>>
>>
>
>
> --
> Robert
>
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
> For additional commands, e-mail: beginners-help [at] perl.org
> http://learn.perl.org/
>
>
>
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Out of memory, HTML::TableExtract
On Jan 5, 10:56=A0pm, jinstho... [at] gmail.com (Jins Thomas) wrote:
> Hi experts,
>
> Have you ever experienced Out of memory problem while using
> HTML::TableExtract. I'm having little large html files, still i didn't
> expect this to happen
>
If the html files are really big, HTML::TableExtract might be
filling memory. If that's the problem and the html output
is being generated by a program in fixed format, you may
need to parse the html yourself with a regex.
Another possible strategy, if your own arrays are filling
memory, is to use a DBM to offload memory to disk.
For an example: perldoc DB_File
--
Charles DeRykus
--
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Out of memory, HTML::TableExtract
On Jan 5, 10:56=A0pm, jinstho... [at] gmail.com (Jins Thomas) wrote:
> Hi experts,
>
> Have you ever experienced Out of memory problem while using
> HTML::TableExtract. I'm having little large html files, still i didn't
> expect this to happen
>
> Would you be able to suggest some workarounds for this. I'm using this
> subroutine in another for loop.
>
[snip]
Using a DBM may help as you grow arrays. The DBM
will trade memory for disk. A very simple example:
use DB_File;
...
tie [at] ldata, 'DB_File', 'ldata.dbm'
or die " tie failed: $!"
If HTML::TableExtract itself is using too much memory,
you may be able to replace it with a lighter regex that
you devise on your own to pull out the table data. But
this will be reliable only if the HTML is known to be
generated programmatically for instance so there's no
variance.
--
Charles DeRykus
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Out of memory, HTML::TableExtract
Hi DeRykus
Sorry for replying late.
I was able to test DB_File with your example, thanks. But i'm facing
a problem. I'm not able to access multi dimensional array with this
DB_File. Address is being stored just a string.
Do we have some options where we can access multi dimensional arrays
(like two dimensional array from html tables)
Thanks
Jins Thomas
On Sat, Jan 8, 2011 at 10:45 AM, C.DeRykus <derykus [at] gmail.com> wrote:
> On Jan 5, 10:56=A0pm, jinstho... [at] gmail.com (Jins Thomas) wrote:
>> Hi experts,
>>
>> Have you ever experienced Out of memory problem while using
>> HTML::TableExtract. I'm having little large html files, still i didn't
>> expect this to happen
>>
>> Would you be able to suggest some workarounds for this. I'm using this
>> subroutine in another for loop.
>>
> =A0 [snip]
>
> Using a DBM may help as you grow arrays. The DBM
> will trade memory for disk. A very simple example:
>
> =A0 use DB_File;
> =A0 ...
> =A0 tie [at] ldata, 'DB_File', 'ldata.dbm'
> =A0 =A0 =A0or die " tie failed: $!"
>
>
> If HTML::TableExtract itself is using too much memory,
> you may be able to replace it with a lighter regex that
> you devise on your own to pull out the table data. =A0But
> this will be reliable only if the HTML is known to be
> generated =A0programmatically for instance so there's no
> variance.
>
> --
> Charles DeRykus
>
>
>
>
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
> For additional commands, e-mail: beginners-help [at] perl.org
> http://learn.perl.org/
>
>
>
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Out of memory, HTML::TableExtract
On Jan 26, 11:28=A0pm, jinstho... [at] gmail.com (Jins Thomas) wrote:
> Hi DeRykus
>
> Sorry for replying late.
>
> I was able to =A0test DB_File with your example, thanks. But i'm facing
> a problem. I'm not able to access multi dimensional array with this
> DB_File. Address is being stored just a string.
>
> Do we have some options where we can access multi dimensional arrays
> (like two dimensional array from html tables)
> ....
MLDBM or MLDBM::Easy are options. Also DBM::Deep.
--
Charles DeRykkus
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Out of memory, HTML::TableExtract
--00163646d92e05e23f049ad24372
Content-Type: text/plain; charset=ISO-8859-1
On Thu, Jan 27, 2011 at 4:44 PM, C.DeRykus <derykus [at] gmail.com> wrote:
> On Jan 26, 11:28 pm, jinstho... [at] gmail.com (Jins Thomas) wrote:
>>
> > Hi DeRykus
> >
> > Sorry for replying late.
> >
> > I was able to test DB_File with your example, thanks. But i'm facing
> > a problem. I'm not able to access multi dimensional array with this
> > DB_File. Address is being stored just a string.
> >
> > Do we have some options where we can access multi dimensional arrays
> > (like two dimensional array from html tables)
> > ....
>
> MLDBM or MLDBM::Easy are options. Also DBM::Deep.
>
> --
> Charles DeRykkus
>
But MLDBM documentation talks only about hashes, no examples for arrays. So
got confused.
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
> For additional commands, e-mail: beginners-help [at] perl.org
> http://learn.perl.org/
>
>
>
--00163646d92e05e23f049ad24372--
Re: Out of memory, HTML::TableExtract
On Jan 27, 3:29=A0am, jinstho... [at] gmail.com (Jins Thomas) wrote:
> On Thu, Jan 27, 2011 at 4:44 PM, C.DeRykus <dery... [at] gmail.com> wrote:
> > On Jan 26, 11:28 pm, jinstho... [at] gmail.com (Jins Thomas) wrote:
>
> > > Hi DeRykus
>
> > > Sorry for replying late.
>
> > > I was able to =A0test DB_File with your example, thanks. But i'm faci=
ng
> > > a problem. I'm not able to access multi dimensional array with this
> > > DB_File. Address is being stored just a string.
>
> > > Do we have some options where we can access multi dimensional arrays
> > > (like two dimensional array from html tables)
> > > ....
>
> > MLDBM or MLDBM::Easy are options. =A0 Also DBM::Deep.
>
> > --
> > Charles DeRykkus
>
> But MLDBM documentation talks only about hashes, no examples for arrays. =
So
> got confused.
>
>
DBM::Deep enables array DBM's. See docs.
For ex:
use DBM::Deep;
my $db =3D DBM::Deep->new(
file =3D> "foo-array.db",
type =3D> DBM::Deep->TYPE_ARRAY
);
$db->[0] =3D "foo";
...
--
Charles DeRykus
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/