Perl "Out of Memory!" Issue

Perl "Out of Memory!" Issue

am 15.12.2008 05:11:53 von fzarabozo

Hello Guys,

I'm getting the "Out of memory!" error from ActivePerl 5.10 (1003) running
on a box with Intel Core 2 Duo 2.4 Ghz with 4 GB in ram with Windows XP Pro.

I'm using XML::Simple to parse a 120 MB XML file. My code so far is as
simple as this:

-----------------------------
use strict;
use warnings;
use XML::Simple;

my $xml = XMLin('file.xml');
-----------------------------

That's pretty much all I need to get the "Out of memory!" error after 90
seconds. As I see it, the XML file is really big, but should never be enough
to overrun this computer's (or Perl's) capacity.

I opened the task manager to watch the memory usage while the script was
running and I could see how the memory was getting used by Perl until it's
using about 3 GB before aborting.

While looking for an answer in Google, I found that several people using the
XML submodules are getting an "Out of memory!" error with big files.

Is this a matter of bad memory usage from XML::Simple? Does anyone know a
better way to parse huge XML files?

I mean... this is embarrassing to me, I'm the biggest Perl promoter and
defender here at work and I refuse to tell the others that Perl won't be
able to parse an XML file without crashing! It's gotta be a better way... Am
I right? :-|


Cheers,

Paco


_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Perl "Out of Memory!" Issue

am 15.12.2008 06:53:03 von Christian Walde

On Mon, 15 Dec 2008 05:11:53 +0100, Zarabozo, Francisco (GE, Corporate) arabozo@hotmail.com> wrote:

> use XML::Simple;
> "Out of memory!"
> =


That's solved in a simple manner. Get a module that does doesn't load the e=
ntire file into memory at once. Such as this: http://search.cpan.org/~mirod=
/XML-Twig-3.32/Twig.pm

-- =

Grüsse,
Christian Walde

=

=

___________________________________________________________ =

Der frühe Vogel fängt den Wurm. Hier gelangen Sie zum neuen Yahoo! Mail=
: http://mail.yahoo.de

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Perl "Out of Memory!" Issue

am 15.12.2008 17:09:55 von Jenda Krynicky

From: "Zarabozo, Francisco \(GE, Corporate\)"
> I'm getting the "Out of memory!" error from ActivePerl 5.10 (1003) running
> on a box with Intel Core 2 Duo 2.4 Ghz with 4 GB in ram with Windows XP Pro.
>
> I'm using XML::Simple to parse a 120 MB XML file.

Don't. While XML::Simple is kinda nice if the XML is fairly small and
simple, once it grows big you'd better reach for a different tool.
The "parse the whole XML into a maze of objects"-style modules are
out of question in this case, their memory footprint would be most
likely even bigger So you are left with those that let you process
the file in chunks. Either stream based parsers like the SAX modules
and XML::Parser (IMHO, they lead to code that's hard to understand
and debug), parsers that let you specify what tag encloses a
digestible chunk and then give you the data of that tag+content one
at a time like XML::Twig. Or a module that lets you filter the tags
as they are encontered, transform the data structure as it's built
and process the datastructure at whatever level(s) that's convenient
like XML::Rules.

HTH, Jenda
===== Jenda@Krynicky.cz === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Perl "Out of Memory!" Issue

am 16.12.2008 13:09:06 von fzarabozo

Hello All,

Thank tou for your answers. I think you're both right about handling files
by chunks instead of loading a huge file into memory at once. However,
there's definitely something wrong in the way that XML::Simple manages
memory usage, which is my point.

The file I'm loading is a 124 MB file. There's no reason at all to reach 3
GB in memory allocation.

Look at this.

The XML file I'm loading is a simple catalog that contains catalog elements
and each element contains properties like price and other stuff. Today, I
made a test with XML::Simple. Instead of making it parse the whole file, I
opened the file in my script and started taking element by element
(.*?). Each element I took, I parsed it with XML::Simple, which
returns a hash reference. Then, I added that hash reference to a main hash I
created at the begining of the script.

At the end, I got the whole catalog into my main hash, exactly as
XML::Simple should have done it with the whole file. I looked at memory
usage while my script was working and It never took more than 150 MB of
memory (and that was at the end), which is the logical thing to expect.
Also, the script worked very fast (less than 3 minutes) for all 30,000
catalog elements.

Anyway, thank you very much for your suggestions, I'm using them too to get
the best solution working.

Cheers,

Paco Zarabozo A.




--------------------------------------------------
From: "Jenda Krynicky"
Sent: Monday, December 15, 2008 10:09 AM
To: "Active State Perl Mailing List"
Subject: Re: Perl "Out of Memory!" Issue

> Don't. While XML::Simple is kinda nice if the XML is fairly small and
> simple, once it grows big you'd better reach for a different tool.
> The "parse the whole XML into a maze of objects"-style modules are
> out of question in this case, their memory footprint would be most
> likely even bigger So you are left with those that let you process
> the file in chunks. Either stream based parsers like the SAX modules
> and XML::Parser (IMHO, they lead to code that's hard to understand
> and debug), parsers that let you specify what tag encloses a
> digestible chunk and then give you the data of that tag+content one
> at a time like XML::Twig. Or a module that lets you filter the tags
> as they are encontered, transform the data structure as it's built
> and process the datastructure at whatever level(s) that's convenient
> like XML::Rules.


--------------------------------------------------
From: "Christian Walde"
Sent: Sunday, December 14, 2008 11:53 PM
To: "Zarabozo, Francisco (GE, Corporate)" ; "Active
State Perl Mailing List"
Subject: Re: Perl "Out of Memory!" Issue

> That's solved in a simple manner. Get a module that does doesn't load the
> entire file into memory at once. Such as this:
> http://search.cpan.org/~mirod/XML-Twig-3.32/Twig.pm




_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Perl "Out of Memory!" Issue

am 16.12.2008 13:51:17 von Angelos Karageorgiou

This is a multi-part message in MIME format.
--------------080006060604000005020301
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Zarabozo, Francisco (GE, Corporate) wrote:
> Hello All,
>
> Thank tou for your answers. I think you're both right about handling files
> by chunks instead of loading a huge file into memory at once. However,
> there's definitely something wrong in the way that XML::Simple manages
> memory usage, which is my point.
>
> The file I'm loading is a 124 MB file. There's no reason at all to reach 3
> GB in memory allocation.
>

Ye gods , XML is a poor choice as a DB replacement as you have
witnessed first hand.

BTW my 2 cents says that perl bombs out due to excessive recursion
depth! I have had it happen to me on occasion. The trick is to break out
of the recursion early.

--------------080006060604000005020301
Content-Type: text/x-vcard; charset=utf-8;
name="angelos.vcf"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename="angelos.vcf"

begin:vcard
fn:Angelos Karageorgiou
n:Karageorgiou;Angelos
org:Vivodi Telecommunications S.A.
email;internet:angelos@unix.gr
title:Technology Manager
tel;work:+30 211 7503 893
tel;fax:+30 211 7503 701
tel;cell:+30 6949120773
note;quoted-printable: =
=
Linkedin Profile =
=
http://www.linkedin.com/in/unixgr =
=
=
=
Personal Web Site =
http://www.unix.gr =
=
=
Blog Site =
http://angelos-proverbs.blogspot.com
x-mozilla-html:FALSE
url:http://www.linkedin.com/in/unixgr
version:2.1
end:vcard


--------------080006060604000005020301
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
--------------080006060604000005020301--