Working with a 1-GB XML file...

Hi. I have a large XML file (aboug 1G) that I would like to be
able to interrogate in my code. Given its size, it's out of the
question to read it all into memory. I'd like to avoid having to
convert this thing to an RDB.

Does anyone know of a module that can treat such a file as
disk-resident data?

TIA!

kj
--
NOTE: In my address everything before the first period is backwards;
and the last period, and everything after it, should be discarded.
kj [ Do, 17 Januar 2008 18:40 ] [ ID #1910960 ]

Re: Working with a 1-GB XML file...

On 2008-01-17, kj <socyl [at] 987jk.com.invalid> wrote:
>
> Hi. I have a large XML file (aboug 1G) that I would like to be
> able to interrogate in my code. Given its size, it's out of the
> question to read it all into memory. I'd like to avoid having to
> convert this thing to an RDB.
>
> Does anyone know of a module that can treat such a file as
> disk-resident data?

You should probably read

http://perl-xml.sourceforge.net/faq/#parser_selection

It sounds like you might want a SAX-based parser.

--keith

--
kkeller-usenet [at] wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://www.therockgarden.ca/aolsfaq.txt
see X- headers for PGP signature information
Keith Keller [ Do, 17 Januar 2008 19:07 ] [ ID #1910962 ]

Re: Working with a 1-GB XML file...

kj <socyl [at] 987jk.com.invalid> wrote:
> Hi. I have a large XML file (aboug 1G) that I would like to be
> able to interrogate in my code.

In what ways do you want to interrogate it? Is all the data in the file
relevant to you, or could you abstract just the relevant parts of it into
a much smaller, memory resident set? (XML::Twig might be good for that.)

> Given its size, it's out of the
> question to read it all into memory. I'd like to avoid having to
> convert this thing to an RDB.

How about converting it to a DBM::Deep file?

> Does anyone know of a module that can treat such a file as
> disk-resident data?

Well, no module is needed to treat it as disk-resident data, as that is
exactly what it is already. You need to give us a functional definition of
how you want to access the data. That will most likely drive the storage,
not the other way around.

You might be able to use DBD::AnyData, but there is no particular reason to
think it will like the format your XML is already in, or that it will be
fast.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
xhoster [ Do, 17 Januar 2008 19:10 ] [ ID #1910963 ]

Re: Working with a 1-GB XML file...

kj <socyl [at] 987jk.com.invalid> wrote:

>
>
>
> Hi. I have a large XML file (aboug 1G) that I would like to be
> able to interrogate in my code. Given its size, it's out of the
> question to read it all into memory. I'd like to avoid having to
> convert this thing to an RDB.
>
> Does anyone know of a module that can treat such a file as
> disk-resident data?

It all depends a lot on /what/ is in the XML file. If it are records you
have to process one by one, XML::Twig might be the right answer. If you
have to process the file in a stream based way SAX or similar module might
be the answer.

--
John

http://johnbokma.com/
John Bokma [ Do, 17 Januar 2008 20:28 ] [ ID #1910969 ]
Perl » comp.lang.perl.misc » Working with a 1-GB XML file...

Vorheriges Thema: FAQ 8.20 How can I call my system's unique C functions from Perl?
Nächstes Thema: Help: undefined symbol: strlcpy