Getting the text from a PDF file
Hello,
I would like to extract the whole text from a PDF document. Can you
recommend a perl module that can do this under Windows?
I searched on cpan.org and I found very many modules, I tested a few of
them, but none of them was able to extract the text, which can be seen well
with Acrobat Reader, but they extracted only garbage, or nothing, or just
gave an error, or they were incompatible with Windows...
Thank you very much.
IP
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Getting the text from a PDF file
Ion Pop wrote:
> I would like to extract the whole text from a PDF document. Can you
> recommend a perl module that can do this under Windows?
>
> I searched on cpan.org and I found very many modules, I tested a few of
> them, but none of them was able to extract the text, which can be seen
> well with Acrobat Reader, but they extracted only garbage, or nothing,
> or just gave an error, or they were incompatible with Windows...
Your PDF could just contain a set of pictures, in stead of textual data.
--
Ruud
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Getting the text from a PDF file
----- Original Message -----
From: ""Ion Pop"" <ionpop123 [at] gmail.com>
Newsgroups: perl.beginners
To: <beginners [at] perl.org>
Sent: Sunday, February 28, 2010 7:57 AM
Subject: Getting the text from a PDF file
> Hello,
>
> I would like to extract the whole text from a PDF document. Can you
> recommend a perl module that can do this under Windows?
>
> I searched on cpan.org and I found very many modules, I tested a few of
> them, but none of them was able to extract the text, which can be seen
> well with Acrobat Reader, but they extracted only garbage, or nothing, or
> just gave an error, or they were incompatible with Windows...
>
> Thank you very much.
>
> IP
>
Here is a link from this group that explains how to using 'xpdf', (not a
Perl module).
http://groups.google.com/group/perl.beginners/browse_frm/thr ead/33e8352da6aaaa4/ec5b13be708ec05d?hl=en&lnk=gst&q=How+to+ pull+Text#ec5b13be708ec05d
Chris
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Getting the text from a PDF file
From: "Dr.Ruud" <rvtol+usenet [at] isolution.nl>
>
>> I would like to extract the whole text from a PDF document. Can you
>> recommend a perl module that can do this under Windows?
>>
>> I searched on cpan.org and I found very many modules, I tested a few of
>> them, but none of them was able to extract the text, which can be seen
>> well with Acrobat Reader, but they extracted only garbage, or nothing, or
>> just gave an error, or they were incompatible with Windows...
>
> Your PDF could just contain a set of pictures, in stead of textual data.
>
> --
> Ruud
The PDF I tried contains textual data and tables.
I mean, I tried with CAM::PDF and I was able to get the text from 2 pdf
files, strange formatted of course and broken words, but I was able at least
to get some text, but from the third PDF file I was able to get only garbage
with the same program.
I have also tried with pdftotex.exe and it was able to get the text much
better, but I would prefer a perl-based solution.
IP
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/