
Extracting Data from PDF files
------=_NextPart_000_002D_01CBD960.AFE1CDD0
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Hello,
I posted a question earlier about creating a PDF file from a PDF form =
submission which we now have working. We are able to create the PDF file =
to be attached to an email.
The issue I'm having now is the ability to extract some specific data =
from these PDF file created. We need to extract a couple of form field =
values from the PDF file created. I've been reviewing the various PDF =
modules and haven't been able to figure it out. The modules I've looking =
at are PDF::API2::Simple and PDF::FDF::Simple. These seem to just =
create/edit the PDF files, but I need to extract specific data from the =
created PDF file.
Is there another way to do this with these modules or some other method =
?
Thanks,
Mike(mickalo)Blezien
=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D- =3D-=3D-=3D-=3D=
-=3D-=3D-=3D-=3D-=3D-=3D-=3D
Thunder Rain Internet Publishing
-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D -=3D-=3D-=3D-=
=3D-=3D-=3D-=3D-=3D-=3D-=3D-
------=_NextPart_000_002D_01CBD960.AFE1CDD0--
Re: Extracting Data from PDF files
--20cf30433f2669e0a8049d92400a
Content-Type: text/plain; charset=ISO-8859-1
On Mar 3, 2011 6:07 AM, "Mike Blezien" <mickalo [at] frontiernet.net> wrote:
>
> Hello,
>
> I posted a question earlier about creating a PDF file from a PDF form
submission which we now have working. We are able to create the PDF file to
be attached to an email.
>
> The issue I'm having now is the ability to extract some specific data from
these PDF file created. We need to extract a couple of form field values
from the PDF file created. I've been reviewing the various PDF modules and
haven't been able to figure it out. The modules I've looking at are
PDF::API2::Simple and PDF::FDF::Simple. These seem to just create/edit the
PDF files, but I need to extract specific data from the created PDF file.
>
> Is there another way to do this with these modules or some other method ?
>
>
Maybe I'm missing something but why don't you just dump all of the form data
into a db and then you can create as many pdf as you like? I mean, I've used
a pdf scraping module (you can even do ocr with one) but it isn't fun
because the data is generally not nicely formatted for this. This probably
isn't the case for you but who cares because you have access to the
pre-processed data.
--20cf30433f2669e0a8049d92400a--
Re: Extracting Data from PDF files
----- Original Message -----
From: "shawn wilson" <ag4ve.us [at] gmail.com>
Cc: "Perl List" <beginners [at] perl.org>
Sent: Thursday, March 03, 2011 5:22 AM
Subject: Re: Extracting Data from PDF files
> On Mar 3, 2011 6:07 AM, "Mike Blezien" <mickalo [at] frontiernet.net> wrote:
>>
>> Hello,
>>
>> I posted a question earlier about creating a PDF file from a PDF form
> submission which we now have working. We are able to create the PDF file to
> be attached to an email.
>>
>> The issue I'm having now is the ability to extract some specific data from
> these PDF file created. We need to extract a couple of form field values
> from the PDF file created. I've been reviewing the various PDF modules and
> haven't been able to figure it out. The modules I've looking at are
> PDF::API2::Simple and PDF::FDF::Simple. These seem to just create/edit the
> PDF files, but I need to extract specific data from the created PDF file.
>>
>> Is there another way to do this with these modules or some other method ?
>>
>>
> Maybe I'm missing something but why don't you just dump all of the form data
> into a db and then you can create as many pdf as you like? I mean, I've used
> a pdf scraping module (you can even do ocr with one) but it isn't fun
> because the data is generally not nicely formatted for this. This probably
> isn't the case for you but who cares because you have access to the
> pre-processed data.
Shawn,
you mean dump it into a database(db) ? the data is mostly all binary so not sure
how you'd "scrape" it to extract the data but I'm not real fimilar with this
approach :)
Mike
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Extracting Data from PDF files
--00151747639e6d1dfb049d92d6e8
Content-Type: text/plain; charset=ISO-8859-1
On Mar 3, 2011 6:35 AM, "Mike Blezien" <mickalo [at] frontiernet.net> wrote:
>
> ----- Original Message ----- From: "shawn wilson" <ag4ve.us [at] gmail.com>
> Cc: "Perl List" <beginners [at] perl.org>
> Sent: Thursday, March 03, 2011 5:22 AM
> Subject: Re: Extracting Data from PDF files
>
>
>
>> On Mar 3, 2011 6:07 AM, "Mike Blezien" <mickalo [at] frontiernet.net> wrote:
>>>
>>>
>>> Hello,
>>>
>>> I posted a question earlier about creating a PDF file from a PDF form
>>
>> submission which we now have working. We are able to create the PDF file
to
>> be attached to an email.
>>>
>>>
>>> The issue I'm having now is the ability to extract some specific data
from
>>
>> these PDF file created. We need to extract a couple of form field values
>> from the PDF file created. I've been reviewing the various PDF modules
and
>> haven't been able to figure it out. The modules I've looking at are
>> PDF::API2::Simple and PDF::FDF::Simple. These seem to just create/edit
the
>> PDF files, but I need to extract specific data from the created PDF file.
>>>
>>>
>>> Is there another way to do this with these modules or some other method
?
>>>
>>>
>> Maybe I'm missing something but why don't you just dump all of the form
data
>> into a db and then you can create as many pdf as you like? I mean, I've
used
>> a pdf scraping module (you can even do ocr with one) but it isn't fun
>> because the data is generally not nicely formatted for this. This
probably
>> isn't the case for you but who cares because you have access to the
>> pre-processed data.
>
>
> Shawn,
>
> you mean dump it into a database(db) ? the data is mostly all binary so
not sure how you'd "scrape" it to extract the data but I'm not real fimilar
with this approach :)
>
You said your data was coming from pdf form, right? I've never done this per
se, however IIRC, the data is posted to a db, web cgi, or a text file. If
this is the case, why not get the text from the db - its plain text at that
point, no?
--00151747639e6d1dfb049d92d6e8--
Re: Extracting Data from PDF files
----- Original Message -----
From: "shawn wilson" <ag4ve.us [at] gmail.com>
Cc: "Perl List" <beginners [at] perl.org>
Sent: Thursday, March 03, 2011 6:04 AM
Subject: Re: Extracting Data from PDF files
> On Mar 3, 2011 6:35 AM, "Mike Blezien" <mickalo [at] frontiernet.net> wrote:
>>
>> ----- Original Message ----- From: "shawn wilson" <ag4ve.us [at] gmail.com>
>> Cc: "Perl List" <beginners [at] perl.org>
>> Sent: Thursday, March 03, 2011 5:22 AM
>> Subject: Re: Extracting Data from PDF files
>>
>>
>>
>>> On Mar 3, 2011 6:07 AM, "Mike Blezien" <mickalo [at] frontiernet.net> wrote:
>>>>
>>>>
>>>> Hello,
>>>>
>>>> I posted a question earlier about creating a PDF file from a PDF form
>>>
>>> submission which we now have working. We are able to create the PDF file
> to
>>> be attached to an email.
>>>>
>>>>
>>>> The issue I'm having now is the ability to extract some specific data
> from
>>>
>>> these PDF file created. We need to extract a couple of form field values
>>> from the PDF file created. I've been reviewing the various PDF modules
> and
>>> haven't been able to figure it out. The modules I've looking at are
>>> PDF::API2::Simple and PDF::FDF::Simple. These seem to just create/edit
> the
>>> PDF files, but I need to extract specific data from the created PDF file.
>>>>
>>>>
>>>> Is there another way to do this with these modules or some other method
> ?
>>>>
>>>>
>>> Maybe I'm missing something but why don't you just dump all of the form
> data
>>> into a db and then you can create as many pdf as you like? I mean, I've
> used
>>> a pdf scraping module (you can even do ocr with one) but it isn't fun
>>> because the data is generally not nicely formatted for this. This
> probably
>>> isn't the case for you but who cares because you have access to the
>>> pre-processed data.
>>
>>
>> Shawn,
>>
>> you mean dump it into a database(db) ? the data is mostly all binary so
> not sure how you'd "scrape" it to extract the data but I'm not real fimilar
> with this approach :)
>>
> You said your data was coming from pdf form, right? I've never done this per
> se, however IIRC, the data is posted to a db, web cgi, or a text file. If
> this is the case, why not get the text from the db - its plain text at that
> point, no?
I wish it was that simple. All the data passed is basically all binary from the
PDF form, haven't be able to figure out how to extract the actual specific form
field data in the file.
Mike
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Extracting Data from PDF files
--------------090100020000030303060800
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
I basically run our pdfs through a pdf 2 txt converter and extract the
data from the text files. It is pretty simple.
On 3/3/2011 6:21 AM, Mike Blezien wrote:
> ----- Original Message ----- From: "shawn wilson" <ag4ve.us [at] gmail.com>
> Cc: "Perl List" <beginners [at] perl.org>
> Sent: Thursday, March 03, 2011 6:04 AM
> Subject: Re: Extracting Data from PDF files
>
>
>> On Mar 3, 2011 6:35 AM, "Mike Blezien" <mickalo [at] frontiernet.net> wrote:
>>>
>>> ----- Original Message ----- From: "shawn wilson" <ag4ve.us [at] gmail.com>
>>> Cc: "Perl List" <beginners [at] perl.org>
>>> Sent: Thursday, March 03, 2011 5:22 AM
>>> Subject: Re: Extracting Data from PDF files
>>>
>>>
>>>
>>>> On Mar 3, 2011 6:07 AM, "Mike Blezien" <mickalo [at] frontiernet.net>
>>>> wrote:
>>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>> I posted a question earlier about creating a PDF file from a PDF form
>>>>
>>>> submission which we now have working. We are able to create the PDF
>>>> file
>> to
>>>> be attached to an email.
>>>>>
>>>>>
>>>>> The issue I'm having now is the ability to extract some specific data
>> from
>>>>
>>>> these PDF file created. We need to extract a couple of form field
>>>> values
>>>> from the PDF file created. I've been reviewing the various PDF modules
>> and
>>>> haven't been able to figure it out. The modules I've looking at are
>>>> PDF::API2::Simple and PDF::FDF::Simple. These seem to just create/edit
>> the
>>>> PDF files, but I need to extract specific data from the created PDF
>>>> file.
>>>>>
>>>>>
>>>>> Is there another way to do this with these modules or some other
>>>>> method
>> ?
>>>>>
>>>>>
>>>> Maybe I'm missing something but why don't you just dump all of the
>>>> form
>> data
>>>> into a db and then you can create as many pdf as you like? I mean,
>>>> I've
>> used
>>>> a pdf scraping module (you can even do ocr with one) but it isn't fun
>>>> because the data is generally not nicely formatted for this. This
>> probably
>>>> isn't the case for you but who cares because you have access to the
>>>> pre-processed data.
>>>
>>>
>>> Shawn,
>>>
>>> you mean dump it into a database(db) ? the data is mostly all binary so
>> not sure how you'd "scrape" it to extract the data but I'm not real
>> fimilar
>> with this approach :)
>>>
>> You said your data was coming from pdf form, right? I've never done
>> this per
>> se, however IIRC, the data is posted to a db, web cgi, or a text
>> file. If
>> this is the case, why not get the text from the db - its plain text
>> at that
>> point, no?
>
> I wish it was that simple. All the data passed is basically all binary
> from the PDF form, haven't be able to figure out how to extract the
> actual specific form field data in the file.
>
> Mike
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.872 / Virus Database: 271.1.1/3479 - Release Date: 03/03/11 01:34:00
>
--------------090100020000030303060800--