
parse data from a report file
Hi All,
I'm trying to parse data from a report file and I'm having trouble producing desired results.
Here is a data example from the report:
PONumber Line InvoicedQty UnitCost Amount Curr Extended Amount
Fr Date Company Department Account ItemNum ItemDescription
--------- ------------------------- ------ --------------------- ------------------
1023112-0000 1 1.0000 102.3419 102.34 USD 102.34
03A 10/13/10 213 31000 - 10810- 138328 ARMBD ART LN
Vendor: 1288 ALIMED Buyer:A02 VALERIE BAGALA
1026244-0000 1 1.0000 284.2525 284.25 USD 284.25
03A 10/29/10 213 31000 - 10810- 279784 BAGS CRUSHER
Vendor: 1338 SHARPE LINES INC Buyer:A02 VALERIE BAGALA
1024877-0000 1 4.0000 140.4800 561.92 USD 561.92
03A 10/26/10 213 31000 - 10810- 235228 SYR 1ML AMBER W/ TIP CAP
Vendor: 2472 BAXA CORP Buyer:A02 VALERIE BAGALA
1000066-0000 1 .9000 241.6845 217.52 USD 217.52
03A 05/19/10 213 41000 - 10810- 145155 NDL JAMSHIDI 11GA DISP STR
Vendor: 2686 CARDINAL HEALTH 200 INC Buyer:A04 DEAN SCHUMACHER
--------- ------------------------- ------ --------------------- ------------------
A complete record in the report expands to more than 1 line.
Each record begin with a line starting with 4 to 7 digits and -0000
each record ends with a line contains the word Vendor:'
I need to extract some elements from each record and combine the extracted data in one new record (1 line per record)
so that later I can bcp the new data into a table/database.
Here is my code:
#!/usr/bin/perl -w
use strict;
my $PO_file = "/home/sybase/scripts125/pl/test/simple_SH135.dat";
open(IN,"$PO_file") || die "Fail open $PO_file";
my ($line, [at] part1, [at] part2, [at] part3, $rec_part1, $rec_part2, $rec_part3, $complete_record);
print "Need to extract the following data\n";
print "PONumber|Quantity|UnitCost|ExtAmt|Date|Company|ItemNumber|D escription|VendorID\n";
while ($line=<IN>) {
chomp $line;
$line =~ s/^\s+//go;
$line =~ s/\s+/ /go;
$line =~ s/,//go;
# Part1
# Data Example: 1023112-0000 1 1.0000 102.3419 102.34 USD 102.34
############################################################ ############Part1:
if ($line =~ /-0000/){
#if ($line =~ /^\d{7}-0000/){
my [at] part1 = split(/\s+/,$line);
my $PONumber = $part1[0];
my $Quantity = $part1[2];
my $UnitCost = $part1[3];
my $ExtAmt = $part1[6];
$rec_part1 = join "|",($PONumber, $Quantity, $UnitCost, $ExtAmt);
}# end part 1
### Part2:
#Data Example: 03A 10/13/10 213 31000 - 10810- 138328 ARMBD ART LN
############################################################ ##########
if ($line =~ /(\d{2}\/\d{2}\/\d{2})/){
#if ($line =~ /^.{5}(\d{2}\/\d{2}\/\d{2})/){ #Data Eg: 03A 10/13/10 213 31000 #why not work?
my [at] part2 = split(/\s+/,$line,8); # last group has multiple words is descriptions
my $PurchFr = $part2[0] ;
my $Date = $part2[1] ;
my $Company = $part2[2] ;
my $Dept = $part2[3] ;
my $Acct = $part2[5] ;
my $ItemNumber = $part2[6] ;
my $Desc = $part2[7];
#$rec_part2 = join '|',($part2[1],$part2[2] ,$part2[6],$part2[7]);
$rec_part2 = join '|',($Date,$Company ,$ItemNumber,$Desc);
#print "rec_part2: $rec_part2 \n";
}# end Part2
##Part3: VendorID
# Data Example: Vendor: 1288 ALIMED Buyer:A02 VALERIE BAGALA
############################################################ ##########
if ($line =~ /^Vendor:/){
my [at] part3 = split(/\s+/,$line);
my $VendorID = $part3[1];
$rec_part3 = $VendorID;
} #end part3
$complete_record = "$rec_part1".'|'."$rec_part2".'|'."$rec_part3";
print "$complete_record\n";
}#end while
My questions:
I expect my program to produce 4 records, like:
PONumber|Quantity|UnitCost|ExtAmt|Date|Company|ItemNumber|De scription|VendorID
1026244-0000|1.0000|284.2525|284.25|10/13/10|213|138328|ARMB D ART LN|1288
1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1338
1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL JAMSHIDI 11GA DISP STR|2686
However it produces unwanted results, like below. Any pointers would be greatly appreciated.
1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D ART LN|
1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D ART LN|1288
1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D ART LN|1288
1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D ART LN|1288
1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D ART LN|1288
1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D ART LN|1288
1026244-0000|1.0000|284.2525|284.25|10/13/10|213|138328|ARMB D ART LN|1288
1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1288
1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1338
1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1338
1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1338
1024877-0000|4.0000|140.4800|561.92|10/29/10|213|279784|BAGS CRUSHER|1338
1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|1338
1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
1000066-0000|.9000|241.6845|217.52|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL JAMSHIDI 11GA DISP STR|2472
1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL JAMSHIDI 11GA DISP STR|2686
1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL JAMSHIDI 11GA DISP STR|2686
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: parse data from a report file
loan tran wrote:
> Hi All,
Hello,
> I'm trying to parse data from a report file and I'm having trouble producing desired results.
>
> Here is a data example from the report:
>
> PONumber Line InvoicedQty UnitCost Amount Curr Extended Amount
> Fr Date Company Department Account ItemNum ItemDescription
> --------- ------------------------- ------ --------------------- ------------------
> 1023112-0000 1 1.0000 102.3419 102.34 USD 102.34
> 03A 10/13/10 213 31000 - 10810- 138328 ARMBD ART LN
> Vendor: 1288 ALIMED Buyer:A02 VALERIE BAGALA
>
> 1026244-0000 1 1.0000 284.2525 284.25 USD 284.25
> 03A 10/29/10 213 31000 - 10810- 279784 BAGS CRUSHER
> Vendor: 1338 SHARPE LINES INC Buyer:A02 VALERIE BAGALA
>
> 1024877-0000 1 4.0000 140.4800 561.92 USD 561.92
> 03A 10/26/10 213 31000 - 10810- 235228 SYR 1ML AMBER W/ TIP CAP
> Vendor: 2472 BAXA CORP Buyer:A02 VALERIE BAGALA
>
> 1000066-0000 1 .9000 241.6845 217.52 USD 217.52
> 03A 05/19/10 213 41000 - 10810- 145155 NDL JAMSHIDI 11GA DISP STR
> Vendor: 2686 CARDINAL HEALTH 200 INC Buyer:A04 DEAN SCHUMACHER
>
> --------- ------------------------- ------ --------------------- ------------------
It looks like you have a blank line between each record so you should
probably use paragraph mode to read the data.
> A complete record in the report expands to more than 1 line.
> Each record begin with a line starting with 4 to 7 digits and -0000
> each record ends with a line contains the word Vendor:'
>
> I need to extract some elements from each record and combine the extracted data in one new record (1 line per record)
> so that later I can bcp the new data into a table/database.
>
> Here is my code:
>
> #!/usr/bin/perl -w
> use strict;
>
> my $PO_file = "/home/sybase/scripts125/pl/test/simple_SH135.dat";
> open(IN,"$PO_file") || die "Fail open $PO_file";
You don't need to enclose variables in quotes, this is not a shell script:
perldoc -q quoting
You should probably use the three argument form of open and you should
include the $! variable in the error message so you know why it failed:
open IN, '<", $PO_file or die "Fail open '$PO_file' because: $!";
> my ($line, [at] part1, [at] part2, [at] part3, $rec_part1, $rec_part2, $rec_part3, $complete_record);
You should declare variables in the smallest possible scope instead of
all at the beginning.
> print "Need to extract the following data\n";
> print "PONumber|Quantity|UnitCost|ExtAmt|Date|Company|ItemNumber|D escription|VendorID\n";
>
> while ($line=<IN>) {
while ( my $line = <IN> ) {
> chomp $line;
> $line =~ s/^\s+//go;
The /g option does nothing because the pattern is anchored at the
beginning of the line. The /o option does nothing because there are no
variables in the pattern.
perldoc -q /o
If you used split(' ',$line) instead of split(/\s+/,$line) you wouldn't
need to remove leading whitespace anyway.
> $line =~ s/\s+/ /go;
The /o option does nothing because there are no variables in the
pattern. No need to modify whitespace here when you are just going to
remove them with split later.
> $line =~ s/,//go;
The /o option does nothing because there are no variables in the
pattern. Although I didn't see any ',' characters in your data above,
it is probably better to use tr/// instaed:
$line =~ tr/,//d;
> # Part1
> # Data Example: 1023112-0000 1 1.0000 102.3419 102.34 USD 102.34
> ############################################################ ############Part1:
> if ($line =~ /-0000/){
> #if ($line =~ /^\d{7}-0000/){
> my [at] part1 = split(/\s+/,$line);
> my $PONumber = $part1[0];
> my $Quantity = $part1[2];
> my $UnitCost = $part1[3];
> my $ExtAmt = $part1[6];
Or you could do:
my ( $PONumber, $Quantity, $UnitCost, $ExtAmt ) = ( split ' ', $line )[
0, 2, 3, 6 ];
> $rec_part1 = join "|",($PONumber, $Quantity, $UnitCost, $ExtAmt);
> }# end part 1
>
>
> ### Part2:
> #Data Example: 03A 10/13/10 213 31000 - 10810- 138328 ARMBD ART LN
> ############################################################ ##########
> if ($line =~ /(\d{2}\/\d{2}\/\d{2})/){
> #if ($line =~ /^.{5}(\d{2}\/\d{2}\/\d{2})/){ #Data Eg: 03A 10/13/10 213 31000 #why not work?
> my [at] part2 = split(/\s+/,$line,8); # last group has multiple words is descriptions
> my $PurchFr = $part2[0] ;
> my $Date = $part2[1] ;
> my $Company = $part2[2] ;
> my $Dept = $part2[3] ;
> my $Acct = $part2[5] ;
> my $ItemNumber = $part2[6] ;
> my $Desc = $part2[7];
> #$rec_part2 = join '|',($part2[1],$part2[2] ,$part2[6],$part2[7]);
> $rec_part2 = join '|',($Date,$Company ,$ItemNumber,$Desc);
> #print "rec_part2: $rec_part2 \n";
> }# end Part2
>
>
> ##Part3: VendorID
> # Data Example: Vendor: 1288 ALIMED Buyer:A02 VALERIE BAGALA
> ############################################################ ##########
> if ($line =~ /^Vendor:/){
> my [at] part3 = split(/\s+/,$line);
> my $VendorID = $part3[1];
> $rec_part3 = $VendorID;
> } #end part3
>
> $complete_record = "$rec_part1".'|'."$rec_part2".'|'."$rec_part3";
> print "$complete_record\n";
>
> }#end while
>
> My questions:
>
> I expect my program to produce 4 records, like:
>
> PONumber|Quantity|UnitCost|ExtAmt|Date|Company|ItemNumber|De scription|VendorID
> 1026244-0000|1.0000|284.2525|284.25|10/13/10|213|138328|ARMB D ART LN|1288
> 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1338
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
> 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL JAMSHIDI 11GA DISP STR|2686
>
> However it produces unwanted results, like below. Any pointers would be greatly appreciated.
>
> 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D ART LN|
> 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D ART LN|1288
> 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D ART LN|1288
> 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D ART LN|1288
> 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D ART LN|1288
> 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D ART LN|1288
> 1026244-0000|1.0000|284.2525|284.25|10/13/10|213|138328|ARMB D ART LN|1288
> 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1288
> 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1338
> 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1338
> 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1338
> 1024877-0000|4.0000|140.4800|561.92|10/29/10|213|279784|BAGS CRUSHER|1338
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|1338
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
> 1000066-0000|.9000|241.6845|217.52|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
> 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL JAMSHIDI 11GA DISP STR|2472
> 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL JAMSHIDI 11GA DISP STR|2686
> 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL JAMSHIDI 11GA DISP STR|2686
That is because for every line of input you are printing a line of
output but you really only want to print a line of output for every
*record*.
John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: parse data from a report file
On 27/01/2011 06:58, loan tran wrote:
> Hi All,
>
> I'm trying to parse data from a report file and I'm having trouble producing desired results.
>
> Here is a data example from the report:
>
> PONumber Line InvoicedQty UnitCost Amount Curr Extended Amount
> Fr Date Company Department Account ItemNum ItemDescription
> --------- ------------------------- ------ --------------------- ------------------
> 1023112-0000 1 1.0000 102.3419 102.34 USD 102.34
> 03A 10/13/10 213 31000 - 10810- 138328 ARMBD ART LN
> Vendor: 1288 ALIMED Buyer:A02 VALERIE BAGALA
>
> 1026244-0000 1 1.0000 284.2525 284.25 USD 284.25
> 03A 10/29/10 213 31000 - 10810- 279784 BAGS CRUSHER
> Vendor: 1338 SHARPE LINES INC Buyer:A02 VALERIE BAGALA
>
> 1024877-0000 1 4.0000 140.4800 561.92 USD 561.92
> 03A 10/26/10 213 31000 - 10810- 235228 SYR 1ML AMBER W/ TIP CAP
> Vendor: 2472 BAXA CORP Buyer:A02 VALERIE BAGALA
>
> 1000066-0000 1 .9000 241.6845 217.52 USD 217.52
> 03A 05/19/10 213 41000 - 10810- 145155 NDL JAMSHIDI 11GA DISP STR
> Vendor: 2686 CARDINAL HEALTH 200 INC Buyer:A04 DEAN SCHUMACHER
>
> --------- ------------------------- ------ --------------------- ------------------
>
>
> A complete record in the report expands to more than 1 line.
> Each record begin with a line starting with 4 to 7 digits and -0000
> each record ends with a line contains the word Vendor:'
>
> I need to extract some elements from each record and combine the extracted data in one new record (1 line per record)
> so that later I can bcp the new data into a table/database.
>
> Here is my code:
>
> #!/usr/bin/perl -w
> use strict;
>
> my $PO_file = "/home/sybase/scripts125/pl/test/simple_SH135.dat";
> open(IN,"$PO_file") || die "Fail open $PO_file";
>
> my ($line, [at] part1, [at] part2, [at] part3, $rec_part1, $rec_part2, $rec_part3, $complete_record);
>
> print "Need to extract the following data\n";
> print "PONumber|Quantity|UnitCost|ExtAmt|Date|Company|ItemNumber|D escription|VendorID\n";
>
> while ($line=<IN>) {
> chomp $line;
> $line =~ s/^\s+//go;
> $line =~ s/\s+/ /go;
> $line =~ s/,//go;
>
> # Part1
> # Data Example: 1023112-0000 1 1.0000 102.3419 102.34 USD 102.34
> ############################################################ ############Part1:
> if ($line =~ /-0000/){
> #if ($line =~ /^\d{7}-0000/){
> my [at] part1 = split(/\s+/,$line);
> my $PONumber = $part1[0];
> my $Quantity = $part1[2];
> my $UnitCost = $part1[3];
> my $ExtAmt = $part1[6];
> $rec_part1 = join "|",($PONumber, $Quantity, $UnitCost, $ExtAmt);
> }# end part 1
>
>
> ### Part2:
> #Data Example: 03A 10/13/10 213 31000 - 10810- 138328 ARMBD ART LN
> ############################################################ ##########
> if ($line =~ /(\d{2}\/\d{2}\/\d{2})/){
> #if ($line =~ /^.{5}(\d{2}\/\d{2}\/\d{2})/){ #Data Eg: 03A 10/13/10 213 31000 #why not work?
> my [at] part2 = split(/\s+/,$line,8); # last group has multiple words is descriptions
> my $PurchFr = $part2[0] ;
> my $Date = $part2[1] ;
> my $Company = $part2[2] ;
> my $Dept = $part2[3] ;
> my $Acct = $part2[5] ;
> my $ItemNumber = $part2[6] ;
> my $Desc = $part2[7];
> #$rec_part2 = join '|',($part2[1],$part2[2] ,$part2[6],$part2[7]);
> $rec_part2 = join '|',($Date,$Company ,$ItemNumber,$Desc);
> #print "rec_part2: $rec_part2 \n";
> }# end Part2
>
>
> ##Part3: VendorID
> # Data Example: Vendor: 1288 ALIMED Buyer:A02 VALERIE BAGALA
> ############################################################ ##########
> if ($line =~ /^Vendor:/){
> my [at] part3 = split(/\s+/,$line);
> my $VendorID = $part3[1];
> $rec_part3 = $VendorID;
> } #end part3
>
> $complete_record = "$rec_part1".'|'."$rec_part2".'|'."$rec_part3";
> print "$complete_record\n";
>
> }#end while
>
> My questions:
>
> I expect my program to produce 4 records, like:
>
> PONumber|Quantity|UnitCost|ExtAmt|Date|Company|ItemNumber|De scription|VendorID
> 1026244-0000|1.0000|284.2525|284.25|10/13/10|213|138328|ARMB D ART LN|1288
> 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1338
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
> 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL JAMSHIDI 11GA DISP STR|2686
>
> However it produces unwanted results, like below. Any pointers would be greatly appreciated.
>
> 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D ART LN|
> 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D ART LN|1288
> 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D ART LN|1288
> 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D ART LN|1288
> 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D ART LN|1288
> 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D ART LN|1288
> 1026244-0000|1.0000|284.2525|284.25|10/13/10|213|138328|ARMB D ART LN|1288
> 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1288
> 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1338
> 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1338
> 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS CRUSHER|1338
> 1024877-0000|4.0000|140.4800|561.92|10/29/10|213|279784|BAGS CRUSHER|1338
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|1338
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
> 1000066-0000|.9000|241.6845|217.52|10/26/10|213|235228|SYR 1ML AMBER W/ TIP CAP|2472
> 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL JAMSHIDI 11GA DISP STR|2472
> 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL JAMSHIDI 11GA DISP STR|2686
> 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL JAMSHIDI 11GA DISP STR|2686
My thoughts are of a much simpler solution. I hope the program below helps.
Rob
use strict;
use warnings;
my $po_file = '/home/sybase/scripts125/pl/test/simple_SH135.dat';
while (<$in>) {
chomp;
next unless /-0000/;
my [at] record = (split)[0, 2, 3, 6];
$_ = <DATA>;
push [at] record, (split)[1, 2, 6, 7];
$_ = <DATA>;
die "No 'Vendor' found in record" unless /Vendor:/;
push [at] record, (split)[1];
print join('|', [at] record), "\n";
}
**OUTPUT**
1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D|1288
1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS |1338
1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR| 2472
1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL|2 686
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: parse data from a report file
John and Rob,
Thanks a lot for the pointers. Very helpfull.
--- On Thu, 1/27/11, Rob Dixon <rob.dixon [at] gmx.com> wrote:
> From: Rob Dixon <rob.dixon [at] gmx.com>
> Subject: Re: parse data from a report file
> To: beginners [at] perl.org
> Cc: "loan tran" <loan_tr [at] yahoo.com>
> Date: Thursday, January 27, 2011, 11:33 AM
> On 27/01/2011 06:58, loan tran
> wrote:
> > Hi All,
> >
> > I'm trying to parse data from a report file and I'm
> having trouble producing desired results.
> >
> > Here is a data example from the report:
> >
> >=A0 =A0 =A0
> PONumber=A0=A0=A0Line=A0
> InvoicedQty=A0=A0=A0UnitCost=A0 Amount
> Curr=A0 Extended Amount
> > Fr=A0 =A0 Date=A0=A0=A0Company
> Department=A0=A0=A0Account=A0 ItemNum=A0
> =A0 =A0=A0=A0ItemDescription
> > --------- ------------------------- ------
> --------------------- ------------------
> >=A0 =A0 =A0=A0=A01023112-0000=A0
> =A0
> 1=A0=A0=A01.0000=A0=A0=A0102.3419=A0=A0=A0102.34
> USD=A0 =A0 102.34
> > 03A=A0 10/13/10=A0 213 31000=A0=A0=A0-
> 10810-=A0=A0=A0138328=A0 =A0 =A0 =A0
> ARMBD ART LN
> > Vendor:=A0 =A0=A0=A01288 ALIMED=A0
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0
> =A0 =A0 =A0 =A0 Buyer:A02=A0 VALERIE BAGALA
> >
> >=A0 =A0 =A0=A0=A01026244-0000=A0
> =A0
> 1=A0=A0=A01.0000=A0=A0=A0284.2525=A0=A0=A0284.25
> USD=A0 =A0 =A0=A0=A0284.25
> > 03A=A0 10/29/10=A0 213 31000=A0 -
> 10810-=A0 279784=A0 =A0 =A0
> =A0=A0=A0BAGS CRUSHER
> > Vendor:=A0 =A0=A0=A01338 SHARPE LINES
> INC=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0
> Buyer:A02=A0 VALERIE BAGALA
> >
> >=A0 =A0 =A0=A0=A01024877-0000=A0
> =A0 1=A0
> 4.0000=A0=A0=A0140.4800=A0=A0=A0561.92
> USD=A0=A0=A0561.92
> > 03A=A0 10/26/10=A0 213 31000=A0 -
> 10810-=A0 235228=A0 =A0 =A0
> =A0=A0=A0SYR 1ML AMBER W/ TIP CAP
> > Vendor:=A0 =A0=A0=A02472 BAXA CORP=A0
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0
> =A0 =A0=A0=A0Buyer:A02=A0 VALERIE BAGALA
> >
> >=A0 =A0 =A0=A0=A01000066-0000=A0
> =A0=A0=A01=A0 .9000=A0 241.6845=A0 =A0
> 217.52 USD=A0 =A0=A0=A0217.52
> > 03A=A0 05/19/10=A0 213 41000 - 10810-=A0
> 145155=A0=A0=A0NDL JAMSHIDI 11GA DISP STR
> > Vendor:=A0 =A0=A0=A02686 CARDINAL HEALTH
> 200 INC=A0 =A0 =A0
> =A0=A0=A0Buyer:A04=A0 DEAN SCHUMACHER
> >
> > --------- ------------------------- ------
> --------------------- ------------------
> >
> >
> > A complete record in the report expands to more than 1
> line.
> > Each record begin with a line starting with 4 to 7
> digits and -0000
> > each record ends with a line contains the word
> Vendor:'
> >
> > I need to extract some elements from each record and
> combine the extracted data in one new record (1 line per
> record)
> > so that later I can bcp the new data into a
> table/database.
> >
> > Here is my code:
> >
> > #!/usr/bin/perl=A0 -w
> > use strict;
> >
> > my $PO_file =3D
> "/home/sybase/scripts125/pl/test/simple_SH135.dat";
> > open(IN,"$PO_file") || die "Fail open $PO_file";
> >
> > my ($line, [at] part1, [at] part2, [at] part3, $rec_part1,
> $rec_part2, $rec_part3, $complete_record);
> >
> > print "Need to extract the following data\n";
> > print
> "PONumber|Quantity|UnitCost|ExtAmt|Date|Company|ItemNumber|D escription|Ve=
ndorID\n";
> >
> > while ($line=3D<IN>) {
> >=A0 =A0 chomp $line;
> >=A0 =A0 $line =3D~ s/^\s+//go;
> >=A0 =A0 $line =3D~ s/\s+/ /go;
> >=A0 =A0 $line =3D~ s/,//go;
> >
> >=A0 =A0 # Part1
> >=A0 =A0 # Data Example:=A0 =A0 =A0
> 1023112-0000=A0 =A0
> 1=A0=A0=A01.0000=A0=A0=A0102.3419=A0=A0=A0102.34
> USD=A0 =A0 102.34
> >=A0 =A0
> ############################################################ ############P=
art1:
> >=A0 =A0 if ($line =3D~ /-0000/){
> >=A0 =A0 #if ($line =3D~ /^\d{7}-0000/){
> >=A0 =A0 =A0=A0=A0my [at] part1 =3D
> split(/\s+/,$line);
> >=A0 =A0 =A0=A0=A0my $PONumber =3D
> $part1[0];
> >=A0 =A0 =A0=A0=A0my $Quantity =3D
> $part1[2];
> >=A0 =A0 =A0=A0=A0my $UnitCost =3D
> $part1[3];
> >=A0 =A0 =A0=A0=A0my $ExtAmt =3D
> $part1[6];
> >=A0 =A0 =A0=A0=A0$rec_part1 =3D join
> "|",($PONumber, $Quantity, $UnitCost, $ExtAmt);
> >=A0 =A0 }# end part 1
> >
> >
> >=A0 =A0 ### Part2:
> >=A0 =A0 #Data Example: 03A=A0 10/13/10=A0
> 213 31000=A0=A0=A0-
> 10810-=A0=A0=A0138328=A0 =A0 =A0 =A0
> ARMBD ART LN
> >=A0 =A0
> ############################################################ ##########
> >=A0 =A0 if ($line =3D~ /(\d{2}\/\d{2}\/\d{2})/){
> >=A0 =A0 #if ($line =3D~
> /^.{5}(\d{2}\/\d{2}\/\d{2})/){ #Data Eg: 03A=A0
> 10/13/10=A0 213 31000 #why not work?
> >=A0 =A0 =A0 =A0 my [at] part2 =3D
> split(/\s+/,$line,8);=A0 # last group has multiple words
> is descriptions
> >=A0 =A0 =A0 =A0 my $PurchFr =3D $part2[0] ;
> >=A0 =A0 =A0 =A0 my $Date =3D $part2[1] ;
> >=A0 =A0 =A0 =A0 my $Company =3D $part2[2] ;
> >=A0 =A0 =A0 =A0 my $Dept =3D $part2[3] ;
> >=A0 =A0 =A0 =A0 my $Acct =3D $part2[5] ;
> >=A0 =A0 =A0 =A0 my $ItemNumber =3D $part2[6]
> ;
> >=A0 =A0 =A0 =A0 my $Desc =3D $part2[7];
> >=A0 =A0 =A0 =A0 #$rec_part2 =3D join
> '|',($part2[1],$part2[2] ,$part2[6],$part2[7]);
> >=A0 =A0 =A0 =A0 $rec_part2 =3D join
> '|',($Date,$Company ,$ItemNumber,$Desc);
> >=A0 =A0 =A0 =A0 #print "rec_part2:
> $rec_part2 \n";
> >=A0 =A0 }# end Part2
> >
> >
> >=A0 =A0 ##Part3: VendorID
> >=A0 =A0 # Data Example: Vendor:=A0
> =A0=A0=A01288 ALIMED=A0 =A0 =A0 =A0
> =A0 =A0 =A0 Buyer:A02=A0 VALERIE BAGALA
> >=A0 =A0
> ############################################################ ##########
> >=A0 =A0 if ($line =3D~ /^Vendor:/){
> >=A0 =A0 =A0 my [at] part3 =3D split(/\s+/,$line);
> >=A0 =A0 =A0 my $VendorID =3D $part3[1];
> >=A0 =A0 =A0 $rec_part3 =3D $VendorID;
> >=A0 =A0 } #end part3
> >
> >=A0 =A0 $complete_record =3D
> "$rec_part1".'|'."$rec_part2".'|'."$rec_part3";
> >=A0 =A0 print "$complete_record\n";
> >
> > }#end while
> >
> > My questions:
> >
> > I expect my program to produce 4 records, like:
> >
> >
> PONumber|Quantity|UnitCost|ExtAmt|Date|Company|ItemNumber|De scription|Ven=
dorID
> >
> 1026244-0000|1.0000|284.2525|284.25|10/13/10|213|138328|ARMB D
> ART LN|1288
> >
> 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS
> CRUSHER|1338
> >
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR
> 1ML AMBER W/ TIP CAP|2472
> >
> 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL
> JAMSHIDI 11GA DISP STR|2686
> >
> > However it produces unwanted results, like below. Any
> pointers would be greatly appreciated.
> >
> >
> 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D
> ART LN|
> >
> 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D
> ART LN|1288
> >
> 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D
> ART LN|1288
> >
> 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D
> ART LN|1288
> >
> 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D
> ART LN|1288
> >
> 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D
> ART LN|1288
> >
> 1026244-0000|1.0000|284.2525|284.25|10/13/10|213|138328|ARMB D
> ART LN|1288
> >
> 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS
> CRUSHER|1288
> >
> 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS
> CRUSHER|1338
> >
> 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS
> CRUSHER|1338
> >
> 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS
> CRUSHER|1338
> >
> 1024877-0000|4.0000|140.4800|561.92|10/29/10|213|279784|BAGS
> CRUSHER|1338
> >
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR
> 1ML AMBER W/ TIP CAP|1338
> >
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR
> 1ML AMBER W/ TIP CAP|2472
> >
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR
> 1ML AMBER W/ TIP CAP|2472
> >
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR
> 1ML AMBER W/ TIP CAP|2472
> >
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR
> 1ML AMBER W/ TIP CAP|2472
> >
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR
> 1ML AMBER W/ TIP CAP|2472
> >
> 1000066-0000|.9000|241.6845|217.52|10/26/10|213|235228|SYR
> 1ML AMBER W/ TIP CAP|2472
> >
> 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL
> JAMSHIDI 11GA DISP STR|2472
> >
> 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL
> JAMSHIDI 11GA DISP STR|2686
> >
> 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL
> JAMSHIDI 11GA DISP STR|2686
>
> My thoughts are of a much simpler solution. I hope the
> program below helps.
>
> Rob
>
> use strict;
> use warnings;
>
> my $po_file =3D
> '/home/sybase/scripts125/pl/test/simple_SH135.dat';
>
> while (<$in>) {
>
> =A0=A0=A0chomp;
> =A0=A0=A0next unless /-0000/;
>
> =A0=A0=A0my [at] record =3D (split)[0, 2, 3, 6];
>
> =A0=A0=A0$_ =3D <DATA>;
> =A0=A0=A0push [at] record, (split)[1, 2, 6, 7];
>
> =A0=A0=A0$_ =3D <DATA>;
> =A0=A0=A0die "No 'Vendor' found in record" unless
> /Vendor:/;
> =A0=A0=A0push [at] record, (split)[1];
>
> =A0=A0=A0print join('|', [at] record), "\n";
> }
>
> **OUTPUT**
>
> 1023112-0000|1.0000|102.3419|102.34|10/13/10|213|138328|ARMB D|1288
> 1026244-0000|1.0000|284.2525|284.25|10/29/10|213|279784|BAGS |1338
> 1024877-0000|4.0000|140.4800|561.92|10/26/10|213|235228|SYR| 2472
> 1000066-0000|.9000|241.6845|217.52|05/19/10|213|145155|NDL|2 686
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
> For additional commands, e-mail: beginners-help [at] perl.org
> http://learn.perl.org/
>
>
> =0A=0A=0A
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/