Unicode, ANSI, and regular expressions in Perl 5.8

Unicode, ANSI, and regular expressions in Perl 5.8

am 14.04.2006 23:17:16 von ianm

I attemped to run a simple line-by-line regular expression matching on a
logfile output by another program. The code, however, refused to match.
Even though I saw:

4:39:35 PM: 1382400 bytes video frame encoded.
4:39:35 PM: 19200 bytes audio sample encoded.
Encoding ends at: 4:39:36 PM.
Encoding Successful.

In Notepad, when I eventually did a line by line print, I got:

4 : 3 9 : 3 5 P M : 1 3 8 2 4 0 0 b y t e s v i d e o f r a m e
e n c o d e d .

4 : 3 9 : 3 5 P M : 1 9 2 0 0 b y t e s a u d i o s a m p l e
e n c o d e d .

E n c o d i n g e n d s a t : 4 : 3 9 : 3 6 P M .

E n c o d i n g S u c c e s s f u l .

I then realized that the logfile was in Unicode. After saving the file in
ANSI, the pattern match worked. Unfortunately, I have to automate this
procedure.

So, how do I either 1) convert a line of Unicode into ANSI or 2) tell my
regular expression to wisen up? My research seems to turn up nothing but
the importance of being Unicode and how it will make us international
friends.

Code follows:

# Open file for reading
if (open(INFILE, " $logfilepathandname")) {
$answer = 'false';
while () {
$line = $_;
print $line . "\n";

if ($line =~ m/Successful/) {
$answer = 'true';
}
# print $line . "\n";
}
} else {
$answer = 'Could not open file ' . $logfilepathandname . ".\n";
}

Restrictions: The output file I wish to read is in Unicode. I cannot
change this format. I cannot run the output file through some conversion
program before processing.

Thank you for all of your help.

Regards,

Ian

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: Unicode, ANSI, and regular expressions in Perl 5.8

am 15.04.2006 04:35:46 von HBullock

Try...
use Encode;
open(my $FH, "<:encoding(UTF-16)", $file)


-----Original Message-----
From: activeperl-bounces@listserv.ActiveState.com
[mailto:activeperl-bounces@listserv.ActiveState.com] On Behalf Of
ianm@brick.net
Sent: Friday, April 14, 2006 5:17 PM
To: activeperl@listserv.ActiveState.com
Subject: Unicode, ANSI, and regular expressions in Perl 5.8

I attemped to run a simple line-by-line regular expression matching on a
logfile output by another program. The code, however, refused to match.

Even though I saw:

4:39:35 PM: 1382400 bytes video frame encoded.
4:39:35 PM: 19200 bytes audio sample encoded.
Encoding ends at: 4:39:36 PM.
Encoding Successful.

In Notepad, when I eventually did a line by line print, I got:

4 : 3 9 : 3 5 P M : 1 3 8 2 4 0 0 b y t e s v i d e o f r a m
e
e n c o d e d .

4 : 3 9 : 3 5 P M : 1 9 2 0 0 b y t e s a u d i o s a m p l e

e n c o d e d .

E n c o d i n g e n d s a t : 4 : 3 9 : 3 6 P M .

E n c o d i n g S u c c e s s f u l .

I then realized that the logfile was in Unicode. After saving the file
in
ANSI, the pattern match worked. Unfortunately, I have to automate this
procedure.

So, how do I either 1) convert a line of Unicode into ANSI or 2) tell my
regular expression to wisen up? My research seems to turn up nothing
but
the importance of being Unicode and how it will make us international
friends.

Code follows:

# Open file for reading
if (open(INFILE, " $logfilepathandname")) {
$answer = 'false';
while () {
$line = $_;
print $line . "\n";

if ($line =~ m/Successful/) {
$answer = 'true';
}
# print $line . "\n";
}
} else {
$answer = 'Could not open file ' . $logfilepathandname .
".\n";
}

Restrictions: The output file I wish to read is in Unicode. I cannot
change this format. I cannot run the output file through some
conversion
program before processing.

Thank you for all of your help.

Regards,

Ian

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs