unicode weirdness

I have code that reads the clipboard, expecting text copied out of Firefox
from mint.com. However when I copy the lines I end up with a bunch of
unicode characters mixed in. The n-dash is particularly irritating, and I
want to change it to a regular hyphen.

When I "paste" into a BBEdit UTF8 window, BBEdit says the n-dash is
character x2013, so I thought this code would work:

**************************************
my ($date,$comment,$mcat,$amt) =3D split /\t/;
my [at] pd =3D parseDate($date);
my $ds =3D dateStamp( [at] pd);

# make sure $amt looks like a real amount
$amt =3D~ tr/$,//d;
$amt =3D~ s/\x{2013}/-/g;
**************************************

.... but it does nothing, the substitution doesn't find the n-dash.

So I went in and added this code to test it:

**************************************
print $amt, "\n";
print $_.": ord(".ord($_).") chr(".chr(ord($_)).")\n" for split(//,$amt);
exit;
**************************************

And here's what it prints:

**************************************
=AD16.58
?: ord(226) chr(?)
?: ord(128) chr(?)
?: ord(147) chr(?)
1: ord(49) chr(1)
6: ord(54) chr(6)
..: ord(46) chr(.)
5: ord(53) chr(5)
8: ord(56) chr(8)
**************************************

Uh -- I thought perl treated unicode characters as regular characters (as
opposed to bytes)? Why does the n-dash come up as three separate
characters? How do I change that n-dash into a hyphen?

TIA.

- Bryan



--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Bryan Harris [ Sa, 24 Juli 2010 21:35 ] [ ID #2045075 ]

Re: unicode weirdness

Bryan Harris wrote:

> I have code that reads the clipboard, expecting text copied out of Firefox
> from mint.com. However when I copy the lines I end up with a bunch of
> unicode characters mixed in. The n-dash is particularly irritating, and I
> want to change it to a regular hyphen.

Check out Text::Unidecode.

--
Ruud

--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
rvtol+usenet [ So, 25 Juli 2010 11:13 ] [ ID #2045080 ]
Perl » gmane.comp.lang.perl.beginners » unicode weirdness

Vorheriges Thema: Get variable name from a list
Nächstes Thema: A new recipe as arrived