Regex to remove non printable characters

Regex to remove non printable characters

am 21.03.2005 14:14:52 von Ramprasad A Padmanabhan

Hello All
I want to remove all characters with ascii values > 127 from a string

Can someone show me a efficient way of doing this.
Currently what I am doing is reading the string char-by-char and check
its ascii value. I think there must be a better way.

Thanks
Ram


----------------------------------------------------------
Netcore Solutions Pvt. Ltd.
Website: http://www.netcore.co.in
Spamtraps: http://cleanmail.netcore.co.in/directory.html
----------------------------------------------------------

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org

Re: Regex to remove non printable characters

am 21.03.2005 15:48:39 von Offer Kaye

On Mon, 21 Mar 2005 18:44:52 +0530, Ramprasad A Padmanabhan wrote:
> Hello All
> I want to remove all characters with ascii values > 127 from a string
>
> Can someone show me a efficient way of doing this.
> Currently what I am doing is reading the string char-by-char and check
> its ascii value. I think there must be a better way.
>
> Thanks
> Ram
>

$string =~ s/(.)/(ord($1) > 127) ? "" : $1/egs;

Overview:
* For every ("g") char(".") (saved in $1), if its numeric code is >
127, replace it with an empty string, otherwise leave it alone.

Explanation:
* if the string to be modified is saved in $string, than:
$string =~ s/(.)/EXPR/g;
will replace every character in $string with EXPR, due to the "g" modifier.
* Since I'm using the "e" modifier as well, EXPR is taken to be Perl
code, which is evaluated (think as "e" as short for "eval"), and the
value it returns is used for the substitution. It basically means
means writing an inline subroutine that returns the value you want,
depending on the current char (which was saved in $1).
* See "perldoc -f ord" for an explanation of the ord() function, and
"perldoc perlop" for an explanation of the ?: "Conditional Operator".
* The "s" modifier is just an extra precaution on my side - if you
have embedded newlines in your string, "." will not match them, unless
you use the "s" modifier. Actually in this case it is not really
needed, I guess.

Hope this helps,
--
Offer Kaye

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org

Re: Regex to remove non printable characters

am 21.03.2005 21:53:36 von krahnj

Ramprasad A Padmanabhan wrote:
> Hello All

Hello,

> I want to remove all characters with ascii values > 127 from a string

By definition ASCII only includes the characters in the range 0 to 127 so
those are non-ASCII characters.

> Can someone show me a efficient way of doing this.
> Currently what I am doing is reading the string char-by-char and check
> its ascii value. I think there must be a better way.

$string =~ tr/\x80-\xFF//d;


John
--
use Perl;
program
fulfillment

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org

Re: Regex to remove non printable characters

am 22.03.2005 13:13:54 von Offer Kaye

On Mon, 21 Mar 2005 12:53:36 -0800, John W. Krahn wrote:
>
> $string =~ tr/\x80-\xFF//d;
>


No no, he can't use that - that solution is much too elegant! It will
also quite probably run faster than my suggested solution!


Very nice solution! Here's a variation, using the "s///" operator:
$string =~ s/[\x80-\xFF]//g;

--
Offer Kaye

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org

Re: Regex to remove non printable characters

am 22.03.2005 13:18:16 von Chris Devers

On Tue, 22 Mar 2005, Offer Kaye wrote:

> On Mon, 21 Mar 2005 12:53:36 -0800, John W. Krahn wrote:
>
> > $string =~ tr/\x80-\xFF//d;
>
> Very nice solution! Here's a variation, using the "s///" operator:
> $string =~ s/[\x80-\xFF]//g;

If you benchmark it, I suspect the tr/// version will be much faster.

It's a simpler operation than s///, so if you can get away with using a
translation instead of a substitution, you should get a speed boost.
In a lot of cases, the tr/// is *too* simple, and you're stuck. But in
this example, it works, and should do well.

As always though, the only way to be positive is to measure it :-)


--
Chris Devers

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org

Re: Regex to remove non printable characters

am 22.03.2005 21:13:45 von krahnj

Offer Kaye wrote:
> On Mon, 21 Mar 2005 12:53:36 -0800, John W. Krahn wrote:
>
>>$string =~ tr/\x80-\xFF//d;
>
>
> No no, he can't use that - that solution is much too elegant! It will
> also quite probably run faster than my suggested solution!
>

>
> Very nice solution! Here's a variation, using the "s///" operator:
> $string =~ s/[\x80-\xFF]//g;

Well, s/he did say s/he wanted a regex so I originally thought
s/[^[:ascii:]]+//g but then s/he asked for "a efficient way" so I had to go
with transliteration. :-)


John
--
use Perl;
program
fulfillment

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org