mod_perl unicode, cgi and binmode

I need to process and output data delivered via a webbrowser using the
CGI-interface.
To deal with "real" unicode-data I set the whole STDIN and STDOUT to
utf8 with binmode (as recommended at
http://www.perldoc.com/perl5.8.0/pod/perluniintro.html. My script would
not work otherwise)

While this works perfect in a standard CGI-environment it does not work
under mod_perl. Perl reads the input from the CGI-form and does not read
it as unicode.


I set up a simple script, that reads lines from a textfield and prints
out the sorted lines. (sort order according to german locale)

As long as you only enter "standard" western chars like A-Z everything
is fine, but as soon as you come to german umlauts, special spanish
chars or whatever, the script produces garbage under mod_perl.

mod_perl:
http://www.goldfisch.at/mod_perl/unicodetest7.pl

standard-cgi:
http://www.customers.goldfisch.at/cgi-bin/unicodetest7.pl

perl is 5.8.5 and mod_perl is latest 1.99_16 and apache 2.0.51.

If somebody shows me a way how to read unicode without using binmode, I
would be very glad too. I didnt manage to get "real" unicode without it.

thnx a lot,
peter

---------------unicodetest7.pl------------------------------ -------
#!/usr/local/bin/perl -w
use CGI;
use strict;

use POSIX qw(locale_h);
use locale;
setlocale(LC_COLLATE, "de_AT");

binmode(STDOUT,":utf8");
binmode(STDIN,":utf8");

my $query = new CGI;
my $charset = 'UTF-8';
$CGI::XHTML= 0;
print
$query->header(-charset=>$charset),$query->start_html(-title =>'Unicodetest');
print "cgi-version = ",$CGI::VERSION," \x{263a}","<br><br>\n";

if ($query->param('submit'))
{
print "your input sorted : <br><br>";

my $si=$query->param('unicode');
$si=~s/\r//g;
# --- the following is to fix some unresolved CGI-problem
my $sin='';
foreach(0..length($si)-1) {
$sin.=chr(ord(substr($si,$_,1)))
};
$si=$sin;
#----

foreach (sort( split(/\n/,$si))) {
s/\r|\n//g;
print $_;
print "  (length=",length($_),")";
print "  ";
foreach my $i (0..length($_)-1) {
print sprintf ("%04x",ord(substr($_,$i,1)))." ";
}
print "<br>\n";
}
}

print '<br><br>enter your unicode-testtext here :
',$query->start_multipart_form,
$query->textarea(-name=>'unicode',-rows=>10,-columns=>100),
"\n<br>\n",
$query->submit(-name=>'submit',-value=>'proceed'),"\n",
$query->endform,"\n";
print $query->end_html;
----------------------------





--
mag. peter pilsl
goldfisch.at
IT-management
tel +43 699 1 3574035
fax +43 699 4 3574035
pilsl [at] goldfisch.at

--
Report problems: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html
List etiquette: http://perl.apache.org/maillist/email-etiquette.html
pilsl [ Di, 28 September 2004 00:43 ] [ ID #263714 ]

Re: mod_perl unicode, cgi and binmode

peter pilsl wrote:
> I need to process and output data delivered via a webbrowser using the
> CGI-interface.
> To deal with "real" unicode-data I set the whole STDIN and STDOUT to
> utf8 with binmode (as recommended at
> http://www.perldoc.com/perl5.8.0/pod/perluniintro.html. My script would
> not work otherwise)
>
> While this works perfect in a standard CGI-environment it does not work
> under mod_perl. Perl reads the input from the CGI-form and does not read
> it as unicode.

STDIN is not used with mod_perl. I'd say, don't use CGI::param() directly,
use your own param wrapper function(s) that call Encode::decode_utf8() or
utf8::decode() for the returned values. Wrapper functions are useful anyway
for untainting input or supporting more than one CGI input module (like
Apache::Request in addition to CGI.pm).

Simplified example:

sub param
{
my $str = undef;
if (MODPERL) { $str = $apr->param(shift()) }
else { $str = $cgi->param(shift()) }
utf8::decode($str);
return $str;
}

--
Report problems: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html
List etiquette: http://perl.apache.org/maillist/email-etiquette.html
Markus Wichitill [ Di, 28 September 2004 02:12 ] [ ID #263715 ]

Re: mod_perl unicode, cgi and binmode

Markus Wichitill wrote:
> peter pilsl wrote:
>
>> I need to process and output data delivered via a webbrowser using the
>> CGI-interface.
>> To deal with "real" unicode-data I set the whole STDIN and STDOUT to
>> utf8 with binmode (as recommended at
>> http://www.perldoc.com/perl5.8.0/pod/perluniintro.html. My script
>> would not work otherwise)
>>
>> While this works perfect in a standard CGI-environment it does not
>> work under mod_perl. Perl reads the input from the CGI-form and does
>> not read it as unicode.
>
>
> STDIN is not used with mod_perl.

It depends on how you write your program. When you don't qualify your read
and print calls with $r, then you do use STDIN, though mod_perl overrides
it, and does the qualified $r->read() calls behind the scenes (via the
perlio layer), but essentially mod_cgi and mod_perl do exactly the same
thing at the end. If you turn the binmode inside your script, I think it
should work just fine, since the perlio layer subclasses
PerlIOBase_binmode, which is supposed to do the right thing. You can find
a few examples of its usage in the modperl test suite (in the source
package), just grep for 'binmode'.

> I'd say, don't use CGI::param()
> directly, use your own param wrapper function(s) that call
> Encode::decode_utf8() or utf8::decode() for the returned values. Wrapper
> functions are useful anyway for untainting input or supporting more than
> one CGI input module (like Apache::Request in addition to CGI.pm).
>
> Simplified example:
>
> sub param
> {
> my $str = undef;
> if (MODPERL) { $str = $apr->param(shift()) }
> else { $str = $cgi->param(shift()) }
> utf8::decode($str);
> return $str;
> }

Boris (CC'ed) has started a similar discussion on the modperl dev list,
which is now redirected to the apreq list (the home of Apache::Request),
Boris is going to post the details on how to make Apache::Request handle
unicode/utf8 transparently for the users. I can't see the post yet, but it
should happen soon. subscribe to the apreq-dev-subscribe [at] httpd.apache.org
if you want to take part in that discussion.

--
____________________________________________________________ ______
Stas Bekman JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide ---> http://perl.apache.org
mailto:stas [at] stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org http://ticketmaster.com

--
Report problems: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html
List etiquette: http://perl.apache.org/maillist/email-etiquette.html
Stas Bekman [ Di, 28 September 2004 03:32 ] [ ID #263716 ]

Re: mod_perl unicode, cgi and binmode

Stas Bekman wrote:
>> STDIN is not used with mod_perl.
>
> It depends on how you write your program. When you don't qualify your
> read and print calls with $r, then you do use STDIN, though mod_perl
> overrides it, and does the qualified $r->read() calls behind the scenes
> (via the perlio layer), but essentially mod_cgi and mod_perl do exactly
> the same thing at the end. If you turn the binmode inside your script, I
> think it should work just fine, since the perlio layer subclasses
> PerlIOBase_binmode, which is supposed to do the right thing.

Yes, I was just talking about his example, which uses CGI.pm, which in turn
gets its input from $r->args and $r->read under mod_perl, so binmode(STDIN)
won't help.

> Boris (CC'ed) has started a similar discussion on the modperl dev list,
> which is now redirected to the apreq list (the home of Apache::Request),
> Boris is going to post the details on how to make Apache::Request handle
> unicode/utf8 transparently for the users. I can't see the post yet, but
> it should happen soon. subscribe to the
> apreq-dev-subscribe [at] httpd.apache.org if you want to take part in that
> discussion.

I've seen the discussion, but I'm not really interested in UTF-8 support for
either APR::Table or Apache::Request, since I don't put my own strings in
APR tables and I use param wrapper functions anyway. Calling utf8::decode()
for a few parameters is no big deal. I'm more interested in the UTF-8
support that's hopefully coming with DBI 1.44, since rewriting hundreds of
strings in hashes fetched via DBI would be much more of a performance issue.

--
Report problems: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html
List etiquette: http://perl.apache.org/maillist/email-etiquette.html
Markus Wichitill [ Di, 28 September 2004 04:47 ] [ ID #263717 ]
Webserver » gmane.comp.apache.mod-perl » mod_perl unicode, cgi and binmode

Vorheriges Thema: [patch CGI.pm] prevent some warnings
Nächstes Thema: mod_perl sometimes prints to error log instead of client