Re: svn commit: r567258 - in /jakarta/site: docs/ docs/site/ docs/site/downloads/ docs/site/news/ do

On 19/08/07, Roland Weber <ossfwot [at] dubioso.net> wrote:
> sebb wrote:
> > Is there a way to fix build.xml so that the user's default encoding
> > does not affect the output? Or perhaps we could add a check and warn
> > if the encoding is wrong?
> >
> > The xml source files are already flagged as ISO-8859-1, as is the
> > stylesheet, which uses output encoding ISO-8859-1 as well, which one
> > might have hoped would be enough...
>
> I don't know what the exact symptoms of the problem are.

Here is a sample diff:

http://svn.apache.org/viewvc/jakarta/site/docs/site/news/200 206.html?r1=567256&r2=567257

The u-umlaut characters were replaced by ?

[But I don't know exactly how the mangled version was generated.]

> This is what the XSLT spec says about output encodings [1]:
>
> > The encoding attribute specifies the preferred encoding to use for
> > outputting the result tree. XSLT processors are required to respect
> > values of UTF-8 and UTF-16. For other values, if the XSLT processor
> > does not support the specified encoding it may signal an error; if
> > it does not signal an error it should use UTF-8 or UTF-16 instead.

Ah, thanks - that could well explain the problem.

> Is the output generated in UTF-8 or UTF-16? Then the solution
> would be to use one of those as the output encoding, since only
> those are required to be supported on all platforms.

The output is currently generated in iso-8859-1 (or iso-8859-15); the
input is specified using either an actual u-umlaut, or ü

Unfortunately changing to UTF-8 would mean changing all the html files...

I'll see about adding a check - should be easy enough to generate a
dummy html file from an xml containing some accented characters and
check that the result is as expected.

> cheers,
> Roland
>
> [1] http://www.w3.org/TR/xslt#section-XML-Output-Method
>
> ------------------------------------------------------------ ---------
> To unsubscribe, e-mail: general-unsubscribe [at] jakarta.apache.org
> For additional commands, e-mail: general-help [at] jakarta.apache.org
>
>
sebb [ So, 19 August 2007 12:22 ] [ ID #1799300 ]

Re: svn commit: r567258

Hi Sebastian,

> The u-umlaut characters were replaced by ?
>
> [But I don't know exactly how the mangled version was generated.]
>
> The output is currently generated in iso-8859-1 (or iso-8859-15); the
> input is specified using either an actual u-umlaut, or ü

That's a nasty one to track down. Apart from encoding specs in
the style sheet, there's also the encoding in the <?xml?> line
of the source file to consider. The source file specifies
ISO-8859-1. I wonder whether svn might screw up the charset
on co/ci. Isn't there also a tool that does some postprocessing
in order to normalize the XML? If an XML processor generates
UTF instead of the specified ISO-8859-1, and the next processor
expects ISO-* as input, the data could get screwed up. You'd
have to chase all the chain from input to final output.

> I'll see about adding a check - should be easy enough to generate a
> dummy html file from an xml containing some accented characters and
> check that the result is as expected.

That's probably the best approach.

cheers,
Roland
Roland Weber [ So, 19 August 2007 13:37 ] [ ID #1799301 ]

Re: svn commit: r567258

The JDK version used may also have to do with it:
http://issues.apache.org/bugzilla/show_bug.cgi?id=38781

cheers,
Roland
Roland Weber [ So, 19 August 2007 13:58 ] [ ID #1799302 ]
Miscellaneous » gmane.comp.jakarta.general » Re: svn commit: r567258 - in /jakarta/site: docs/ docs/site/ docs/site/downloads/ docs/site/news/ do

Vorheriges Thema: Re: svn commit: r567258 - in /jakarta/site: docs/ docs/site/ docs/site/downloads/ docs/site/news/ do
Nächstes Thema: Re: svn commit: r567258 - in /jakarta/site: docs/ docs/site/ docs/site/downloads/ docs/site/news/ do