Encoding problem in SQLXML - bulkload

I got an XML Data in UTF-8 encoding like this

Dah$)A(&li

but when I run the schema to insert data into SQL Server 2005 Express
edition, the data will become like this

Dah$)A!'¦li

How can I solve the encoding problem?

XML data:
=======================================

<crew program='SH008774030000'>
<member>
<role>Director</role>
<givenname>Dah$)A(&li</givenname>
<surname>Hall</surname>
</member>
<member>
<role>Writer</role>
<givenname>Dah$)A(&li</givenname>
<surname>Hall</surname>
</member>
</crew>


XML schema:
=======================================

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
xmlns:sql="urn:schemas-microsoft-com:mapping-schema">

<xsd:annotation>
<xsd:appinfo>
<sql:relationship name="OrderOD" parent="productionID"
parent-key="program"
child="productionCrew" child-key="program"/>

<sql:relationship name="ODProduct" parent="productionCrew"
parent-key="role givenname surname" child="crew"
child-key="role givenname surname"/>
</xsd:appinfo>
</xsd:annotation>


<xsd:element name="crew" sql:relation="productionID">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="member" sql:relation="crew"
sql:relationship="OrderOD ODProduct">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="role" type="xsd:string"/>
<xsd:element name="givenname" type="xsd:string"/>
<xsd:element name="surname" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
<xsd:attribute name="program" type="xsd:string"/>
</xsd:complexType>
</xsd:element>


</xsd:schema>

Database table:
=======================================

CREATE TABLE ProgramListings.dbo.productionCrew
(
program VARCHAR(20),
role VARCHAR(20),
givenname VARCHAR(20),
surname VARCHAR(20),
PRIMARY KEY(program, role, surname)
)

CREATE TABLE ProgramListings.dbo.crew
(
role VARCHAR(20),
givenname VARCHAR(20),
surname VARCHAR(20),
PRIMARY KEY(role, surname)

)
worlman385 [ So, 30 M盲rz 2008 11:42 ] [ ID #1931569 ]

Re: Encoding problem in SQLXML - bulkload

>I got an XML Data in UTF-8 encoding like this
>
> Dah$)A(&li
>
> but when I run the schema to insert data into SQL Server 2005 Express
> edition, the data will become like this
>
> Dah$)A!'0"7li
>
> How can I solve the encoding problem?

Your XML is not well-formed. Entity references need to be specified in
place of illegal XML characters (e.g. "&" instead of "&"). Note that
this is not specific to SQLXML but part of the basic XML standards. See
http://www.w3.org/TR/REC-xml/.

A CDATA section is commonly used in XML in order to eliminate the need to
escape illegal characters. In a CDATA section, only the end tag ("]]>").
For example:

<crew program='SH008774030000'>
<member>
<role>Director</role>
<givenname><![CDATA[Dah$)A(&li]]></givenname>
<surname>Hall</surname>
</member>
<member>
<role>Writer</role>
<givenname><![CDATA[Dah$)A(&li]]></givenname>
<surname>Hall</surname>
</member>
</crew>

--
Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

<worlman385 [at] yahoo.com> wrote in message
news:23ouu39b3arbpskfmgdtfof9b81ilskjg3 [at] 4ax.com...
>I got an XML Data in UTF-8 encoding like this
>
> Dah$)A(&li
>
> but when I run the schema to insert data into SQL Server 2005 Express
> edition, the data will become like this
>
> Dah$)A!'0"7li
>
> How can I solve the encoding problem?
>
> XML data:
> =======================================
>
> <crew program='SH008774030000'>
> <member>
> <role>Director</role>
> <givenname>Dah$)A(&li</givenname>
> <surname>Hall</surname>
> </member>
> <member>
> <role>Writer</role>
> <givenname>Dah$)A(&li</givenname>
> <surname>Hall</surname>
> </member>
> </crew>
>
>
> XML schema:
> =======================================
>
> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
> elementFormDefault="qualified"
> xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
>
> <xsd:annotation>
> <xsd:appinfo>
> <sql:relationship name="OrderOD" parent="productionID"
> parent-key="program"
> child="productionCrew" child-key="program"/>
>
> <sql:relationship name="ODProduct" parent="productionCrew"
> parent-key="role givenname surname" child="crew"
> child-key="role givenname surname"/>
> </xsd:appinfo>
> </xsd:annotation>
>
>
> <xsd:element name="crew" sql:relation="productionID">
> <xsd:complexType>
> <xsd:sequence>
> <xsd:element name="member" sql:relation="crew"
> sql:relationship="OrderOD ODProduct">
> <xsd:complexType>
> <xsd:sequence>
> <xsd:element name="role" type="xsd:string"/>
> <xsd:element name="givenname" type="xsd:string"/>
> <xsd:element name="surname" type="xsd:string"/>
> </xsd:sequence>
> </xsd:complexType>
> </xsd:element>
> </xsd:sequence>
> <xsd:attribute name="program" type="xsd:string"/>
> </xsd:complexType>
> </xsd:element>
>
>
> </xsd:schema>
>
> Database table:
> =======================================
>
> CREATE TABLE ProgramListings.dbo.productionCrew
> (
> program VARCHAR(20),
> role VARCHAR(20),
> givenname VARCHAR(20),
> surname VARCHAR(20),
> PRIMARY KEY(program, role, surname)
> )
>
> CREATE TABLE ProgramListings.dbo.crew
> (
> role VARCHAR(20),
> givenname VARCHAR(20),
> surname VARCHAR(20),
> PRIMARY KEY(role, surname)
>
> )
Dan Guzman [ So, 30 M盲rz 2008 16:37 ] [ ID #1931578 ]

Re: Encoding problem in SQLXML - bulkload

> In a CDATA section, only the end tag ("]]>").

This sentence should have been:

In a CDATA section, only the end tag ("]]>") is recognized as markup.


--
Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

"Dan Guzman" <guzmanda [at] nospam-online.sbcglobal.net> wrote in message
news:6E2D5295-651D-4DE4-ACF6-149251E0433B [at] microsoft.com...
> >I got an XML Data in UTF-8 encoding like this
>>
>> Dah$)A(&li
>>
>> but when I run the schema to insert data into SQL Server 2005 Express
>> edition, the data will become like this
>>
>> Dah$)A!'0"7li
>>
>> How can I solve the encoding problem?
>
> Your XML is not well-formed. Entity references need to be specified in
> place of illegal XML characters (e.g. "&" instead of "&"). Note that
> this is not specific to SQLXML but part of the basic XML standards. See
> http://www.w3.org/TR/REC-xml/.
>
> A CDATA section is commonly used in XML in order to eliminate the need to
> escape illegal characters. In a CDATA section, only the end tag ("]]>").
> For example:
>
> <crew program='SH008774030000'>
> <member>
> <role>Director</role>
> <givenname><![CDATA[Dah$)A(&li]]></givenname>
> <surname>Hall</surname>
> </member>
> <member>
> <role>Writer</role>
> <givenname><![CDATA[Dah$)A(&li]]></givenname>
> <surname>Hall</surname>
> </member>
> </crew>
>
> --
> Hope this helps.
>
> Dan Guzman
> SQL Server MVP
> http://weblogs.sqlteam.com/dang/
>
> <worlman385 [at] yahoo.com> wrote in message
> news:23ouu39b3arbpskfmgdtfof9b81ilskjg3 [at] 4ax.com...
>>I got an XML Data in UTF-8 encoding like this
>>
>> Dah$)A(&li
>>
>> but when I run the schema to insert data into SQL Server 2005 Express
>> edition, the data will become like this
>>
>> Dah$)A!'0"7li
>>
>> How can I solve the encoding problem?
>>
>> XML data:
>> =======================================
>>
>> <crew program='SH008774030000'>
>> <member>
>> <role>Director</role>
>> <givenname>Dah$)A(&li</givenname>
>> <surname>Hall</surname>
>> </member>
>> <member>
>> <role>Writer</role>
>> <givenname>Dah$)A(&li</givenname>
>> <surname>Hall</surname>
>> </member>
>> </crew>
>>
>>
>> XML schema:
>> =======================================
>>
>> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
>> elementFormDefault="qualified"
>> xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
>>
>> <xsd:annotation>
>> <xsd:appinfo>
>> <sql:relationship name="OrderOD" parent="productionID"
>> parent-key="program"
>> child="productionCrew" child-key="program"/>
>>
>> <sql:relationship name="ODProduct" parent="productionCrew"
>> parent-key="role givenname surname" child="crew"
>> child-key="role givenname surname"/>
>> </xsd:appinfo>
>> </xsd:annotation>
>>
>>
>> <xsd:element name="crew" sql:relation="productionID">
>> <xsd:complexType>
>> <xsd:sequence>
>> <xsd:element name="member" sql:relation="crew"
>> sql:relationship="OrderOD ODProduct">
>> <xsd:complexType>
>> <xsd:sequence>
>> <xsd:element name="role" type="xsd:string"/>
>> <xsd:element name="givenname" type="xsd:string"/>
>> <xsd:element name="surname" type="xsd:string"/>
>> </xsd:sequence>
>> </xsd:complexType>
>> </xsd:element>
>> </xsd:sequence>
>> <xsd:attribute name="program" type="xsd:string"/>
>> </xsd:complexType>
>> </xsd:element>
>>
>>
>> </xsd:schema>
>>
>> Database table:
>> =======================================
>>
>> CREATE TABLE ProgramListings.dbo.productionCrew
>> (
>> program VARCHAR(20),
>> role VARCHAR(20),
>> givenname VARCHAR(20),
>> surname VARCHAR(20),
>> PRIMARY KEY(program, role, surname)
>> )
>>
>> CREATE TABLE ProgramListings.dbo.crew
>> (
>> role VARCHAR(20),
>> givenname VARCHAR(20),
>> surname VARCHAR(20),
>> PRIMARY KEY(role, surname)
>>
>> )
>
Dan Guzman [ So, 30 M盲rz 2008 16:56 ] [ ID #1932805 ]

Re: Encoding problem in SQLXML - bulkload

Thanks your help Dan!

But no! the XML is well formed:
http://www.oniva.com/upload/1356/x1.jpg

but after using the COM object of SQLXML to load XML file into
Database, the data will look like this:
http://www.oniva.com/upload/1356/x1.jpg

notice the givenname data is changed.

I think the input is UTF-8 data but the SQLXML interface convert UTF-8
to ASCII so the data is messed up when loaded from XML to database.

Since some data in XML is non-ASCII



>Your XML is not well-formed. Entity references need to be specified in
>place of illegal XML characters (e.g. "&" instead of "&"). Note that
>this is not specific to SQLXML but part of the basic XML standards. See
>http://www.w3.org/TR/REC-xml/.
>
>A CDATA section is commonly used in XML in order to eliminate the need to
>escape illegal characters. In a CDATA section, only the end tag ("]]>").
>For example:
>
><crew program='SH008774030000'>
><member>
><role>Director</role>
><givenname><![CDATA[Dah$)A(&li]]></givenname>
><surname>Hall</surname>
></member>
><member>
><role>Writer</role>
><givenname><![CDATA[Dah$)A(&li]]></givenname>
><surname>Hall</surname>
></member>
></crew>
worlman385 [ Mo, 31 M盲rz 2008 00:46 ] [ ID #1932808 ]

Re: Encoding problem in SQLXML - bulkload

Sorry, the link of second one should be
http://www.oniva.com/upload/1356/x2.jpg

>
>but after using the COM object of SQLXML to load XML file into
>Database, the data will look like this:
>http://www.oniva.com/upload/1356/x1.jpg
>
worlman385 [ Mo, 31 M盲rz 2008 00:57 ] [ ID #1932811 ]

Re: Encoding problem in SQLXML - bulkload

> I think the input is UTF-8 data but the SQLXML interface convert UTF-8
> to ASCII so the data is messed up when loaded from XML to database.

Does your XML include a processing instruction to specify UTF-8 encoding?
For example:

<?xml version="1.0" encoding="UTF-8" ?>

--
Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

<worlman385 [at] yahoo.com> wrote in message
news:7m50v3dsmbcdqcrvhf2qm0c0q0apc724mo [at] 4ax.com...
> Thanks your help Dan!
>
> But no! the XML is well formed:
> http://www.oniva.com/upload/1356/x1.jpg
>
> but after using the COM object of SQLXML to load XML file into
> Database, the data will look like this:
> http://www.oniva.com/upload/1356/x1.jpg
>
> notice the givenname data is changed.
>
> I think the input is UTF-8 data but the SQLXML interface convert UTF-8
> to ASCII so the data is messed up when loaded from XML to database.
>
> Since some data in XML is non-ASCII
>
>
>
>>Your XML is not well-formed. Entity references need to be specified in
>>place of illegal XML characters (e.g. "&" instead of "&"). Note that
>>this is not specific to SQLXML but part of the basic XML standards. See
>>http://www.w3.org/TR/REC-xml/.
>>
>>A CDATA section is commonly used in XML in order to eliminate the need to
>>escape illegal characters. In a CDATA section, only the end tag ("]]>").
>>For example:
>>
>><crew program='SH008774030000'>
>><member>
>><role>Director</role>
>><givenname><![CDATA[Dah$)A(&li]]></givenname>
>><surname>Hall</surname>
>></member>
>><member>
>><role>Writer</role>
>><givenname><![CDATA[Dah$)A(&li]]></givenname>
>><surname>Hall</surname>
>></member>
>></crew>
Dan Guzman [ Mo, 31 M盲rz 2008 00:11 ] [ ID #1932814 ]

Re: Encoding problem in SQLXML - bulkload

(worlman385 [at] yahoo.com) writes:
> Thanks your help Dan!
>
> But no! the XML is well formed:
> http://www.oniva.com/upload/1356/x1.jpg
>
> but after using the COM object of SQLXML to load XML file into
> Database, the data will look like this:
> http://www.oniva.com/upload/1356/x1.jpg
>
> notice the givenname data is changed.
>
> I think the input is UTF-8 data but the SQLXML interface convert UTF-8
> to ASCII so the data is messed up when loaded from XML to database.

Your original post was encoded in iso-2022-cn, which may explain why I
and Dan so very funny characters.

Since your target columns are varchar, I need to ask: what is the collation
of these columns? If that collation does not include in its ANSI set,
you cannot get the name right. Then again, then you should get "e". What
you got appears to be UTF-8 interpreted as ANSI.

I don't have any experience of XML bulkload, so I don't know what is
going on. You could try to add

<?xml version="1.0" encoding="utf-8" ?>

first in the file, even though this should not be needed since UTF-8
is the default for XML.


--
Erland Sommarskog, SQL Server MVP, esquel [at] sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx
Erland Sommarskog [ Mo, 31 M盲rz 2008 00:13 ] [ ID #1932815 ]

Re: Encoding problem in SQLXML - bulkload

Yes, I put the string like
<?xml version="1.0" encoding="UTF-8" ?>
also tried other encoding string but no luck.

I tried to put the parameter in the connection string:
sendStringParametersAsUnicode=true

but doesn't help

=======================
hr = pIXMLBulkLoad->put_ConnectionString(L"provider=SQLOLEDB;data
source=(local)\\SQLEXPRESS;database=ProgramListings;integrat ed
security=SSPI;sendStringParametersAsUnicode=true");
}









>> I think the input is UTF-8 data but the SQLXML interface convert UTF-8
>> to ASCII so the data is messed up when loaded from XML to database.
>
>Does your XML include a processing instruction to specify UTF-8 encoding?
>For example:
>
><?xml version="1.0" encoding="UTF-8" ?>
worlman385 [ Mo, 31 M盲rz 2008 01:25 ] [ ID #1932816 ]

Re: Encoding problem in SQLXML - bulkload

Just solve the problem

use NVARCHAR instead of VARCHAR

NVARCHAR - support unicdoe

==============

CREATE TABLE ProgramListings.dbo.productionCrew
(
program NVARCHAR(20),
role NVARCHAR(20),
givenname NVARCHAR(20),
surname NVARCHAR(20),
PRIMARY KEY(program, role, surname)
)


>> I think the input is UTF-8 data but the SQLXML interface convert UTF-8
>> to ASCII so the data is messed up when loaded from XML to database.
>
>Your original post was encoded in iso-2022-cn, which may explain why I
>and Dan so very funny characters.
>
>Since your target columns are varchar, I need to ask: what is the collation
>of these columns? If that collation does not include $)A(& in its ANSI set,
>you cannot get the name right. Then again, then you should get "e". What
>you got appears to be UTF-8 interpreted as ANSI.
>
>I don't have any experience of XML bulkload, so I don't know what is
>going on. You could try to add
>
> <?xml version="1.0" encoding="utf-8" ?>
>
>first in the file, even though this should not be needed since UTF-8
>is the default for XML.
>
worlman385 [ Mo, 31 M盲rz 2008 01:34 ] [ ID #1932817 ]

Re: Encoding problem in SQLXML - bulkload

Just solve the problem

use NVARCHAR instead of VARCHAR

NVARCHAR - support unicdoe

==============

CREATE TABLE ProgramListings.dbo.productionCrew
(
program NVARCHAR(20),
role NVARCHAR(20),
givenname NVARCHAR(20),
surname NVARCHAR(20),
PRIMARY KEY(program, role, surname)
)
worlman385 [ Mo, 31 M盲rz 2008 01:34 ] [ ID #1932818 ]

Re: Encoding problem in SQLXML - bulkload

Just solve the problem

use NVARCHAR instead of VARCHAR

NVARCHAR - support unicdoe

==============

CREATE TABLE ProgramListings.dbo.productionCrew
(
program NVARCHAR(20),
role NVARCHAR(20),
givenname NVARCHAR(20),
surname NVARCHAR(20),
PRIMARY KEY(program, role, surname)
)
worlman385 [ Mo, 31 M盲rz 2008 01:34 ] [ ID #1932819 ]

Re: Encoding problem in SQLXML - bulkload

> Just solve the problem
>
> use NVARCHAR instead of VARCHAR

I'm glad you were able to figure this out. I had assumed that your database
default collation was appropriate for the characters being stored.

--
Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

<worlman385 [at] yahoo.com> wrote in message
news:4s80v3dun5cl5vf7vks6mnkqt7tdk116uq [at] 4ax.com...
> Just solve the problem
>
> use NVARCHAR instead of VARCHAR
>
> NVARCHAR - support unicdoe
>
> ==============
>
> CREATE TABLE ProgramListings.dbo.productionCrew
> (
> program NVARCHAR(20),
> role NVARCHAR(20),
> givenname NVARCHAR(20),
> surname NVARCHAR(20),
> PRIMARY KEY(program, role, surname)
> )
Dan Guzman [ Mo, 31 M盲rz 2008 13:42 ] [ ID #1932831 ]
Datenbanken » comp.databases.ms-sqlserver » Encoding problem in SQLXML - bulkload

Vorheriges Thema: Re: Help Needed For writting a query (SQl Server 2005).
Nächstes Thema: Re: How to Interrogate SQL Server Tables for Specific Values - Here's