<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2800.1505" name=GENERATOR></HEAD>
<BODY style="MARGIN: 4px 4px 1px; FONT: 10pt Tahoma">
<DIV dir=ltr align=left><SPAN class=873102612-30062005>Luc,</SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005>Bytes are not the same
things as characters! There exist several conventions ("encodings") for
representing characters by a byte sequence. XML has the Unicode
character set (there are quite a lot of characters in it, see the code charts at
<A href="http://www.unicode.org">http://www.unicode.org</A>) and their default
encoding is UTF-8, but other encodings can be used as well.</SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005>In an UTF-8 encoding,
only characters under 127 (0x7F) are represented by a single byte. The
non-breaking space character '0xA0' is represented by the byte sequence 'C2
A0'. Your sample document has some of these, for instance within the
<Auteur> tag for <Ouvrage> where <Nuart> contains
"9610767":</SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN
class=873102612-30062005>...<BR>000001b0 3c 2f 54 69 74 72 65 3e 3c
41 75 74 65 75 72 3e </Titre><Auteur></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005>000001c0 c2
a0 3c 2f 41 75 74 65 75 72 3e 3c 50 72 69 78
..</Auteur><Prix<BR>...</SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=873102612-30062005>Where you see the dodgy 'A0' byte
(at file offset 0x00001140, if I'm not mistaken), you should have 'C2 A0',
i.e. two bytes instead of one. You may need to check how these data are
generated.</SPAN></DIV>
<DIV><SPAN class=873102612-30062005></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005>Look for an
explanation on UTF-8 (and other) encodings on the Web -- you will see
that there's more about it than one might have expected.</SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005>Best
regards,</SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005>--</SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005>Jacques
Deseyne</SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005></SPAN> </DIV><BR>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<B>From:</B> owner-xep-support@renderx.com
[mailto:owner-xep-support@renderx.com] <B>On Behalf Of </B>LUC
AUDRAIN<BR><B>Sent:</B> Thursday, June 30, 2005 11:58 AM<BR><B>To:</B>
msulyaev@renderx.com; xep-support@renderx.com<BR><B>Subject:</B> Rép. : Re:
[xep-support] Invalid UTF-8 byte<BR><BR></DIV>
<DIV></DIV>
<DIV>Hello Michael,</DIV>
<DIV> </DIV>
<DIV>I Think that it is an 0A I have after the xml declaration, as I have at
the end of each line of this file. The invalid UTF-8 byte is a0xA0.</DIV>
<DIV> </DIV>
<DIV>Looking a bit more precisely, I have found this 'A0' byte : it is in the
ligne beginning with "<Nuart>4776027" inside the element Run.</DIV>
<DIV> </DIV>
<DIV>Now, I still don't understand why it is an invalid UTF-8 byte, because
when I open this file in UltraEdit in Hex mode I see "00A0" and "00A0" is a
valid Unicode character! I may filter it here, but in some case, I may need it
as it is the "NO-BREAK SPACE".</DIV>
<DIV> </DIV>
<DIV>What's wrong.</DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV>Best regards</DIV>
<DIV> </DIV>
<DIV>Luc AUDRAIN<BR>__________________________________<BR>DSI /
Infocube<BR>Informatique Éditoriale<BR>HACHETTE LIVRE<BR>43, quai de
Grenelle<BR>75015 PARIS<BR>00 33 1 43 92 38 12<BR><A
href="mailto:laudrain@hachette-livre.fr">laudrain@hachette-livre.fr</A><BR><BR>>>>
msulyaev@renderx.com 24/06/2005 17:28:42 >>><BR>Hello,
Luc,<BR><BR>Your .xml file is invalid: it has a 0xA0 byte after the xml
declaration <BR>and before anything else, e.g. like here (the last byte
shown):<BR><BR>3C 3F 78 6D 6C 20 76 65 ¦ 72 73 69 6F 6E 3D 22 31 <?xml
version="1<BR>2E 30 22 20 65 6E 63 6F ¦ 64 69 6E 67 3D 22 55 54 .0"
encoding="UT<BR>46 2D 38 22 3F 3E 20 20 ¦ 20 20 20 20 20 20 20 20
F-8"?><BR>20 20 20 20 20 20 20 20 ¦ 20 20 20 20 20 20 20 20<BR>A0
<<BR><BR>Use any HEX editor to fix.<BR><BR>-- <BR>Best regards,<BR>Michael
Sulyaev<U> <A
href="mailto:msulyaev@renderx.com">mailto:msulyaev@renderx.com</A></U>
<BR>RenderX.<BR><BR><BR><BR>LUC AUDRAIN wrote:<BR>> Hello,<BR>> <BR>>
On some XML files, I have an error message on validation :<BR>> <BR>>
[error] Error reported by XML parser; SystemID: file:/J:/Traitement <BR>>
BdC/Depot TXT/lg/OPERATION ARTEMIS CHASSE 23 AOUT 2005.xml; Line#: -1;
<BR>> Column#: 949<BR>> [error]
javax.xml.transform.TransformerException: Error reported by XML <BR>>
parser error: formatting failed: <BR>>
javax.xml.transform.TransformerException: org.xml.sax.SAXParseException:
<BR>> invalid UTF-8 byte (check the XML declaration) (code: 0xa0)<BR>>
<BR>> I found information on the Renderx Web Site in this answer<BR>>
*From*: Mike Trotman <<U> <A
href="mailto:mike.trotman@datalucid.com">mike.trotman@datalucid.com</A></U>
<BR>> <<U> <A
href="mailto:mike.trotman@datalucid.com?Subject=Re:%20[xep-support]%20UTF%20data%20format">mailto:mike.trotman@datalucid.com?Subject=Re:%20[xep-support]%20UTF%20data%20format</A></U>
>> <BR>> <BR>> *Date*: Mon May 02 2005 - 08:14:51 PDT<BR>> and
tried without success.<BR>> <BR>> The workaround I found is to save the
XML file again from any text or <BR>> xml editor (as XMLSPy) and it works
fine.<BR>> <BR>> In order to find what's wrong in my source file, I'd
like to know how to <BR>> use the ligne and column information in the error
message : Line#: -1; <BR>> Column#: 949.<BR>> <BR>> Best
regards.<BR>> <BR>> <BR>> <BR>> <BR>> <BR>> <BR>>
<BR>> Luc AUDRAIN<BR>> __________________________________<BR>> DSI /
Infocube<BR>> Informatique Éditoriale<BR>> HACHETTE LIVRE<BR>> 43,
quai de Grenelle<BR>> 75015 PARIS<BR>> 00 33 1 43 92 38 12<BR>> <U><A
href="mailto:laudrain@hachette-livre.fr">laudrain@hachette-livre.fr</A></U>
<<U> <A
href="mailto:laudrain@hachette-livre.fr">mailto:laudrain@hachette-livre.fr</A></U>
><BR>> <BR>-------------------<BR>(*) To unsubscribe, send a message
with words 'unsubscribe xep-support'<BR>in the body of the message to <U><A
href="mailto:majordomo@renderx.com">majordomo@renderx.com</A></U> from the
address<BR>you are subscribed from.<BR>(*) By using the Service, you expressly
agree to these Terms of Service <U><A
href="http://www.renderx.com/terms-of-service.html">http://www.renderx.com/terms-of-service.html</A></U>
<BR></DIV></BLOCKQUOTE></BODY></HTML>