<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2800.1479" name=GENERATOR></HEAD>
<BODY style="MARGIN: 4px 4px 1px; FONT: 10pt Tahoma">
<DIV>Hello <SPAN class=873102612-30062005>Jacques,</SPAN></DIV>
<DIV> </DIV>
<DIV>Thank you for your kind answer. I have found the reason of this ill-formed utf-8 char :<BR>The wrong "A0" code was output directly from the text input file to the XML file, the other "A0" codes were transfered to utf-8 conversion routines. That's why this one wasn't well coded in UTF8 and the others were correct.</DIV>
<DIV> </DIV>
<DIV>Thank you very much for your help.</DIV>
<DIV> </DIV>
<DIV>Best regards.</DIV>
<DIV> </DIV>
<DIV>Luc</DIV>
<DIV> </DIV>
<DIV>The question is : why does this first one code "C2 A0" works fine, and not the next one ?</DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV><BR>>>> Jacques.DESEYNE@swift.com 30/06/2005 14:52:24 >>><BR></DIV>
<DIV style="FONT: 10pt Tahoma; COLOR: #000000">
<DIV dir=ltr align=left><SPAN class=873102612-30062005>Luc,</SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005>Bytes are not the same things as characters! There exist several conventions ("encodings") for representing characters by a byte sequence. XML has the Unicode character set (there are quite a lot of characters in it, see the code charts at <A href="http://www.unicode.org">http://www.unicode.org</A>) and their default encoding is UTF-8, but other encodings can be used as well.</SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005>In an UTF-8 encoding, only characters under 127 (0x7F) are represented by a single byte. The non-breaking space character '0xA0' is represented by the byte sequence 'C2 A0'. Your sample document has some of these, for instance within the <Auteur> tag for <Ouvrage> where <Nuart> contains "9610767":</SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005>...<BR>000001b0 3c 2f 54 69 74 72 65 3e 3c 41 75 74 65 75 72 3e </Titre><Auteur></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005>000001c0 c2 a0 3c 2f 41 75 74 65 75 72 3e 3c 50 72 69 78 ..</Auteur><Prix<BR>...</SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=873102612-30062005>Where you see the dodgy 'A0' byte (at file offset 0x00001140, if I'm not mistaken), you should have 'C2 A0', i.e. two bytes instead of one. You may need to check how these data are generated.</SPAN></DIV>
<DIV><SPAN class=873102612-30062005></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005>Look for an explanation on UTF-8 (and other) encodings on the Web -- you will see that there's more about it than one might have expected.</SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005>Best regards,</SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005>--</SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005>Jacques Deseyne</SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=873102612-30062005></SPAN> </DIV><BR>
<BLOCKQUOTE dir=ltr style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<B>From:</B> owner-xep-support@renderx.com [mailto:owner-xep-support@renderx.com] <B>On Behalf Of </B>LUC AUDRAIN<BR><B>Sent:</B> Thursday, June 30, 2005 11:58 AM<BR><B>To:</B> msulyaev@renderx.com; xep-support@renderx.com<BR><B>Subject:</B> Rép. : Re: [xep-support] Invalid UTF-8 byte<BR><BR></DIV>
<DIV></DIV>
<DIV>Hello Michael,</DIV>
<DIV> </DIV>
<DIV>I Think that it is an 0A I have after the xml declaration, as I have at the end of each line of this file. The invalid UTF-8 byte is a0xA0.</DIV>
<DIV> </DIV>
<DIV>Looking a bit more precisely, I have found this 'A0' byte : it is in the ligne beginning with "<Nuart>4776027" inside the element Run.</DIV>
<DIV> </DIV>
<DIV>Now, I still don't understand why it is an invalid UTF-8 byte, because when I open this file in UltraEdit in Hex mode I see "00A0" and "00A0" is a valid Unicode character! I may filter it here, but in some case, I may need it as it is the "NO-BREAK SPACE".</DIV>
<DIV> </DIV>
<DIV>What's wrong.</DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV>Best regards</DIV>
<DIV> </DIV>
<DIV>Luc AUDRAIN<BR>__________________________________<BR>DSI / Infocube<BR>Informatique Éditoriale<BR>HACHETTE LIVRE<BR>43, quai de Grenelle<BR>75015 PARIS<BR>00 33 1 43 92 38 12<BR><A href="mailto:laudrain@hachette-livre.fr">laudrain@hachette-livre.fr</A><BR><BR>>>> msulyaev@renderx.com 24/06/2005 17:28:42 >>><BR>Hello, Luc,<BR><BR>Your .xml file is invalid: it has a 0xA0 byte after the xml declaration <BR>and before anything else, e.g. like here (the last byte shown):<BR><BR>3C 3F 78 6D 6C 20 76 65 ¦ 72 73 69 6F 6E 3D 22 31 <?xml version="1<BR>2E 30 22 20 65 6E 63 6F ¦ 64 69 6E 67 3D 22 55 54 .0" encoding="UT<BR>46 2D 38 22 3F 3E 20 20 ¦ 20 20 20 20 20 20 20 20 F-8"?><BR>20 20 20 20 20 20 20 20 ¦ 20 20 20 20 20 20 20 20<BR>A0 <<BR><BR>Use any HEX editor to fix.<BR><BR>-- <BR>Best regards,<BR>Michael Sulyaev<U> <A href="mailto:msulyaev@renderx.com">mailto:msulyaev@renderx.com</A></U> <BR>RenderX.<BR><BR><BR><BR>LUC AUDRAIN wrote:<BR>> Hello,<BR>> <BR>> On some XML files, I have an error message on validation :<BR>> <BR>> [error] Error reported by XML parser; SystemID: file:/J:/Traitement <BR>> BdC/Depot TXT/lg/OPERATION ARTEMIS CHASSE 23 AOUT 2005.xml; Line#: -1; <BR>> Column#: 949<BR>> [error] javax.xml.transform.TransformerException: Error reported by XML <BR>> parser error: formatting failed: <BR>> javax.xml.transform.TransformerException: org.xml.sax.SAXParseException: <BR>> invalid UTF-8 byte (check the XML declaration) (code: 0xa0)<BR>> <BR>> I found information on the Renderx Web Site in this answer<BR>> *From*: Mike Trotman <<U> <A href="mailto:mike.trotman@datalucid.com">mike.trotman@datalucid.com</A></U> <BR>> <<U> <A href="mailto:mike.trotman@datalucid.com?Subject=Re:%20[xep-support]%20UTF%20data%20format">mailto:mike.trotman@datalucid.com?Subject=Re:%20[xep-support]%20UTF%20data%20format</A></U> >> <BR>> <BR>> *Date*: Mon May 02 2005 - 08:14:51 PDT<BR>> and tried without success.<BR>> <BR>> The workaround I found is to save the XML file again from any text or <BR>> xml editor (as XMLSPy) and it works fine.<BR>> <BR>> In order to find what's wrong in my source file, I'd like to know how to <BR>> use the ligne and column information in the error message : Line#: -1; <BR>> Column#: 949.<BR>> <BR>> Best regards.<BR>> <BR>> <BR>> <BR>> <BR>> <BR>> <BR>> <BR>> Luc AUDRAIN<BR>> __________________________________<BR>> DSI / Infocube<BR>> Informatique Éditoriale<BR>> HACHETTE LIVRE<BR>> 43, quai de Grenelle<BR>> 75015 PARIS<BR>> 00 33 1 43 92 38 12<BR>> <U><A href="mailto:laudrain@hachette-livre.fr">laudrain@hachette-livre.fr</A></U> <<U> <A href="mailto:laudrain@hachette-livre.fr">mailto:laudrain@hachette-livre.fr</A></U> ><BR>> <BR>-------------------<BR>(*) To unsubscribe, send a message with words 'unsubscribe xep-support'<BR>in the body of the message to <U><A href="mailto:majordomo@renderx.com">majordomo@renderx.com</A></U> from the address<BR>you are subscribed from.<BR>(*) By using the Service, you expressly agree to these Terms of Service <U><A href="http://www.renderx.com/terms-of-service.html">http://www.renderx.com/terms-of-service.html</A></U> <BR></DIV></BLOCKQUOTE></DIV></BODY></HTML>