[xep-support] RE: Rép. : Re: [xep-support] Invalid UTF-8 byte

DESEYNE Jacques Jacques.DESEYNE at swift.com
Thu Jun 30 05:52:24 PDT 2005


Luc,
 
Bytes are not the same things as characters! There exist several conventions ("encodings") for representing characters by a byte
sequence. XML has the Unicode character set (there are quite a lot of characters in it, see the code charts at
http://www.unicode.org) and their default encoding is UTF-8, but other encodings can be used as well.
 
In an UTF-8 encoding, only characters under 127 (0x7F) are represented by a single byte. The non-breaking space character '0xA0' is
represented by the byte sequence 'C2 A0'. Your sample document has some of these, for instance within the <Auteur> tag for <Ouvrage>
where <Nuart> contains "9610767":
 
...
000001b0   3c 2f 54 69 74 72 65 3e 3c 41 75 74 65 75 72 3e   </Titre><Auteur>
000001c0   c2 a0 3c 2f 41 75 74 65 75 72 3e 3c 50 72 69 78   ..</Auteur><Prix
...
 
Where you see the dodgy 'A0' byte (at file offset 0x00001140, if I'm not mistaken), you should have 'C2 A0', i.e. two bytes instead
of one. You may need to check how these data are generated.
 
Look for an explanation on UTF-8 (and other) encodings on the Web -- you will see that there's more about it than one might have
expected.
 
Best regards,
--
Jacques Deseyne
 


  _____  

From: owner-xep-support at renderx.com [mailto:owner-xep-support at renderx.com] On Behalf Of LUC AUDRAIN
Sent: Thursday, June 30, 2005 11:58 AM
To: msulyaev at renderx.com; xep-support at renderx.com
Subject: Rép. : Re: [xep-support] Invalid UTF-8 byte


Hello Michael,
 
I Think that it is an 0A I have after the xml declaration, as I have at the end of each line of this file. The invalid UTF-8 byte is
a0xA0.
 
Looking a bit more precisely, I have found this 'A0' byte : it is in the ligne beginning with "<Nuart>4776027" inside the element
Run.
 
Now, I still don't understand why it is an invalid UTF-8 byte, because when I open this file in UltraEdit in Hex mode I see "00A0"
and "00A0" is a valid Unicode character! I may filter it here, but in some case, I may need it as it is the "NO-BREAK SPACE".
 
What's wrong.
 
 
 
 
 
Best regards
 
Luc AUDRAIN
__________________________________
DSI / Infocube
Informatique Éditoriale
HACHETTE LIVRE
43, quai de Grenelle
75015 PARIS
00 33 1 43 92 38 12
laudrain at hachette-livre.fr

>>> msulyaev at renderx.com 24/06/2005 17:28:42 >>>
Hello, Luc,

Your .xml file is invalid: it has a 0xA0 byte after the xml declaration 
and before anything else, e.g. like here (the last byte shown):

3C 3F 78 6D 6C 20 76 65 ¦ 72 73 69 6F 6E 3D 22 31 <?xml version="1
2E 30 22 20 65 6E 63 6F ¦ 64 69 6E 67 3D 22 55 54 .0" encoding="UT
46 2D 38 22 3F 3E 20 20 ¦ 20 20 20 20 20 20 20 20 F-8"?>
20 20 20 20 20 20 20 20 ¦ 20 20 20 20 20 20 20 20
A0 <

Use any HEX editor to fix.

-- 
Best regards,
Michael Sulyaev mailto:msulyaev at renderx.com 
RenderX.



LUC AUDRAIN wrote:
> Hello,
> 
> On some XML files, I have an error message on validation :
> 
> [error] Error reported by XML parser; SystemID: file:/J:/Traitement 
> BdC/Depot TXT/lg/OPERATION ARTEMIS CHASSE 23 AOUT 2005.xml; Line#: -1; 
> Column#: 949
> [error] javax.xml.transform.TransformerException: Error reported by XML 
> parser error: formatting failed: 
> javax.xml.transform.TransformerException: org.xml.sax.SAXParseException: 
> invalid UTF-8 byte (check the XML declaration) (code: 0xa0)
> 
> I found information on the Renderx Web Site in this answer
> *From*: Mike Trotman < mike.trotman at datalucid.com 
> < mailto:mike.trotman at datalucid.com?Subject=Re:%20[xep-support]%20UTF%20data%20format >> 
> 
> *Date*: Mon May 02 2005 - 08:14:51 PDT
> and tried without success.
> 
> The workaround I found is to save the XML file again from any text or 
> xml editor (as XMLSPy) and it works fine.
> 
> In order to find what's wrong in my source file, I'd like to know how to 
> use the ligne and column information in the error message : Line#: -1; 
> Column#: 949.
> 
> Best regards.
> 
> 
> 
> 
> 
> 
> 
> Luc AUDRAIN
> __________________________________
> DSI / Infocube
> Informatique Éditoriale
> HACHETTE LIVRE
> 43, quai de Grenelle
> 75015 PARIS
> 00 33 1 43 92 38 12
> laudrain at hachette-livre.fr < mailto:laudrain at hachette-livre.fr >
> 
-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo at renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/terms-of-service.html 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.renderx.com/pipermail/xep-support/attachments/20050630/e14661c0/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 2418 bytes
Desc: not available
URL: <http://lists.renderx.com/pipermail/xep-support/attachments/20050630/e14661c0/attachment.bin>


More information about the Xep-support mailing list