[xep-support] unexpected bidi behaviour with english text in right-to-left mode

G. Ken Holman gkholman at CraneSoftwrights.com
Wed May 12 05:27:27 PDT 2010


At 2010-05-12 10:10 +0200, Gerald Wiesinger wrote:
>we are using a common stylesheet to create documents in various languages,
>one of them: hebrew.
>
>the data is within the XML file and can be any language, pure
>left-to-right, pure right-to-left and mixed with numbers and text.
>
>the only control we use is the writing mode which we set on the highest
>possible container level for hebrew documents to rl-tb and for the others
>lr-tb.
>however, the data which gets fed into the rendering process can be any and
>we dont know and should not know or understand the content.
>
>the problem now is that in case we receive e.g. a english address for a
>hebrew (right-to-left) document we find some (not all) special characters
>like commas, full stops in the wrong sequence.
>
>example:
>
>the address:
>19 LEONARDO DE-VINZI ST.
>TEL AVIV.
>
>gets printed as
>.19 LEONARDO DE-VINZI ST
>.TEL AVIV
>
>i understand that the change to the lr writing mode would solve this

I don't believe that is correct.  I believe all the specification 
requires is that you embed the text of different sources using the following:

    ...content from source A...
    <bidi-override unicode-bidi="embed">
      ...content from source B...
    </bidi-override>
    ...content from source A...

The embedding protects Unicode characters from being influenced by 
surrounding characters.

There is a section of my XSL-FO book and class that discusses this.

>but we can not modify it as the language of the content is unknown.

That's okay ... XSL-FO processors are supposed to recognize Unicode 
characters and follow the Unicode bi-directional algorithm which 
accommodates embedding.

When a string of Unicode characters from a single source is used, 
there is typically no need at all to think of this.  But when mixing 
characters from different sources (say from the input file and the 
stylesheet, or from two places in the input files) embedding solves 
many problems simply.

>the occurrence of this is imminent for full-stops after latin characters or
>numbers.

I hope this helps.

. . . . . . . . . . . Ken

--
XSLT/XQuery training:   after http://XMLPrague.cz 2011-03-28/04-01
Vote for your XML training:   http://www.CraneSoftwrights.com/f/i/
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/f/
G. Ken Holman                 mailto:gkholman at CraneSoftwrights.com
Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/f/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal

-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo at renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/terms-of-service.html



More information about the Xep-support mailing list