[xep-support] unexpected bidi behaviour with english text in right-to-left mode

G. Ken Holman gkholman at CraneSoftwrights.com
Wed May 12 08:08:40 PDT 2010


At 2010-05-12 16:17 +0200, Gerald Wiesinger wrote:
>the data comes from one and the same data source and its either lr or rl or
>mixed.
>thus, we can not implement embedding levels. this would require analyzing
>the content and split it into fragments which is almost the last on the
>list from a software design point of view.

Unless there is a bug in XEP, then I am not successfully 
communicating my thoughts to you.  Note that in my message I said 
that by "two sources" they can both be from the same input file just 
from different places.

>the same data on the database can be accessed using WinSQL or QMF for
>Windows (which has some other bugs) and there the data is shown correctly,
>even notepad does it right.

Fine ... I know these tools implement the Unicode bi-directional algorithm.

It wasn't clear to me if you are using your stylesheet to mix your 
input data into your result (which is a typical source of *exactly* 
the problem you described), or if you are simply reporting that a 
string of Unicode characters known to work in other tools simply 
doesn't work properly in XEP (which is typical of a rendering bug and 
not a stylesheet problem).

I got the impression you were injecting English addresses into the 
Hebrew result.  My mistake.

>another problem occurs when the english phrase is closed with a closing
>bracket which becomes a open bracket and is moved to the left most
>position.
>
>e.g.
>first part of the text (embraced text)
>
>gets printed as
>
>(first part of the text (embraced text
>
>because there are unicode based tools and applications out which treat this
>information correctly i am tempted to consider this as a bug.

If you are not using your stylesheet to combine different parts of 
your source, and you are simply streaming the content of your source 
onto the page, then I agree it is a rendering bug.

You'll note in my message I said "When a string of Unicode characters 
from a single source is used, there is typically no need at all to 
think of this." and I stand by that.  Thus, if your string of 
characters in XSL-FO was the same string of characters in your input, 
you should be comfortable in reporting what you see as a rendering problem.

Full stops and parentheses are examples of "weak-direction" 
characters and as such are influenced by their neighbouring 
characters according to the Unicode Bidirectionality Algorithm which 
has to be implemented by the XSL-FO formatter.

I'm sure the XEP developers would welcome a concise example of this 
for their review.

. . . . . . . . . . . Ken

>____________________________________
>Best Regards
>Gerald Wiesinger
>
>
>
>
>
>
>              "G. Ken Holman"
>              <gkholman at CraneSo
>              ftwrights.com>                                             To
>              Sent by:                  xep-support at renderx.com
>              owner-xep-support                                          cc
>              @renderx.com
>                                                                    Subject
>                                        Re: [xep-support] unexpected bidi
>              12.05.2010 14:27          behaviour with english text  in
>                                        right-to-left mode
>
>              Please respond to
>              xep-support at rende
>                   rx.com
>
>
>
>
>
>
>At 2010-05-12 10:10 +0200, Gerald Wiesinger wrote:
> >we are using a common stylesheet to create documents in various languages,
> >one of them: hebrew.
> >
> >the data is within the XML file and can be any language, pure
> >left-to-right, pure right-to-left and mixed with numbers and text.
> >
> >the only control we use is the writing mode which we set on the highest
> >possible container level for hebrew documents to rl-tb and for the others
> >lr-tb.
> >however, the data which gets fed into the rendering process can be any and
> >we dont know and should not know or understand the content.
> >
> >the problem now is that in case we receive e.g. a english address for a
> >hebrew (right-to-left) document we find some (not all) special characters
> >like commas, full stops in the wrong sequence.
> >
> >example:
> >
> >the address:
> >19 LEONARDO DE-VINZI ST.
> >TEL AVIV.
> >
> >gets printed as
> >.19 LEONARDO DE-VINZI ST
> >.TEL AVIV
> >
> >i understand that the change to the lr writing mode would solve this
>
>I don't believe that is correct.  I believe all the specification
>requires is that you embed the text of different sources using the
>following:
>
>     ...content from source A...
>     <bidi-override unicode-bidi="embed">
>       ...content from source B...
>     </bidi-override>
>     ...content from source A...
>
>The embedding protects Unicode characters from being influenced by
>surrounding characters.
>
>There is a section of my XSL-FO book and class that discusses this.
>
> >but we can not modify it as the language of the content is unknown.
>
>That's okay ... XSL-FO processors are supposed to recognize Unicode
>characters and follow the Unicode bi-directional algorithm which
>accommodates embedding.
>
>When a string of Unicode characters from a single source is used,
>there is typically no need at all to think of this.  But when mixing
>characters from different sources (say from the input file and the
>stylesheet, or from two places in the input files) embedding solves
>many problems simply.
>
> >the occurrence of this is imminent for full-stops after latin characters
>or
> >numbers.
>
>I hope this helps.
>
>. . . . . . . . . . . Ken


--
XSLT/XQuery training:   after http://XMLPrague.cz 2011-03-28/04-01
Vote for your XML training:   http://www.CraneSoftwrights.com/f/i/
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/f/
G. Ken Holman                 mailto:gkholman at CraneSoftwrights.com
Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/f/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal

-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo at renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/terms-of-service.html



More information about the Xep-support mailing list