[xep-support] line-breaking on typographic spaces

David Tolpin dvd at davidashen.net
Thu Oct 20 11:17:15 PDT 2005


Hi,

I am slowly recovering my thoughts of the past that led me to the  
decision to abandon the idea to support Unicode spaces in XEP.

Here is the picture:

1) According to UAX#14, http://www.unicode.org/reports/tr14/#SP, Zs  
are breaking spaces. Which means that on either side of the character  
the line can be broken. On the other hand, whitespace treatment and  
collapsing is defined in terms of XML spaces, which are

S ::=    (#x20 | #x9 | #xD | #xA)+

Which means that when someone uses typographic spaces, they end up  
hanging from ends of the line if the line-breaking algorithm decides  
to break on them, breaking alignment on both edges.

2) Spaces, except for the THIN SPACE (2009), cannot change their  
width due to alignment. Which means that they cannot be output as  
normal glyphs and then used with letter-spacing/word-spacing --  
letter spacing will change the effective widths of the spaces. Thus,  
any implementation just preserving the spaces from the range as  
unicode codepoints breaks their intended use.

3) XSL has an idiom that expresses what the spaces are intended for  
in traditional typography - inline spaces on inlines (space-start,  
space-end -- Sergey, thanks for the hint). A professional typesetting  
toolchain should provide markup that automatically translates such  
things as figure numbers and quoted strings in French into properly  
spaced structures, using inline spaces, not unicode space characters.

4) In cases where legacy data must be dealt with, the frontend,  
processing the document before it is fed to an XSL formatter, can and  
should transform the spaces into space-filled leaders, which ensure  
the desired look, and allow to adjust behavior to a particular  
tradition of the publishing house.


****

With this points in mind, the options had been to:

   - provide complete implementation of typographic spaces, which  
most probably would mean just translating them to leaders in the XSL  
compiler (an internal part of XEP);
   - keep them as glyphs;
   - expose the problem and filter out all those spaces replacing  
them with #x20, thus ensuring that the formatter yields readable  
documents with less than optimal look in case of the use of the  
problematic approach.

After discussing the issue, we had chosen the third path, because it  
still allows a complete implementation according to the first  
alternative to be implemented and employed, and does not hardcode  
contradictory semantics into the formatting kernel. Provided that the  
corresponding filter is but a few lines in any modern programming  
language, we thus have left room for adjustments and customizations;  
while still providing a powerful machinery to handle all cases and  
approaches.

I am thankful to Jirka Kosek for support of this approach in DocBook  
XSL stylesheets, the XSLT code is necessarily verbose because string  
manipulation is not the area where XSLT shines. The example shows  
that it is easy to as well implement spaces to leaders transformation  
for XSL itself, so that typographic spaces work with any XSL  
formatter, and in a way you define, which is important provided that  
currently available definitions are vague and contradictory.

I also believe that XSL is an interchange format. When talking about  
high-quality typesetting, one is expected to implement a tool chain  
where hair-splitting activities like inserting hair spaces at right  
places are unnecessary because a higher-level markup does the job,  
both by providing the typesetting with convenient abstractions and  
macros and by translating the contents into structured XSL, with ipd  
spaces, not with unicode space marks.

David Tolpin


-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo at renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/terms-of-service.html



More information about the Xep-support mailing list