[xep-support] linefeed normalization

Mon Apr 5 09:32:21 PDT 2004

 Nikolai Grigoriev wrote:

> the question of U+2028 is a complicated one. The XSL-FO spec 
> does not constrain its processing in any way. There is no 
> mention that it is subject to normalization, but equally no 
> indication that it is expected to produce a line break at 
> all. The effects of this character are therefore not 
> well-defined: I doubt whether it can be considered a valid linefeed.

I agree that it is not well-defined.

> In XEP, we treat U+000A, U+000D, and U+2028 as complete 
> equivalents. (This refers to the data that come to the 
> formatter, after linefeed normalization in the processor). 
> The logic is
> straightforward: a character is either a linefeed or not; 
> linefeeds terminate lines and are subject to the effects of 
> linefeed-treatment; non-linefeeds do neither of these.

Ken's distinction between a linefeed and a LINE SEPARATOR is relevant here I
think. They are different concepts. The fact that linefeed-treatment is
supposed to *only* affect U+000A, but affects U+2028 in XEP indicates some
misunderstanding.

> One can argue if this is a correct behaviour. However, I 
> believe that it is inherently unsafe to rely on Unicode text 
> flow control characters in systems that have their own markup 
> to express the same semantics. There is no reason to use 

I mostly agree with this. However, there really is no semantic in XSL-FO
that says "force a line break here". It is true that you can say "start a
new block here", but that really is a different concept.

> U+2028 or U+2029 if you have explicit paragraph structure set 
> by <fo:block>s; it is risky to mix LRO/RLO/LRE/RLE with 

I think U+2029 really is the same as saying "start a new block", and agree
that there is no good reason to use it in XSL-FO.

> fo:bidi-override. If you need explicit line breaks inside 
> non-preformatted text, set a <br/> element in the input XML 
> vocabulary and match it to <fo:block/> in the stylesheet. In 
> this way, your intent  is clear to everybody.

There really is a good reason to not take this approach, unless necessary.
Simply inserting an </fo:block><fo:block> combination does not do the job.
The new block created here may not have the same properties -- things like
space-before, keeps, etc. have great potential to be different. Now, I
acknowledge that this can be worked around in the stylesheet, but it does
add an order-of-magnitude level of complexity.

> One additional consideration: in XML 1.1, U+2028 will be 
> subject to parser-side linefeed  normalization. It implies 
> that you never get it from user text; and if you generate an 
> entity just to make it appear after the normalization, why 
> not generate a piece of markup instead?

OK. I find this to be persuasive, and it means ultimately that either I or
the authors of the XML 1.1 standard have misunderstood what the Unicode
standard was trying to do with U+2028.

This leaves only the issue of documentation. I would simply suggest that
Section 7.1 of the document "XSL Formatting Objects in XEP 3.7" be modified
to include your comment above that U+2028 is always treated within XEP as a
linefeed character.

Thanks again to both Nikolai and Ken for your explanations. This is not an
issue I feel strongly about, and I didn't mean for it to turn into a big
deal.

Victor Mote

-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo at renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/tos.html