[xep-support] Improving hyphenation algorithm

David Tolpin dvd at davidashen.net
Wed Aug 25 08:27:28 PDT 2004


Jirka Kosek:
[ Charset UTF-8 unsupported, converting... ]
> Hi,
> 
> I used XEP for longer documents containg a lot of free-flow text in a 
> last months. I found a lot of words with bad hyphenation. I was very 
> surprised as I use same hyphenation files as for TeX that hyphenates 
> almost all words in a correct way. Then I found following statement in 
> XEP documentation:

Both in the original Czech hyphenation table for TeX and the modified
one for XEP (although XEP understands TeX accents, so the modification
is not necessary), there is a bug.

u ring, a Czech character, I believe, is written as \r u instead
of \ru, and thus it is translated into ^^b0u, not into ^^f5.

Both TeX and XEP hyphenate Czech wrong, only due to differences
in error handling (natural since TeX accepts 8-bit characters,
and XEP treats all non-alphanumeric characters as separators),
the result is different - TeX gives too few hyphenation points,
while XEP gives to many.

produktu hyphenates as pro-duktu in TeX, pro-du-k-tu in XEP.

Both are wrong, but the bug is in the TeX hyphenation table. When
all occurences of ^^b0u are replaced with ^^f5 (or '\r u' with \ru),
both TeX and XEP give correct hyphenation:

pro-duk-tu 

And all other hyphenation results are exactly the same.

Neither TeX nor XEP use absolute values in hyphenation patterns,
just because it makes no sense -- they are not designed for that,
just to compute right hyphenation point by taking the largest value
in each hyphenation point and checking whether it is odd.

Hyphenation algorithms in TeX and XEP are identical.

David Tolpin
RenderX
-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo at renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/tos.html



More information about the Xep-support mailing list