[xep-support] Hyphenation at /

Jim Melton jim.melton at acm.org
Tue Apr 6 09:13:33 PDT 2004


Alexander,

At 05:09 AM 4/5/2004 Monday, Alexander Peshkov wrote:
>Hello Jim,
>
>1. Character '/' is not a letter and therefore is not considered to be
>    part of the hyphenation rule (our hyphenator is aimed to hyphenate
>    words of natural languages, not an arbitrary strings).

Of course that is a true statement.  However, "soft-hyphen" is not a 
letter, but it is taken into account during the hyphenation algorithm.  I 
have worked with several hyphenation systems in the past (among other 
previous lives, I once write newspaper typesetting systems) and some of 
them had pre-defined lists of characters that could be used to break 
"words" (strings of characters) while others allowed the user or the system 
configurator to specify such characters.  (Common characters for this 
purpose were "/", ".", "-", and "_", all of which were extremely useful for 
typesetting, or formatting, technical works such as those about programming 
languages.)

I don't think that it would be a violation of your principles to allow 
users to specify a list of "additional characters" at which words/strings 
could be broken.  It is *extremely* inconvenient for document authors to 
have to manually insert zero-width spaces (no such think on *my* keyboard, 
so I'd have to use a character entity reference) or soft hyphens (same 
problem), especially when there might be literally scores or hundreds of 
places in a large, dynamic document.

>    I recommend you to add zero-width spaces (U+200B) after and/or
>    before slash character so that string could be broken apart at
>    these points.
>2. Again, you have to add zero-width spaces (or soft hyphens if you
>    want to have visible hyphenation characters) after every character
>    (or after every character triad in your case).
>
>Note that you can add those symbols automatically using XSLT-preprocessing.

Interesting observation, but I think I'm feeling particularly dense this 
morning.  I started to say that I am uncertain how I can efficiently 
examine every ordinary character string in a very large document to see if 
it has a "/" in it and then replace that string with "&zwsp;/&zwsp;" (or 
the equivalent).  However, I saw Ken Holman's message that described just 
how to do that sort of thing and I realized that I already (sort of) knew 
how.  (Thanks, Ken!)

But Jim also asked about hyphenation (or at least breaking) strings like 
"RT54XIOP", which is a separate problem that is of genuine interest to 
others, such as myself.  Again, I've worked with a number of systems whose 
algorithms permit such strings to be arbitrarily broken so column widths 
(for example) are not violated and so character "scrunching" does not have 
to be performed.  Again, I know that such strings are not "words of natural 
languages", but many technical subjects, especially in the computer field, 
have such non-words liberally sprinkled throughout books, etc. that are 
about those subjects.

I'd like to add my voice to Jim Quest's in asking that RenderX reconsider 
the decisions regarding hyphenating/breaking character sequences based on 
either predefined non-letter characters or non-letter characters specified 
by the person installing/configuring XEP, and the decision to arbitrarily 
break character sequences when no such "break character" can be found and 
no hyphenation rule can be applied.

Thanks,
    Jim

========================================================================
Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: +1.801.942.0144
Oracle Corporation        Oracle Email: jim dot melton at oracle dot com
1930 Viscounti Drive      Standards email: jim dot melton at acm dot org
Sandy, UT 84093-1063              Personal email: jim at melton dot name
USA                                                Fax : +1.801.942.3345
========================================================================
=  Facts are facts.  However, any opinions expressed are the opinions  =
=  only of myself and may or may not reflect the opinions of anybody   =
=  else with whom I may or may not have discussed the issues at hand.  =
========================================================================  

-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo at renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/tos.html



More information about the Xep-support mailing list