[xep-support] Operating on all of the pcdata of an XML file: Considered harmful?

Louis Amdur LAmdur at symantec.com
Thu Mar 31 21:00:06 PST 2005


I know this issue is something of a chestnut on this list, but I'd like to 
solicit some feedback to see how other folks are handling the issue.

My understanding: When XEP encounters a long string with no "natural" line 
break point (e.g., a programlisting or URL without spaces), it squeezes 
the characters of the string together if the string cannot fit in the 
given space. By RenderX's lights, this is a feature rather than a bug, as 
it exposes weaknesses in stylesheets. One solution is to insert zero-width 
spaces to allow a long string to break when necessary--this can be 
accomplished manually in the markup, or automatically through a 
preprocessing step. For us, the manual option is a non-starter, as we 
translate our content to up to thirty target languages, and translation 
vendors never see the markup itself, just the pcdata (our translation 
memory protects the markup). So we're looking at automating this, knowing 
that an automated solution won't always create graceful line breaks in all 
contexts. I've seen some XSL code fragments on how to test for string 
length and then insert ZWS code points between the characters of strings 
that exceed a given threshold length--the same could be accomplished, 
perhaps more efficiently, through Python or Perl during a pre-processing 
step. The person who is responsible for implementing and maintaining our 
XSL tool chain is, however, resistant to such an approach, claiming that 
he has "philosophical objections" to performing an operation on all of the 
pcdata of an XML file. Other than lacking elegance, I don't really 
understand the foundation of his objection to this sort of solution--all 
sorts of organizations pre-process pcdata as a matter of course (not to 
mention all sorts of non-XML text streams, as well). I'm not really 
interested in forcing any solution down his throat--I just have an 
immediate need to create a bulletproof method for allowing long strings to 
break. 

Ideally, I think RenderX should provide a configuration option that would 
allow text to break rather than squeeze, with the caveat that such breaks 
may often not be very pretty. Lacking that, I am gunning for a simple 
pre-processing step that will do the same.


____________
Lou Amdur
Senior Principal Information Developer
Symantec
(310) 449-7005
-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo at renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/tos.html



More information about the Xep-support mailing list