[xep-support] Operating on all of the pcdata of an XML file: Considered harmful?

Martin Holmes mholmes at uvic.ca
Fri Apr 1 09:29:52 PST 2005


Hi there,

Here are two solutions I use for inserting zero-width spaces, one 
specific to URLs, and one more general:

<!-- This template written by Martin Holmes. It inserts zero-width 
spaces before and after each slash in a url, enabling the url to be 
broken appropriately at line ends. -->
	<xsl:template name="BreakOnSlashes">
		<xsl:param name="InString"/>
		<xsl:value-of select="substring-before($InString, 
'/')"/>&#x200b;&#x002f;&#x200b;<xsl:choose>
			<xsl:when test="contains(substring-after($InString, '/'), '/')">
				<xsl:call-template name="BreakOnSlashes">
					<xsl:with-param name="InString">
						<xsl:value-of select="substring-after($InString, '/')"/>
					</xsl:with-param>
				</xsl:call-template>
			</xsl:when>
			<xsl:otherwise>
				<xsl:value-of select="substring-after($InString, '/')"/>
			</xsl:otherwise>
		</xsl:choose>
	</xsl:template>
	
<!-- This recursive template found on the Web, attributed to Michael 
Smith, Julien Letessier and Nikolai Grigoriev, found here:
http://www.dpawson.co.uk/docbook/styling/fo.html#d2635e175 -->

     <xsl:template name="intersperse-with-zero-spaces">
         <xsl:param name="str"/>
         <xsl:variable 
name="spacechars">&#x9;&#xA;&#x2000;&#x2001;&#x2002;&#x2003;&#x2004;&#x2005;&#x2006;&#x2007;&#x2008;&#x2009;&#x200A;&#x200B;</xsl:variable>

         <xsl:if test="string-length($str) &gt; 0">
             <xsl:variable name="c1"
                     select="substring($str, 1, 1)"/>
             <xsl:variable name="c2"
                     select="substring($str, 2, 1)"/>

             <xsl:value-of select="$c1"/>
             <xsl:if test="$c2 != '' and
                     not(contains($spacechars, $c1) or
                     contains($spacechars, 
$c2))"><xsl:text>&#x200b;</xsl:text></xsl:if>

             <xsl:call-template name="intersperse-with-zero-spaces">
                 <xsl:with-param name="str" select="substring($str, 2)"/>
             </xsl:call-template>
         </xsl:if>
     </xsl:template>

Cheers,
Martin


Louis Amdur wrote:
> I know this issue is something of a chestnut on this list, but I'd like to 
> solicit some feedback to see how other folks are handling the issue.
> 
> My understanding: When XEP encounters a long string with no "natural" line 
> break point (e.g., a programlisting or URL without spaces), it squeezes 
> the characters of the string together if the string cannot fit in the 
> given space. By RenderX's lights, this is a feature rather than a bug, as 
> it exposes weaknesses in stylesheets. One solution is to insert zero-width 
> spaces to allow a long string to break when necessary--this can be 
> accomplished manually in the markup, or automatically through a 
> preprocessing step. For us, the manual option is a non-starter, as we 
> translate our content to up to thirty target languages, and translation 
> vendors never see the markup itself, just the pcdata (our translation 
> memory protects the markup). So we're looking at automating this, knowing 
> that an automated solution won't always create graceful line breaks in all 
> contexts. I've seen some XSL code fragments on how to test for string 
> length and then insert ZWS code points between the characters of strings 
> that exceed a given threshold length--the same could be accomplished, 
> perhaps more efficiently, through Python or Perl during a pre-processing 
> step. The person who is responsible for implementing and maintaining our 
> XSL tool chain is, however, resistant to such an approach, claiming that 
> he has "philosophical objections" to performing an operation on all of the 
> pcdata of an XML file. Other than lacking elegance, I don't really 
> understand the foundation of his objection to this sort of solution--all 
> sorts of organizations pre-process pcdata as a matter of course (not to 
> mention all sorts of non-XML text streams, as well). I'm not really 
> interested in forcing any solution down his throat--I just have an 
> immediate need to create a bulletproof method for allowing long strings to 
> break. 
> 
> Ideally, I think RenderX should provide a configuration option that would 
> allow text to break rather than squeeze, with the caveat that such breaks 
> may often not be very pretty. Lacking that, I am gunning for a simple 
> pre-processing step that will do the same.
> 
> 
> ____________
> Lou Amdur
> Senior Principal Information Developer
> Symantec
> (310) 449-7005
> -------------------
> (*) To unsubscribe, send a message with words 'unsubscribe xep-support'
> in the body of the message to majordomo at renderx.com from the address
> you are subscribed from.
> (*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/tos.html
-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo at renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/tos.html



More information about the Xep-support mailing list