[xep-support] Preserving whitespace in programlisting output

Jim Melton jim.melton at acm.org
Mon Jul 11 07:38:00 PDT 2005


Kenneth,

The problem is caused by the very nature of PDF and its ancestor 
PostScript.  There are no "margins" on a PDF page.  Each character is 
placed at a specific absolute point on the page (well, more precisely, the 
first character of each little group of sequential contiguous characters is 
placed at such a point and the subsequent characters in the group are 
placed at that same point plus a horizontal offset determined by the 
"width" of the preceding character(s)).

Thus, if you have the following text:
=====
This is some sample text.
    This indents 3 spaces.
This is more text.
=====
the second line does not "start" at the same place as the first and third 
lines.  It starts at its own left-most point without any preceding space 
characters at all.  The fact that the starting point for the second line is 
offset to the right  from the starting point of the first and third lines 
by three times the width of a space character is not captured in the PDF 
data at all.

Therefore, when you copy text from the PDF file, there *are no spaces* at 
the start of indented lines to be copied.  Period.

Try generating some PDF (in non-compressed mode) from any application, 
including Acrobat (as well as XEP) and looking at the internal structure of 
such lines.

Hope this helps,
    Jim


At 7/11/2005 02:49 AM, Kenneth Johansson wrote:
>Hi David,
>
>In general I agree with you that HTML might be a better choice for online 
>reading, but the readers require PDFs in this case, so creating another 
>format is not an option.
>
>We use CHM and PDF for User guides and PDF for sysadm, installation and 
>upgrade guides.
>
>The installation engineers use the PDF both online and in binders. Mostly 
>they copy commands but occasionally they copy chunks of code, like this:
>
>WISE =
>(DESCRIPTION =
>(ADDRESS_LIST =
>(ADDRESS =
>(PROTOCOL = TCP)(HOST = <WISE_HOST>)(PORT = 1521)
>)
>)
>(CONNECT_DATA = (SERVICE_NAME = WISE)
>)
>)
>
>which was copied from a PDF loosing all the indention.
>
>Btw, I don't have a problem with tabs since we don't use tabs in our 
>programlistings, but rather with whitespaces which I'd expect would be 
>available in the PDF.
>
>/Kenneth

========================================================================
Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: +1.801.942.0144
   Co-Chair, W3C XML Query WG; F&O (etc.) editor    Fax : +1.801.942.3345
Oracle Corporation        Oracle Email: jim dot melton at oracle dot com
1930 Viscounti Drive      Standards email: jim dot melton at acm dot org
Sandy, UT 84093-1063 USA          Personal email: jim at melton dot name
========================================================================
=  Facts are facts.   But any opinions expressed are the opinions      =
=  only of myself and may or may not reflect the opinions of anybody   =
=  else with whom I may or may not have discussed the issues at hand.  =
======================================================================== 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.renderx.com/pipermail/xep-support/attachments/20050711/357a4d30/attachment.html>


More information about the Xep-support mailing list