<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Hi all,<br>
<br>
When text is copied for example by an annotation tools like
Hypothes.is from a PDF generated by XEP (or when we simply want to
copy text from PDF into another document) spaces between words of
adjacent lines or between words of different font styles (eg
italics/normal) get lost. <br>
<br>
Example 1:<br>
<br>
PDF (Source):<br>
word1 word2<br>
word3 word4<br>
<br>
Copy:<br>
word1 word2word3 word4<br>
<br>
Example 2:<br>
<br>
PDF (Source):<br>
*word1* word2 word3<br>
<br>
Copy:<br>
word1word2 word3<br>
<br>
Is this problem related to the way spaces are encoded within PDFs
and is there a way to generate PDFs with XEP that avoids this
problem? I found a blog post at an Adobe forum saying that the
problem is<span class=""> related to the way the rendering engine
generates strings:</span>
<br>
<br>
<blockquote type="cite">The problem is that the PDF may or may not
have the apparent spaces encoded as space characters, particularly
at line ends, but also between words and perhaps even between
characters. The rendering engine (Ps or PDF driver) may have
chosen to break "<em>word1 word2</em>" into two strings with two
starting coordinates and no U+0020 space character (or alternative
space characters) at all.</blockquote>
Source: <a class="moz-txt-link-freetext" href="https://forums.adobe.com/thread/1367541">https://forums.adobe.com/thread/1367541</a><br>
<br>
Is this correct - are the spaces lost because they are not encoded
at all and the words are separated just by different starting
coordinates? If so, can this be avoided by encoding spaces when the
PDF is generated? My impression is, however, that the PDF viewer
might be a source of the problem (and not the PDF).<br>
<br>
Thanks!<br>
Armin<br>
<br>
<br>
<pre class="moz-signature" cols="72">--
Dr. Armin Günther
Information Technology
Leibniz Institute for Psychology Information (ZPID)
54286 Trier, Germany
Fon: +49(0)651-201-2055
Fax: +49(0)651-201-2604
E-Mail: <a class="moz-txt-link-abbreviated" href="mailto:guenther@zpid.de">guenther@zpid.de</a>
<a class="moz-txt-link-abbreviated" href="http://www.zpid.de/en">www.zpid.de/en</a></pre>
</body>
</html>