[xep-support] Encoding spaces at line ends and between words of different style

Armin Günther guenther at zpid.de
Tue May 3 08:28:51 PDT 2016


Hi all,

When text is copied for example by an annotation tools like Hypothes.is 
from a PDF generated by XEP  (or when we simply want to copy text from 
PDF into another document) spaces between words of adjacent lines or 
between words of different font styles (eg italics/normal) get lost.

Example 1:

PDF (Source):
word1 word2
word3 word4

Copy:
word1 word2word3 word4

Example 2:

PDF (Source):
*word1* word2 word3

Copy:
word1word2 word3

Is this problem related to the way spaces are encoded within PDFs and is 
there a way to generate PDFs with XEP that avoids this problem? I found 
a blog post at an Adobe forum saying that the problem isrelated to the 
way the rendering engine generates strings:

> The problem is that the PDF may or may not have the apparent spaces 
> encoded as space characters, particularly at line ends, but also 
> between words and perhaps even between characters. The rendering 
> engine (Ps or PDF driver) may have chosen to break "/word1 word2/" 
> into two strings with two starting coordinates and no U+0020 space 
> character (or alternative space characters) at all.
Source: https://forums.adobe.com/thread/1367541

Is this correct - are the spaces lost because they are not encoded at 
all and the words are separated just by different starting coordinates? 
If so, can this be avoided by encoding spaces when the PDF is generated? 
My impression is, however, that the PDF viewer might be a source of the 
problem (and not the PDF).

Thanks!
Armin


-- 
Dr. Armin Günther

Information Technology
Leibniz Institute for Psychology Information (ZPID)
54286 Trier, Germany

Fon: +49(0)651-201-2055
Fax: +49(0)651-201-2604
E-Mail: guenther at zpid.de
www.zpid.de/en

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.renderx.com/pipermail/xep-support/attachments/20160503/6c0d396a/attachment.html>


More information about the Xep-support mailing list