[xep-support] Re: Problem rendering a tagged PDF

Kevin Brown kevin at renderx.com
Thu Nov 1 11:33:43 PDT 2012


I thought it best to post some information here on the new Section
508/tagged PDF capabilities.

 

Now, the first thing to keep in mind with Section 508/tagged PDF support, we
are generating the PDF from the XEP Intermediate Format (XEPOUT). These
means that all of these attributes really only have direct meaning *if* they
are placed on the object that has the content. If you placed these on
objects without content, they would lead to no <xep:text> element in the
XEPOUT to carry the attribute. Thus, take for instance this example:

 

<fo:table-cell rx:pdf-structure-tag="TH"><fo:block>I am a table
header</fo:block></fo:table-cell>

 

The <table-cell> has no textual content in the XEPOUT, the <block> inside it
does. Thus, the use of @pdf-structure-tag in this instance would not yield
what you want. You will only hit this is a few places, most commonly you
would be attempting to do that very example to override a table-cell to a
table header. Another would be for marking "chunks" of things as artifacts
(omitted from the structure). Since things are not inherited and must be on
the textual content elements, for artifacts you will need to mark all the
objects that lead to textual content as such. We'll get to some examples
later.

 

The only real exception to this is the role-map that was mentioned in the
first emails to xep-support. This is new to XEP for this release. It
provides two basic functions:

 

1)      Specifying what tags you would like for common structures in the
PDF, these will be the actual tags used inside the PDF (unless very specific
structures are overridden using the @rx:pdf-structure-tag attribute)

2)      Specifying what tags you would like omitted from the PDF.

 

This second item is very important. XSL FO tagging is by nature very verbose
and is used both for structure and for presentation. Proper PDF tagging for
screen readers is for structure only and should not contains unneeded
tags/hierarchy.  A portion rolemap might look like this below (note: this is
but one example in which this was the tagging desired by this customer, most
notably is the fact that they only wanted the items in lists and no labels,
this example does that very thing).

 

This portion is mapping the common fo structures to tags in the PDF
structure. Thus, the <root> becomes <Document> in PDF tags, <block> becomes
<P> and so on. You would also note that there are a few structures here that
are contextual combinations, namely "header-table-cell", "body-table-cell".
They have obvious meaning, a "header-table-cell" is a <table-cell> whose
ancestor is <table-header>.

 

  <structure-elements>

      <structure-element name="root" role-mapping="Document"/>

      <structure-element name="block" role-mapping="P"/>

      <structure-element name="page-number" role-mapping="Artifact"/>

      <structure-element name="page-number-citation"
role-mapping="PageNumberCitation"/>

      <structure-element name="page-number-citation-last"
role-mapping="PageNumberCitationLast"/>

      <structure-element name="external-graphic"
role-mapping="ExternalGraphic"/>

      <structure-element name="instream-foreign-object"
role-mapping="InstreamForeignObject"/>

      <structure-element name="basic-link" role-mapping="Link"/>

      <structure-element name="footnote" role-mapping="Footnote"/>

      <structure-element name="footnote-label"
role-mapping="FootnoteLabel"/>

      <structure-element name="footnote-body" role-mapping="FootnoteBody"/>

      <structure-element name="list-block" role-mapping="L"/>

      <structure-element name="list-item" role-mapping="LI"/>

      <structure-element name="table" role-mapping="T"/>

      <structure-element name="table-row" role-mapping="TR"/>

      <structure-element name="header-table-cell" role-mapping="TH"/>

      <structure-element name="body-table-cell" role-mapping="TD"/>

      <structure-element name="footer-table-cell" role-mapping="TD"/>

      <structure-element name="ruler" role-mapping="Ruler"/>   

  </structure-elements>

 

This section is the rollup section, this means these tags are complete
omitted from the structure.  The @role-mapping here is really meaningless
but we leave it is to be able to copy from this area to the previous area if
we want to include/exclude tags. So, given this below, tags in FO like
<flow>, <flow-section>, <block-container> will have no structure in the PDF
at all, their children "rolled-up" to the parent structure.

 

  <structure-elements roll-up="true">

      <structure-element name="leader" role-mapping="Artifact"/>

      <structure-element name="list-item-label" role-mapping="P"/>

      <structure-element name="list-item-body" role-mapping="P"/>

      <structure-element name="page-sequence" role-mapping="Sect"/>

      <structure-element name="flow" role-mapping="Flow"/>

      <structure-element name="flow-section" role-mapping="Flow"/>

      <structure-element name="static-content"
role-mapping="StaticContent"/>

      <structure-element name="block-container"
role-mapping="BlockContainer"/>

      <structure-element name="inline" role-mapping="Span"/>

      <structure-element name="wrapper" role-mapping="Wrapper"/>

      <structure-element name="table-header" role-mapping="TableHeader"/>

      <structure-element name="table-body" role-mapping="TableBody"/>

      <structure-element name="table-footer" role-mapping="TableFooter"/>

  </structure-elements>

 

So, to set up your environment for tagging PDF, the first step is to decide
what tags you want in the PDF and adjust your rolemap appropriately.

 

Next, let's cover @rx:pdf-structure-tag and the special case of artifacts.

 

@rx:pdf-structure-tag provides a way to specify a very specific tag for
content. The most common use is for marking the heading structure in tagged
PDF. Tagged PDF normally has heading structures (like H1.H9) so that screen
readers can navigate the PDF. Now, there are no direct equivalents to this
is FO nor could one use rolemap to say "all of something would be an H1" for
example.  Now remember that structures with direct textual content are the
targets for these attributes, one would normally be putting this
@rx:pdf-structure-tag on the <block> element. For example:

 

<fo:block rx:pdf-structure-tag="H1">I am Section Heading 1</fo:block>

 

Would lead to:

 

<H1>I am Section Heading 1</H1> 

 

In the Tagged PDF.

 

And because it's direct content items, be careful in how you create
structures . for example:

 

<fo:block rx:pdf-structure-tag="H1"><fo:inline font-weight="bold">I am
Section Heading 1</fo:inline></fo:block>

 

Will not yield what you think, the <block> has no content. The <inline> tag
is what yielded the textual content. Also, moving the @rx:pdf-structure-tag
to the <inline> would also not be right solution because the tagged PDF
would be <P><H1>I am a Section Heading</H1></P>. So you would need to move
the @font-weight to the <block> tag or at least ensure some non-space
textual content exists inside the <fo:block> for this to work.

 

Now, there are two special cases of @rx:pdf-structure-tag, first we will
cover "Artifact". An artifact in PDF structure tags means that the content
is to be omitted completely. This is typically used for content like headers
and footers. Using @rx:pdf-structure-tag="Artifact" does just that. The
other special case we created to allow PDF tagging of table-headers *within*
the table-body. PDF structure tagging allows any cell to be called a TH.
Normally, these are the top of the table. In XSL FO, this is easy because
they are the <table-cell> that are the descendants of <table-header>.  But
PDF structure tagging also allows something like the left column to be
table-cells marked also as TH . specifically row-headers. There is no FO
construct for this. To overcome this, when producing the PDF output, RenderX
will look at the child elements to any table-cell. If any child has
@rx:pdf-structure-tag="TH", then it will mark that cell as a TH in the
structure. If the table cell occurs in the table-body, it is marked with
scope="row", if it occurs in the table-header it is marked as scope="both"
(because omitting them in the table header marks them already as
scope="column"). This allows you to create TH cells in a TR structure that
can be row headers or column headers to meet complete Section 508
requirements.

 

The three attributes for marking content to be read differently (or to
provide content to be read in absence of any like on an image) are fairly
straight forward. These are @rx:alt-description, @rx:actual-text and
@rx:abbreviation. One would use any one of them like this:

 

<fo:external-image src="url('logo.jpg')" rx:alt-description="An image of the
logo of the company."/>

 

The other two attributes allow further classification of Artifacts and are
placed on the same element as rx:pdf-structure-tag="Artifact" if required.

 

Our services team worked extensively on a few recent projects for the US
Government to assist them to create XSL FO that would produce PDFs that meet
their needs and agency requirements. We are happy to assist anyone who has
similar needs, just ask us directly.

 

Kevin Brown

RenderX

 

 

From: Daniel Boughton [mailto:daniel.r.boughton at rrd.com] 
Sent: Thursday, November 01, 2012 7:39 AM
To: kevin at renderx.com
Cc: Charles B Porter
Subject: Re: [xep-support] Re: Problem rendering a tagged PDF

 

Hi Kevin,

 

I was wondering if you could also provide some documentation on the new
extensions for tagged documents.  You did send Charles Porter (another
member of our team) a sample SBC document showing the use of
rx:alt-description and rx:pdf-structure-tag, but I still have questions on
the use of the other extensions: rx:pdf-structure-type, rx:alt-description,
rx:actual-text, rx:abbreviation, rx:pdf-artifact-type and
rx:pdf-artifact-subtype. The section on Accessibility Support in the
documentation is not very helpful.

 

Is there any difference between using the attribute role and
rx:alt-description?

 

Thanks,

____________________________________________________________________________
______
Daniel Boughton | Technical Analyst IV | RR Donnelley
W6545 Quality Dr | Greenville, WI 54942 
Office: 920.997.3635 | Mobile: 920.450.3581 | Fax: 920.997.3754
 <http://inside.rrd.net/insiderrd/pages/google/rrd_google_signature.aspx>
daniel.r.boughton at rrd.com
 <http://www.rrdonnelley.com/> http://www.rrdonnelley.com





On Wed, Oct 31, 2012 at 4:25 PM, Kevin Brown <kevin at renderx.com> wrote:

Daniel:

 

I spoke with the developer and yes this change was made at the release
(moving from an inclusion to using an <option>). So you are correct that it
is an option and he also confirms the bug that relative paths do not work.

 

We will patch *very fast* and get you a new version of software to test.

 

Kevin Brown

RenderX

 

From: xep-support-bounces at renderx.com
[mailto:xep-support-bounces at renderx.com] On Behalf Of Kevin Brown
Sent: Wednesday, October 31, 2012 1:53 PM
To: 'RenderX Community Support List'; daniel.r.boughton at rrd.com
Subject: [xep-support] Re: Problem rendering a tagged PDF

 

I am confused at best. I guess I need to examine an installation but I did
not think ROLE_MAP was an option. It is implemented just like any section of
xep.xml which can be either internal or external. Meaning that xep.xml
should have:

 

<role-map href="rolemap.xml"/>

 

Kevin Brown

 

 

From: xep-support-bounces at renderx.com
[mailto:xep-support-bounces at renderx.com] On Behalf Of Daniel Boughton
Sent: Wednesday, October 31, 2012 8:23 AM
To: xep-support at renderx.com
Subject: [xep-support] Problem rendering a tagged PDF

 

I ran into a problem testing the new version of XEP (version 4.21) when
trying to generate a tagged PDF.  I get two different errors depending on
the environment.  

 

Running under the server configuration I get the error:

    "PDFException - Pdflib internal error. Broken xref. Please report to
manufacturer!.".  

 

Running under Oxygen I get the warning:

   "com.renderx.util.RoleMapConfigurationException: Cannot parse
rolemap.xml: java.net.MalformedURLException: Invalid URL or non-existent
file: rolemap.xml; PDF accessibility turned off!"

 

I found that change the ROLL_MAP option to use an absolute path <option
name="ROLE_MAP" value="C:\Program Files\RenderX\XEP\rolemap.xml"/>, it
works.

 

I am assuming that this is a bug because fonts use relative paths and so
does the SPOT_COLOR_TRANSLATION_TABLE option.  Its not a problem using the
absolute path running under Oxygen, but it would be a problem trying to
handle that in our server configuration.


____________________________________________________________________________
______
Daniel Boughton | Technical Analyst IV | RR Donnelley
W6545 Quality Dr | Greenville, WI 54942 
Office: 920.997.3635 | Mobile: 920.450.3581 | Fax: 920.997.3754
 <http://inside.rrd.net/insiderrd/pages/google/rrd_google_signature.aspx>
daniel.r.boughton at rrd.com
 <http://www.rrdonnelley.com/> http://www.rrdonnelley.com

 

!DSPAM:87,5091428963731835264450! 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.renderx.com/pipermail/xep-support/attachments/20121101/27431327/attachment-0001.html>


More information about the Xep-support mailing list