[xep-support] Indexing again

Gustaf Liljegren gustaf.liljegren at xml.se
Tue Jan 15 09:11:05 PST 2002


I working on an index solution for a book. Nikolai once gave me the advice 
to use XEP's own XML output to collect the page numbers generated by the
<fo:page-number-citation> elements. The reason I want to do this is that
there may be so many requirements for an index that I prefer just to
get the page numbers out, and start building the index pages isolated from
the rest.

Before starting to implement this, I'd like to ask for more 
advice on how to do it. The current plan is something like this now:

1. The XML file (the book) contains words/phrases marked up with <index>
tags. The stylesheet have a page-sequence for the index, where index items
are collected (word + page). There are no particular styling on these pages
since they will be replaced later. And there needs to be no sorting at
this stage.

2. Generate the FO file. Now the index pages contains unique identifiers
(not page numbers as I once thought, before I thought):

<fo:page-number-citation ref-id="d0e1076"/>

3. Run XEP in PDF mode to get the book visually, except the finished index
pages.

4. Run XEP in XML mode to get the words + page numbers in readable XML
markup.

5. Write a SAX filter to sort out the index data, and collect it in a new
XML file (called the "index file").

6. Write an FO stylesheet to generate the index pages from the index file.
Styling and sorting is added here.

7. Replace the index pages in the PDF.

I know it's possible to build your own XEP formatter just to collect this
kind of information, but it would take too long for me to complete this
time. I'm not experienced in Java and XEP architechure. Therefore I have
choosen to build this filtering routine based on the common XML output.

Things I'm pondering upon now: Can the process be streamlined in a more
efficient way? How should the page-sequence for index pages be written 
to make filtering as easy and reliable as possible? Is 5 a task for 
XSLT, rather than SAX? Any comments or ideas are greatly appreaciated.

Gustaf


-------------------
By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/tos.html



More information about the Xep-support mailing list