[xep-support] Splitting of large document

Jost Klopfstein jost at axostech.com
Fri Jun 3 11:30:03 PDT 2005


Brian,

You can extract almost everything from the XEP intermediate format (SVG drawings seem to be encoded ???).

 It is a flat XML format, organized in pages, with x and y coordinate for every element.
Details see here http://www.renderx.com/reference.html#appendix_C

This approach works fine if you have a linear workflow:
1) process section 1
2) process section 2 with starting page information from previous step
3) process next sections
...
n) extract necessary information for indexes and TOC from the XEP intermediate format files (based on rx:pinpoint data)
n+1) build TOC
n+2) build indexes
n+3) assemble book


This concept fails with pointers (references/links) between sections, as you may have to change the pagination of a previous section while rendering a new section.

Cheers, Jost

Jost Klopfstein
*Axos Technologies Inc.*
OnDemand & Transactional Document Solutions, powered by XML
IT Consulting

*604 628-2248  Phone*
604-324-2380  Fax
jost (at) axostech.com
http://www.axostech.com


  ----- Original Message ----- 
  From: Brian J. Butler 
  To: xep-support at renderx.com 
  Sent: Friday, June 03, 2005 10:53 AM
  Subject: Re: [xep-support] Splitting of large document


  The biggest memory consumer seems to be the process that compresses the PDF document.  I find that our document requires more than 3GB to process with this option enabled and less than 1.5GB without it.

  Try the turning this option off in your xep.xml file as shown below:

      <!-- Backend options -->
      <generator-options format="PDF">
        <!-- <option name="COMPRESS" value="false"/> -->
        <!-- <option name="PDF_VERSION" value="1.3"/> -->
      </generator-options>
        
  By the way, I have a problem similar to yours. Our 2200 page book is only one part of a larger catalog.  Unfortunately, the other sections contain references to products in the main section.  I had posted some messages here earlier to see if there is a way to write page numbers and other data into the XEP log. I thought I would extract the data with an editor and put it in a database, using it later to look up page references for the other sections. I don't think this is possible, but someone suggested looking at the XEP intermediate format.  I have not done this yet.  If you look at it and can distill the information I would like to know how to do it.  If I try it first, I will post here.

  BJB

  Jost Klopfstein wrote: 
Mike, Brian,

Thanks for your hints.

The document is in the 1000+ pages class with plenty of large SVG drawings
(>500 Kbytes).

I will check with my customer if I can drop the references between chapters.
If so, then I can use the split technology and build the TOC and Indexes
from the separate sections in XEP intermediate format.
Otherwise they have to buy a bigger server...

Does anyone know if there is an option to prevent XEP's in memory
processing?

Cheers, Jost

----- Original Message ----- 
From: "Mike Trotman" <mike.trotman at datalucid.com>
To: <xep-support at renderx.com>
Sent: Friday, June 03, 2005 3:05 AM
Subject: Re: [xep-support] Splitting of large document


  I have successfully processed 100mB+ documents of 1000+ pages - mainly
consisting of heavily formatted tables with 15 x 20 cells per page,
multiple pages per table, lots of data per cell, footnotes etc.
This included bookmarks and a simple Table Of Contents with internal
links to individual tables.

By placing each table / document chunk within a separate
<fo:page-sequence> I was able to keep the memory requirements very low
(not much more than the default).
I'm now also using XSLT pre-processing where I produce each
<fo:page-sequence> in a separate XSL-FO file and generate  a master
processing document which sets up regions and page masters
and contains a list of the separate <fo:page-sequence> files to include.
I then process this master list with a simple XSLT to produce the final
FO for output to PDF.

I haven't used indexes (the TOC references etc. are constructed by the
XSLT) - so don't know what sort of overhead this produces.


Mike

Brian J. Butler wrote:

    I have also been working on a very large document (88MB FO file, 2200
pages of technical text and drawings). I can offer the following three
suggestions:

1. Make sure your Java -Xmx size is as large as possible.  With
Windows this will be approximately -Xmx1600Mb.
2. Use the XEP flag to turn off PDF compression (in xep.xml or command
line).  This will result in a very large PDF, but you can compress it
after rendering by opening it in Adobe Acrobat and then saving.
3. Switch to a 64-bit Solaris platform (Opteron processors).  We
benchmarked one of these machines and found that we can -Xmx almost
unlimited memory.  The speed is also very fast.

BJB

Jost Klopfstein wrote:

      Hi,

I ran into memory problems while rendering a large book with TOC,
indexes and references between sections.
I first thought I could just render section by section into XEP
intermediate format and then assemble the pieces with some custom
code into a large PDF using the PDF output generator.
However I will loose the TOC, indexes and the references between
sections.

Any ideas?

Thanks,
Jost

        ------------------------------------------------------------------------
      Jost Klopfstein
*Axos Technologies Inc.*
OnDemand & Transactional Document Solutions, powered by XML
IT Consulting

*604 628-2248  Phone*
604-324-2380  Fax
jost (at) axostech.com
http://www.axostech.com
        -- 
Brian J. Butler
BJB Software, Inc.
76 Bayberry Lane
Holliston, MA 01746

E-mail: bjbutler at bjbsoftware.com
Web:    http://www.bjbsoftware.com
Phone:  508-429-1441
Fax:    419-710-1867



      

-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.322 / Virus Database: 267.5.2 - Release Date: 03/06/2005


Message Scanned by ClamAV on datalucid.com
-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo at renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service
    http://www.renderx.com/terms-of-service.html
      
-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo at renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/terms-of-service.html



  

-- 
Brian J. Butler
BJB Software, Inc.
76 Bayberry Lane
Holliston, MA 01746

E-mail: bjbutler at bjbsoftware.com
Web:    http://www.bjbsoftware.com
Phone:  508-429-1441
Fax:    419-710-1867

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.renderx.com/pipermail/xep-support/attachments/20050603/a5ff6512/attachment.html>


More information about the Xep-support mailing list