PDF2TEXT.COM 

::Convert PDF to XML ::Developer Edition(COM)          

 

Products     Download     Order Online     Support     FAQ   

PDF XML Converter
Developer Edition V4.0(COM)

PDF XML Converter(P2X) extract the text information from the pdf file and output them into a xml file. All the functions were encapsulated into a COM component, the exposed methods/interface is as same as PDF Plain Text Extractor(P2T), but the output file is in XML format. please check PDF Plain Text Extractor(P2T) Server Edition (COM) for detail technical information. You can integrate it into your own application and redistribute it royalty free.

The output XML format was defined in PDFDocument.xsd

Output XML sample

 

Download Now          FAQ'S            Buy It Now

 

 

<?xml version="1.0" encoding="UTF-8"?>
<PDFDocument>
  <PDFInfo>
      <title><![CDATA[ PDF Reference ]]></Title>
      <Subject><![CDATA[PDF Reference 1.4]]></Subject>
      <Author><![CDATA[Smith.H]]></Author>
      <Creator><![CDATA[PDF Writer]]></Creator>
      <Producer><![CDATA[Adobe Acrobat]]></Producer>
      <CreateDate><![CDATA[2002/06/15]]></CreateDate>
      <KeyWords><![CDATA[PDF Reference]]></KeyWords>
  </PDFInfo>
  <Pages>
    <Page>
      <PageNumber>1</PageNumber>
      <PDFElement>
          <Coordinate_X>12</Coordinate_X>
          <Coordinate_Y>34</Coordinate_Y>
          <DataString>
<![CDATA[
Hello, this is a data chunk with
special chars "~@@^%^$(^#\''"'and
line break.CDATA will deal with
this kind of data perfectly.
]]>
         </DataString>
      </PDFElement>
      .
      .
      .
    </Page>
    .
    .
    .
  </Pages>
</PDFDocument>


 © Copyright 2003 Powered by Retsina Software Solutions