PDF Reader

Related products: FME Form

Complimenting the PDF writer (which is being unified from having separate 2D/3D variants), this one would read vector/raster features out of geospatial PDFs
Would also use to compare non-geospatial vector PDFS. Also would like to use one of the read PDF files as a template over which additional content might be added.

This would make some of my projects a lot easier. For a certain client I have to design a map in vector and then deliver individual layers as PNG files (sometimes in excess of 20000 pixels per side). My current workflow is to write out individual PDF's from Illustrator and rasterize them in Photoshop. This often takes a lot of time and requires some manual actions by me (open file, set size, save file). If I could just run it through FME that would at the very least save me the manual work.


Would be very useful. We get lots of site plans and data in PDF's, FME could save lots of time in digisiting sites.


This would be very useful at the moment. We have upcoming requests to read PDF so would be interested in having a reader to analyse geospatial PDF


I can confirm that we've been laying the groundwork for this. Won't be in 2016.1, but I'd be surprised if we didn't a form of PDF reading by end of calendar 2016. @ciarab can I ask you to send a couple sample PDFs into support@safe.com so we can be sure your scenario is targetted?


I never realized I have to do this but true enough I have are projects that would require this. Thanks for the update @daleatsafe. If you need some more PDFs to try let me know.


There is a new open source implementation towards GDAL that reads PDF - more information here http://blog.klokantech.com/2016/08/pdfium-geopdf-driver-in-gdal-21.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+klokan-blog-osgeo+%28Blog%3A+Klokan+Petr+P%C5%99idal%29 @fme_lizard @daleatsafe


When are we expecting this (pdf reader), if at all

Thanks


We have been using A-PDF Data extractor to extract data from pdfs. We use a system caller to connect to the app. We hope to see a similiar feature directly in FME without the need of a 3rd party app.


At this moment I have no need for a PDF reader.

But I will vote for it as it might speed up the improvements for the PDF writer that I do need:

https://knowledge.safe.com/idea/38680/better-pdf-writer-support.html


Any notable progress on the PDF reader?


I tried to read text from a pdf file using a PythonCaller and the pdfminer plugin, and it went pretty well. For a start? Like this:

import fme
import fmeobjects
import sys
import chardet
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.pdfpage import PDFPage
from pdfminer.converter import XMLConverter, HTMLConverter, TextConverter
from pdfminer.layout import LAParams
from cStringIO import StringIO

# Template Function interface:
# When using this function, make sure its name is set as the value of
# the 'Class or Function to Process Features' transformer parameter
def processFeature(feature):

data = FME_MacroValues['SourcePdfFile']

fp = file(data, 'rb')
rsrcmgr = PDFResourceManager()
retstr = StringIO()
codec = 'utf-8'
laparams = LAParams()
device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
# Create a PDF interpreter object.
interpreter = PDFPageInterpreter(rsrcmgr, device)
# Process each page contained in the document.

for page in PDFPage.get_pages(fp):
interpreter.process_page(page)
data = retstr.getvalue()

e = chardet.detect(data)
u = None
try:
if e['confidence'] > 0.3:
u = unicode(data, e['encoding'])
except:
pass

if u:
feature.setAttribute('pdfcontent', u)
else:
feature.setAttribute('pdfcontent', data)
pass

There is also a custom reader at the hub:

PDF2TextReader

'


is there a PDF to Excel reader in FME?


Please build something for PDF converter!


I use poppler to read PDF as Raster. Basically it just converts pdf files to jpgs and then u read the jpg.


https://poppler.freedesktop.org/



We might have had something to do with that long ago....

 

 


I'm late to the party, but I vote for this. My primary use would be change detection between two GeoPDF's.


Hi all -- what better way to start the year than to try out the new PDF reader in FME 2018 betas. Builds 18236 and later have it. Get it from http://www.safe.com/download and let us know what you think. @ciarab @marko @redgeographics @geospatiallover @gschleusner @sigtill @cartoscro @dannymatranga @zubairsm FYI


 

@croningarrett our long awaited PDF reader 😉
Me also, but on MS windows the latest binary I could find was for v0.51, quite a way behind the latest. Not that it seems to matter for simple image extraction.