Printable Version of Topic

Click here to view this topic in its original format

UtterAccess Forums _ Access Automation _ Retrieve Data From Pdfs

Posted by: GroverParkGeorge Nov 20 2019, 10:18 AM

I've been asked by a colleague about importing data from PDFs into an Access/SQL Server application.

First, being retired, this is a project that I will probably pass on.

That said, I was hoping to be able to offer some advice on how to proceed. In a nutshell, the client will receive multiple reports in pdf format, I'm guessing dozens of them at a time. The data from those pdfs needs to be imported into the SQL Server tables. In the past, this was done with text or csv files, as I recall. For whatever reason, they now want to use PDFs instead.

So, my question is whether there are 3rd party tools that can identify and extract the relevant data from a PDF into a temp table in Access for export back to the SQL Server. Or is there a way to import the data directly into the SQL Server.


Posted by: cheekybuddha Nov 20 2019, 12:28 PM

A few suggestions https://stackoverflow.com/questions/3650957/how-to-extract-text-from-a-pdf

hth,

d

Posted by: theDBguy Nov 20 2019, 12:29 PM

Hi George. Not sure about 3rd party tools (there must be some out there), but my demo on working with PDFs (on my website) uses Acrobat API to extract data from a PDF form. You could take a look.

Posted by: GroverParkGeorge Nov 20 2019, 12:31 PM

Thanks. I should have been more precise. This needs to be done from Access, preferably, but it sounds like there are tools that can be adapted to do the job.

Posted by: GroverParkGeorge Nov 20 2019, 12:32 PM

I should have started on your website. thumbup.gif

Unfortunately, I don't think they're working with Acrobat forms, though. More like "reports" saved as PDFs.

Posted by: cheekybuddha Nov 20 2019, 12:34 PM

>> This needs to be done from Access <<
Ah, I see! You meant third party Access tools!

I just envisaged shelling out to on of the suggested tools in the SO thread.

Posted by: theDBguy Nov 20 2019, 12:35 PM

Ah, well, PDFtk can extract text from a PDF, but then determining which one is data would be another matter.

Posted by: GroverParkGeorge Nov 20 2019, 01:08 PM

Yes, sloppy posting on my part.