Meet our PDF to Excel libraries for the PDFTables API

PDF to Excel libraries on GitHub

If you aren't already aware, PDFTables has an API! You can interact with it in a number of languages, including Python, PHP, Java and many more.

In the past, we provided example snippets on our PDF to Excel API page for each of the languages.

We're always looking for ways to make it as easy as possible for developers to use our API, so we've decided to package each API into easy-to-use libraries and host them on GitHub!

Here are the new libraries, and links to the GitHub repositories with examples:

Python - pdftables/python-pdftables-api

Java - pdftables/java-pdftables-api

C# - pdftables/csharp-pdftables-api

PHP - pdftables/php-pdftables-api

Go - pdftables/go-pdftables-api

R - expersso/pdftables (Unofficial package)

C/C++ - pdftables/c-pdftables-api

How to activate the Developer tab in Excel

How to activate the Developer tab in Excel

So you've found a tutorial that shows you how to code a macro, but you've hit a roadblock: where's the Developer tab they keep referring to?

Excel doesn't show the Developer tab by default, so you'll need to dive into the options to find the correct setting.

In this tutorial, I'll show you how to activate the Excel Developer tab.

Step 1

Open Excel from the Start menu, and create a blank workbook by double-clicking the Blank workbook option (selected by default).

Creating a blank workbook in Excel

Step 2

Open the Excel options window (File → Options) and navigate to the Customize Ribbon tab, then on the right hand side, tick the box next to Developer. Click OK to close the dialog.

Enabling the Developer tab in the Excel options window

Step 3

Back on the spreadsheet view, you'll see that a new Developer tab has been added to the end of the tab list.

Click DEVELOPER, and you'll see the following:

The activated Developer tab in Excel

Congratulations! You've activated the Developer tab!

From now on, the tab be visible by default whenever you create a new Excel workbook.

Next steps

Want to try your hand at creating your own macro? Check out our easy-to-follow PDF to Excel using VBA guide.

Alternatively, if you'd like to learn more about VBA as a programming language, check out the Excel VBA Tutorial from

How to convert a PDF to Excel with Python

PDF to Excel with Python header

If you're a Python user, and want to be able to convert PDFs without uploading it manually to, you can make use of our brand-new Python PDFTables API.

In this tutorial, I'll be showing you how to get the library set up on your local machine, and how to use it to convert a PDF in a folder to Excel.

An example of a PDF conversion with Python
Here's an example of a PDF that I've converted with the library. In order to properly test the library, make sure you have a PDF handy!

Before we start

If you haven't already, install Python on your machine from the Python website. You can use either Python 3.5.x or 2.7.x, as the PDFTables API works with both.

For this tutorial, I'll be using the Windows Python IDLE Shell, but the instructions are almost identical for Linux and Mac. Depending on the version of Python you have, you will also need to install the pip package management system for Python.

You'll also need to create an account on in order to get your free PDFTables API key.

Step 1

In your terminal/command line, install the PDFTables Python library with:

pip install git+

Or if you'd prefer to install it manually, you can download it from python-pdftables-api, and install it with:

python install

Step 2

Create a new Python script, and add the following code:

import pdftables_api

c = pdftables_api.Client('my-api-key')
c.xlsx('input.pdf', 'output.xlsx')

Now, you'll need to make the following changes to the script:

  • Replace my-api-key with your PDFTables API key, which you can get here.
  • Replace input.pdf with the PDF you want to convert.
  • Replace output.xlsx with the name of the converted spreadsheet.

Now, save your finished script as in the same directory as the PDF document you want to convert.

PDF and Python script in the conversion directory

Step 4

Open your command line/terminal, and change your directory (e.g. cd C:/Users/Bob) to the folder you saved your script and PDF in, then run the following command:


To find your converted spreadsheet, navigate to the folder in your file explorer and hey presto, you've converted a PDF to Excel with Python!

Converted Excel spreadsheet in its directory

Milner Group Insurance brokerage automates PDF to Excel and realises 10x financial savings

Converting PDFs to Excel for Milner Group
Automation realises 10X financial savings, increases efficiency and means better accuracy in our reporting.
- Warren Yancey, Marketing Director

Established in 1958, The Milner Group is a highly respected insurance brokerage in Atlanta, Georgia. Like all insurance agencies, it has been looking at ways to improve the service it offers its valued insurance agents. It also wants a laser focus on debt commitments.

Improving Customer Service

The processing of agent commissions relies on information flows from many carriers like TransAmerica Life Assurance and Legal and General America. Each carrier sends its commission reports to The Milner Group in a PDF document. Processing these reports more quickly means that agents can get paid faster.

10x Financial Savings Through Automation

“We’ve been doing this by hand for 13 years” says Danielle Graves, Director of Internal Operations at The Milner Group. “The last five years has seen a big increase in volume and we’ve been looking to automate it. We want to take the raw data from PDFs of multiple carriers and import these to OneHQ, our customer relationship management software Processing these PDFs by hand, copy and pasting the information every time, is a slow and laborious job. And we have a lot of PDFs to process each week”.

Converting PDFs to Excel for Milner Group
An example of the PDFs that are being converted on a regular basis

Josh Powell of Innovative Operations, Milner’s trusted integrator, identified as a solution. He’s been working with our data engineers to create a system for Milner. The solution was developed over a few weeks.

“It’s simple!” Josh says. “We’ve written a Ruby on Rails app that calls the PDFTables API, runs a series of validation routines on the extracted PDF table, and produces a spreadsheet for each carrier that’s exactly in the form that the office staff need. In just a few mouse clicks, the operations team at Milner are ready to import carrier data to OneHQ. Our next step is to automatically push the data to OneHQ.”

Danielle added, “I am extremely pleased with the progress and joint collaboration on this project.”

Milner Group reports automation login interface
The new interface for automatically converting carrier reports

Supercharging Their Team

Warren Yancey, Marketing Director says, “We welcome innovation. Our ops team are saving a huge amount of time which can be better spent on direct customer service. Automation realises 10X financial savings, increases efficiency and means better accuracy in our reporting.”

About Milner Group

The Milner Group is a full service insurance brokerage agency providing impaired risk, life, annuity, health, disability and long term care insurance products to agents nationwide.

About Innovation Operations

We help small and medium sized businesses leverage technology to streamline their operations, better share information, and manage complex workflows. We personally design and build custom software when appropriate, and integrate third party services when they deliver value.

Press Contact: Josh Powell / josh (at) / Tel +1 (305) 814 4878


PDFTables is made by The Sensible Code Company. We make products that turn messy information into valuable data. We work with systems integrators and corporate customers to help streamline front and back office operations that rely on external data sources. We also make QuickCode which is a place where statisticians and economists can up skill up in Python and R whilst working on their operational data.

Press Contact: Tristan Bacon / tristan (at) / Tel +44 (0)151 3315200

New and improved PDF to Excel conversions with PDFTables

We continue to improve the algorithm that analyzes and retrieves content from your PDFs. We've recently implemented some larger updates to our algorithm and thought this would be a good opportunity to show you some of our work!

Let's consider some examples - we'll start with what we internally call a shipping-manifest-type PDF.

table pdftables output

Customers approach us frequently with this type of document (in all its variations!) and give us feedback on how the conversion went. Having access to customer PDFs enables us to fine-tune our algorithm which results in an improvement for the customer. We make changes to the PDFTables algorithm with one aim only: to minimize the time you have to spend on post-processing the extracted data!

If you'd like to find out whether we can improve the content extraction for your PDF files, get in touch at . Please don't forget to attach the relevant PDFs!

Next let's answer the question you've probably been dying to ask:

How hard can it be to extract data from PDFs?

PDF files contain only the most basic information necessary to display their content. Most of the time, all we have to work with is a series of simple graphical instructions such as drawing a character at point, or drawing a line from point A to point B. PDFs generally do not contain higher-level data structures such as tables. While humans can easily recognize such structures visually, it is quite a different thing to teach a computer to perform this task. This is where PDFTables comes in - our algorithm analyzes the spatial information contained in a PDF to construct tables!

Consider this fairly straight-forward example:

table pdftables output

PDFTables did a great job recognizing the column header containing multiple lines of text. This is one of the improvements we've recently made: using the line information to deduce whether multiple lines of text belong in one cell.

The above table had a clear structure so let's next take a look at one that might be not as obvious:

pdftables output
Looking at this table ourselves, it is immediately clear which content belongs together and therefore which rows should be merged, and which shouldn't be. It is less clear to a computer though because it does not have the semantic and contextual understanding we have. The algorithm can learn from past data, look at heuristics, and with this information, compute an outcome.
That's why it is important that you sent us any PDF that you'd like to see improved!
Every additional PDF helps us to improve our system and to find edge cases. I hope we've given you some small insights in what makes PDFTables work, as well as hinted at some of the work that's still ahead of us.
Got a PDF for which PDFTables returns an output that you'd like to see improved? Get in touch at or on Twitter.

Amazingly, PDFs are more popular each year

More people search for "pdf" (compared to other terms) than they used to. Over twice as many now as at the low point, back in 2007. That's in addition to the increase in overall search volume of all terms! Although to geeks PDF feels like a dated format, really it is about the same age as the web, and designed for a similar purpose - sharing documents. In 1991, executives didn't read on screens, so PDF differed from HTML by concentrating on printing. John Warnock, cofounder of Adobe, describes the vision in the very first memo on PDFs:

Imagine being able to send full text and graphics documents (newspapers, magazine articles, technical manuals etc.) over electronic mail distribution networks. These documents could be viewed on any machine and any selected document could be printed locally. This capability would truly change the way information is managed.

PDF became an ISO standard as late as January 2008, so it isn't surprising that it is increasingly popular. Right now, the highest search volume for "pdf" comes from Cuba. All the rest of the top 10 are African countries.

Just over a decade earlier, the countries most interested in PDFs were very different. Iran and North Africa at the top.

Maybe there's a particular stage of Internet access, open Government publishing or corporate reporting, which causes the search volume for "pdf" to increase in a particular country.

Here's to the PDF! Digital paper, yes, but still useful!

Getting contacts out of PDFs and into your phone

Henry Morris

Henry Morris is always in a hurry. He's changing the lives of thousands of young people and he has no time to waste. When I asked what motivates him, he replies:

Working with amazing people to make a positive impact on the world

He's a serial social entrepreneur. After a short stint playing at being an investment banker he set up his first social enterprise called upReach When I ask him why he says:

It's about helping players to play the game better

It supports undergraduates from less privileged backgrounds secure top jobs. It prepares people with the skills they need to be successful to get jobs which ordinarily might seem out of reach and to date its supported 400 undergraduates from less-privileged backgrounds. When the organisation became operational, he knew it was time to move on.

About his new start-up PiC (Performance in Context) which is also about social mobility, Henry says:

This time we're trying to change the game

PiC looks at a person’s performance in relation to their educational background. People with massive potential are being ignored as the measurement system is skewed to look at the top performers from the best schools.

It’s arguably easier for a young person to be as good as the next in a highly successful school, the value of a young person being a top performer in low performing school can be much higher.

Henry wants to re-calibrate the measurement system and open data is his friend. His efforts are all the more critical as people fork out the equivalent of a mortgage to put themselves through tertiary level education.

PiC front page

When Stephen Covey wrote The 7 Habits of Highly Effective People he must have had Henry in mind. He's busy and he manages his time very effectively.

Apple Contacts is his preferred digital rolodex. When I spoke to Henry he was between appointments. He’s proactive and meets lots of people at conferences. He often asks the event organiser for the delegate list to be dispatched to him electronically. So no surprise that it usually arrives as a PDF.

He uses to convert the PDF to Excel. He transfers the data into a Google Sheet which has headers that map to Apple Contacts. He adds notes that give context to each entry. Once the Google Sheet is ready he exports a CSV and imports it into Apple contacts. Job done!

Contacts data entry template

When Henry needs to make contact - he's able to identify when and where he met the person.

Thanks for using Henry and long may your amazing work continue!