Blog

How to convert a PDF to Excel or CSV with Python

How to convert a PDF to Excel or CSV with Python

If you're a Python user and you want to be able to convert PDFs without uploading them manually to PDFTables.com, you can make use of our brand-new Python PDFTables API.

In this tutorial, I'll be showing you how to get the library set up on your local machine and how to use it to convert a PDF to Excel or CSV from a folder.

An example of a PDF conversion with Python
Here's an example of a PDF that I've converted with the library. In order to properly test the library, make sure you have a PDF handy!


Before we start

If you haven't already, install Anaconda on your machine from Anaconda website. You can use either Python 3.6.x or 2.7.x, as the PDFTables API works with both. Downloading Anaconda means that pip will also be installed. Pip gives a simple way to install the PDFTables API Python package.

For this tutorial, I'll be using the Windows Python IDLE Shell, but the instructions are almost identical for Linux and Mac.


Step 1

In your terminal/command line, install the PDFTables Python library with:

pip install git+https://github.com/pdftables/python-pdftables-api.git

If git is not recognised, download it here. Then, run the above command again.

Or if you'd prefer to install it manually, you can download it from python-pdftables-api then install it with:

python setup.py install

Step 2

Create a new Python script then add the following code:

import pdftables_api

c = pdftables_api.Client('my-api-key')
c.xlsx('input.pdf', 'output') #replace c.xlsx with c.csv to convert to CSV

Now, you'll need to make the following changes to the script:

  • Replace my-api-key with your PDFTables API key, which you can get here.
  • Replace input.pdf with the PDF you would like to convert.
  • Replace output with the name you'd like to give the converted document.

Now, save your finished script as convert-pdf.py in the same directory as the PDF document you want to convert.

PDF and Python script in the conversion directory


Step 3

Open your command line/terminal and change your directory (e.g. cd C:/Users/Bob) to the folder you saved your convert-pdf.py script and PDF in, then run the following command:

python convert-pdf.py

To find your converted spreadsheet, navigate to the folder in your file explorer and hey presto, you've converted a PDF to Excel or CSV with Python!

Converted Excel spreadsheet in its directory

Looking to convert multiple PDF files at once?

Check out our blog post here.

Icons made by Smashicons from www.flaticon.com is licensed by CC 3.0 BY
PDFTables.com uses cookies to provide a service and collect information about how you use our site. If you don't want us to collect information about your site behaviour, please go to our privacy page for more information. Read about our use of cookies.