In this blog post we will be showing you how to change the delimiter in a CSV file using Python. A CSV (comma-separated values) file is a text file that uses commas to separate values and can be opened in Microsoft Excel, Google Sheets, a text editor and more.
There may be a time when you need to change the delimiter in order to, for example, import your file into an application. In this example we will be changing the delimiter from a comma to a semicolon.
Before we start
If you don't have Python installed on your computer, follow steps 1 and 2 of blog post Convert PDF to Excel, CSV or XML with Python. You will now have Python, Anaconda and the PDFTables Python library installed.
I will be changing the delimiter in a sample invoice from a freight company that has been converted from PDF to CSV using PDFTables.com.
Step 1
Create a new Python file in the location where your CSV file is saved. Make sure the file is saved as .py
format and use a file
name of your choice. Add the following code to the new file:
import csv reader = csv.reader(open("freight_invoice.csv", "rU"), delimiter=',') writer = csv.writer(open("output.txt", 'w'), delimiter=';') writer.writerows(reader) print("Delimiter successfully changed")
You will need to make some or all of the following changes to the script:
- Replace
freight_invoice.csv
with name of your input file - Replace
output.txt
with name you'd like to give your output file - Replace the semicolon in
delimiter=';'
with a new delimiter of your choice
Step 2
Open up an Anaconda Prompt instance. You can find this by searching in your computers search box. Press Enter to open an instance.
Locate the folder in which the Python script is saved using cd ..
or cd *folder_name*
to move around the directories.
Step 3
Type python change_delimiter.py
(replacing change_delimiter.py
with the name of your Python file) then press Enter.
The comma-separated file will now be read in then a new file will be output in .txt
format with the new delimiter.
You will see the message Delimiter successfully changed
once the script has finished running.
Step 4
You can now open the new file in a text editor or import it into an application.
Converting PDF to CSV first
If you need to extract data from a PDF first, you can convert PDF to CSV using PDFTables.com. The following code uses the PDFTables API to convert PDF to CSV, then uses the code snippet from above to change the delimiter.
import csv import pdftables_api c = pdftables_api.Client('my-api-key') c.csv('freight_invoice.pdf', 'freight_invoice.csv') reader = csv.reader(open("freight_invoice.csv", "rU"), delimiter=',') writer = csv.writer(open("output.txt", 'w'), delimiter=';') writer.writerows(reader) print("Delimiter successfully changed")
You will need to make the following changes to the script:
- Replace
my-api-key
with your API key which you will find here.
Do you have more questions?
Check out our other blog posts here or our FAQ page. Also, feel free to contact us.
Love PDFTables? Leave us a review on our Trustpilot page!