Automate Invoice Parsing with AI and n8n
In this post, I’ll show you how to build an automatic invoice parser that converts invoices (images or PDFs) into structured data in your database or Google Sheets. This automation, powered by AI, saves you countless hours of manual data entry.
How it Works
Instead of manually reading invoices, you simply drop them into a Google Drive folder. The system automatically extracts key data and stores it in your database.
Demo
Let’s look at an example invoice with:
- Invoice number
- Date
- Due date
- Vendor name
- Total amount
I’ll drop this invoice into my designated Google Drive folder. Within a minute, the data will appear in my Airtable database.
Building the Automation Step-by-Step
Here’s how to build this automation using n8n:
- Trigger:
- Use the Google Drive trigger “On Changes Involving a Specific Folder.”
- Set up your Google Drive credentials (refer to my Google Sheets setup video for guidance, just apply it to Google Drive).
- Set the “Pull Time” to every minute.
- Select the specific folder you created for invoices (e.g., “Invoices”).
- Set “Watch For” to “File Created.”
- Download File
- Use the Google Drive “Download File” action.
- Set up the same Google Drive credentials.
- Select the “By ID” option file operation.
- Fetch the File ID from the trigger data.
- Parse the Invoice (OCR.space API)
- Use the HTTP Request node to connect to the OCR.space free OCR API (https://ocr.space/).
- Sign up for a free API key on the OCR.space website.
- Set the request method to “POST” and the URL to `https://api.ocr.space/parse/image`.
- Set up “Header Auth”, name it “APIKEY” and add the API key you’ve received to the header.
- Set the “Body Content Type” to “Form Data”.
- Set “Binary File” to “Data”.
- Clean the data with Set Node
- Add Set Node with the API parsed text data.
- Extract data with LLM Chain Node
- Use Basic LLM Chain and define prompt.
- Define the required output:
- Invoice Number
- Date
- Due date
- Vendor name
- Total amount
- Assign Open AI module and connect it.
- Output Data Structure (LLM Chain Node)
- In Basic LLM Chain enable “require specific output format”.
- Set “Define Below and create a Schema.
- Store Data in Airtable:
- Use the Airtable “Create a Record” action.
- Set up your Airtable access token.
- Choose your base and table.
- Map the extracted data fields (vendor name, invoice number, etc.) to the corresponding Airtable columns.
- Toggle “Type Cast” in options to ensure correct date formatting.
Testing and Activation
Test the workflow with a new invoice. Once everything works, activate the workflow. Now, any invoice added to your Google Drive folder will be automatically parsed and added to your Airtable database.
Conclusion
This automation streamlines invoice processing, saving you significant time and effort. Check out the video for a detailed walkthrough.
Watch this video on Youtube