GSoC 2018, Debian: Community Bonding
Hey all! Last time on "What's up April?" I ended by saying "A few more interesting things happened in these one and half month but I cannot really tell you about those right now, need to wait for a week or two". I know it has been more than two weeks 😜, almost a month now. But here it is: **drum-roll begins**
My proposal for Google Summer Code 2018 with Debian has been accepted. I will be working on "Extracting data from PDF invoices and bills for financial accounting"
Here is the link.
23rd April to 14th May is Community Bonding period, where we, selected students, get familiar with our organisation, communicate with our mentor and try to make a workflow. For this project, I am working with three mentors, Manuel Riel, Thomas Levine and Pieter Willem Moerenhout.
We have been discussing the project and decided to make slight changes to my proposal. Below is the gist of it.
Despite efforts to develop new formats for the exchange of invoices, most invoices are still exchanged via PDF, mainly due to the great fallback (can view, print, sign, add stamps, etc) they provide. Purely machine-readable formats, like EDIFACT are only used for high-volume business relationships by large companies.
In January 2018, France finalized a new standard, called Factur-X that builds on a different German standard ("ZUGFeRD"), as well as EU Norm EN 16931. This standard allows for the embedding of different XML-based invoice representations, like CII. Benefits of using this standard are:
- No change in processes. Companies can keep sending invoices in PDF format.
- Invoices can still be printed and can have graphical design elements.
- Invoice recipients using compatible accounting software can directly process these machine-readable invoices.
- No separate record needs to be kept except the original file.
- Invoices lacking an embedded version can have it added retroactively.
We will advance the ecosystem for machine-readable invoice exchange and make it easily accessible for the whole Python community by making the following contributions:
- Python library to read/write/add/edit Factur-x metadata in different XML-flavors in Python.
- Command line interface to process PDF files and access the main library functions.
- Way to add structured data to existing files or from legacy accounting systems. (via invoice2data project)
- New desktop (web?) GUI to add, edit, import and export factur-x metadata in- and out of PDF files.
You can find more about my project here
Right now, I am reading the code of Factur-x library and working on improving invoice2data by solving issues.