InvoiceX-GUI: Google Summer of Code Project
Most of you might be aware that this summer I have working as a Google Summer of Code student at Debian. It has been an enjoyable and satisfying journey so far. Full of anxiety, anxiousness and arousal if I may say so. I have been working of project which involves making GUI for Factur-X library. Factur-X library reads a PDF invoices to discover attached XML file which follows a certain e-invoicing standard such as Factur-X, Zugferd or UBL. It enables validating the xml, reading the values and also allows to edit them.
Most of the GUI related work has been completed and I thought it is the ripe time to explain this project and giving intricate working of the same. I will go through tech stack used, dependencies, writing tests, difficulties faced.
- Preview the PDF opened.
- Search for any attached standard XML.
- Display fields in dock.
- Edit Fields
- Export metadata in form of XML, JSON, YAML
- Save the PDF
- Automatically discover fields values for an empty XML
Framework and Packaging
For InvoiceX-GUI, I am using PyQt5 framework which is a wrapper for the famous Qt framework for C++. The last number of widget and stability were the pros which made me feel to use PyQt5 rather than Kivy. I have explained the problem with Kivy in one of my previous blogs (GSoC 2018: Week 6). I have to say the learning curve for PyQt5 is really steep. Main reason being lack of documentation, for most part I had to rely on Qt5 documentation which is in C++ and Stack Overflow.
GUI not being the strong suite of python, you have to struggle a lot to be able to package your application properly. I might have spent more than 40 hours just searching and researching for a stable and easy to use packaging library. I went through most of the known libraries such as bb_freeze, cx_freeze, py2exe, py2app, but in the end had to settle with pyinstaller for it's ability of being cross-platform and much better community than the others I mentioned.
Pyinstaller gives an option of editing spec file, which contains details from how the application to behave to all the binary data files to include.
Packaging part has been the most frustrating part of this project. But the feeling of being able to build an executable after giving in so much is second to none.
Structure of GUI
InvoiceX-GUI has been given an legacy look with classic Menu Bar, Tool Bar, Docks and a Central Stage. File menu contains three sub menus, File, Command and Help
- File: Contains options like, Open PDF, Save PDF, view Fields.
- Commands: It has all the options of playing with attached XML
- Help: Link to Github Repository and about option
To preview the PDF I am using external applications Imagemagick and GhostScript. This is done using subprocess and the snippet for this is
convert = ['convert', '-verbose', '-density', '150', '-trim',
self.fileName, '-quality', '100', '-flatten',
'-sharpen', '0x1.0', '.load/preview.jpg']
One of the most exciting feature of InvoiceX-GUI is automatic field detection from PDF invoice. For this I am using invoice2data libarary (a library to which I contribute and also I am an active maintainer of invoice2data). Invoice2data, extracts texts from PDF invoices and if the text matches any of the existing or specified template then it maps different values to different fields, like seller, amount, date. It also supports extracting text from photographs of invoices using tesseract-OCR.
Writing Code for GUI is just one part of the GUI, the other part involves, writing tests, setting up Continuous Integration, adding Makefile, Documenting the project.
All these things were new to me, maybe except the part of writing tests. I learned about writing tests when I was contributing to invoice2data library. This week was all about setting up Continuous Integration, adding Makefile and writing tests for the project.