GSoC 2018: Final Report



This is my final report of my Google Summer of Code 2018, it also serves as my final code submission.

For the last 3 months I have been working with Debian on the project Extracting Data from PDF Invoices and Bills Details. Information about the project can be found here: 

https://wiki.debian.org/SummerOfCode2018/Projects/ExtractingDataFromPDFInvoicesAndBillsDetails.

My mentor and I agreed to modify the work to be done in the Summer. Already discussed here: http://blog.harshitjoshi.in/2018/05/gsoc-2018-debian-community-bonding.html

We will advance the ecosystem for machine-readable invoice exchange and make it easily accessible for the whole Python community by making the following contributions:
  • Python library to read/write/add/edit Factur-x metadata in different XML-flavors in Python.
  • Command line interface to process PDF files and access the main library functions.
  • Way to add structured data to existing files or from legacy accounting systems. (via invoice2data project)
  • New desktop GUI to add, edit, import and export factur-x metadata in- and out of PDF files.

Short overview

The project work can be bifurcated into two parts:
  • Main Deliverable: GUI creation for Factur-X Library
  • Pre-requisites for Main Deliverable: Improvements to invoice2data library and updating Factur-X library to a working state

Contributions to invoice2data

A modular Python library to support your accounting process. Tested on Python 2.7 and 3.4+. Main steps:
  1. extracts text from PDF files using different techniques, like pdftotext, pdfminer or tesseract OCR.
  2. searches for regex in the result using a YAML-based template system
  3. saves results as CSV, JSON or XML or renames PDF files to match the content.
My contributions: https://github.com/invoice-x/invoice2data/commits?author=duskybomb

Contributions to Factur-X

Factur-X is a EU standard for embedding XML representations of invoices in PDF files. This library provides an interface for reading, editing and saving the this metadata.
My contributions: https://github.com/invoice-x/factur-x-ng/commits?author=duskybomb

Organisation Page

An organisation created on github, invoice-x, to tie down all the repository at a single place.
link to organisation page: https://github.com/invoice-x/

Organisation Website

A static website briefly explaining the whole project. Link to website: https://www.invoice-x.org/

Main Deliverable Repository

This repository contains the code to make GUI for Factur-x Library. Link to the repository: https://github.com/invoice-x/invoicex-gui

invoicex-gui: invoice2data integration with invoicex-gui and factur-x-ng

Overview

Pre-requisites for Main Deliverable

Factur-X

To work on GUI creation for Factur-X, I first needed to update Factur-x library to a working state. My mentor, Manuel, did the initial refactoring of the project after forking the original repository, https://github.com/akretion/factur-x.

Since then I have added a few features to the library:
  • Fix checking of embedded resources
  • Converting the documentation format from md to rst
  • Added unit tests for factur-x
  • Added new feature to export metadata in JSON and YAML format
  • Cleaned XML template to add
  • Added validation of country and currency codes with ISO standards.
  • Implemented Command Line Options

Invoice2data

I started contributing to invoice2data in the month of February. Invoice2data became the first open source project I contributed to. The first contribution was just fixing a typo in the documentation, but this introduced me to the world of Free Open Source Software (FOSS).

Since, I have been selected for Google Summer of Code 2018, I have added the following commits:
  • Removed required fields in favour of providing flexibility to extract data
  • Added feature to extract all fields mentioned in template
  • Updated README and worked on conversion of md to rst
  • Added checks for dependencies: tesseract and imagemagick
  • Changed subprocess input form normal string to list
  • Added more tests and checked coverage locally
  • Fixed the ways invoice2data handles lists

Main Deliverable

Invoicex-GUI

My main deliverable was to make Graphical User Interface for Factur-X library. For this I used PyQt-5 framework. The other options for the same were Kivy and wxWidgets. I have some prior experience with PyQt-5 and a bug in Kivy related to touchpad driver of Debian inclined me to use PyQt-5.

The making the GUI was one of the most challenging part of the GSoC project. The lack of documentation for PyQt-5 didn’t help much. I have 3 years of experience with C++ and used it to learn more about PyQt-5 through its original documentation for Qt which is in C++.

The GUI includes:
  • Selected PDF and searching for any embedded standard
  • If no standard is found, give a pop up to select the standard to be added
  • Edit metadata of existing embedded standard
  • Export metadata
  • Validate Metadata
  • Use invoice2data to extract field data from invoice

Weekly Work Done

https://lists.debian.org/debian-outreach/2018/05/msg00015.html (week 1)
https://lists.debian.org/debian-outreach/2018/05/msg00029.html (week 2)
https://lists.debian.org/debian-outreach/2018/06/msg00003.html (week 3)
https://lists.debian.org/debian-outreach/2018/06/msg00029.html (week 4)
https://lists.debian.org/debian-outreach/2018/06/msg00078.html (week 5)
https://lists.debian.org/debian-outreach/2018/06/msg00106.html (week 6)
https://lists.debian.org/debian-outreach/2018/06/msg00136.html (week 7)
https://lists.debian.org/debian-outreach/2018/07/msg00019.html (week 8)
https://lists.debian.org/debian-outreach/2018/07/msg00072.html (week 9, 10)
https://lists.debian.org/debian-outreach/2018/07/msg00105.html (week 11)
https://lists.debian.org/debian-outreach/2018/08/msg00011.html (week 12) 

Comments

Popular posts from this blog

InvoiceX-GUI: Google Summer of Code Project

GSoC 2018, Debian: Community Bonding