Data science at the Netherlands Tax and Customs Administration
How can data science improve tax administration? Mark Pijnenburg, a senior data scientist at the Netherlands Tax and Customs Administration, decided to investigate this in a dissertation. He talks about his experiences: 'Sometimes a technique is scientifically interesting, but not applicable in reality.'
Doubled workload
Over the past twenty years, the amount of work for the Tax and Customs Administration has doubled, while the number of employees - some 30,000 - has remained roughly the same. ‘So we have to work more efficiently,’ says Mark Pijnenburg, who recently obtained his PhD at the Leiden Institute of Advanced Computer Science (LIACS). ‘Data science can help us do that.’
Conflicts of interest
Doing research at the place where you work has its advantages and disadvantages, says Pijnenburg. One disadvantage is that you can be seen as a butcher who inspects his own meat. ‘I think that as an internal researcher it is more difficult to investigate sensitive issues such as discrimination. That brings a lot of emotions internally and externally, and from the outside one might think that there is a conflict of interests. So I didn't do any research into discrimination but looked at the technical side of the data analysis.’
On the other hand, it creates new opportunities, says the data scientist. ‘This doctoral research gave me the opportunity to investigate new methods at the Tax and Customs Administration, something I normally never had had time for. In addition, my co-supervisor Wojtek Kowalczyk gave me useful tips, for example about finding specialist literature. Both aspects have improved my work at the Tax and Customs Administration.’
Putting science into practice
In addition, Pijnenburg was able to bring his practical experiences to science, resulting in cross-pollination. ‘At conferences, many people found my practical experiences at the Tax and Customs Administration interesting. That resulted in great conversations from which I received valuable input.’
Pijnenburg also made an impact within his own organisation. ‘The management team of the small and medium-sized business unit read one of my articles, which employs more than 8,000 people. I thought it was great that they could also appreciate my work in that layer of the organisation.’
A grey area
However, not all of Pijnenburg's insights were adopted by the Tax and Customs Administration. ‘For example, I carried out research into so-called factorisation machines, which can make a good contribution to our risk models. We use specific information, such as postal codes and industry. The combination of restaurant and industrial site, for example, would be suspicious. My research showed that this method worked quite well, only the management team decided not to apply it in practice. Postal codes can also be related to socio-economic background and then it soon becomes a grey area.’