Data Science Research Programme
PhD candidates in their own words
The Data Science Research Programme at Leiden University combines data science with PhD projects in a wide range of disciplines. The programme has been running for over two years, and is producing the first astonishing results. A number of PhD candidates talk about their experiences and research below.
Searching for 27 million patterns in 6,000 tax treaties
PhD candidate Manon Wintgens is using an algorithm to trawl through thousands of international tax treaties. She hopes to detect a system in the dizzying interplay between countries, businesses and documents. It is a unique research project. ‘One of the latest findings from my research is that former colonies generally take on tax treaties that were entered into by the former coloniser. We don’t yet know why they do this, but the pattern recognition is an important step in itself. My research also shows that many more treaties are signed in certain regions, such as West Europe, than in a region such as Africa. This was known already, but this analysis has confirmed it. It means that you can recognise, and perhaps even predict, the behaviour of the actors – countries, treaties, businesses.’
The patient forum as a goldmine
Patients with a particular disease possess a lot of knowledge about this disease. For instance, unknown side effects of a particular medicine or which diet can help relieve the symptoms. The benefit of this knowledge is twofold: it can provide pointers for new research into the disease and how to treat it, and it can be used by patients to support one another. Neuroscientist Anne Dirkson is working on software that can collect this kind of knowledge. ‘To give some idea of how much work this entails, one of the forums that I am researching consists of 36,277 messages and 1,255,741 words. Ultimately, we want to be able to compare all this knowledge with existing data to see what we already know and what is new.’
Building a bridge between data science and the social and behavioural sciences
What is the best living environment for dementia patients? To answer this question, Daniela Gawehns is using data mining methods to search through different types of data source. Her research is inadvertently building a bridge between to disciplines that are sometimes somewhat wary of each other. ‘What makes this project complex is that you’re combining data that doesn’t really fit together. The observations by social scientists, on the one hand, and the “hard data” from the trackers, on the other. We hope that this complex data will help us find links between different data sources. No one has ever done this before – particularly within the context of a care home.’
A digital eye for archaeologists
Amazing, all the images that archaeologists are receiving nowadays thanks to satellites, elevation measurements and other forms of remote sensing. They can find undiscovered archaeological objects on the scans. But so much material is available now that it is impossible to look at it all manually. In his PhD research, Wouter Verschoof-van der Vaart is therefore developing a universal artificial intelligence (AI) system that can find and classify archaeological objects on digital images. ‘The results are very encouraging. The first version of the system detected 80% of our known burial mounds in the Veluwe and the second almost 90%. I’m going to spend the next two years refining the system, and will make sure that people who aren’t data scientists will also be able to use it.’
Comparing sign languages
There are 130 known sign languages and dialects in the world. It can be difficult for linguists to translate these sign languages, as each language or dialect has differences in handshape, hand position, movement size and more. For his PhD project, computer scientist Manolis Fragkiadakis is developing a tool that can compare videos of different sign languages. This would make it possible to detect differences between sign languages and prevent translation errors. Ultimately, the tool could be used to compare sign languages from all around the world.