The patient forum as a goldmine
Patients with certain diseases discuss their experiences and support one another on specialised internet forums. With the right data-science methods, these forums can be a goldmine of information for researchers. PhD candidate Anne Dirkson is researching these methods.
The Data Science Research Programme at Leiden University combines data science with PhD projects in a wide range of disciplines. The programme has been running for over two years, and is producing the first astonishing results. We discuss some of them in this series of articles.
Patients with a particular disease possess a lot of knowledge about this disease. For instance, unknown side effects of a particular medicine or which diet can help relieve the symptoms. The benefit of this knowledge is twofold: it can provide pointers for new research into the disease and how to treat it, and it can be used by patients to support one another.
Neuroscientist Anne Dirkson, a PhD candidate on the Data Science Research Programme, is working on software that can collect this kind of knowledge. She is using a data mining system that searches through the discussions on specialist patient forums. She is using several patient forums to develop the software, but the particular focus of her research is a forum for patients with a rare form of tumour: Gastrointestinal Stroma Tumour. She hopes to create a tool that will be able to search all sorts of forums for this kind of information.
You have been working on this for a year now. What are the initial findings?
‘I’ve almost finished tidying up the data from the forums. That means getting rid of abbreviations, spelling mistakes and colloquialisms. These kinds of anomaly make it difficult for software to search through text. I have also filtered the experiences of patients from all of the messages, and removed other types of message – such as messages of support. Over the next year, we are going to build a system that can search through the text and build up the knowledge base. To give some idea of how much work this entails, one of the forums that I am researching consists of 36,277 messages and 1,255,741 words.
‘One challenge will be to get the system to recognise concepts such as a side-effect. There are many ways do describe a headache, for instance, whereas it is tremendously important for our knowledge base that all these descriptions can be placed in the category of ‘headache.’ And what we ultimately want, once the knowledge base has been built, is also to be able to compare all the collected knowledge with existing data to see what we already know and what is new. The knowledge base should also give some direction to new research by experts. Another thing we want to look at is to what extent the direction that the knowledge base indicates is plausible. There is still a lot to do, but it should be possible.’
Did you know much about data science when you joined the Data Science Research Programme?
‘I have a background in neuroscience, and in that capacity you do come into contact with big data. For instance, mapping the brain. But I didn’t know much about data mining methods. The good thing about the Data Science Research Programme is that you come into contact with PhD candidates who are very good in the theory of data science as well as with PhD candidates who know a lot about a certain field. This has enabled me to brush up my knowledge of data science and the fields of application.’
What is the benefit of a programme such as the Data Science Research Programme?
‘Within the academic world, it is always a challenge to find other researchers with research methods that could be of interest to you. The Data Science Research Programme places all the PhD candidates who might be able to work together in one room, which means you quickly get a good idea of possible data-science methods. You also learn a lot from each other when it comes to practical matters, such as solving programming problems.’
The Data Science Research Programme is a University-wide programme that aims to advance data science research and accelerate the use of data science methods at all faculties of Leiden University.