Using statistics to prevent the loss of blood donors
The Sanquin blood bank gathers data on every donation. Around 720,000 donations are made every year. ‘That generates a mountain of highly valuable data,’ says Leiden PhD candidate Marieke Vinkenoog.
To be able to extract useful information from the data, Sanquin has joined Leiden University's Data Science Research Programme, an interfaculty research programme combining specialist knowledge with data science.
‘The aim of my PhD research is to use data science to make blood banks more efficient,’ explains Vinkenoog, who has been working with Sanquin for the past year via the Data Science Research Programme. ‘At the moment I’m looking at the measurements of the haemoglobin levels of donors.’ These levels are measured prior to each donation.
Haemoglobin levels
There has to be a sufficient level of the protein haemoglobin in the blood to transport oxygen throughout the body. When a donor gives blood, about half a litre of blood is taken, containing haemoglobin. If the haemoglobin level is too low, it could mean that the donor is left with too low a level of haemoglobin after giving blood. Donating blood would then be bad for the donor's health, and they would not be able to give blood on that occasion. They can return after three months to try again.
‘This accounts for around six per cent of women and three per cent of men who are sent home without giving blood,’ Vinkenoog explains. ‘That’s an inefficient use of time for the blood banks. It costs time, and they don’t receive any blood in return. Not only that, but it’s also demotivating for the donor.’ People who are unable to donate blood for this reason often don’t come back to the blood bank after being refused two or three times. This means that Sanquin loses donors.
Predictive modelling
A haemoglobin level that is too low to allow a donor to give blood can be caused by diet and lifestyle. After giving blood, it takes a number of weeks before the level is restored. How quickly that happens varies from person to person. Sanquin has therefore been working for a number of years with models that predict how often they can call on donors without causing their haemoglobin levels to fall too low, which could result in a fruitless trip to the blood bank. ‘Traditional statistical models were always used,’ Vinkenoog explains. These models work best with structured data, where a person's haemoglobin level is measured regularly, possibly weekly. But Sanquin’s data come from the real world, where data collection can be rather irregular: sometimes a person will come and give blood again after three months, or maybe after two years. That makes it difficult to construct a predictive model.’
For some loyal donors, who have been giving blood for maybe ten years, there may be no regular measurements, but nonetheless there are a lot of measurements. Vinkenoog also makes use of these data. ‘I’m hoping to discover a predictable trend in the data using modern machine learning techniques. These can be trained to recognise relationships in large amounts of data.’
If Vinkenoog develops a model for the haemoglobin level within the coming year, she will be able to explore other ways of personalising blood donation so that it fits better in the life of the donor. ‘But for now I have my hands full with the haemoglobin data.’
Text: Dorine Schenk
Mail the editors