'Biologists also need to be a bit of a data analyst’
Biologists today have to be able to work with big data. Data analysis skills should be taught from the start of the degree programme, or - even better - in secondary school. This is the message of Vera van Noort, new Professor of Computational Biology. Inaugural lecture 22 January.
Biologists have always been fascinated by living things and research on the structure of living organisms has always been important in their work. 'In botany in the 17th century the main objective was to determine the multiplicity of biological structures, so that plants could be categorised in groups and genuses,' Professor Van Noort explains by way of illustration. 'In the 20th and 21st century biology is more about understanding mechanisms and functions.' Biological structures are still analysed, such as three-dimensional structures of proteins in cells. 'But this is mainly to discover how these proteins work and the system in which they work.'
Studying the biological system
This kind of research uses large amounts of data. To study a biological system we measure the presence of particular chemical substances, such as proteins in a cell. 'We no longer measure the presence or absence of one protein in the cell, but of thousands of proteins at the same time, as well as the concentration of each of these proteins. We call this proteomics, and it generates enormous amounts of data.' We see the same thing with genomics, where huge amounts of DNA sequences are produced. Microscopy isn't being left behind either, with the enormous numbers of high-resolution images it produces. 'All this means that biology has become a big data science.'
Data analysis
Biologists today have to be able to handle all these amounts of data: they need to be able to structure data, use databases skilfully, apply the right statistical modelling techniques and draw the right conclusions, even when the data is incomplete or contains distractions. 'You have to understand where correlations come from and whether there could be any problems with the underlying data that could lead to the wrong conclusions. In short: a biologist needs to have the same skills as a data analyst,' Van Noort posits.
Learning skills
You can't learn these skills with a single course in bio-statistics as part of your Biology programme, Van Noort warns. She compares it to playing the piano. A year of piano lessons in your childhood is not enough to be able to play the piano beautifully for the rest of your life. You have to keep on practising, even if you don't intend to become a composer or a concert pianist. ‘In the same way I am convinced that we have to teach data analysis skills from the start of the degree programme - and preferably from junior or secondary school - and keep these skills up to date, even if we don't want to be a mathematician or a statistician. The earlier we start, the better we will be. The more we practise, the more natural it will be.'
Specialist in biology
But can every data analyst do biology research and are biologists redundant? Not at all, Van Noort reassures us. 'Without sound knowledge of biological systems you can't come up with research questions relating to biology. Just amassing random data and applying a model to it is not useful. As biologists we have to be specialists in biological knowledge, and we also need particular skills from other disciplines.'