A whole new (computational) world
Historical linguist Lauren Fonteyn is awarded a substantial Digital Infrastructure grant as well as being selected as member of the prestigious Young Academy Leiden (YAL).
It was a great day at the office for Lauren Fonteyn, a historical linguist at the Leiden University Centre for Linguistics. First she was awarded just under 438.000 EUR for a 2-year interdisciplinary research project MacBERTh. This project is aimed at creating a computational language model for understanding how meanings and concepts are conveyed in historical texts. And if that wasn’t enough, she also received news that she had been selected as member of the Young Academy Leiden (YAL). This group of early-career academics strives to boost research across disciplines, engage in science and policy education, and encourage public outreach.
“I am beyond happy that we have been awarded the Digital Infrastructure grant. This grant will help us create the first set of deep neural language models pre-trained on historical textual material (Dutch and English) from different time periods,” says Fonteyn. “And I feel very honoured to have been selected as a member for YAL. I’m looking forward to contributing to their efforts of stimulating interdisciplinary research and connecting researchers to society.”
Digital Humanities and Social Sciences
The digitisation of texts has offered researchers a wealth of information to analyse. However, generating this data ‘by hand’ is extremely labour-intensive and time-consuming. This is why a growing number of Humanities and Social Sciences scholars are becoming interested in using computational methods of research. Unfortunately, many of these researchers do not have easy access to state-of-the-art computational language models. And so Fonteyn and her colleagues started thinking about how existing computational models could be designed to specifically suit the needs of these Humanities and Social Sciences researchers.
“I believe that providing access for researchers from Humanities and Social Sciences to models and tools developed in the machine learning community will encourage dialogue and build necessary bridges between different fields and disciplines,” Fonteyn says.
From BERT to MacBERTh
One example of a so-called deep neural computational language model is BERT (Bidirectional Encoder Representations from Transformers), which can capture subtle and complex meanings of words and phrases by compressing the context in which they occur into numeric vectors. “This is particularly exciting, as models such as BERT offer a more objective and data-driven way of approaching texts, and may help avoid (unintended) biases of researchers,” Fonteyn explains. Together with her colleagues, Fonteyn wondered whether they could train BERT so that it could be applied to historical texts and language variaties. The project MacBERTh was born.
Fonteyn: “With MacBERTh researchers interested in how meanings and concepts are conveyed in historical texts will be able to address questions like which words or phrases express that a person is considered ‘an outsider’?’ or which terms does Shakespeare use in Macbeth to refer to the concept of ‘murder’?” As part of the project, an interactive app will also be created, which will allow anyone to visualise and learn how meanings and usage of English and Dutch words and phrases have changed over time.
Crossing boundaries and connecting to society
These are exciting times for Fonteyn. For a researcher who is passionate about facilitating interdisciplinary research and engaging with the wider society, she will definitely be a tribute to YAL. And her new research project MacBERTh also epitomises these passions. With a team of colleagues from various universities and research fields (Linguistics, Literary Studies, Digital Humanities and Social Sciences), she is eager to play a part in crossing boundaries and showing how her research benefits the research community and society as a whole.
Researchers involved in MacBERTh
Co-applicants: Dr Jelena Prokic (Leiden University), Prof Dr Els Stronks (Utrecht University), Dr Lieke Stelling (Utrecht University), Prof Dr Nicoline van der Sijs (Radboud University, INT), Dr Gijsbert Rutten (Leiden University), Dr John Boy (Leiden University), Dr Ju-Sung Lee (Erasmus University Rotterdam), Dr Andreas van Cranenburgh (University of Groningen), and Prof Dr Gertjan van Noord (University of Groningen). Advisory partners: Prof Dr Antal van de Bosch (Meertens Institute KNAW), Dr Ernst van den Hemel (Meertens Institute KNAW), and Dr Folgert Karsdorp (Meertens Institute KNAW).