Research programme
Phaeton
Improving pandemic preparedness through a collaborative privacy-by-design data modelling environment
- Duration
- 2024 - 2025
- Contact
- Marco Spruit
- Funding
- ZonMW Pandemic Preparedness SA 2023 programme
- Partners
TNO, LUMC/PHEG, LIACS
The Phaeton project creates a ready to use modelling infrastructure that allows data analysis and modeling experts from around the world to jointly create the best performing models rapidly to provide quick, transparent and accurate support to decision makers during a pandemic. We solve data access hurdles through a unique privacy by design approach which was studied and tested before, that insulates sensitive data from experts, yet allows efficient model development.
In addition, we further speed up model development by providing an open source free to use infrastructure with up to date data and models. It includes a collaboration hub for the modelling community and leaderboard approach with audit trail to submit and validate models to assure policy makers have instant access to the best models and forecasts, experts can learn from and update each other to prevent double work.
Background & Relevance
Instant access to the latest high-quality (epidemiological) models are vital to support decision making by policy makers during pandemics, offering insights into disease transmission, outcome predictions, and intervention effectiveness. They guide informed decisions such as public health responses like lockdowns and on vaccination programs. Making efficient use of the global knowledge and experience on data analysis and modelling (DAM) of the community, will improve the guidance of important decisions during a pandemic, as experienced during COVID-19 pandemic.
Problem Definition & Objectives
Access to relevant citizen data and health outcomes is regulated by laws such as the GDPR because health data are amongst the most sensitive and therefore its reuse should be as secure as possible. Providing DAM experts with synthetic data that realistically represents the spread and structure of the real data (without being over-realistic to compromise privacy), would allow model training without ever needing access to sensitive real data. Moreover, during modelling, a substantial amount of time is wasted in data preparation and model validation. Reliable performance comparison is hampered by model availability and implementation differences between models. Model development could be boosted through scientific crowdsourcing by creating an attractive modelling hub where experts have a ready-to-use environment and are challenged to submit their best models in a standardized and audited leaderboard fashion. In this way decision makers can more easily select the best model at that time for the question at hand.
Plan of Approach
The infrastructure we propose consists of three main parts, a Model Development & Collaboration hub (MDC-hub), a Trusted Model Validation Environment (TMVE) and a Model Submission and Leaderboard API (MSLAB-API, API meaning application programming interface). The MDC-hub is a portable computing environment which global DAM experts can use to collaborate and develop their models and perform analyses. This ready-to-use flexible reproducible virtual environment contains up-to-date synthetic data, models, standardized performance measures and a MSLAB-API calling service for dependency-packed model submission.
The TMVE is a restricted environment under supervision, without internet access, that can safely harbor real data and that allows verified models to run safely and generate standardized performance reports. The MSLAB-API can be called from the MDC-hub with a new model, invoking an AI-based malicious code checker and alerting the TMVE operator when needed. Models considered safe, can be manually transferred and run safely in the TMVE (fully audited), resulting in a privacy-risk-sanitized standardized model performance report. This report is transferred to the MSLAB-API and sent back to the expert. The audited model run is added to an online fully transparent open leaderboard. Participation in leaderboards can be voluntary or through incentives such as bounties and sponsorships provided by the government to speed up development.
Objects of this project will be delivered as open-source software kept in public code repositories (i.e. Codeberg) and publicly available websites.
Impact & Innovation
This project has the ambition to bring together and foster collaboration between DAM experts. It strongly facilitates transfer of knowledge from DAM experts to regulatory institutes such as RIVM as well as to policy makers (and transparently to the public) to optimally be prepared for future pandemics through faster, more effective, safer and more accurate policies.
This project innovates (AI-supported) privacy-by-design modelling, scientific crowd-sourcing to accelerate model development and lowering the boundaries and increasing transparency between the scientific (modelling) community, regulators and public transparency.