Our multidisciplinary team focuses on the state-of-the art generation and in-depth evaluation of synthetic health data.
With the rise of ChatGPT and Dall-E, the concept of generative AI has gained widespread recognition. Synthetic data refer to artificially created data that are generated based on an original dataset and modelled in such a way that they replicate the characteristics, structure, and dependencies of the original data.
Medical data are sensitive and therefore access in healthcare is tightly regulated which may limit innovation and research. Since synthetic data can sever the direct relationship with the original data records which they were based on, synthetic data can be open to the broad science community and practitioners. Therefore, the use of synthetic data has the potential to advance medical research and improve patient outcomes while addressing privacy and ethical concerns related to the use of real medical data.
Furthermore, in the era of Deep Learning, the use of large amounts of data is gaining importance, while often the number of samples for a specific medical problem are rather limited. Synthetic data augmentation can possibly be used to overcome this limitation for small-sample and multi-party datasets. Synthetic data may also help to address unbalanced and biased datasets, thereby creating more robust models.
Finally, synthetic medical data can be used to create cohorts of simulated patients for educational purposes and the creation of synthetic control arms for clinical interventional trials.
Synthetic data can be generated using a broad spectrum of methods, ranging from statistical techniques to highly innovative machine learning techniques such as GANs, VAEs, normalizing flow models and diffusion models.