
Synthetic data for research on rare diseases
An interdisciplinary team laid the groundwork for SHARE — a synthetic health data repository for rare diseases — at the second Sandpit workshop of the Wübben Stiftung Wissenschaft in Frankfurt.
The goal is to create a standardized, privacy-compliant, and openly accessible data environment that supports the training of AI models and global health research. For rare diseases, high-quality data is often difficult to access—or simply unavailable. Synthetic data is artificially generated and can reflect the statistical properties of real data. These data are extremely important for clinical research, especially in the field of rare diseases. Another challenge lies in the global comparability of already trained AI models, since the datasets they are based on are often not accessible. SHARE aims to close this gap by providing standardized data for researchers worldwide.
Jannik Schaaf and Richard Noll from Goethe University Frankfurt led the group that addressed the technical and ethical challenges of generating and validating synthetic data. Key insights from the workshop included:
- Integrating unstructured electronic patient data is complex, especially when handwritten notes are involved.
- A narrow disease focus facilitates implementation and project validation.
- International interoperability standards are essential (OMOP, FHIR, SNOMED CT).
- Robust validation and quality assurance processes are required for synthetic data.
The results of the Sandpit workshop “Building SHARE: the Synthetic Health dAta REpository” provide a solid foundation for addressing key development challenges in building a synthetic health data repository for rare diseases. The SHARE initiative is now moving forward with the next steps toward real-world implementation.
The idea behind the funding format: Inviting researchers and relevant stakeholders to work outside their comfort zone in unconventional constellations in order to generate unorthodox ideas for projects and solutions that set out to tackle highly relevant challenges in today’s society.