Gehirn verdrahtet und verkabelt
©Serr­No­vik
Sandpit
Uni­ver­si­ty Medical Center Frank­furtLife Sci­en­ces

Vital data

How can ar­ti­fi­ci­al­ly ge­ne­ra­ted data help improve re­se­arch and tre­at­ment of rare di­sea­ses?

Rare di­sea­ses affect mil­li­ons in Germany. The avail­ab­le data is often in­suf­fi­ci­ent to allow early iden­ti­fi­ca­ti­on, re­se­arch, or tre­at­ment. Experts in com­pu­ter science and me­di­ci­ne have now laid the foun­da­ti­ons for the de­ve­lop­ment of a public da­ta­ba­se at a sandpit event funded by Wübben Stif­tung Wis­sen­schaft.

Arrhyth­mo­ge­nic right ventri­cu­lar car­dio­myo­pa­thy (ARVC), an in­heri­ted disease of the heart muscle, is one of around 7,000 rare di­sea­ses that only affect one in several thousand people. In ARVC, fat and con­nec­tive tissue grows in the right ventri­cle in place of muscle cells. This leads to arrhyth­mia. Which carries the risk of sudden cardiac death.

“Pa­ti­ents with rare di­sea­ses like ARVC are often not given a correct dia­gno­sis for years. It usually takes five to seven years for their illness to be iden­ti­fied,” says Jannik Schaaf from Goethe Uni­ver­si­ty Frank­furt/Uni­ver­si­ty Hos­pi­tal Frank­furt. As a medical in­for­ma­ti­on sci­en­tist, he is com­mit­ted to im­pro­ving data avai­la­bi­li­ty so di­sea­ses that are hard to iden­ti­fy can be dia­gno­sed faster and studied. Tre­at­ments are cur­r­ent­ly only avail­ab­le for around five to ten percent of these di­sea­ses. “Usually, pa­ti­ents are just given tre­at­ment to al­le­via­te their sym­ptoms, with the aim of im­pro­ving their quality of life.”

These case numbers are ap­proa­ching those of wi­despread di­sea­ses and are placing a great strain on the health system. It is es­sen­ti­al that we press ahead with re­se­arch. And an im­portant basis for this re­se­arch is data.

Jannik Schaaf

Most rare di­sea­ses are genetic in origin and occur in child­hood, with around 20 percent ap­pearing later in life, as a result of a viral in­fec­tion, for in­stan­ce. Taking the suf­fe­rers of all these di­sea­ses tog­e­ther, they are by no means rare, with around four million such pa­ti­ents in Germany alone. “These case numbers are ap­proa­ching those of wi­despread di­sea­ses and are placing a great strain on the health system,” says Schaaf. “It is es­sen­ti­al that we press ahead with re­se­arch. And an im­portant basis for this re­se­arch is data.”

Syn­the­tic data can avoid legal pit­falls

In June 2025, 13 medical, com­pu­ter science, and eco­no­mics experts joined patient re­p­re­sen­ta­ti­ves at a Wübben Stif­tung Wis­sen­schaft sandpit work­shop to develop a digital plat­form called SHARE that will make data on rare di­sea­ses avail­ab­le to re­se­ar­chers world­wi­de. SHARE, which stands for “Syn­the­tic Health dAta Re­po­sito­ry,” uses syn­the­tic data instead of real patient data.

Syn­the­tic data are ge­ne­ra­ted from real data with the help of AI, and medical staff check and improve the output. This creates ar­ti­fi­ci­al da­ta­sets that are sta­tis­ti­cal­ly very similar to real patient data – in terms of sym­ptoms, lab test results, family history, etc. – but do not allow direct in­fe­ren­ces to be drawn about in­di­vi­du­als. Re­se­ar­chers can use the data to develop AI models to help dia­gno­se rare di­sea­ses, for example, or to develop tre­at­ments. “With syn­the­tic data you can ge­ne­ra­te ar­ti­fi­ci­al pa­ti­ents with dif­fe­rent cha­rac­te­ris­tics and test how they respond to a drug,” says Schaaf.

Syn­the­tic data are ‘en­ab­lers’ of re­se­arch and de­ve­lop­ment and can dra­ma­ti­cal­ly speed up our un­der­stan­ding of rare di­sea­ses as well as their dia­gno­sis and tre­at­ment.

Benedikt Langenberger

Ob­tai­ning data on rare di­sea­ses as a re­se­ar­cher is often a la­bo­rious process. Alt­hough there are now data in­te­gra­ti­on centers at all uni­ver­si­ty hos­pi­tals, pro­vi­sio­ning the data takes a long time because of data pro­tec­tion re­gu­la­ti­ons, and there is often a lack of stan­dar­di­za­ti­on. By con­trast, the ar­ti­fi­ci­al data on which SHARE is based will be pu­blicly ac­ces­si­ble, an­ony­mi­zed, and stan­dar­di­zed for ease of use.

Syn­the­tic data are par­ti­cu­lar­ly im­portant for ge­ne­ra­ting sta­tis­ti­cal­ly re­le­vant cohorts for rare di­sea­ses with low case numbers. Patient data from dif­fe­rent hos­pi­tals and EU coun­tries can be added without the need to spend con­si­dera­ble amounts of time on re­quests and data pro­ces­sing for every use case. “Syn­the­tic data are ‘en­ab­lers’ of re­se­arch and de­ve­lop­ment and can dra­ma­ti­cal­ly speed up our un­der­stan­ding of rare di­sea­ses as well as their dia­gno­sis and tre­at­ment,” says Be­ne­dikt Lan­gen­ber­ger of the Digital Health Cluster at the Hasso Platt­ner In­sti­tu­te in Potsdam, who is helping to develop SHARE.

From ideas to con­cre­te so­lu­ti­ons in just three days

Wübben Stif­tung Wis­sen­schaft’s in­ter­di­sci­pli­na­ry sandpit format offered a chance to think through the SHARE idea from many dif­fe­rent angles and produce a design. “Where else do you get the op­por­tu­ni­ty to con­cen­tra­te on a topic for three days without dis­trac­tions and with a wide range of dif­fe­rent experts from nu­me­rous coun­tries?” Jannik Schaaf asks.

Day one kicked off with two keynote spee­ches on the topic. Health eco­no­mist and medical in­for­ma­ti­on sci­en­tist Andreas Gold­schmidt brought the par­ti­ci­pants up to speed with the current si­tua­ti­on. Ruth Biller from self-help as­so­cia­ti­on ARVC-Selbst­hil­fe, whose daugh­ter died sud­den­ly from ARVC at the age of 14, con­tri­bu­t­ed the per­spec­tive of pa­ti­ents and their fa­mi­lies. “I am cam­pai­gning so that other fa­mi­lies can be spared our fate,” she says. “Good data are in­credi­b­ly im­portant because without them and without patient re­gis­ters for rare di­sea­ses, you can’t have evi­dence-based me­di­ci­ne.”

Fol­lo­wing the keynote spee­ches, the par­ti­ci­pants took part in a brain­stor­ming session to iden­ti­fy the main chal­len­ges in­vol­ved in crea­ting a func­tio­n­ing SHARE plat­form. They then ranked these chal­len­ges in order of prio­ri­ty during a World Café session in small groups at se­pa­ra­te tables. At the end of this process, four central chal­len­ges emerged: de­fi­ning the aims of SHARE, en­su­ring data quality, en­su­ring a user focus, and stan­dar­di­zing the data.

On the second day of the work­shop, the par­ti­ci­pants used in­ter­ac­tive design thin­king methods to for­mu­la­te ideas for so­lu­ti­ons, flesh them out, and rank them in order of prio­ri­ty. The result was a con­cre­te roadmap for the next steps. As well as a tech­ni­cal paper, which has since been pu­blished, the main aim is to develop a pro­to­ty­pe of the SHARE data re­po­sito­ry. The idea is to involve a broad network of experts from re­se­arch, cli­ni­cal prac­tice, and ethics, and to consult with patient re­p­re­sen­ta­ti­ves. In­iti­al­ly, SHARE will only contain data re­la­ting to a few rare di­sea­ses, in­clu­ding ARVC.

In ever­y­day life, you wouldn’t dare tackle such a complex topic because it’s simply too in­vol­ved but, thanks to the sandpit, we managed to lay a solid foun­da­ti­on.

Jannik Schaaf

Jannik Schaaf and his team face a mammoth task. “In ever­y­day life, you wouldn’t dare tackle such a complex topic because it’s simply too in­vol­ved but, thanks to the sandpit, we managed to lay a solid foun­da­ti­on,” says Schaaf, who, as a medical in­for­ma­ti­on sci­en­tist, has already worked on the de­ve­lop­ment of an AI model that helps family phy­si­ci­ans reach dia­gno­ses in cases with non-spe­ci­fic sym­ptoms, as part of a re­se­arch project funded by Germany’s Federal Mi­nis­try of Health. A pro­to­ty­pe has already been pro­du­ced.

A core group from the sandpit is now ap­p­ly­ing for an EU grant to develop a pro­to­ty­pe of the SHARE re­po­sito­ry. Once all the tech­ni­cal and legal issues have been re­sol­ved, they could, for example, set up a company or an al­ter­na­ti­ve or­ga­ni­za­ti­on to main­tain the plat­form and obtain va­lu­able basic data in the long term. “We need to mo­ti­va­te re­se­ar­chers to make their data avail­ab­le so we can use that data to ge­ne­ra­te syn­the­tic da­ta­sets,” says Schaaf. He hopes a large com­mu­ni­ty of people will take shape who re­co­gni­ze the value of the in­itia­ti­ve and will upload data of their own accord. “If we succeed in making syn­the­tic data com­mon­place, there will be no­ti­ce­ab­le be­ne­fits for pa­ti­ents.”

Jannik Schaaf
©IMI Uni­me­di­zin Frank­furt

Jannik Schaaf is Pro­fes­sor of Digital Health, fo­cu­sing on chronic and rare di­sea­ses, and Deputy Di­rec­tor of the In­sti­tu­te for Medical In­for­ma­tics at Goethe Uni­ver­si­ty Frank­furt/Uni­ver­si­ty Hos­pi­tal Frank­furt. He heads the re­se­arch group on Digital Health & Ar­ti­fi­ci­al In­tel­li­gence.