Each year, around 14,000 persons in Germany are diagnosed with leukemia. The success of a therapy depends greatly on the point in time and the accuracy of the diagnosis. While the diagnostics for leukemias and lymphomas based on the diagnosis guidelines of the WHO (World Health Organization) were still highly morphology-dependent a few years ago, an increasing number of genetic and molecular genetic markers are being used to identify forms of leukemia. For example, the current issue of WHO 78 includes various genetic changes (mutations, gene fusions, and overexpression), which require special diagnosis, or at least describe the clonality of the illness (Swerdlow et al., Blood, 2016; Arber et al., Blood, 2016).
Leukemia occurs in various forms, and while certain types of leukemia are highly uniform in their manifestation and molecular profile, there are other sub-entities with a significantly wider spectrum. It is precisely this diversity that not only makes diagnosis difficult, but also choosing the best form of therapy. However, the knowledge of illness-causing changes in leukemia cells is also increasingly enabling them to be tackled in a targeted fashion via therapies. One prominent example here are tyrosine kinase inhibitors (TKI), which are being used highly successfully to treat chronic myeloid leukemia (CML), where they specifically attack the illness-causing target BCR-ABL1. This type of targeted therapy is a prime example of customized medicine (personalized/precision medicine). For this purpose, it is important to possess as much knowledge as possible about the molecular processes that take place.
This was the reason why the 5,000 genome project was launched at MLL. In order to gain as much knowledge as possible, we have begun to examine a diverse range of leukemia and lymphoma sub-groups in our project. With our Biobank, we also have the ability to include rare forms of leukemia and lymphoma, thereby allowing a very wide spectrum of various entities to be covered. We take advantage of the options offered by high-throughput sequencing and examine both the genome (WGS, Whole Genome Sequencing) as well as the transcriptome (RNA-Seq) of a patient in order to obtain as much genetic information as possible. Via the combination of WGS and RNA-Seq, we validate not just the variants found on both levels, but also pursue the question of whether the mutations found are transcribed and expressed and/or whether the translocations found also lead to a fusion transcript. Furthermore, we attempt to correlate the genotype with the expression profile in order to find out more about genetic changes and their impact on the cell. For example, which changes in the transcriptome do patients with mutations in one of the splicing genes exhibit?
Both with DNA as well as RNA profiles, classifiers, which allow a diagnosis to be predicted, can be trained. It is possible that this can be improved even further via a combination of both profiles. The analysis of expression profiles allows for the identification of changed cellular pathways. Based on this information, networks can be created, which can make conclusions regarding the function of the cell. In addition, the potential effects and success of possible therapeutic interventions can also be modeled in silico with these networks.
In addition to the newly obtained genetic information, we at MLL have the ability to generate a good characterization of patients using data from routine diagnostics by utilizing morphology, immunophenotyping, chromosome analysis, and mutation analyses and examining them in relation to each other (orthogonal comparison). Furthermore, we have follow-up information and clinical data, which provide us with insight into the progression of the illness in each individual patient. By combining all the data, significantly more comprehensive risk stratification and prognoses can be calculated.
We process all this genetic information obtained – the genome data and the transcriptome data – with the latest analysis methods. Artificial intelligence and sufficiently large data processing capacities (cloud computing) have made it at all possible for us today to perform these comprehensive analyses.
Because we are unable to cover all aspects of the various forms of leukemia equally well with our own research projects, and the data generated contains a wide range of different types of information, we make it possible for research groups located worldwide to work collaboratively on the data from the 5,000 genome project together with us here at MLL. These research groups send individual scientists in order to discuss the potential of various research ideas and the utility of our data for these projects, and then subsequently process them together. In order to ensure the security of our data in accordance with the new EU General Data Protection Regulation (GDPR), it does not leave our secure storage location. The scientists only work on the data from MLL (Data management/storage) for the purposes of examining and answering their various research questions. Thanks to the controlled accessibility of the data, we are able to make the 5,000 sequenced cases available to the hematological research community and learn as much as possible about leukemia and its mechanisms.
Swerdlow et al., Blood. 2016 May 19;127(20):2375–90.
Arber et al., Blood. 2016 May 19;127(20):2391–405.