Research

Sharing knowledge.
Advancing science.

Innovation
Diagnostics of the future
5,000 genome project

Publications
Biobank
Cooperations

We are scientists and doctors. We engage in a large number of scientific projects and international collaborations that contribute to the advancement of knowledge in the diagnosis and prognosis of leukemias and lymphomas. Moreover, we promptly develop and implement the latest technical innovations, including test procedures, devices and individual tests. In doing so, we are significant drivers of progress in laboratory diagnostics and routines.

Research Report 2023

Innovation - shaping the future

Implementation on the cutting edge
Steady innovation, the implementation of the latest scientific insight, devices and methods – as well as their scalability – are key to rapid advancement in diagnostics. Our clear scientific focus and a broad range of international collaborations help us drive this outstanding innovation at the necessary speed and in the required quality.
Automation - Making processes reproducible
Modern diagnostics are predicated on standard operating procedures (SOP) – reproducible and automated process steps – and state-of-the-art technologies, as well as thoroughly trained and highly educated employees. MLL has automated and standardized its workflows to the greatest possible extent, and devices are controlled by a central software platform. Samples are always fitted with barcodes for processing. Controlling laboratory processes and ensuring the integration of external software solutions, our proprietary laboratory information and management system (LIMS) optimizes all of our procedures.
Exploiting digitization effectively
Our team and the equipment it uses work with comprehensive software solutions and maximum automation, supported wherever purposeful by artificial intelligence to ensure precisely the right outcome to match your individual requirements. Here, we draw on fully automatic, software-driven and barcode-controlled processing and treatment of blood and bone marrow samples by state-of-the-art robots and measurement systems. Controlled by a central software structure, the findings are shared directly between the devices and staff during operation and sample analysis.

Diagnostics of the future

Today, we are engaged with the topics of new methodological developments, automation, the Internet of Things (IoT), big data, cloud computing, data security, and artificial intelligence. These aspects have not stood still even in the field of medicine – on the contrary, they allow us to forge new paths and to further improve diagnostics for our patients. What motivates us is putting these new technologies to the test and gradually making them available to each of our patients in routine diagnostics.

Artificial intelligence in medicine

Modern analytical techniques provide physicians with an ever-increasing amount of information, the analysis of which is becoming increasingly difficult without the help of computers. This flood of data has led to intense change in the field of data analysis over the past two decades. While the original way of programming was to teach the computer to solve problems by means of well-defined rules, nowadays the models developed are capable of learning on their own and are thus getting closer and closer to artificial intelligence.

Instead of giving the computer the rules, sample data is collected (e.g. images, texts, audio) from which the algorithm (=computer program) independently selects and extracts the relevant information and creates its own rules. Such algorithms are often based on the functioning of the human brain and are therefore called "neural networks".

Artificial Intelligence in Cytomorphology

In medicine, there are several areas in which artificial intelligence is or can be applied. Particularly advanced is the use of neural networks in image recognition. Following the WHO guidelines, the diagnosis of leukemia is still strongly influenced by cytomorphology. Cytomorphology focuses on the assessment of blood and bone marrow smears to describe and differentiate malignant and healthy cells. Here, the morphologist looks for abnormal patterns in appearance and number of different cell types, which are then classified using established guidelines.

However, the quality of the result depends heavily on the experience of the morphologist, and even with experienced and trained hematopathologists, the reproducibility is only 75 to 90%. In addition, manual evaluation can be quite tedious and time-consuming, limiting the number of cells that can be processed per sample and sample throughput in general. However, advances in digital microscopic imaging and various machine learning techniques have made automated image processing and classification possible. To standardize the process of differentiating peripheral blood cells, we established a workflow to automatically acquire and digitize microscopic images of blood smears and, in collaboration with AWS, trained an machine learning (ML) model that identifies 21 predefined classes of different cell types. The blood smears are initially scanned at 10x magnification to define the relevant area and then images of individual cells are generated by a high resolution 40x scan. These images are then fed to the ML model, which returns the class (= cell type) with the highest probability for each image. In a first interim analysis, the comparison of the results for cell differentiation between the experts and the ML model showed a high agreement, so we are confident that the method can soon be used in routine to support hematologists. In another project, we are working in collaboration with the Institute for Artificial Intelligence in Healthcare (Helmholtz Zentrum, Munich) on automated analysis of bone marrow smears, and the initial results are extremely promising.

Here, as with all ML-based algorithms, the more data available, the more accurate the prediction. The accuracy of a morphologist also increases with his experience - the more time he has spent in front of a microscope and the more extensive the range of smears viewed, the more accurate and rapid his assessment becomes.

Artificial intelligence in Cytogenetics

Similar methods can be applied in all areas which are primarily based on the analysis of image files. The greatest success to date in the use of artificial intelligence at MLL has been achieved in cytogenetics. Here, an ML-based system has already been used since November 2019 to automate various steps in chromosome analysis. Chromosome analysis is about obtaining patient-specific information by classifying chromosomes and detecting any chromosomal aberrations. Here, the chromosomes are classified based on size and banding pattern and displayed in a karyogram. However, the generation of such a karyogram is a very time-consuming and complex process. For example, the chromosomes in the recorded metaphases must first be carefully separated from each other before they can be assigned to their place in the karyogram. The automatic separation of individual chromosomes is not a trivial task, as overlaps also occur from time to time. However, since February 2021, an algorithm for automatic chromosome separation has been used in cytogenetics at MLL, which only requires conditional manual support/correction. Already since November 2019, the use of a trained and optimized neural network allows the automatic classification of individual chromosomes and the generation of karyograms for patients without cytogenetic alterations. In the summer of 2021, this algorithm was further optimized so that all recorded metaphases per patient are now analyzed simultaneously. This has further increased the number of cases reported within 7 days. Further improvements have led to the fact that numerical aberrations (gain or loss of whole chromosomes) are now also reliably classified. Structural aberrations (e.g. translocations, inversions, etc) are more challenging, but during automatic classification chromosomes that are clearly different from normal chromosomes are sorted out for manual classification, saving time even for aberrant karyotypes.

Artificial intelligence in Immunophenotyping

ML-based models are also used at MLL in immunophenotyping, where malignant cells are distinguished from healthy cells based on their antigen expression pattern using flow cytometry. The individual cell types are characterized by the expression of specific antigen combinations. Diagnosis of the various hematologic neoplasms is made by interpreting the recorded two-dimensional graphs of flow cytometry. Each analysis involves measuring thousands of cells, which greatly increases the amount of data. In collaboration with AWS, several ML-based models have been trained using the raw flow cytometry data, allowing the classification of six different subtypes of hematologic neoplasms (AML, MDS, ALL, T-NHL, B-NHL, multiple myeloma/MGUS). The models are currently being tested and evaluated in routine settings. We anticipate that the trained models will replace up to 75% of routine data analysis in Immunophenotyping in the future. Our next steps here focus on classifying additional entities, applying transfer learning to achieve universal applicability, and extending the models to also detect measurable residual disease patterns.

Artificial intelligence in Molecular Genetics

In molecular genetics, the increase in sequencing performed means that data volumes are growing and manual interpretation of the data is becoming increasingly difficult. While previously limited to the study of individual genes, high-throughput sequencing methods allow the simultaneous study of the entire genome (WGS) and/or transcriptome (RNA-Seq). The goal of these methods is not only to analyze gene-specific changes and/or overexpression in a high-throughput manner, but rather to uncover underlying regulatory mechanisms and identify recurring genetic patterns. For example, are there certain combinations of genetic alterations that characterize the clinical picture of a particular type of leukemia? Several molecular markers are already known to distinguish the different subtypes of leukemia, but knowledge is still limited. The immense amounts of data make manual sifting through genomic data impossible, and since you don't know what you're looking for, you can't tell a computer how to find it. For this reason, machine learning methods are used, which learn independently from the data and extract relevant information. In principle, this approach pursues two goals: on the one hand, one wants an automatic classification of unknown samples and, on the other hand, one wants to gain further insights into the fundamentals of the various diseases. In order for this to work, the algorithm, which is often a neural network, is trained on genomic data of the various subtypes and its performance is evaluated. This is a highly iterative process to find the optimal setting of parameters that guarantee the best performance and thus the most accurate classification. Even though the genome of different people is 99.9% identical, they differ in a large number of polymorphisms. To prevent these individual differences from negatively affecting the classifier's performance, large amounts of training data are needed to cover the diversity that occurs and guarantee accurate estimates. Since the algorithm searches itself for the features for the individual subtypes, it stands to reason that this will also allow new correlations and associations to be found that may help to better understand the molecular basis. This, together with the already known features of routine diagnostics, should make improved diagnosis and prognosis assessment possible.

Each human cell contains the complete diploid chromosome set in its cell nucleus (46, XY or 46, XX), which contains the entire genetic information of each person. The DNA that carries this information consists of 3 billion base pairs, which code for approx. 23,000 genes. Because each human cell contains the same DNA regardless of its function, DNA represents the most fundamental building block of the cell.

Whole genome sequencing (WGS) aims to read a person’s complete genetic information, detect polymorphisms, and identify somatic mutations, which play an important role in cancer diagnostics. Furthermore, WGS can also be used to detect additions to and the loss of chromosomal material (copy number variations, CNV) and the translocation of chromosomal material (structural variation, SV).

In addition to the search for disease-associated mutations and changes, attempts are increasingly also being made to obtain predictive information from genome-wide data, e.g. the response to individual therapies (genome-wide association studies, GWAS). The more information there is about the tumor disease and the genetic background of the patient, the more efficient a targeted therapy can be in the future.

Preparing the DNA – Library preparation

There are two fundamentally different approaches for library preparation for WGS: PCR-free and DNA amplification. For the PCR-free method, a relatively large amount of input DNA is required (1µg), but it avoids PCR artifacts. Generally, sufficient DNA for a PCR-free library prep can be obtained from bone marrow and peripheral blood. If the raw material exists in the form of fixed tissue (formalin-fixed, paraffin-embedded; FFPE) or as cell-free DNA from liquid biopsy samples, a pre-amplification method must be chosen in order to obtain sufficient material for the sequencing.

At MLL, library prep is performed in a fully automated procedure by pipette robots (Hamilton NGS Star). This ensures standardized and homogeneous library prep.

Sequencing

At MLL, sequencing is performed using the Illumina sequencing by synthesis method on the latest generation of sequencing devices, the NovaSeq 6000. While a coverage (depth) of 30× is often sufficient in human genetics, the detection of somatic mutations, and hence small clones as well, is of great importance in tumor biology. Therefore, sequencing is usually performed with a coverage of > 60–90×.

Data analysis

Subsequently, data analysis is conducted. At MLL, the data from the sequencing devices is transferred directly to the Amazon Web Services (AWS) cloud in Frankfurt and analyzed in Illumina’s BaseSpace Sequence Hub. Data protection requirements are complied with in accordance with the EU General Data Protection Regulation (GDPR) and ensured via ISO 27001 certification (Cloud Computing). Firstly, the alignment of the reads (iSAAC, Illumina) to the reference genome takes place, i.e. the mapping of the fragments to their position in the genome. Subsequently, variant calling is performed (Strelka, Illumina), i.e. determining the changes for a patient as compared to a reference sequence (GRCh37, hg19). Usually, a “tumor-normal comparison” is performed here: By sequencing the tumor and e.g. buccal swap or peripheral blood as a normal control, the genome of a person can be compared for both materials, thereby allowing the differences in the tumor to be identified. For the further removal of irrelevant changes in-house analysis pipelines are available at MLL.

Infographic WGS

Unlike whole genome sequencing (WGS), whole exome sequencing (WES) focuses on the protein-coding region of the genome, which is called the exome. A person’s exome accounts for just 1% (approx.) of the genome, which is why only approx. 30 million base pairs are read during WES. However, the majority of disease-associated mutations and changes can be found in the exome, as the sequence changes occurring here have a direct effect on the structure and hence functionality of proteins, and can therefore modify the function of the cell.

Hence, although WES also allows gene mutations to be detected, it only provides an incomplete view of a patient’s genome. This means that procedures such as GWAS (genome-wide association studies), which also detect changes in non-coding regions, can only be performed to a limited extent. Chromosomal changes (structural variations, SV; copy number variations, CNV) can only be detected if they affect coding regions.

Preparing the DNA – Library preparation

In addition to the fragmentation of the DNA, end repair, and adapter ligation, which contain unique indexes such that each individual read after the sequencing can be uniquely identified as belonging to a patient, library preparation for WES also involves the enrichment of the coding sequences. Using probes, which exhibit a sequence complementary to the coding regions of the genome, the exome sequences can be specifically selected (capturing) and enriched. The xGen Exome Research Panel (IDT, Integrated DNA Technologies) uses 429,826 probes to enrich 39 Mb of genomic sequences (19,396 genes) and prepare them for sequencing.

At MLL, library prep is performed in a fully automated procedure by pipette robots (Hamilton NGS Star). This ensures standardized and homogeneous library prep.

Sequencing

At MLL, sequencing is performed using the Illumina sequencing by synthesis method on the latest generation of sequencing devices, the NovaSeq 6000. Generally, a coverage (depth) of > 100× is striven for during WES, as the detection of somatic mutations, and hence small clones as well, is of great importance in tumor biology.

Data analysis

Subsequently, data analysis is conducted. At MLL, the data from the sequencing devices is transferred directly to the Amazon Web Services (AWS) cloud in Frankfurt and analyzed in Illumina’s BaseSpace Sequence Hub. Data protection requirements are complied with in accordance with the EU General Data Protection Regulation (GDPR) and ensured via ISO 27001 certification (Cloud Computing). Firstly, the alignment of the reads (iSAAC, Illumina) to the reference genome (GRCh37, hg19) takes place, i.e. the mapping of the fragments to their position in the genome. Subsequently, variant calling is performed, i.e. determining the changes for a patient as compared to a reference sequence. For the further filtering of relevant changes, in-house analysis pipelines are available at MLL.

Each cell in the human body has an identical copy of the genome (DNA) – the full set of genetic material. However, the transcriptomes of the cells differ. RNA sequencing (RNA-Seq) analyzes the transcriptome; i.e. it is a quantitative determination of the transcribed (from DNA to RNA) genes present in the cell. The expression of the transcriptome provides the basis for the identity of a cell and the associated functionality. In the case of an illness such as cancer, abnormal gene regulation occurs, which significantly modifies the transcriptome of the affected cells and influences the proportion of the genes transcribed.

Differentiated cells possess a specific repertoire of biological functions. For example, white blood cells play an important role in the immune system, red blood cells in the transportation of oxygen to the individual organs, and blood platelets in clotting. A particular set of genes is necessary for each of these functions, as well as for regulating the lifetime of a cell. Gene expression is controlled strictly via various mechanisms. In the case of an illness such as cancer, abnormal gene regulation occurs, which significantly modifies the transcriptome of the affected cells and influences the proportion of the genes transcribed. These changes can be detected and quantified using RNA-Seq, for example by comparing the transcriptome of the tumor cells with the profile of healthy cells.

In addition to changes in gene expression, RNA-Seq also allows fusion genes to be detected, which are the result of structural changes (translocations of chromosomal material). A person’s transcriptome contains not only protein-coding transcripts, but also transcripts that do not lead to the formation of a protein. These transcripts can be subdivided into two groups based on their length: short RNAs (microRNA, snoRNA, snaRNA, etc.) with a length of 20–24 bases and the long non-protein-coding RNAs (long non-coding RNAs, lncRNAs) with a length of over 200 bases. These transcripts are involved in the regulation of gene expression, making them a good starting point for interventions and therapies.

Preparing the RNA – Library preparation

As with the analysis of DNA (WGS, WES), library preparation is conducted prior to the sequencing of the transcriptome. This process includes the fragmentation of the RNA, the removal of ribosomal RNA or the enrichment of messenger RNA (mRNA), the synthesis of cDNA from the RNA, the ligation of uniquely identifiable indices that make it possible to tell one sample apart from another, and a subsequent enrichment of the material via PCR. At MLL, library prep is performed in a fully automated procedure by pipette robots (Hamilton NGS Star). This ensures standardized and homogeneous library prep.

Sequencing

The library prepared in this fashion is then input into the sequencing devices and read out using the sequencing by synthesis method. At MLL, the device used is the NovaSeq 6000, the latest generation of sequencing devices from Illumina. In order to achieve sufficient accuracy during the transcriptome analysis, the target is 50 million reads (sequenced fragments) per probe.

Data analysis

Subsequently, data analysis is conducted. At MLL, the data from the sequencing devices is transferred directly to the Amazon Web Services (AWS) cloud in Frankfurt and analyzed in Illumina’s BaseSpace Sequence Hub. Data protection requirements are complied with in accordance with the EU General Data Protection Regulation (GDPR) and ensured via ISO 27001 (Cloud Computing). Firstly, the alignment of the reads (STAR, Illumina) to the reference genome (GRCh37, hg19) takes place, i.e. the mapping of the fragments to their position in the genome. What follows is the determination of the counts, i.e. the number of reads per gene, which are then normalized in an internal MLL pipeline. The normalized counts constitute the starting point for all further analyses. Three fusion callers (Manta, STAR-Fusion, Arriba) are used to detect fusion transcripts and since the number of false positives is comparatively high, only fusion transcripts detected by at least two of these callers are considered for evaluation.

Apart from the DNA in cells, cell-free DNA can also be obtained from bodily fluids. In most cases, this refers to freely circulating DNA from the blood (cfDNA, cell-free DNA). It is assumed that this DNA is released from apoptotic cells. Tumors are characterized by high rates of proliferation and apoptosis. During the process of apoptosis, a cell goes through programmed cell death, which results in the cell breaking apart and DNA being released into the surrounding tissue. This type of diagnostics is called a “liquid biopsy.” It is a non-invasive method that is preferred for monitoring the progress of previously diagnosed instances of cancer and for assessing the response to a therapy without having to perform time-consuming tissue biopsies in the case of solid tumors.

Because the concentration of this cfDNA is extremely low, it must first be replicated using special amplification methods before it can be examined for changes (mutations). This allows tests to be performed on whether cells from the residual tumor are still present in the body, which would increase the risk of recurrence. In addition, intensive work is being done on the development of tests that aim to enable early cancer detection using cell-free DNA from blood. Generally, in addition to cfDNA, a second type of DNA can be obtained using a liquid biopsy: cell-bound DNA from freely circulating tumor cells (CTCs). These indicate a possible metastasis of the primary tumor. The value of the liquid biopsy with the detection of cfDNA in patients with lymphomas is currently being evaluated.

Preparing the DNA – Extraction and library preparation

For extracting cfDNA, 10 ml of blood is drawn from the patient in special blood vials. In these vials, the blood is anticoagulated, stabilized, transported, and can be stored for up to 7 days. In the vials, the hemolysis and apoptosis of the blood cells is inhibited, such that no cellular DNA from decaying blood cells enters the plasma. The cfDNA can now be selectively isolated from the plasma. Special extraction kits allow the cfDNA to be isolated from large volumes (approx. 10 ml of plasma) and eluted in a small volume (20µl) in order to concentrate the cfDNA (which occurs in very low concentrations) in a small volume.

Subsequently, the cfDNA thus obtained can be analyzed using PCR or next-generation sequencing and examined for markers that characterize tumors. For the library preparation preceding the sequencing, it should be emphasized that cfDNA is frequently very highly fragmented (~180 bp long) and present in extremely low concentrations, such that pre-amplification library preps in which the quantity of DNA is first replicated should be used.

Apart from allowing for the storage of large quantities of data, cloud computing also enables the data to be processed and analyzed rapidly, as the required calculations can be performed in a highly parallel fashion. Thanks to cloud computing, we have computing capacity available for research projects that allows for rapid processing of the data, something that could only be realized in-house at great cost.

Cloud computing has made it possible for us to process the WGS data from the 5,000 genome project directly in the cloud via Illumina’s BaseSpace Sequence Hub. Furthermore, we can use our private domain in the cloud to upload our proprietary software and use it to analyze the data without it needing to be transferred. This means that the data is available directly to us for all analyses and scientific queries.

Where there is data analysis, data management is also necessary, as WGS in particular produces a great amount of data (~130 GB per patient for 90× coverage). For this, a custom infrastructure needs to be available, which allows not only for the analysis, but also the storage of the data. In the past, there was great skepticism regarding the cloud for data processing, and even more so as a data storage solution. However, the increasingly staggering amounts of data being produced today make it ever clearer that not only the hardware, but also the maintenance of IT infrastructure comes at a high cost. Hence, it is usually easier and more economical for specialized cloud providers to remain up to date both where safety and hardware are concerned, as well as offer the highest security standards. MLL’s WGS, WES and RNA-Seq data are located completely anonymized in a private AWS instance of Amazon Cloud in Frankfurt (AWS, Amazon Web Services), to which only special employees at MLL have access. The data stored there is exclusively sequence data that has an arbitrary MLL_Identifier. No personal data whatsoever is stored in the AWS instance, such as clinical parameters or personal information.

The data security measures comply with the highest standards of the new EU General Data Protection Regulation (GDPR), which has also been verified by external auditors in their reports, including ISO 27001, ISO 27017 and ISO 27018. Furthermore, AWS has also been awarded the C5 attestation of the Federal Office of Information Security (BSI).

5,000 genome project

From phenotype to genotype. We set a goal to sequence 5,000 genomes and transcriptomes from as many different leukemia and lymphoma entities as possible in one year. We have learned in this year to sequence genomes and transcriptomes at the highest quality level, with high throughput, in the shortest possible time in a clinical setting. We are now applying this experience and expertise to analyze the data in new ways and use artificial intelligence to reproduce common diagnostics and develop a genetic classification of leukemias and lymphomas.

Learn more

Publications

Besides routine diagnostics with rapid turn-around times, MLL focuses on applied research for leukemia diagnostics. We believe that our central task is to encourage discussion of our knowledge, data and findings and to place them at the disposal of the scientific community in talks, training and publications. We have contributed to a broad variety of collaborations on scientific issues within hematology for more than 20 years, especially through our international networks with other diagnostic laboratories and research groups around the world.

Learn more

Biobank

Each year, we receive more than 75,000 samples, which are examined by our routine diagnostics. Upon receipt of the sample, a decision is made based on the submitting physician’s requirements as to which methods (cytomorphology, cytogenetics/FISH, immumophenotyping, molecular genetics) the sample is to be subjected to for diagnostics. Furthermore, particularly in the case of molecular genetic analyses, the submitting party also has the option of having material preserved so that specific analyses can be ordered as needed at a later point in time. For 40% percent of submissions, the patient also submits a declaration of consent permitting us to preserve the material and utilize it for research purposes in the future where necessary. Hence, since MLL’s inception, we have been able to build a comprehensive biobank, which currently makes it possible to address a large number of questions in a targeted fashion.

Learn more

Cooperations

Global collaboration - We research and publish hand-in-hand with our national and international cooperation
partners from science and business; discoveries and insights are made
public in journals, talks, press releases or technical notes.

We enter into collaborations with renowned researchers who are world leaders in their scientific field. By combining their expertise and research activities with the knowledge we have gained and deepening it through joint scientific activities, we are expanding our understanding of leukemias and lymphomas. But innovation and technical advancement is also not possible without industrial partners with whom we jointly advance diagnostics.

The diagnostic concepts emerging in our company collaborations aim to continuously improve the quality of test results, develop new diagnostic parameters and establish valid procedures with improved efficiency of laboratory processes. The industrial cooperation partners include:

You may also be interested in

Big data

We support scientists, researchers and physicians with browser-based tools for interpreting sequencing data for hematology diagnostics.

Learn more

MLL Magazine

Here you will find news about our services, diseases, research results, events and other company and professional information.

Learn more

MLLSEQ

We are the next generation: sequencing services. We are experts in sequencing and want to share this knowledge and experience with you by our sequencing service MLLSEQ.

Learn more

Your contact person

»Our research is important to keep on the leading edge.«

Dr. rer. nat. Manja Meggendorfer, MBA

Biologist, Dipl.
Head of Molecular genetics
Head of Research and Development

T: +49 89 99017-355
manja.meggendorfer@mll.com

Sharing knowledge. Advancing science.

Innovation - shaping the future

Diagnostics of the future

5,000 genome project

Publications

Biobank

Cooperations

You may also be interested in

Your contact person

Sharing knowledge.
Advancing science.