Large-scale sequencing experience from the MLL 5K project

18. August 2019

Estimated reading time: 5 minutes

The project was started in 2017 with the aim to sequence the genome and transcriptome of 5,000 patients with haematological malignancies, in order to obtain a more in-depth knowledge of their molecular profiles to better understand the underlying complexity, to refine diagnosis, and also to take another step in the direction of personalized medicine.

The MLL Biobank covers more than 40 entities of haematological malignancies, including rare forms of leukemia and lymphoma. The biobank doesn’t only give us the opportunity to choose from a broad range of leukemia and lymphoma sub-groups but it also provides sufficient DNA material (1µg) for a PCR-free library preparation (TruSeq PCR-free, Illumina). The advantage of PCR-free library preparations is to avoid PCR artifacts that potentially affect downstream analyses but the application of such kits is usually limited by low amounts of input material.

Sequencing statistics

Within one year more than 3,500 genomes with 90x coverage (150bp paired-end) and 3,500 transcriptomes (100bp paired-end) with a median of 50 Mio reads were sequenced on the HiSeqX and NovaSeq 6000 systems (Illumina). At the beginning of the project only the S2 flow cell was available for the NovaSeq System, limiting the maximum throughput of 5 NovaSeqs to ~90 genomes (90x) per week. With the release of the S4 flow cell in October 2017 the throughput could be increased to 200 genomes and 192 transcriptomes a week. The median cluster passed filter (PF) rate for a S2 run was 68.44% (range 64.22% - 74.59%) with a median Q30 score of 89.24%. For a S4 run the median %PF was 67.04% (range 63.22% - 74.05%) with a median Q30 score of 88.99%.

Library quantification and normalisation

The quality of sequencing and a homogenous read distribution largely depend on exact library quantification. For DNA library quantification multiple methods are available of which quantitative real-time PCR (qPCR) and fluorometry (QuBit) are the most frequently used ones. Newer studies also provide the impetus to use ddPCR-based quantification [Robin et al. 2016]. Currently no unambiguous gold standard exists, but the qPCR is the recommended method by Illumina and, hence, the KAPA Library Quantification Kit (Roche) was used to quantify the WGS libraries. However, in order to get the maximum yield and to optimally use the flow cell capacities while guaranteeing equal coverage for all samples at the same time, another normalization step was added: in the first run the samples were sequenced targeting 45x, based on the obtained read distribution the sample concentration was adjusted and a second run was started to top off the coverage of each sample to 90x. The strategy was especially useful for S4 flow cells, increasing the number of 90x genomes in two runs from 16 (source: Illumina specifications) to 18-19. This strategy, including the quantification itself, might soon be replaced by performing quality control (QC) using the iSeq 100 System. The recently published white paper from Illumina demonstrates that the iSeq 100 System enables efficient rebalancing of DNA libraries for sequencing runs on the NovaSeq 6000 system. Availability of specific QC kits for the iSeq 100 System would make the iSeq 100 System a noteworthy QC solution that could save time and money.

Reproducibility

During the sequencing the data is transferred directly to the Amazon Web Services (AWS) cloud in Frankfurt and pre-processed in Illumina’s BaseSpace Sequencing Hub. The Illumina tumor/unmatched normal workflow was used for variant calling. Here, a mixture of genomic DNA from multiple anonymous donors was used as normal controls. In total four normal controls were used: one control per gender and sequencing platform. For each sequencing platform two patients were sequenced twice with comparable coverage (Table 1) to estimate the reproducibility on the same platform. In addition, two patients were sequenced on both systems to estimate the reproducibility between the two platforms (Table 1). However, the used normal controls for the different platforms had a slightly larger coverage difference (~40x) that might impact the variant calling.

Table 1: Sample overview

Sample

Instrument

Coverage

1

NovaSeq

112.87

1

NovaSeq

119.06

2

NovaSeq

109.91

2

NovaSeq

110.64

3

HiSeq X

127.18

3

HiSeq X

127.58

4

HiSeq X

121.65

4

HiSeq X

126.35

5

NovaSeq

141.82

5

HiSeq X

140.94

6

NovaSeq

120.88

6

HiSeq X

125.32

In general, the number of called variants was dependent on the coverage of the sample for both platforms. However, a slightly higher number of variants was called for the HiSeq X samples compared to the NovaSeq samples. For runs on the same instrument the reproducibility of ‘PASS’ variant calls was rather good with a percentage overlap between 92% and 95.5%. The relative overlap of ‘PASS’ variants between the two platforms on the other hand was worse with 67.2% and 83.6%, respectively. This might partially be attributed to the difference in coverage of the platform-specific normal samples. However, even if the overall number of variants is lower for the samples produced on the NovaSeq system the number of unique ‘PASS’ variants is higher compared to the HiSeq X sample. Hence, the data quality of the NovaSeq samples seems to be slightly better compared to the HiSeq X samples.


References

Robin, Jérôme D., et al. "Comparison of DNA quantification methods for next generation sequencing." Scientific reports 6 (2016): 24067.

Die Autorin

»Sie haben Fragen zum Artikel oder wünschen weitere Informationen? Schreiben Sie mir gerne eine E-Mail.«

Dr. Wencke Walter

Bioinformatikerin, M.Sc.
Abteilungsleitung Innovation

T: +49 89 99017-545

Das könnte Sie auch interessieren

Dr. rer. nat. Katharina Hörst
vom 16.12.2025

Das MLL MVZ auf dem 67. ASH Annual Meeting & Exposition

Vom 06. bis 09. Dezember 2025 waren Expertinnen und Experten des MLL MVZ auf dem ASH-Meeting in Orlando und präsentierten dort Forschungsergebnisse in Vorträgen, einem Workshop und Postern.

Mehr erfahren

Prof. Dr. med. Dr. phil. Torsten Haferlach
vom 16.12.2025

Der MLL MVZ Jahresrückblick 2025

2025 war für uns ein prägendes Jahr: steigende Anforderungen, digitale Veränderungen und ein wachsender Systemdruck. Wir haben wichtige Weichen bei Laborinfrastruktur, diagnostischen Angeboten und Forschung gestellt.

Mehr erfahren

Julia Hennig
vom 18.12.2024

KIM-Mail - sicherer Kommunikationsweg für medizinische Daten

Die herkömmliche E-Mail als Kommunikationsmittel eignet sich nicht für die Übermittlung sensibler Patientendaten. Darüber hinaus sind wir daran interessiert, analoge Befundübermittlungsverfahren, allen voran das Fax, abzulösen. Deshalb haben wir neben unserem Befundportal zusätzlich die Nutzung von KIM-Mail etabliert.

Mehr erfahren
MLL MVZ Academy 2026 - Anmeldung offen

Vom 23. bis 25. März 2026 findet die jährliche MLL MVZ Academy 2026 statt. Unter dem Motto „State of the art diagnostics in hematological malignancies“ erhalten Sie in diesem Rahmen umfangreichen Einblick in die diagnostischen Methoden der Zytomorphologie, Immunphänotypisierung sowie der Zytogenetik und Molekulargenetik. Außerdem bietet Ihnen die MLL MVZ Academy detaillierte Informationen zu verschiedenen hämatologischen Neoplasien, unter besonderer Berücksichtigung der aktuellen diagnostischen Kriterien, Richtlinien und Empfehlungen.

Alle Details zu Programm und Anmeldung finden Sie hier.

Mehr erfahren