MLL uses AWS Cloud Computing infrastructure for NGS data

Thanks to the rapid advancement of sequencing technology, it has become easier to read an increasingly large amount of DNA. Gene panels are already standard in leukemia diagnostics, and a switch to exome sequencing is expected in the medium term. However, the ability to interpret this data has not grown by the same amount. For this purpose, high-performance computer clusters are necessary in order to analyze the large quantities of data for a standard next-generation sequencing (NGS) run.

With lower sequencing costs and the introduction of devices with significantly higher capacity, the sequencing throughput has increased. This means that the bottleneck has shifted away from sequencing and towards data processing and
interpretation. The increasing availability of computer resources as a result of to cloud computing therefore competes with a large initial investment in local computer infrastructure.

Hence, MLL performed an initial evaluation of cloud computing for NGS in the context of the 5,000 genome project. It quickly became clear that this project would produce petabytes of data that would need to be processed and stored securely over the long term in a computer environment secured according to ISO 27001 and which complies with the GDPR. Building such infrastructure would have equated to a huge initial workload and costs for MLL. The maintenance costs would also be immense. With the decision to utilize cloud computing and the associated pay-per-use approach, that was no longer an issue, and we were able to focus completely on the goal of the project: better molecular leukemia diagnostics. 

After the initial positive experiences with cloud computing as part of this research project, we decided to switch to cloud computing for the evaluation of the NGS routine diagnostics data as well, after we had to search for costly local solutions in the previous years in order to ensure the same turnround time despite continuously increasing volumes of data.

Accreditation of the newly set-up NGS panel diagnostics in accordance with DIN EN ISO 15189 and DIN EN ISO/IEC 17025 with a pipeline developed specifically for this purpose utilizing Illuminas BaseSpace and AWS Cloud Computing was performed in early 2019. With this pipeline, the amount of daily patient data no longer has an influence on the processing duration, as the data of each patient can be processed in parallel and simultaneously in the cloud. 

With the ability of cloud computing to scale almost infinitely, both with regard to processing as well as the storage of data, we consider ourselves to be well-equipped for the future development of NGS – whether it is via the expansion of the panels, whole exome, or whole genome sequencing. Furthermore, we are striving to shift additional workflows to MLL’s dedicated partition on the AWS Cloud. As part of collaborative projects with AWS and its Envision engineering team, we are currently working, e.g. on scalable approaches, in order to perform cell differentiation in cytomorphology via artificial intelligence and automatically analyzing flow cytometry data in immune phenotyping without manual gating.

The author

»Do you have questions regarding this article or do you need further information? Please send me an e-mail.«

Niroshan Nadarajah

Bioinformatician, M.Sc.
Innovation & Partner Management

T: +49 89 99017-567