Evaluating the chaos game representation of proteins for applications in machine learning models

2023

Journal of Molecular Modeling

Andrea Arsiccio, Lorenzo Stratta, Tim Menzen

Evaluating the chaos game representation of proteins for applications in machine learning models: prediction of antibody affinity and specificity as a case study

Machine learning techniques are becoming increasingly important in the selection and optimization of therapeutic molecules, as well as for the selection of formulation components and the prediction of long-term stability. Compared to first-principle models, machine learning techniques are easier to implement, and can identify correlations that would be hard to describe at a mechanistic level, but strongly rely on high-quality input training data. Here, we evaluate the potential of the “chaos game” representation to provide input data for machine learning models. The chaos game is an algorithm originally developed for the production of fractal structures, and later on applied also to the representation of biological sequences, such as genes and proteins. Our results show that the combination of the chaos game representation with convolutional neural networks results in comparable accuracy to other machine learning approaches, thus indicating that chaos game representations could be a valid alternative to existing featurization strategies for machine learning models of biological sequences.

We implement the chaos game in Python 3.8.10, and use it to produce fractal as well as novel expanding representations of protein sequences. We then feed the resulting images to a convolutional neural network, built in Python 3.8.10, using TensorFlow 2.9.1, Keras 2.9.0, and the scikit-learn 1.1.1 packages. We select as case study a recently published dataset for the antibody emibetuzumab, with the objective of co-optimizing antibodies variants with both high affinity and low non-specific binding.

Journal of Molecular Modeling

https://doi.org/10.1007/s00894-023-05777-0

Latest publications

Publication

NEW PUBLICATION: Challenges in the analysis of pharmaceutical lentiviral vector products

Daniela Stadler, Constanze Helbig, Klaus Wuchner, Jürgen Frank, Klaus Richter, Andrea Hawe, Tim Menzen

Publication

NEW PUBLICATION: High throughput multidimensional liquid chromatography approach for online protein removal and characterization

Maksymilian M. Zegota, Georg Schuster, Mauro De Pra, Tibor Müllner, Tim Menzen, Frank Steiner, Andrea Hawe

Publication

Use of Closed System Transfer Devices (CSTDs) with Protein-Based Therapeutic Drugs—A Non-Solution for a Non-Problem?

Jonas Fast, Twinkle Christian, Mirjam Crul, Wim Jiskoot, M. Reza Nejadnik, Annette Medina, Allison Radwick, Alavattam Sreedhara, Hugh Tole

Publication

Genome length determination in adeno-associated virus vectors with mass photometry

Cornelia Hiemenz, Nadine Baumeister, Constanze Helbig, Andrea Hawe, Sabrina Babutzka, Stylianos Michalakis, Wolfgang Friess, Tim Menzen

Publication

NEW PUBLICATION: Roadmap for Drug Product Development and Manufacturing of Biologics

Krishnan Sampathkumar, Bruce A. Kerwin

Publication

Possibilities and limitations of α-relaxation data of amorphous freeze-dried cakes to predict long term IgG1 antibody stability

Alexandra Roesch, Roland Windisch, Christian Wichmann, Willem F. Wolkers, Gideon Kersten, Tim Menzen

Publication

Three-Dimensional Homodyne Light Detection (3D-HLD) for High-Throughput Submicron Particle Analysis in (Highly Concentrated) Protein Biopharmaceuticals, Viral Vectors, and LNPs

Dominik Brandstetter, Constanze Helbig, Kentaro Osawa, Hiroyuki Minemura, Yumiko Anzai, Tetsuo Torisu, Susumu Uchiyama, Tim Menzen, Wolfgang Friess, Andrea Hawe

Publication

Osmotic properties of T cells determined by flow imaging microscopy in comparison to electrical sensing zone analysis

Alexandra Roesch, Roland Windisch, Christian Wichmann, Willem F. Wolkers, Gideon Kersten, Tim Menzen

Publication

Lyophilization cycle design for highly concentrated protein formulations supported by micro freeze-dryer and heat flux sensor

Marco Carfagna, Monica Rosa, Andrea Hawe, Wolfgang Frieß

Publication

Reversed phase liquid chromatography for recombinant AAV genome integrity assessment

Christoph Gstöttner, Andrei Hutanu, Sacha Boon, Aurelia Raducanu, Klaus Richter, Markus Haindl, Raphael Ruppert, and Elena Domínguez-Vega