Developing AI models to study proteins and their encodings
- Date
- Mar 5, 2026
- Time
- 11:00 AM - 12:30 PM
- Speaker
- Rachel Kolodny
- Affiliation
- University of Haifa, Israel
- Series
- MPI-CBG Thursday Seminar
- Language
- en
- Main Topic
- Biologie
- Host
- Agnes Toth-Petroczy
- Description
- We develop AI models to better understand proteins and the information they encode. The first model, Contrastive Learning Sequence–Structure (CLSS), aims to map the protein universe by characterizing relationships between amino acid sequences and structures. CLSS is a self-supervised contrastive learning model trained on large and diverse sets of protein domains to co-embed sequence and structure into a shared high-dimensional space, where distance reflects sequence–structure similarity. This representation naturally captures both evolutionary relationships and structural variation. We find that CLSS refines expert knowledge of the global organization of protein space, highlights transitional forms that resist hierarchical classification, and reveals linkages between domains from seemingly separate lineages, thereby improving our understanding of evolutionary design. The second model focuses on codon selection. Codon usage is shaped by selective pressures that optimize multiple, overlapping signals that remain only partially understood. We trained AI models to predict gene codons from amino acid sequences in four organisms (S. cerevisiae, S. pombe, E. coli, and B. subtilis). The AI models significantly outperformed frequency-based baselines, indicating that dependencies between codons within genes can be learned. Performance gains were greater for highly expressed genes and in bacteria compared to eukaryotes, consistent with stronger selective pressure under larger effective population sizes. In S. cerevisiae and bacteria, accuracy also increased with protein length, suggesting that the models captured signals related to co-translational folding. Incorporating information from homologous proteins provided only minor additional benefit, potentially reflecting complex codon-usage patterns in rapidly evolving genes. Together, these studies provide practical tools and demonstrate how AI can be used to study how evolution has shaped the protein universe and its encodings.
Last modified: Feb 21, 2026, 7:36:06 AM
Location
Max Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG CBG Large Auditorium)Pfotenhauerstraße10801307Dresden
- Phone
- +49 351 210-0
- Fax
- +49 351 210-2000
- MPI-CBG
- Homepage
- http://www.mpi-cbg.de
Organizer
Max Planck Institute of Molecular Cell Biology and GeneticsPfotenhauerstraße10801307Dresden
- Phone
- +49 351 210-0
- Fax
- +49 351 210-2000
- MPI-CBG
- Homepage
- http://www.mpi-cbg.de
Legend
- Biology
- Chemistry
- Civil Eng., Architecture
- Computer Science
- Economics
- Electrical and Computer Eng.
- Environmental Sciences
- for Pupils
- Law
- Linguistics, Literature and Culture
- Materials
- Mathematics
- Mechanical Engineering
- Medicine
- Physics
- Psychology
- Society, Philosophy, Education
- Spin-off/Transfer
- Traffic
- Training
- Welcome
