Bi

Developing AI models to study proteins and their encodings

Date
Mar 5, 2026
Time
11:00 AM - 12:30 PM
Speaker
Rachel Kolodny
Affiliation
University of Haifa, Israel
Series
MPI-CBG Thursday Seminar
Language
en
Main Topic
Biologie
Host
Agnes Toth-Petroczy
Description
We develop AI models to better understand proteins and the information they encode. The first model, Contrastive Learning Sequence–Structure (CLSS), aims to map the protein universe by characterizing relationships between amino acid sequences and structures. CLSS is a self-supervised contrastive learning model trained on large and diverse sets of protein domains to co-embed sequence and structure into a shared high-dimensional space, where distance reflects sequence–structure similarity. This representation naturally captures both evolutionary relationships and structural variation. We find that CLSS refines expert knowledge of the global organization of protein space, highlights transitional forms that resist hierarchical classification, and reveals linkages between domains from seemingly separate lineages, thereby improving our understanding of evolutionary design. The second model focuses on codon selection. Codon usage is shaped by selective pressures that optimize multiple, overlapping signals that remain only partially understood. We trained AI models to predict gene codons from amino acid sequences in four organisms (S. cerevisiae, S. pombe, E. coli, and B. subtilis). The AI models significantly outperformed frequency-based baselines, indicating that dependencies between codons within genes can be learned. Performance gains were greater for highly expressed genes and in bacteria compared to eukaryotes, consistent with stronger selective pressure under larger effective population sizes. In S. cerevisiae and bacteria, accuracy also increased with protein length, suggesting that the models captured signals related to co-translational folding. Incorporating information from homologous proteins provided only minor additional benefit, potentially reflecting complex codon-usage patterns in rapidly evolving genes. Together, these studies provide practical tools and demonstrate how AI can be used to study how evolution has shaped the protein universe and its encodings.

Last modified: Feb 21, 2026, 7:36:06 AM

Location

Max Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG CBG Large Auditorium)Pfotenhauerstraße10801307Dresden
Phone
+49 351 210-0
Fax
+49 351 210-2000
E-Mail
MPI-CBG
Homepage
http://www.mpi-cbg.de

Organizer

Max Planck Institute of Molecular Cell Biology and GeneticsPfotenhauerstraße10801307Dresden
Phone
+49 351 210-0
Fax
+49 351 210-2000
E-Mail
MPI-CBG
Homepage
http://www.mpi-cbg.de
Scan this code with your smartphone and get directly this event in your calendar. Increase the image size by clicking on the QR-Code if you have problems to scan it.
  • BiBiology
  • ChChemistry
  • CiCivil Eng., Architecture
  • CoComputer Science
  • EcEconomics
  • ElElectrical and Computer Eng.
  • EnEnvironmental Sciences
  • Sfor Pupils
  • LaLaw
  • CuLinguistics, Literature and Culture
  • MtMaterials
  • MaMathematics
  • McMechanical Engineering
  • MeMedicine
  • PhPhysics
  • PsPsychology
  • SoSociety, Philosophy, Education
  • SpSpin-off/Transfer
  • TrTraffic
  • TgTraining
  • WlWelcome