Living Lab Lecture No. 30: Compressing Large Language Models
- Datum
- 06.03.2025
- Zeit
- 11:00 - 12:00
- Sprecher
- Aaron Klein
- Zugehörigkeit
- ScaDS.AI Dresden/Leipzig
- Serie
- ScaDS.AI Lecture Series
- Sprache
- en
- Hauptthema
- Informatik
- Host
- ScaDS.AI Dresden/Leipzig
- Beschreibung
- Large Language Models (LLMs) mark a new era in Artificial Intelligence. However, their large size poses significant challenges for inference in real-world applications due to substantial GPU memory requirements and high inference latency. In this talk, we discuss techniques to compress pre-trained LLMs, reducing their resource consumption during inference while maintaining their performance. More specifically, we approach the problem from a multi-objective Neural Architecture Search (NAS) perspective to jointly optimize performance and efficiency. By considering the LLM as a super-network consisting of a large but finite number of sub-networks, we can identify a set of Pareto-optimal sub-networks that balance parameter count and validation performance. We empirically demonstrate that using NAS techniques for fine-tuning enhances the prunability of pre-trained LLMs and explore how this impacts real-world applications.
- Links
Letztmalig verändert: 17.02.2025, 16:08:07
Veranstaltungsort
Online, please follow the internet link. (https://tud.link/i8zf)
Veranstalter
Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI)Chemnitzer Straße46b, 2. OG01187Dresden
- Telefon
- +49 351 463-40900
- ScaDS.AI
- Homepage
- https://scads.ai
Legende
- Ausgründung/Transfer
- Bauing., Architektur
- Biologie
- Chemie
- Elektro- u. Informationstechnik
- für Schüler:innen
- Gesellschaft, Philos., Erzieh.
- Informatik
- Jura
- Maschinenwesen
- Materialien
- Mathematik
- Medizin
- Physik
- Psychologie
- Sprache, Literatur und Kultur
- Umwelt
- Verkehr
- Weiterbildung
- Willkommen
- Wirtschaft