Living Lab Lecture No. 30: Compressing Large Language Models
- Date
- Mar 6, 2025
- Time
- 11:00 AM - 12:00 PM
- Speaker
- Aaron Klein
- Affiliation
- ScaDS.AI Dresden/Leipzig
- Series
- ScaDS.AI Lecture Series
- Language
- en
- Main Topic
- Informatik
- Host
- ScaDS.AI Dresden/Leipzig
- Description
- Large Language Models (LLMs) mark a new era in Artificial Intelligence. However, their large size poses significant challenges for inference in real-world applications due to substantial GPU memory requirements and high inference latency. In this talk, we discuss techniques to compress pre-trained LLMs, reducing their resource consumption during inference while maintaining their performance. More specifically, we approach the problem from a multi-objective Neural Architecture Search (NAS) perspective to jointly optimize performance and efficiency. By considering the LLM as a super-network consisting of a large but finite number of sub-networks, we can identify a set of Pareto-optimal sub-networks that balance parameter count and validation performance. We empirically demonstrate that using NAS techniques for fine-tuning enhances the prunability of pre-trained LLMs and explore how this impacts real-world applications.
- Links
Last modified: Feb 17, 2025, 4:08:07 PM
Location
Online, please follow the internet link. (https://tud.link/i8zf)
Organizer
Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI)Chemnitzer Straße46b, 2. OG01187Dresden
- Phone
- +49 351 463-40900
- ScaDS.AI
- Homepage
- https://scads.ai
Legend
- Biology
- Chemistry
- Civil Eng., Architecture
- Computer Science
- Economics
- Electrical and Computer Eng.
- Environmental Sciences
- for Pupils
- Law
- Linguistics, Literature and Culture
- Materials
- Mathematics
- Mechanical Engineering
- Medicine
- Physics
- Psychology
- Society, Philosophy, Education
- Spin-off/Transfer
- Traffic
- Training
- Welcome