- Seminar Calendar
- Seminar Archive
- 2024-2025 Semester 2
- 2024-2025 Semester 1
- 2023-2024 Semester 2
- 2023-2024 Semester 1
- 2022-2023 Semester 2
- 2022-2023 Semester 1
- 2021-2022 Semester 2
- 2021-2022 Semester 1
- 2020-2021 Semester 2
- 2020-2021 Semester 1
- 2019-2020 Semester 2
- 2019-2020 Semester 1
- 2018-2019 Semester 2
- 2018-2019 Semester 1
- 2017-2018 Semester 2
- 2017-2018 Semester 1
- 2016-2017 Semester 2
- 2016-2017 Semester 1
- 2015-2016 Semester 1
- 2015-2016 Semester 2
- 2014-2015 Semester 2
- 2014-2015 Semester 1
- 2013-2014 Semester 2
- 2013-2014 Semester 1
- 2012-2013 Semester 2
- 2012-2013 Semester 1
- 2011-2012 Semester 2
- 2011-2012 Semester 1
- 2010-2011 Semester 2
- 2010-2011 Semester 1
- 2009-2010 Semester 2
- 2009-2010 Semester 1
- 2008-2009 Semester 2
- 2008-2009 Semester 1
- 2007-2008 Semester 2
- 2007-2008 Semester 1
- 2006-2007 Semester 2
- 2006-2007 Semester 1
- 2005-2006 Semester 2
- 2005-2006 Semester 1
- Contact
- Site Map
Accommodating LLM Service over Heterogeneous Computational Resources
----------------------------------------------------------------------------------------------------
Department of Systems Engineering and Engineering Management
The Chinese University of Hong Kong
----------------------------------------------------------------------------------------------------
Date: Friday, Feburary 14, 4:00 pm – 5:30 pm
Venue: ERB 513, The Chinese University of Hong Kong
Title: Accommodating LLM Service over Heterogeneous Computational Resources
Speaker: Professor Binhang YUAN, HKUST
Abstract:
Serving large-scale language model service is crucial to contemporary
AI applications. We focus on deploying such services in a
heterogeneous and potentially decentralized setting to mitigate the
substantial costs typically associated with centralized data centers.
Our work relies on carefully designed scheduling algorithms where we
model the computation capacity and inter-machine connection precisely
and propose an efficient searching algorithm to find the optimal
allocations for different LLM serving paradigms. Our empirical study
suggests that the proposed method can efficiently reduce the service
cost while preserving the service quality.
Biography:
Binhang YUAN is an Assistant Professor at the Department of Computer
Science and Engineering (CSE), the Hong Kong University of Science and
Technology (HKUST). He received his Ph.D. and master's degrees from
Rice University and his bachelor's degree from Fudan University.
Before joining HKUST, he was a Postdoc at the Swiss Federal Institute
of Technology Zurich (ETH Zurich). His main research interests are in
data management systems for machine learning and distributed and
decentralized machine learning systems. He won the VLDB Best Paper
Honorable Mention Award in 2019 and the SIGMOD Research Highlight
Award in 2020.
Everyone is welcome to attend the talk!
Date:
Friday, February 14, 2025 - 17:00