Pollux Co-Adaptive Cluster Scheduling for Goodput Optimized Deep Learning

本文记录阅读OSDI2021 best paer "Pollux Co-Adaptive Cluster Scheduling for Goodput Optimized Deep Learning"笔记

Design and Architecture

  • PolluxAgent: 通过调整batch size和learning rate来充分利用已分配的资源(job-level granularity)
  • PolluxSched:通过调整分配给各个任务的资源来提升公平性和加快任务完成时间(cluster-wide granularity)
updatedupdated2021-11-062021-11-06