분산 기계학습 시스템

Distributed Machine Learning System
Dr. Wen-Syan Li (wensyanli@snu.ac.kr, Office: 942-412)

Goals

Emerging machine learning (i.e. ML) applications flexibility and performance. Flexibility requires systems to support multiple machine learning algorithm families, data exchange between tasks on multiple nodes or tiers, as well as control of task dependency. Performance requires systems to support scaling across nodes or on the cloud, adapting to environment changes in real time, and pushing computation to edges close to local data. In a word, ML functionalities need to be distributed horizontally and vertically (i.e. from edge computing units, such as Google Coral, to backend servers, and clouds) as needed for performance with high flexibility.

This course looks into the topics related to above aspects as well as system design principles, such as

  • · Data Locality,
  • · Data Federation,
  • · Data Privacy,
  • · Power Efficiency,
  • · Possible Work offline and
  • · How to integrate software and hardware, such as dedicated machine training hardware for acceleration, inference on chips, inference on clouds.

Content

The course will use popular machine learning libraries as examples to demonstrate end to end of a distributed machine learning system. Systems like Berkeley Ray, Alibaba Brain/machine learning platform, and other available commercially used systems will be utilized in the course for students to learn designs of real world large-scale distributed ML systems. This course will also look into emerging federated machine learning topics.

Some industry experts (i.e. architect, business users, or operators) may be invited to present in class to share best practice in the real world.

1. Distributed Machine Learning Overview

2. Machine Learning Algorithm Review and Parallelism Analysis

3. Open Source Big Data Platform

4. Open Source Big Data Platform

5. Vertical Machine Learning Distribution

6. Assignment Discussion / Final Group Project Topic Presentation

7. Berkeley Ray Distributed Machine Learning System

8. Internet Machine Learning Platforms 1

9. Internet Machine Learning Platforms 2

10. Enterprise Machine Learning Systems

11. Assignment Discussion / Final Group Project Topic Presentation Second check Point

12. Federated Machine Learning Systems 1

13. Federated Machine Learning Systems 2

14. Advanced Topics in Distributed Machine Learning Systems

15. Final Project Presentation

16. Final Project Presentation

Prerequisite

Familiarity with Python, machine learning algorithms and libraries, state of art open source big data platform is needed for programming-based assignments. Students are encouraged to go through the book or online before starting the class. Presentation and participation in class discussion are required. Having taken Machine Learning and Deep Learning course is required – for exception, approval, please discuss with the instructor.

Note

There is no single required textbook for this course as the lectures will be based on multiple textbooks, various articles, and web documents as well as real scenarios from external companies. Papers, documents, technical reports, white papers will be distributed in class. As Machine Learning and Deep Learning is required to take this Distributed Machine Learning course, the following textbooks available in the market, the following are recommended as background material.

  1. 1. Pattern Recognition and Machine Learning (Information Science and Statistics)by Christopher M. Bishop, ISBN-13:978-0387310732. On line material and downloadable pdf are available at https://www.microsoft.com/en-us/research/people/cmbishop/prml-book/

2. Machine Learning by Tom M Mitchell ISBN-13: 978-1259096952. On line material is available at http://www.cs.cmu.edu/~tom/10701_sp11/lectures.shtml

3. Deep Learning (Adaptive Computation and Machine Learning series) by Ian Goodfellow, Yoshua Bengio, Aaron Courville, ISBN-13:978-0262035613. On line material is available at https://www.deeplearningbook.org/

4. Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning series) second edition by Richard S. Sutton(Author), Andrew G. Barto, ISBN-13: 978-0262039246. On line material is available at https://mitpress.mit.edu/books/reinforcement-learning-second-edition (reinforcement learning focused book)

5. Machine Learning by Andrew Ng’s online machine learning course available at https://www.youtube.com/watch?v=PPLop4L2eGk&list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN

6. A good reference is the Python Data Science Handbook by Jake VanderPlas

Language Policy

This course will be taught in English. All lectures as well as exams and assignments will be given in English. Students will use English for answering exam questions and doing assignments.