Learning Methods for Data Science
(데이터사이언스를 위한 학습이론)

Fall 2020

Tue/Thu, 12:30 - 13:45

Instructor: Yongdai Kim (ydkim903@snu.ac.kr)

Summary

This subject covers the principles of various methodologies used in statistical learning. It mainly covers supervised learning methodologies, and unsupervised learning methodologies if time permits. Topics cover linear models, shrinkage, decision trees and ensembles, SVM and RKHS, empirical risk minimization, deep learning, and etc. An undergraduate-level of mathematical statistics and knowledge of regression analysis and machine learning algorithm programs are required.

Goal

This course teaches principles and methodologies of learning for data analytics which is a core technology to extract useful information from data. Various data analytic methodologies used in data mining, machine learning and deep learning are summarized, and background principles and theories governing these methodologies are studied. Topics including likelihood principle, estimation of probability density, non-parametric regression, classification and decision theory, shrinkage and regularization, sparse learning, decision trees and ensembles, SVM and deep learning are covered and related theories including concentration of measure and minimax optimality are treated.

Content

1. Introduction to statistical learning

2. Linear classifier

3. Shrinkage estimator

4. Model assessment and selection

5. Basis expansion and Kernel methods

6. Model assessment

7. Ensemble

8. Midterm Exam

9 Function estimation on high dimensions

10. Support Vector Machine (SVM)

11. Empirical risk minimization

12. Kernel machines

13. Deep learning: Introduction

14. Deep learning: Theories

15. Final Exam

Grading Policy

  • · Attendance: 10%

  • · Assignment: 20%

  • · Midterm Exam: 30%

  • · Final Exam: 40%

Prerequisite

Mathematical Statistics, Undergraduate level of Statistical Knowledge (e.g. Regression Analysis), Machine Learning Programs

Note

  • · There is no practice in this subject, but data analysis is in the homework assignments.
  • · Students have to learn the computing programs used in the course on their own.