Text and Natural Language Big Data Analysis
(텍스트 및 자연어 빅데이터 분석 방법론)

Fall 2020

Tue/Thu, 15:30 - 16:45

Instructor: Hyopil Shin (hpshin@snu.ac.kr)
TA: Sang-Ah Lee (visualjan@snu.ac.kr)


The course will cover both theoretical background for Natural Language Processing or Computational Linguistics and recent Transformers-based methodologies. We will start from regular expressions, N-grams, Entropy, and Embeddings. After reviewing the concepts of Regressions, Encoder-Decoder and Attention Models, we will study Transformers’ pre-learning models implemented by Huggingface. Students will experience various transformer-based pre-trained models and will apply them to many downstream tasks which are implemented in Pytorch. Python and Deep Learning Knowledge is required for the class. Through lectures and programming assignments students will learn the necessary implementation tricks for making neural networks work on practical problems.

Textbook and Sites


01. Introduction to Natural Language Processing / Regular Expressions, Text Normalization and Edit Distance (1) 02. Regular Expressions, Text Normalization and Edit Distance (2) / Language Modeling and with N-Grams (1) 03. Language Modeling and with N-Grams (2) / Entropy and Maximum Entropy Models 04. Naive Bayes Classification and Sentiment / Linear Regression and Logistic Regression 05. Vector Semantics and Embeddings 06. Neural Networks Review for NLP 07. Sequence Processing with Recurrent Networks / Mid-Term Test 08. Encoder-Decoder Review / Attention Model 09. Transformer 10. BERT (Bidirectional Encoder Representations from Transformers) / Transformers by Huggingface (1) 11. Transformers by Huggingface (2) 12. Transformers by Huggingface For Korean (1) 13. Transformers by Huggingface For Korean (2) 14. Transformers by Huggingface For GPT Models 15. Final Test and Project Presentations

Grading Policy

  • · Attendance: 10%

  • · Assignment: 20%

  • · Mid-term: 30%

  • · Final Project: 40%


  • · Please set up Python, PyTorch, and Colab for class
  • · Basic knowledge of deep learning and programming in Python


This course is for both students who take “Text and Natural Language Big Data Analysis” of Graduate School of Data Science and “Studies in Computational Linguistics I” of Dept. of Linguistics.

For more details: https://hpshin.github.io/NaturalLanguageBigDataAnalysis/index.html