Text and Natural Language Big Data Analysis
(텍스트 및 자연어 빅데이터 분석 방법론)

Instructor: Hyopil Shin (hpshin@snu.ac.kr)


The course will cover both theoretical background for Natural Language Processing or Computational Linguistics and recent Transformers-based methodologies. We will start from regular expressions, N-grams, Entropy, and Embeddings. After reviewing the concepts of Regressions, Encoder-Decoder and Attention Models, we will study Transformers’ pre-learning models implemented by Huggingface. Students will experience various transformer-based pre-trained models and will apply them to many downstream tasks which are implemented in Pytorch. Python and Deep Learning Knowledge is required for the class. Through lectures and programming assignments students will learn the necessary implementation tricks for making neural networks work on practical problems.


01. Introduction to Natural Language Processing / Regular Expressions, Text Normalization and Edit Distance (1)

02. Regular Expressions, Text Normalization and Edit Distance (2) / Language Modeling and with N-Grams (1)

03. Language Modeling and with N-Grams (2) / Entropy and Maximum Entropy Models

04. Naive Bayes Classification and Sentiment / Linear Regression and Logistic Regression

05. Vector Semantics and Embeddings

06. Neural Networks Review for NLP

07. Sequence Processing with Recurrent Networks / Mid-Term Test

08. Encoder-Decoder Review / Attention Model

09. Transformer

10. BERT (Bidirectional Encoder Representations from Transformers) / Transformers by Huggingface (1)

11. Transformers by Huggingface (2)

12. Transformers by Huggingface For Korean (1)

13. Transformers by Huggingface For Korean (2)

14. Transformers by Huggingface For GPT Models

15. Final Test and Project Presentations


Grading Policy

  • · Attendance: 10%

  • · Assignment: 20%

  • · Mid-term: 30%

  • · Final Project: 40%


  • · Please set up Python, PyTorch, and Colab for class
  • · Basic knowledge of deep learning and programming in Python


This course is for both students who take “Text and Natural Language Big Data Analysis” of Graduate School of Data Science and “Studies in Computational Linguistics I” of Dept. of Linguistics.

For more details: https://hpshin.github.io/NaturalLanguageBigDataAnalysis/index.html