텍스트 및 자연어 빅데이터 분석 방법론
Text and Natural Language Big Data Analysis
Hyopil Shin (hpshin@snu.ac.kr)
Goals
The course will cover both theoretical background for Natural Language Processing or Computational Linguistics and recent Transformers-based methodologies. We will start from regular expressions, N-grams, Entropy, and Embeddings. After reviewing the concepts of Regressions, Encoder-Decoder and Attention Models, we will study Transformers’ pre-learning models implemented by Huggingface. Students will experience various transformer-based pre-trained models and will apply them to many downstream tasks which are implemented in Pytorch. Python and Deep Learning Knowledge is required for the class. Through lectures and programming assignments students will learn the necessary implementation tricks for making neural networks work on practical problems.
Content
01. Introduction to Natural Language Processing / Regular Expressions, Text Normalization and Edit Distance (1)
02. Regular Expressions, Text Normalization and Edit Distance (2) / Language Modeling and with N-Grams (1)
03. Language Modeling and with N-Grams (2) / Entropy and Maximum Entropy Models
04. Naive Bayes Classification and Sentiment / Linear Regression and Logistic Regression
05. Vector Semantics and Embeddings
06. Neural Networks Review for NLP
07. Sequence Processing with Recurrent Networks / Mid-Term Test
08. Encoder-Decoder Review / Attention Model
09. Transformer
10. BERT (Bidirectional Encoder Representations from Transformers) / Transformers by Huggingface (1)
11. Transformers by Huggingface (2)
12. Transformers by Huggingface For Korean (1)
13. Transformers by Huggingface For Korean (2)
14. Transformers by Huggingface For GPT Models
15. Final Test and Project Presentations
Textbook
- · Speech and Language Processing (3rd ed. Draft) (https://web.stanford.edu/~jurafsky/slp3/)
- · Huggingface Transformers (https://huggingface.co/transformers/index.html)
- · Deep Learning Tutorials based on PyTorch (https://www.deeplearningwizard.com/deep_learning/intro/)
Grading Policy
- · Attendance: 10%
- · Assignment: 20%
- · Mid-term: 30%
- · Final Project: 40%
Prerequisite
- · Please set up Python, PyTorch, and Colab for class
- · Basic knowledge of deep learning and programming in Python
Note
This course is for both students who take “Text and Natural Language Big Data Analysis” of Graduate School of Data Science and “Studies in Computational Linguistics I” of Dept. of Linguistics.
For more details: https://hpshin.github.io/NaturalLanguageBigDataAnalysis/index.html