Computer Science - UG3/G
Course Natural Language Processing I
Term 2023F
Final Tba
Credits 2
Staff WU Xiaokun 吴晓堃
Lecture 32 hours (8 weeks)
计算机科学专业选修课
人工智能专业核心课
课程名称 自然语言处理I
授课时间 2023年春
考试形式 考试/考查
学分 2
讲者 吴晓堃
总计时长 32学时(8周)

Course Information

Short Intro

Natural Language Processing (NLP) is a crucial part of Artificial Intelligence (AI), which modeling how people communicates to each other. The objective of this course is to provide a complete introduction to natural language processing techniques and their applications, especially in the era of deep machine learning approaches.

Description

Natural Language Processing (NLP) is a crucial part of Artificial Intelligence (AI), which applies both Computer Science and Linguistics methodologies. NLP is sometimes referred to as Computational Linguistics (CL) when the speaker emphasizes more on linguistic structures. NLP is widely considered as the fundamental instrument of the information age, since applications facilitate people communicating in various kinds of language: web search, advertising, language translation, etc. The objective of this course is to provide a complete introduction to natural language processing techniques and their applications, as a first step leading towards more specialized graduate-level topics. In recent years, Deep Learning approaches have greatly improved the performance of almost every AI task, so modern methodologies using Deep Learning for NLP will be provided as the main theme.

The course will touch on the following topics:

Concepts will be illustrated with examples in the PyTorch framework.

Keywords: Natural Language Processing (NLP), Deep Neural Networks, PyTorch.

Course group

Previous offers: [2022]

Closely related: [Artificial Intelligence], [Machine Learning], [Deep Learning]

Prerequisites

Required

Recommended

Teaching plan

  1. Introduction 导言, Basic Text Processing 文本处理基础
  2. Syntactic Structure 句法结构
  3. N-gram Language Models N元语法模型
  4. Naive Bayes 朴素贝叶斯, Logistic Regression 逻辑回归
  5. Vector Semantics 向量语义, Word Embeddings 词嵌入
  6. Sequence Labeling 序列标注, Parts of Speech 词类, Named Entities 命名实体

Several special topics:

  1. Lexicon Parsing 词典分词, Chinese Word Segmentation 中文分词
  2. Constituency Parsing 构成解析, Dependency Parsing 依存解析
  3. Recurrent Neural Networks 循环神经网络, Modern RNNs 现代循环神经网络
  4. Attention 注意力机制, Transformer, Pretrained models 预训练模型
  5. Encoder-Decoder 编码器-解码器, seq2seq 序列到序列学习

Several applications:

  1. Text Synthesis 文本合成
  2. Text Classification 文本分类, Sentiment Analysis 情感分析
  3. Machine Translation 机器翻译
  4. Question Answering 机器问答, Information Retrieval 信息检索
  5. Chatbots & Dialogue Systems 聊天机器人
  6. Speech Recognition 语音识别, Text-to-Speech 机器朗读

Schedule

Wednesday; Multiple locations.

Week Date Lecture Handouts
1 2023/03/18 [导言] [课程信息] [安装配置]
2 2023/03/25 [文本处理]
3 2023/04/01 [词典分词]
4 2023/04/08 [评测] [句法结构]
5 2023/04/15 [N元语法]
6 2023/04/22 [序列标注]
7 2023/04/29 holiday
8 2023/05/06 [向量语义]
9 2023/05/13 [词嵌入]
10 2023/05/20 review

Methodology

Problem-solving oriented, equal emphasis on lecture and practice.

以解决实际问题为导向,教学与实践并重。

Each lecture is roughly organized into 3 progressive units:

  1. Core Concepts 核心概念: provides elementary knowledge of the topic
  2. Advanced Discussion 进阶讨论: provides in-depth understanding, mathematical formulations
  3. Practical Skills 实践技巧: provides problem-solving skills through hands-on programming training

Evaluation

Textbook

Not mandatory but recommended:

Resources


  1. https://hanlp.hankcs.com/↩︎

  2. https://web.stanford.edu/~jurafsky/slp3/↩︎

  3. https://web.stanford.edu/class/cs224n/↩︎