Multilingual Natural Language Processing

Basic information

Lecturer: Prof. Dr. Goran Glavaš
Teaching Assistants: Benedikt Ebing, Fabian David Schmidt

Lecture: Wednesday 10:00 - 12:00 in Übungsraum I (Informatik)
Execises: Friday 12:00 - 14:00 in Übungsraum I (Informatik)

Kickoff: 26.4.2023

Intended audience: The course is recommended for master students of all CS-oriented programs (Master Informatik, Master eXtended AI, Master Business Informatics). Prior knowledge of core NLP and machine learning concepts is desirable, albeit not mandatory.

WueCampus course

Registration

It is not necessary to register for the lecture or exercises.
In order to get access to teaching materials and current announcements you need to register for WueCampus course

To participate in the exam you must register for the exam via WueStudy.
All further information on this can be found in our WueCampus course

Learning outcomes

Students will acquire theoretical and practical knowledge on modern multilingual natural language processing and also get an insight into cutting edge research in (multilingual) NLP. They will learn how to represent texts from different languages in shared representation spaces that enable semantic comparison and cross-lingual transfer for various NLP tasks. Upon successful completion of the course, the students will be well-equipped to solve practical NLP problems regardless of the language of the text data, and to determine the optimal strategy to obtain best performance for any concrete target language.

Schedule (tentative)

Introduction

26.4. L1: Languages of the world & Linguistic Universals; Course organization

Block I: Fundamentals

10.5. L2: Language modeling, word embedding models, tokenization & vocabulary

17.5. L3: Deep Learning for (Modern) NLP — Perceptron/MLP, Backprop, Batching, Gradient Descent, Dropout…

24.5. L4: Transformer Almighty & Pretraining Language Models (autoregressive, masked language modeling)

Block II: Multilinguality

31.5. L5: Multilingual Word Embedding Spaces (and cross-lingual transfer using them)

7.6. L6: Multilingual LMs (and cross-lingual transfer using them); Tasks, Benchmarks & Evaluation

14.6. L7: Curse of Multilinguality, Modularization, and Language Adaptation

21.6. L8: Transfer for Token-Level Tasks: Word Alignment & Label Projection

Block III: Advanced

28.6. L9: Neural Machine Translation

5.7. L10: Multilingual Sentence Representations

12.7. L11: Prompting and Large Language Models (LLMs); Instruction Fine-Tuning, ChatGPT/GPT-4

Hubland Nord, Gebäude 50

Basic information

Registration

Learning outcomes

Schedule (tentative)

Bildnachweise