Intern
    Natural Language Processing

    Multilingual Natural Language Processing

    Basic information

    Lecturer: Prof. Dr. Goran Glavaš
    Teaching Assistants: Benedikt Ebing, Fabian David Schmidt

    Lecture: Wednesday 10:00 - 12:00 in Übungsraum I (Informatik)
    Execises: Friday 12:00 - 14:00 in Übungsraum I (Informatik)

    Kickoff: 26.4.2023 

    Intended audience: The course is recommended for master students of all CS-oriented programs (Master Informatik, Master eXtended AI, Master Business Informatics). Prior knowledge of core NLP and machine learning concepts is desirable, albeit not mandatory. 

    WueCampus course

    Registration

    It is not necessary to register for the lecture or exercises. 
    In order to get access to teaching materials and current announcements you need to register for WueCampus course

    To participate in the exam you must register for the exam via WueStudy.
    All further information on this can be found in our WueCampus course

    Learning outcomes

    Students will acquire theoretical and practical knowledge on modern multilingual natural language processing and also get an insight into cutting edge research in (multilingual) NLP. They will learn how to represent texts from different languages in shared representation spaces that enable semantic comparison and cross-lingual transfer for various NLP tasks. Upon successful completion of the course, the students will be well-equipped to solve practical NLP problems regardless of the language of the text data, and to determine the optimal strategy to obtain best performance for any concrete target language.

    Schedule (tentative)

    Introduction

    26.4.  L1: Languages of the world & Linguistic Universals; Course organization

    Block I: Fundamentals

    10.5.  L2: Language modeling, word embedding models, tokenization & vocabulary

    17.5.  L3: Deep Learning for (Modern) NLP — Perceptron/MLP, Backprop, Batching, Gradient Descent, Dropout…

    24.5.  L4: Transformer Almighty & Pretraining Language Models (autoregressive, masked language modeling)

    Block II: Multilinguality

    31.5.  L5: Multilingual Word Embedding Spaces (and cross-lingual transfer using them)

    7.6.    L6: Multilingual LMs (and cross-lingual transfer using them); Tasks, Benchmarks & Evaluation

    14.6.  L7: Curse of Multilinguality, Modularization, and Language Adaptation

    21.6.  L8: Transfer for Token-Level Tasks: Word Alignment & Label Projection

    Block III: Advanced

    28.6.  L9: Neural Machine Translation

    5.7.    L10: Multilingual Sentence Representations

    12.7.  L11: Prompting and Large Language Models (LLMs); Instruction Fine-Tuning, ChatGPT/GPT-4