Intern
    Natural Language Processing

    Data Science for Digital Humanities 2

    Lecturers: Prof. Dr. Goran Glavaš, Lennart Keller

    Sessions: Thursdays 10:00 - 12:00 in Sensalight building (John-Skilton-Str. 8a), 4th floor, room 4.23

    Kickoff: 27.4.2023

    WueCampus course

    Registration

    It is not necessary to register for the sessions. 
    In order to get access to teaching materials and current announcements you need to register for WueCampus course

    To participate in the exam you must register for the exam via WueStudy.
    All further information will be shared in our WueCampus course

    Objective of the lecture

    The course builds on top of the Data Science for DH 1 (from winter semester) and introduces complementary (and some more advanced) DS topics, with emphasis on use cases and application in (computational/digital) humanities. 

    Tentative schedule/content of the course: 

    27.4. Session #1: Introduction

    • Recap of DS4DH1
    • Course organization

    4.5. Session #2: Corpus linguistics

    • Lexical association measures
    • Multi-word expressions, collocations, idioms
    • Lexico-semantic resources: WordNet, BabelNet, PanLex

    11.5. Session #3: Topic modeling

    • Latent Dirichlet Allocation
    • Practical examples with LDA in Gensim
    • Homework project #1: pick a corpus, induce topics, analyze topics and topical distribution of documents, prepare a small-scale presentation 

    25.5. Session #4: Student presentations -- Topic Modeling Homeworks

    1.6. Session #5: Networks

    • Introduction to Graph Theory
    • Node importance -- degree centrality, closeness centrality, betweeness centrality
    • Shortest paths
    • Practical exercises with networkx
    • Homework project #2: analysis of a large-scale network dataset; prepare a small-scale presentation with insights

    15.6. Session #6: Student presentations -- Network Analysis

    22.6. Session #7: Evaluation & Statistical Testing

    • Common evaluation measures for classification and regression
    • Gold-standard annotation and inter-annotator agreement
    • Significance testing (parametric: Student’s t-test; non-parametric: Wilcoxon’s test)

    29.6. Session #8: Deep Learning

    • Convolutional NNs
    • Recurrent NNs
    • Attention mechanism and Transformers
    • Practical exercises in keras

    6.7. Session #9: Interpretability & Fairness

    • Explainability and interpretability of machine learning models
    • Biases and fairness: data bias, model bias

    13.7. Session #10: Guest Lecture 

    • A talk by a prominent researcher in the area of Computational Humanities