Jan Pfister, M.Sc.

Chair of Data Science (Informatik X)
University of Würzburg
Campus Hubland Nord
Emil-Fischer-Straße 50
97074 Würzburg
Germany
Email: pfister[at]informatik.uni-wuerzburg.de
Phone: (+49 931) 31 - 81934
Office: Room 50.03.020 (Institutsgebäude Künstliche Intelligenz)
PGP Fingerprint: 9131 6BAF 85DE CFB3 AF89 86C9 568C 615A F18B C9C0
Research Interests & Projects
I work in the field of Natural Language Processing (NLP) and in particular, I am interested in developing novel methods for understanding and extracting meaning from text. My work focuses on using large language models also in combination with pointer networks to capture the complexities of human language. I am currently working on applying these techniques to the task of aspect-based sentiment analysis, in order to extract fine-grained sentiment information from text. Also I'm working on advancing the German NLP landscape with artifacts like the first comprehensive benchmark or German pretrained models from scratch (like LLäMmlein, or ModernGBERT).
I joined the DMIR group for my PhD studies after receiving my master's degree in Computer Science at the University of Würzburg in 2021 and currently I'm responsible for BibSonomy.
Awards & Stuff
- First Place at SemEval 24 Shared Task 4, and Best Paper Honorable Mention Award over all tasks
- First Place at SemEval 25 Shared Task 2, and Best Paper Award Task 2
Teaching
- Seminar: Ausgewählte Themen des Machine Learning (SS'21 + WS '21 + WS'24)
- Lecture: Information Retrieval (SS '22)
- Project: Machine Learning in Natural Language Processing (since '22)
- Lecture: Text Mining (WS '23)
Publications
-
BARTABSA++: Revisiting BARTABSA with Decoder LLMs. . In Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025), H. Fei, K. Tu, Y. Zhang, X. Hu, W. Han, Z. Jia, Z. Zheng, Y. Cao, M. Zhang, W. Lu, N. Siddharth, L. Ovrelid, N. Xue, Y. Zhang (eds.), pp. 115–128. Association for Computational Linguistics, Vienna, Austria, 2025.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
- [ BibSonomy-Post ]
-
SALT at SemEval-2025 Task 2: A SQL-based Approach for LLM-Free Entity-Aware-Translation. . In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), S. Rosenthal, A. Rosá, D. Ghosh, M. Zampieri (eds.), pp. 852–864. Association for Computational Linguistics, Vienna, Austria, 2025.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
- [ BibSonomy-Post ]
-
CAIDAS at SemEval-2025 Task 7: Enriching Sparse Datasets with LLM-Generated Content for Improved Information Retrieval. . In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), S. Rosenthal, A. Rosá, D. Ghosh, M. Zampieri (eds.), pp. 1623–1638. Association for Computational Linguistics, Vienna, Austria, 2025.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
- [ BibSonomy-Post ]
-
Identifying Axiomatic Mathematical Transformation Steps using Tree-Structured Pointer Networks. . In Transactions on Machine Learning Research. 2025.
- [ BibTeX ]
- [ URL ]
- [ Download ]
- [ BibSonomy-Post ]
-
ModernGBERT: German-only 1B Encoder Model Trained from Scratch. . 2025.
- [ BibTeX ]
- [ URL ]
- [ Download ]
- [ BibSonomy-Post ]
-
LLäMmlein: Transparent, Compact and Competitive German-Only Language Models from Scratch. . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), W. Che, J. Nabende, E. Shutova, M. T. Pilehvar (eds.), pp. 2227–2246. Association for Computational Linguistics, Vienna, Austria, 2025.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
- [ BibSonomy-Post ]
-
Pollice Verso at SemEval-2024 Task 6: The Roman Empire Strikes Back. . In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), A. K. Ojha, A. S. Dougruöz, H. Tayyar Madabushi, G. Da San Martino, S. Rosenthal, A. Rosá (eds.), pp. 1529–1536. Association for Computational Linguistics, Mexico City, Mexico, 2024.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
- [ BibSonomy-Post ]
-
From Chat to Publication Management: Organizing your related work using BibSonomy & LLMs. . In Proceedings of the 2024 Conference on Human Information Interaction and Retrieval, of CHIIR ’24, pp. 386–390. Association for Computing Machinery, <conf-loc>, <city>Sheffield</city>, <country>United Kingdom</country>, </conf-loc>, 2024.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
- [ BibSonomy-Post ]
-
SuperGLEBer: German Language Understanding Evaluation Benchmark. . In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), K. Duh, H. Gomez, S. Bethard (eds.), pp. 7904–7923. Association for Computational Linguistics, Mexico City, Mexico, 2024.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
- [ BibSonomy-Post ]
-
OtterlyObsessedWithSemantics at SemEval-2024 Task 4: Developing a Hierarchical Multi-Label Classification Head for Large Language Models. . In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), A. K. Ojha, A. S. Dougruöz, H. Tayyar Madabushi, G. Da San Martino, S. Rosenthal, A. Rosá (eds.), pp. 602–612. Association for Computational Linguistics, Mexico City, Mexico, 2024.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
- [ BibSonomy-Post ]
-
Pointer Networks: A Unified Approach to Extracting German Opinions. . In Proceedings of the 19th Conference on Natural Language Processing (KONVENS 2023), M. Georges, A. Herygers, A. Friedrich, B. Roth (eds.), pp. 127–138. Association for Computational Lingustics, Ingolstadt, Germany, 2023.
- [ BibTeX ]
- [ URL ]
- [ Download ]
- [ BibSonomy-Post ]
-
Zero-Shot Clickbait Spoiling by Rephrasing Titles as Questions. . In Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023), pp. 1090–1095. Association for Computational Linguistics, Toronto, Canada, 2023.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
- [ BibSonomy-Post ]
-
Optimizing Medical Service Request Processes through Language Modeling and Semantic Search. . In 2023 the 7th International Conference on Medical and Health Informatics (ICMHI), of ICMHI 2023, pp. 136–141. Association for Computing Machinery, Kyoto, Japan, 2023.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
- [ BibSonomy-Post ]
-
Long-Term Effects of Perceived Friendship with Intelligent Voice Assistants on Usage Behavior, User Experience, and Social Perceptions. . In Computers, 12(4). 2023.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
- [ BibSonomy-Post ]
-
Higher-Order DeepTrails: Unified Approach to *Trails. . In Lernen, Wissen, Daten, Analysen (LWDA) Conference Proceedings, Marburg, Germany, October 9-11, 2023, Vol. 3630 of CEUR Workshop Proceedings, M. Leyer, J. Wichmann (eds.), pp. 372–386. CEUR-WS.org, 2023.
- [ BibTeX ]
- [ URL ]
- [ Download ]
- [ BibSonomy-Post ]
-
Point me to your Opinion, SenPoi. . In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pp. 1313–1323. Association for Computational Linguistics, Seattle, United States, 2022.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
- [ BibSonomy-Post ]
-
Self-Supervised Multi-Task Pretraining Improves Image Aesthetic Assessment. . In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 816–825. 2021.
- [ BibTeX ]
- [ Download ]
- [ BibSonomy-Post ]
-
Emote-Controlled: Obtaining Implicit Viewer Feedback through Emote based Sentiment Analysis on Comments of Popular Twitch.tv Channels. . In ACM Transactions on Social Computing. 2020.
- [ BibTeX ]
- [ Download ]
- [ BibSonomy-Post ]