Natural Language Processing

Assessing the State of the Art in Scene Segmentation. Zehe, Albin; Fischer, Elisabeth; Hotho, Andreas. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), L. Chiruzzo, A. Ritter, L. Wang (eds.), pp. 9922–9941. Association for Computational Linguistics, Albuquerque, New Mexico, 2025.

[ Abstract ]
[ BibTeX ]
[ URL ]
[ Download ]

The detection of scenes in literary texts is a recently introduced segmentation task in computational literary studies. Its goal is to partition a fictional text into segments that are coherent across the dimensions time, space, action and character constellation. This task is very challenging for automatic methods, since it requires a high-level understanding of the text. In this paper, we provide a thorough analysis of the State of the Art and challenges in this task, identifying and solving a problem in the training procedure for previous approaches, analysing the generalisation capabilities of the models and comparing the BERT-based SotA to current Llama models, as well as providing an analysis of what causes errors in the models. Our change in training procedure provides a significant increase in performance. We find that Llama-based models are more robust to different types of texts, while their overall performance is slightly worse than that of BERT-based models.

@inproceedings{zehe2025assessing,
  abstract = {The detection of scenes in literary texts is a recently introduced segmentation task in computational literary studies. Its goal is to partition a fictional text into segments that are coherent across the dimensions time, space, action and character constellation. This task is very challenging for automatic methods, since it requires a high-level understanding of the text. In this paper, we provide a thorough analysis of the State of the Art and challenges in this task, identifying and solving a problem in the training procedure for previous approaches, analysing the generalisation capabilities of the models and comparing the BERT-based SotA to current Llama models, as well as providing an analysis of what causes errors in the models. Our change in training procedure provides a significant increase in performance. We find that Llama-based models are more robust to different types of texts, while their overall performance is slightly worse than that of BERT-based models.},
  address = {Albuquerque, New Mexico},
  author = {Zehe, Albin and Fischer, Elisabeth and Hotho, Andreas},
  booktitle = {Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)},
  editor = {Chiruzzo, Luis and Ritter, Alan and Wang, Lu},
  keywords = {scenes},
  month = {04},
  pages = {9922–9941},
  publisher = {Association for Computational Linguistics},
  title = {Assessing the State of the Art in Scene Segmentation},
  year = 2025
}

ModernGBERT: German-only 1B Encoder Model Trained from Scratch. Ehrmanntraut, Anton; Wunderle, Julia; Pfister, Jan; Jannidis, Fotis; Hotho, Andreas. 2025.

[ BibTeX ]
[ URL ]
[ Download ]

@misc{ehrmanntraut2025moderngbertgermanonly1bencoder,
  author = {Ehrmanntraut, Anton and Wunderle, Julia and Pfister, Jan and Jannidis, Fotis and Hotho, Andreas},
  keywords = {author:pfister},
  title = {ModernGBERT: German-only 1B Encoder Model Trained from Scratch},
  year = 2025
}

Adapting Sequential Recommender Models to Content Recommendation in Chat Data using Non-Item Page-Models. Zehe, Albin; Fischer, Elisabeth; Kaiser, Jonas; Wagner, Toni; Hotho, Andreas. In Proceedings of the Sixth Knowledge-aware and Conversational Recommender Systems Workshop. 2024.

[ BibTeX ]
[ URL ]
[ Download ]

@inproceedings{zehe2024adapting,
  author = {Zehe, Albin and Fischer, Elisabeth and Kaiser, Jonas and Wagner, Toni and Hotho, Andreas},
  booktitle = {Proceedings of the Sixth Knowledge-aware and Conversational Recommender Systems Workshop},
  keywords = {author:zehe},
  month = 10,
  title = {Adapting Sequential Recommender Models to Content Recommendation in Chat Data using Non-Item Page-Models},
  year = 2024
}

{O}tterly{O}bsessed{W}ith{S}emantics at {S}em{E}val-2024 Task 4: Developing a Hierarchical Multi-Label Classification Head for Large Language Models. Wunderle, Julia; Schubert, Julian; Cacciatore, Antonella; Zehe, Albin; Pfister, Jan; Hotho, Andreas. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), A. K. Ojha, A. S. Do{\u{g}}ru{\"o}z, H. Tayyar Madabushi, G. Da San Martino, S. Rosenthal, A. Ros{\’a} (eds.), pp. 602–612. Association for Computational Linguistics, Mexico City, Mexico, 2024.

[ Abstract ]
[ BibTeX ]
[ URL ]
[ Download ]

For our submission for Subtask 1, we developed a custom classification head that is designed to be applied atop of a Large Language Model. We reconstructed the hierarchy across multiple fully connected layers, allowing us to incorporate previous foundational decisions in subsequent, more fine-grained layers. To find the best hyperparameters, we conducted a grid-search and to compete in the multilingual setting, we translated all documents to English.

@inproceedings{wunderle-etal-2024-otterlyobsessedwithsemantics,
  abstract = {For our submission for Subtask 1, we developed a custom classification head that is designed to be applied atop of a Large Language Model. We reconstructed the hierarchy across multiple fully connected layers, allowing us to incorporate previous foundational decisions in subsequent, more fine-grained layers. To find the best hyperparameters, we conducted a grid-search and to compete in the multilingual setting, we translated all documents to English.},
  address = {Mexico City, Mexico},
  author = {Wunderle, Julia and Schubert, Julian and Cacciatore, Antonella and Zehe, Albin and Pfister, Jan and Hotho, Andreas},
  booktitle = {Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)},
  editor = {Ojha, Atul Kr. and Do{\u{g}}ru{\"o}z, A. Seza and Tayyar Madabushi, Harish and Da San Martino, Giovanni and Rosenthal, Sara and Ros{\'a}, Aiala},
  keywords = {author:pfister},
  month = {06},
  pages = {602–612},
  publisher = {Association for Computational Linguistics},
  title = {{O}tterly{O}bsessed{W}ith{S}emantics at {S}em{E}val-2024 Task 4: Developing a Hierarchical Multi-Label Classification Head for Large Language Models},
  year = 2024
}

LLäMmlein: Compact and Competitive German-Only Language Models from Scratch. Pfister, Jan; Wunderle, Julia; Hotho, Andreas. 2024.

[ BibTeX ]
[ URL ]
[ Download ]

@misc{pfister2024llammleincompactcompetitivegermanonly,
  author = {Pfister, Jan and Wunderle, Julia and Hotho, Andreas},
  keywords = {author:pfister},
  title = {LLäMmlein: Compact and Competitive German-Only Language Models from Scratch},
  year = 2024
}

BibSonomy Meets ChatLLMs for Publication Management: From Chat to Publication Management: Organizing your related work using BibSonomy & LLMs. Völker, Tom; Pfister, Jan; Koopmann, Tobias; Hotho, Andreas. 2024.

[ Abstract ]
[ BibTeX ]
[ URL ]
[ DOI ]
[ Download ]

The ever-growing corpus of scientific literature presents significant challenges for researchers with respect to discovery, management, and annotation of relevant publications. Traditional platforms like Semantic Scholar, BibSonomy, and Zotero offer tools for literature management, but largely require manual laborious and error-prone input of tags and metadata. Here, we introduce a novel retrieval augmented generation system that leverages chat-based large language models (LLMs) to streamline and enhance the process of publication management. It provides a unified chat-based interface, enabling intuitive interactions with various backends, including Semantic Scholar, BibSonomy, and the Zotero Webscraper. It supports two main use-cases: (1) Explorative Search & Retrieval - leveraging LLMs to search for and retrieve both specific and general scientific publications, while addressing the challenges of content hallucination and data obsolescence; and (2) Cataloguing & Management - aiding in the organization of personal publication libraries, in this case BibSonomy, by automating the addition of metadata and tags, while facilitating manual edits and updates. We compare our system to different LLM models in three different settings, including a user study, and we can show its advantages in different metrics.

@misc{volker2024bibsonomy,
  abstract = {The ever-growing corpus of scientific literature presents significant challenges for researchers with respect to discovery, management, and annotation of relevant publications. Traditional platforms like Semantic Scholar, BibSonomy, and Zotero offer tools for literature management, but largely require manual laborious and error-prone input of tags and metadata. Here, we introduce a novel retrieval augmented generation system that leverages chat-based large language models (LLMs) to streamline and enhance the process of publication management. It provides a unified chat-based interface, enabling intuitive interactions with various backends, including Semantic Scholar, BibSonomy, and the Zotero Webscraper. It supports two main use-cases: (1) Explorative Search & Retrieval - leveraging LLMs to search for and retrieve both specific and general scientific publications, while addressing the challenges of content hallucination and data obsolescence; and (2) Cataloguing & Management - aiding in the organization of personal publication libraries, in this case BibSonomy, by automating the addition of metadata and tags, while facilitating manual edits and updates. We compare our system to different LLM models in three different settings, including a user study, and we can show its advantages in different metrics.},
  author = {Völker, Tom and Pfister, Jan and Koopmann, Tobias and Hotho, Andreas},
  keywords = {from:janpf},
  note = {cite arxiv:2401.09092Comment: Accepted at 2024 ACM SIGIR CHIIR, For a demo see here http://professor-x.de/demos/bibsonomy-chatgpt/demo.mp4},
  title = {BibSonomy Meets ChatLLMs for Publication Management: From Chat to Publication Management: Organizing your related work using BibSonomy & LLMs},
  year = 2024
}

PreAdapter: Pre-training Language Models on Knowledge Graphs. Omeliyanenko, Janna; Hotho, Andreas; Schlör, Daniel. In International Semantic Web Conference ISWC 2024, to appear. 2024.

[ BibTeX ]
[ Download ]

@article{omeliyanenko2024preadapter,
  author = {Omeliyanenko, Janna and Hotho, Andreas and Schlör, Daniel},
  journal = {International Semantic Web Conference ISWC 2024, to appear},
  keywords = {selected},
  title = {PreAdapter: Pre-training Language Models on Knowledge Graphs},
  year = 2024
}

Zero-Shot Clickbait Spoiling by Rephrasing Titles as Questions. Wangsadirdja, Dirk; Pfister, Jan; Kobs, Konstantin; Hotho, Andreas. In Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023), pp. 1090–1095. Association for Computational Linguistics, Toronto, Canada, 2023.

[ Abstract ]
[ BibTeX ]
[ URL ]
[ Download ]

In this paper, we describe our approach to the clickbait spoiling task of SemEval 2023. The core idea behind our system is to leverage pre-trained models capable of Question Answering (QA) to extract the spoiler from article texts based on the clickbait title without any task-specific training. Since oftentimes, these titles are not phrased as questions, we automatically rephrase the clickbait titles as questions in order to better suit the pretraining task of the QA-capable models. Also, to fit as much relevant context into the model's limited input size as possible, we propose to reorder the sentences by their relevance using a semantic similarity model. Finally, we evaluate QA as well as text generation models (via prompting) to extract the spoiler from the text.Based on the validation data, our final model selects each of these components depending on the spoiler type and achieves satisfactory zero-shot results. The ideas described in this paper can easily be applied in fine-tuning settings.

@inproceedings{wangsadirdja-etal-2023-jack,
  abstract = {In this paper, we describe our approach to the clickbait spoiling task of SemEval 2023. The core idea behind our system is to leverage pre-trained models capable of Question Answering (QA) to extract the spoiler from article texts based on the clickbait title without any task-specific training. Since oftentimes, these titles are not phrased as questions, we automatically rephrase the clickbait titles as questions in order to better suit the pretraining task of the QA-capable models. Also, to fit as much relevant context into the model's limited input size as possible, we propose to reorder the sentences by their relevance using a semantic similarity model. Finally, we evaluate QA as well as text generation models (via prompting) to extract the spoiler from the text.Based on the validation data, our final model selects each of these components depending on the spoiler type and achieves satisfactory zero-shot results. The ideas described in this paper can easily be applied in fine-tuning settings.},
  address = {Toronto, Canada},
  author = {Wangsadirdja, Dirk and Pfister, Jan and Kobs, Konstantin and Hotho, Andreas},
  booktitle = {Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023)},
  keywords = {from:janpf},
  month = {07},
  pages = {1090–1095},
  publisher = {Association for Computational Linguistics},
  title = {Zero-Shot Clickbait Spoiling by Rephrasing Titles as Questions},
  year = 2023
}

CapsKG: Enabling Continual Knowledge Integration in Language Models for Automatic Knowledge Graph Completion. Omeliyanenko, Janna; Zehe, Albin; Hotho, Andreas; Schlör, Daniel. In International Semantic Web Conference ISWC 2023, to appear. 2023.

[ Abstract ]
[ BibTeX ]
[ Download ]

Automated completion of knowledge graphs is a popular topic in the Semantic Web community that aims to automatically and continuously integrate new appearing knowledge into knowledge graphs using artificial intelligence. Recently, approaches that leverage implicit knowledge from language models for this task have shown promising re- sults. However, by fine-tuning language models directly to the domain of knowledge graphs, models forget their original language representation and associated knowledge. An existing solution to address this issue is a trainable adapter, which is integrated into a frozen language model to extract the relevant knowledge without altering the model itself. How- ever, this constrains the generalizability to the specific extraction task and by design requires new and independent adapters to be trained for new knowledge extraction tasks. This effectively prevents the model from benefiting from existing knowledge incorporated in previously trained adapters. In this paper, we propose to combine the benefits of adapters for knowl- edge graph completion with the idea of integrating capsules, introduced in the field of continual learning. This allows the continuous integra- tion of knowledge into a joint model by sharing and reusing previously trained capsules. We find that our approach outperforms solutions using traditional adapters, while requiring notably fewer parameters for con- tinuous knowledge integration. Moreover, we show that this architecture benefits significantly from knowledge sharing in low-resource situations, outperforming adapter-based models on the task of link prediction.

@article{noauthororeditor,
  abstract = {Automated completion of knowledge graphs is a popular topic in the Semantic Web community that aims to automatically and continuously integrate new appearing knowledge into knowledge graphs using artificial intelligence. Recently, approaches that leverage implicit knowledge from language models for this task have shown promising re- sults. However, by fine-tuning language models directly to the domain of knowledge graphs, models forget their original language representation and associated knowledge. An existing solution to address this issue is a trainable adapter, which is integrated into a frozen language model to extract the relevant knowledge without altering the model itself. How- ever, this constrains the generalizability to the specific extraction task and by design requires new and independent adapters to be trained for new knowledge extraction tasks. This effectively prevents the model from benefiting from existing knowledge incorporated in previously trained adapters. In this paper, we propose to combine the benefits of adapters for knowl- edge graph completion with the idea of integrating capsules, introduced in the field of continual learning. This allows the continuous integra- tion of knowledge into a joint model by sharing and reusing previously trained capsules. We find that our approach outperforms solutions using traditional adapters, while requiring notably fewer parameters for con- tinuous knowledge integration. Moreover, we show that this architecture benefits significantly from knowledge sharing in low-resource situations, outperforming adapter-based models on the task of link prediction.},
  author = {Omeliyanenko, Janna and Zehe, Albin and Hotho, Andreas and Schlör, Daniel},
  journal = {International Semantic Web Conference ISWC 2023, to appear},
  keywords = {graph},
  title = {CapsKG: Enabling Continual Knowledge Integration in Language Models for Automatic Knowledge Graph Completion},
  year = 2023
}

Large Language Models and Knowledge Graphs: Opportunities and Challenges. Pan, Jeff Z.; Razniewski, Simon; Kalo, Jan-Christoph; Singhania, Sneha; Chen, Jiaoyan; Dietze, Stefan; Jabeen, Hajira; Omeliyanenko, Janna; Zhang, Wen; Lissandrini, Matteo; Biswas, Russa; de Melo, Gerard; Bonifati, Angela; Vakaj, Edlira; Dragoni, Mauro; Graux, Damien. In Transactions on Graph Data and Knowledge, 1(1), pp. 2:1–2:38. Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik, Dagstuhl, Germany, 2023.

[ BibTeX ]
[ URL ]
[ DOI ]
[ Download ]

@article{pan_et_al:TGDK.1.1.2,
  address = {Dagstuhl, Germany},
  annote = {Keywords: Large Language Models, Pre-trained Language Models, Knowledge Graphs, Ontology, Retrieval Augmented Language Models},
  author = {Pan, Jeff Z. and Razniewski, Simon and Kalo, Jan-Christoph and Singhania, Sneha and Chen, Jiaoyan and Dietze, Stefan and Jabeen, Hajira and Omeliyanenko, Janna and Zhang, Wen and Lissandrini, Matteo and Biswas, Russa and de Melo, Gerard and Bonifati, Angela and Vakaj, Edlira and Dragoni, Mauro and Graux, Damien},
  journal = {Transactions on Graph Data and Knowledge},
  keywords = {selected},
  number = 1,
  pages = {2:1–2:38},
  publisher = {Schloss Dagstuhl – Leibniz-Zentrum f{\"u}r Informatik},
  title = {Large Language Models and Knowledge Graphs: Opportunities and Challenges},
  volume = 1,
  year = 2023
}

Point me to your Opinion, {S}en{P}oi. Pfister, Jan; Wankerl, Sebastian; Hotho, Andreas. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pp. 1313–1323. Association for Computational Linguistics, Seattle, United States, 2022.

[ Abstract ]
[ BibTeX ]
[ URL ]
[ Download ]

Structured Sentiment Analysis is the task of extracting sentiment tuples in a graph structure commonly from review texts. We adapt the Aspect-Based Sentiment Analysis pointer network BARTABSA to model this tuple extraction as a sequence prediction task and extend their output grammar to account for the increased complexity of Structured Sentiment Analysis. To predict structured sentiment tuples in languages other than English we swap BART for a multilingual mT5 and introduce a novel Output Length Regularization to mitigate overfitting to common target sequence lengths, thereby improving the performance of the model by up to 70{\%}. We evaluate our approach on seven datasets in five languages including a zero shot crosslingual setting.

@inproceedings{pfister-etal-2022-senpoi,
  abstract = {Structured Sentiment Analysis is the task of extracting sentiment tuples in a graph structure commonly from review texts. We adapt the Aspect-Based Sentiment Analysis pointer network BARTABSA to model this tuple extraction as a sequence prediction task and extend their output grammar to account for the increased complexity of Structured Sentiment Analysis. To predict structured sentiment tuples in languages other than English we swap BART for a multilingual mT5 and introduce a novel Output Length Regularization to mitigate overfitting to common target sequence lengths, thereby improving the performance of the model by up to 70{\%}. We evaluate our approach on seven datasets in five languages including a zero shot crosslingual setting.},
  address = {Seattle, United States},
  author = {Pfister, Jan and Wankerl, Sebastian and Hotho, Andreas},
  booktitle = {Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)},
  keywords = {from:janpf},
  month = {07},
  pages = {1313–1323},
  publisher = {Association for Computational Linguistics},
  title = {Point me to your Opinion, {S}en{P}oi},
  year = 2022
}

The {F}airy{N}et Corpus - Character Networks for {G}erman Fairy Tales. Schmidt, David; Zehe, Albin; Lorenzen, Janne; Sergel, Lisa; D{\"u}ker, Sebastian; Krug, Markus; Puppe, Frank. In Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pp. 49–56. Association for Computational Linguistics, Punta Cana, Dominican Republic (online), 2021.

[ Abstract ]
[ BibTeX ]
[ URL ]
[ DOI ]
[ Download ]

This paper presents a data set of German fairy tales, manually annotated with character networks which were obtained with high inter rater agreement. The release of this corpus provides an opportunity of training and comparing different algorithms for the extraction of character networks, which so far was barely possible due to heterogeneous interests of previous researchers. We demonstrate the usefulness of our data set by providing baseline experiments for the automatic extraction of character networks, applying a rule-based pipeline as well as a neural approach, and find the neural approach outperforming the rule-approach in most evaluation settings.

@inproceedings{schmidt2021fairynet,
  abstract = {This paper presents a data set of German fairy tales, manually annotated with character networks which were obtained with high inter rater agreement. The release of this corpus provides an opportunity of training and comparing different algorithms for the extraction of character networks, which so far was barely possible due to heterogeneous interests of previous researchers. We demonstrate the usefulness of our data set by providing baseline experiments for the automatic extraction of character networks, applying a rule-based pipeline as well as a neural approach, and find the neural approach outperforming the rule-approach in most evaluation settings.},
  address = {Punta Cana, Dominican Republic (online)},
  author = {Schmidt, David and Zehe, Albin and Lorenzen, Janne and Sergel, Lisa and D{\"u}ker, Sebastian and Krug, Markus and Puppe, Frank},
  booktitle = {Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature},
  keywords = {networks},
  month = 11,
  pages = {49–56},
  publisher = {Association for Computational Linguistics},
  title = {The {F}airy{N}et Corpus - Character Networks for {G}erman Fairy Tales},
  year = 2021
}

Detecting Scenes in Fiction: A new Segmentation Task. Zehe, Albin; Konle, Leonard; Dümpelmann, Lea; Gius, Evelyn; Hotho, Andreas; Jannidis, Fotis; Kaufmann, Lucas; Krug, Markus; Puppe, Frank; Reiter, Nils; Schreiber, Annekea; Wiedmer, Nathalie. In Proceedings of the 16th Conference of the {E}uropean Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. ACL, 2021.

[ BibTeX ]
[ Download ]

@inproceedings{zehe2021detecting,
  author = {Zehe, Albin and Konle, Leonard and Dümpelmann, Lea and Gius, Evelyn and Hotho, Andreas and Jannidis, Fotis and Kaufmann, Lucas and Krug, Markus and Puppe, Frank and Reiter, Nils and Schreiber, Annekea and Wiedmer, Nathalie},
  booktitle = {Proceedings of the 16th Conference of the {E}uropean Chapter of the Association for Computational Linguistics: Volume 1, Long Papers},
  keywords = {author:zehe},
  publisher = {ACL},
  title = {Detecting Scenes in Fiction: A new Segmentation Task},
  year = 2021
}

Shared Task on Scene Segmentation @ KONVENS 2021. Zehe, Albin; Konle, Leonard; Guhr, Svenja; Dümpelmann, Lea; Gius, Evelyn; Hotho, Andreas; Jannidis, Fotis; Kaufmann, Lucas; Krug, Markus; Puppe, Frank; Reiter, Nils; Schreiber, Annekea. In Shared Task on Scene Segmentation @ KONVENS 2021, pp. 1–21. 2021.

[ BibTeX ]
[ URL ]
[ Download ]

@inproceedings{zehe2021shared,
  author = {Zehe, Albin and Konle, Leonard and Guhr, Svenja and Dümpelmann, Lea and Gius, Evelyn and Hotho, Andreas and Jannidis, Fotis and Kaufmann, Lucas and Krug, Markus and Puppe, Frank and Reiter, Nils and Schreiber, Annekea},
  booktitle = {Shared Task on Scene Segmentation @ KONVENS 2021},
  keywords = {author:zehe},
  pages = {1-21},
  title = {Shared Task on Scene Segmentation @ KONVENS 2021},
  year = 2021
}

Improving Sentiment Analysis with Biofeedback Data. Schl{\"o}r, Daniel; Zehe, Albin; Kobs, Konstantin; Veseli, Blerta; Westermeier, Franziska; Br{\"u}bach, Larissa; Roth, Daniel; Latoschik, Marc Erich; Hotho, Andreas. In Proceedings of LREC2020 Workshop ``People in language, vision and the mind’’ (ONION2020), pp. 28–33. European Language Resources Association (ELRA), Marseille, France, 2020.

[ Abstract ]
[ BibTeX ]
[ URL ]
[ Download ]

Humans frequently are able to read and interpret emotions of others by directly taking verbal and non-verbal signals in human-to-human communication into account or to infer or even experience emotions from mediated stories. For computers, however, emotion recognition is a complex problem: Thoughts and feelings are the roots of many behavioural responses and they are deeply entangled with neurophysiological changes within humans. As such, emotions are very subjective, often are expressed in a subtle manner, and are highly depending on context. For example, machine learning approaches for text-based sentiment analysis often rely on incorporating sentiment lexicons or language models to capture the contextual meaning. This paper explores if and how we further can enhance sentiment analysis using biofeedback of humans which are experiencing emotions while reading texts. Specifically, we record the heart rate and brain waves of readers that are presented with short texts which have been annotated with the emotions they induce. We use these physiological signals to improve the performance of a lexicon-based sentiment classifier. We find that the combination of several biosignals can improve the ability of a text-based classifier to detect the presence of a sentiment in a text on a per-sentence level.

@inproceedings{schlor-etal-2020-improving,
  abstract = {Humans frequently are able to read and interpret emotions of others by directly taking verbal and non-verbal signals in human-to-human communication into account or to infer or even experience emotions from mediated stories. For computers, however, emotion recognition is a complex problem: Thoughts and feelings are the roots of many behavioural responses and they are deeply entangled with neurophysiological changes within humans. As such, emotions are very subjective, often are expressed in a subtle manner, and are highly depending on context. For example, machine learning approaches for text-based sentiment analysis often rely on incorporating sentiment lexicons or language models to capture the contextual meaning. This paper explores if and how we further can enhance sentiment analysis using biofeedback of humans which are experiencing emotions while reading texts. Specifically, we record the heart rate and brain waves of readers that are presented with short texts which have been annotated with the emotions they induce. We use these physiological signals to improve the performance of a lexicon-based sentiment classifier. We find that the combination of several biosignals can improve the ability of a text-based classifier to detect the presence of a sentiment in a text on a per-sentence level.},
  address = {Marseille, France},
  author = {Schl{\"o}r, Daniel and Zehe, Albin and Kobs, Konstantin and Veseli, Blerta and Westermeier, Franziska and Br{\"u}bach, Larissa and Roth, Daniel and Latoschik, Marc Erich and Hotho, Andreas},
  booktitle = {Proceedings of LREC2020 Workshop ``People in language, vision and the mind'' (ONION2020)},
  keywords = {author:zehe},
  month = {05},
  pages = {28–33},
  publisher = {European Language Resources Association (ELRA)},
  title = {Improving Sentiment Analysis with Biofeedback Data},
  year = 2020
}

Emote-Controlled: Obtaining Implicit Viewer Feedback through Emote based Sentiment Analysis on Comments of Popular Twitch.tv Channels. Kobs, Konstantin; Zehe, Albin; Bernstetter, Armin; Chibane, Julian; Pfister, Jan; Tritscher, Julian; Hotho, Andreas. In ACM Transactions on Social Computing. 2020.

[ BibTeX ]
[ DOI ]
[ Download ]

@article{kobs2020emotecontrolled,
  author = {Kobs, Konstantin and Zehe, Albin and Bernstetter, Armin and Chibane, Julian and Pfister, Jan and Tritscher, Julian and Hotho, Andreas},
  journal = {ACM Transactions on Social Computing},
  keywords = {sentiment-analysis},
  title = {Emote-Controlled: Obtaining Implicit Viewer Feedback through Emote based Sentiment Analysis on Comments of Popular Twitch.tv Channels},
  year = 2020
}

HarryMotions – Classifying Relationships in Harry Potter based on Emotion Analysis. Zehe, Albin; Arns, Julia; Hettinger, Lena; Hotho, Andreas. In 5th SwissText & 16th KONVENS Joint Conference. 2020.

[ BibTeX ]
[ Download ]

@inproceedings{zehe2020harrymotions,
  author = {Zehe, Albin and Arns, Julia and Hettinger, Lena and Hotho, Andreas},
  booktitle = {5th SwissText & 16th KONVENS Joint Conference},
  keywords = {author:zehe},
  title = {HarryMotions – Classifying Relationships in Harry Potter based on Emotion Analysis},
  year = 2020
}

Towards Predicting the Subscription Status of Twitch.tv Users. Kobs, Konstantin; Potthast, Martin; Wiegmann, Matti; Zehe, Albin; Stein, Benno; Hotho, Andreas. In Proceedings of ECML-PKDD 2020 ChAT Discovery Challenge on Chat Analytics for Twitch. 2020.

[ BibTeX ]
[ URL ]
[ Download ]

@article{kobstowards,
  author = {Kobs, Konstantin and Potthast, Martin and Wiegmann, Matti and Zehe, Albin and Stein, Benno and Hotho, Andreas},
  journal = {Proceedings of ECML-PKDD 2020 ChAT Discovery Challenge on Chat Analytics for Twitch},
  keywords = {author:zehe},
  title = {Towards Predicting the Subscription Status of Twitch.tv Users},
  year = 2020
}

LM4KG: Improving Common Sense Knowledge Graphs with Language Models. Omeliyanenko, Janna; Zehe, Albin; Hettinger, Lena; Hotho, Andreas. In International Semantic Web Conference. Springer, 2020.

[ BibTeX ]
[ Download ]

@inproceedings{omeliyanenko2020reweight,
  author = {Omeliyanenko, Janna and Zehe, Albin and Hettinger, Lena and Hotho, Andreas},
  booktitle = {International Semantic Web Conference},
  keywords = {graph},
  organization = {Springer},
  title = {LM4KG: Improving Common Sense Knowledge Graphs with Language Models},
  year = 2020
}

On the Right Track! Analysing and Predicting Navigation Success in Wikipedia. Koopmann, Tobias; Dallmann, Alexander; Hettinger, Lena; Niebler, Thomas; Hotho, Andreas. In Proceedings of the 30th ACM Conference on Hypertext and Social Media, of HT ’19, pp. 143–152. ACM, Hof, Germany, 2019.

[ BibTeX ]
[ URL ]
[ DOI ]
[ Download ]

@inproceedings{Koopmann:2019:RTA:3342220.3343650,
  address = {New York, NY, USA},
  author = {Koopmann, Tobias and Dallmann, Alexander and Hettinger, Lena and Niebler, Thomas and Hotho, Andreas},
  booktitle = {Proceedings of the 30th ACM Conference on Hypertext and Social Media},
  keywords = {author:hotho},
  pages = {143–152},
  publisher = {ACM},
  series = {HT '19},
  title = {On the Right Track! Analysing and Predicting Navigation Success in Wikipedia},
  year = 2019
}

Detection of Scenes in Fiction. Gius, Evelyn; Jannidis, Fotis; Krug, Markus; Zehe, Albin; Hotho, Andreas; Puppe, Frank; Krebs, Jonathan; Reiter, Nils; Wiedmer, Nathalie; Konle, Leonard. In Proceedings of Digital Humanities 2019. 2019.

[ BibTeX ]
[ Download ]

@inproceedings{gius2019detection,
  author = {Gius, Evelyn and Jannidis, Fotis and Krug, Markus and Zehe, Albin and Hotho, Andreas and Puppe, Frank and Krebs, Jonathan and Reiter, Nils and Wiedmer, Nathalie and Konle, Leonard},
  booktitle = {Proceedings of Digital Humanities 2019},
  keywords = {author:zehe},
  title = {Detection of Scenes in Fiction},
  year = 2019
}

Classification of text-types in german novels. Schlör, D; Schöch, C; Hotho, A. In Digital Humanities 2019: Conference Abstracts. 2019.

[ BibTeX ]
[ DOI ]
[ Download ]

@inproceedings{schlor2019classification,
  author = {Schlör, D and Schöch, C and Hotho, A},
  booktitle = {Digital Humanities 2019: Conference Abstracts},
  keywords = {from:daschloer},
  title = {Classification of text-types in german novels},
  year = 2019
}

Analysing Direct Speech in German Novels. Jannidis, Fotis; Konle, Leonard; Zehe, Albin; Hotho, Andreas; Krug, Markus. In DHd 2018. 2018.

[ BibTeX ]
[ Download ]

@inproceedings{jannidis2018analysing,
  author = {Jannidis, Fotis and Konle, Leonard and Zehe, Albin and Hotho, Andreas and Krug, Markus},
  booktitle = {DHd 2018},
  keywords = {directspeech},
  title = {Analysing Direct Speech in German Novels},
  year = 2018
}

Burrows’ Zeta: Exploring and Evaluating Variants and Parameters. Schöch, Christof; Schlör, Daniel; Zehe, Albin; Gebhard, Henning; Becker, Martin; Hotho, Andreas. In DH, pp. 274–277. 2018.

[ BibTeX ]
[ URL ]

@inproceedings{schoech2018zeta,
  author = {Schöch, Christof and Schlör, Daniel and Zehe, Albin and Gebhard, Henning and Becker, Martin and Hotho, Andreas},
  booktitle = {DH},
  keywords = {zeta},
  pages = {274-277},
  title = {Burrows’ Zeta: Exploring and Evaluating Variants and Parameters},
  year = 2018
}

A White-Box Model for Detecting Author Nationality by Linguistic Differences in Spanish Novels. Zehe, Albin; Schlör, Daniel; Henny-Krahmer, Ulrike; Becker, Martin; Hotho, Andreas. In DH. ADHO, 2018.

[ BibTeX ]
[ Download ]

@inproceedings{zehe2018whitebox,
  author = {Zehe, Albin and Schlör, Daniel and Henny-Krahmer, Ulrike and Becker, Martin and Hotho, Andreas},
  booktitle = {DH},
  keywords = {classification},
  organization = {ADHO},
  title = {A White-Box Model for Detecting Author Nationality by Linguistic Differences in Spanish Novels},
  year = 2018
}

ClaiRE at SemEval-2018 Task 7 - Extended Version. Hettinger, Lena; Dallmann, Alexander; Zehe, Albin; Niebler, Thomas; Hotho, Andreas. 2018.

[ Abstract ]
[ BibTeX ]
[ URL ]
[ Download ]

In this paper we describe our post-evaluation results for SemEval-2018 Task 7 on classification of semantic relations in scientific literature for clean (subtask 1.1) and noisy data (subtask 1.2). Due to space limitations we publish an extended version of Hettinger et al. (2018) including further technical details and changes made to the preprocessing step in the post-evaluation phase. Due to these changes Classification of Relations using Embeddings (ClaiRE) achieved an improved F1 score of 75.11% for the first subtask and 81.44% for the second.

@misc{hettinger2018claire,
  abstract = {In this paper we describe our post-evaluation results for SemEval-2018 Task 7 on classification of semantic relations in scientific literature for clean (subtask 1.1) and noisy data (subtask 1.2). Due to space limitations we publish an extended version of Hettinger et al. (2018) including further technical details and changes made to the preprocessing step in the post-evaluation phase. Due to these changes Classification of Relations using Embeddings (ClaiRE) achieved an improved F1 score of 75.11% for the first subtask and 81.44% for the second.},
  author = {Hettinger, Lena and Dallmann, Alexander and Zehe, Albin and Niebler, Thomas and Hotho, Andreas},
  keywords = {extraction},
  note = {cite arxiv:1804.05825Comment: This is the extended version for our work: ClaiRE at SemEval-2018 Task 7},
  title = {ClaiRE at SemEval-2018 Task 7 - Extended Version},
  year = 2018
}

ClaiRE at SemEval-2018 Task 7: Classification of Relations using Embeddings. Hettinger, Lena; Dallmann, Alexander; Zehe, Albin; Niebler, Thomas; Hotho, Andreas. In Proceedings of International Workshop on Semantic Evaluation (SemEval-2018). New Orleans, LA, USA, 2018.

[ BibTeX ]
[ Download ]

@inproceedings{hettinger2018semeval,
  address = {New Orleans, LA, USA},
  author = {Hettinger, Lena and Dallmann, Alexander and Zehe, Albin and Niebler, Thomas and Hotho, Andreas},
  booktitle = {Proceedings of International Workshop on Semantic Evaluation (SemEval-2018)},
  keywords = {word},
  title = {ClaiRE at SemEval-2018 Task 7: Classification of Relations using Embeddings},
  year = 2018
}

Burrows Zeta: Varianten und Evaluation. Schöch, Christof; Calvo, José; Zehe, Albin; Hotho, Andreas. In DHd 2018. 2018.

[ BibTeX ]
[ Download ]

@inproceedings{schoch2018burrows,
  author = {Schöch, Christof and Calvo, José and Zehe, Albin and Hotho, Andreas},
  booktitle = {DHd 2018},
  keywords = {zeta},
  title = {Burrows Zeta: Varianten und Evaluation},
  year = 2018
}

Learning Semantic Relatedness from Human Feedback Using Relative Relatedness Learning. Niebler, Thomas; Becker, Martin; Pölitz, Christian; Hotho, Andreas. In ISWC’17. 2017.

[ BibTeX ]
[ Download ]

@inproceedings{niebler2017learning_3,
  author = {Niebler, Thomas and Becker, Martin and Pölitz, Christian and Hotho, Andreas},
  booktitle = {ISWC'17},
  keywords = {selected},
  title = {Learning Semantic Relatedness from Human Feedback Using Relative Relatedness Learning},
  year = 2017
}

Learning Word Embeddings from Tagging Data: A methodological comparison. Niebler, Thomas; Hahn, Luzian; Hotho, Andreas. In Proceedings of the LWDA. 2017.

[ BibTeX ]
[ Download ]

@inproceedings{niebler2017learning_2,
  author = {Niebler, Thomas and Hahn, Luzian and Hotho, Andreas},
  booktitle = {Proceedings of the LWDA},
  keywords = {bibsonomy},
  title = {Learning Word Embeddings from Tagging Data: A methodological comparison},
  year = 2017
}

Towards Sentiment Analysis on German Literature. Zehe, Albin; Becker, Martin; Jannidis, Fotis; Hotho, Andreas. 2017.

[ BibTeX ]
[ Download ]

@inproceedings{zehe2017sentiment,
  author = {Zehe, Albin and Becker, Martin and Jannidis, Fotis and Hotho, Andreas},
  keywords = {author:zehe},
  title = {Towards Sentiment Analysis on German Literature},
  year = 2017
}

Neutralising the Authorial Signal in Delta by Penalization: Stylometric Clustering of Genre in Spanish Novels. Tello, José Calvo; Schlör, Daniel; Henny-Krahmer, Ulrike; Schöch, Christof. In DH, R. Lewis, C. Raynor, D. Forest, M. Sinatra, S. Sinclair (eds.). Alliance of Digital Humanities Organizations (ADHO), 2017.

[ BibTeX ]
[ URL ]
[ Download ]

@inproceedings{conf/dihu/TelloSHS17,
  author = {Tello, José Calvo and Schlör, Daniel and Henny-Krahmer, Ulrike and Schöch, Christof},
  booktitle = {DH},
  crossref = {conf/dihu/2017},
  editor = {Lewis, Rhian and Raynor, Cecily and Forest, Dominic and Sinatra, Michael and Sinclair, Stéfan},
  keywords = {stylometry},
  publisher = {Alliance of Digital Humanities Organizations (ADHO)},
  title = {Neutralising the Authorial Signal in Delta by Penalization: Stylometric Clustering of Genre in Spanish Novels.},
  year = 2017
}

Prediction of Happy Endings in German Novels. Zehe, Albin; Becker, Martin; Hettinger, Lena; Hotho, Andreas; Reger, Isabella; Jannidis, Fotis. In Proceedings of the Workshop on Interactions between Data Mining and Natural Language Processing 2016, P. Cellier, T. Charnois, A. Hotho, S. Matwin, M.-F. Moens, Y. Toussaint (eds.), pp. 9–16. 2016.

[ Abstract ]
[ BibTeX ]
[ URL ]
[ Download ]

Identifying plot structure in novels is a valuable step towards automatic processing of literary corpora. We present an approach to classify novels as either having a happy ending or not. To achieve this, we use features based on different sentiment lexica as input for an SVM- classifier, which yields an average F1-score of about 73%.

@inproceedings{zehe2016prediction,
  abstract = {Identifying plot structure in novels is a valuable step towards automatic processing of literary corpora. We present an approach to classify novels as either having a happy ending or not. To achieve this, we use features based on different sentiment lexica as input for an SVM- classifier, which yields an average F1-score of about 73%.},
  author = {Zehe, Albin and Becker, Martin and Hettinger, Lena and Hotho, Andreas and Reger, Isabella and Jannidis, Fotis},
  booktitle = {Proceedings of the Workshop on Interactions between Data Mining and Natural Language Processing 2016},
  editor = {Cellier, Peggy and Charnois, Thierry and Hotho, Andreas and Matwin, Stan and Moens, Marie-Francine and Toussaint, Yannick},
  keywords = {novels},
  month = {07},
  pages = {9-16},
  title = {Prediction of Happy Endings in German Novels},
  year = 2016
}

Classification of Literary Subgenres. Hettinger, Lena; Jannidis, Fotis; Reger, Isabella; Hotho, Andreas. In DHd 2016. 2016.

[ BibTeX ]
[ Download ]

@inproceedings{hettinger2016classification,
  author = {Hettinger, Lena and Jannidis, Fotis and Reger, Isabella and Hotho, Andreas},
  booktitle = {DHd 2016},
  keywords = {classification},
  title = {Classification of Literary Subgenres},
  year = 2016
}

Analyzing Features for the Detection of Happy Endings in German Novels. Jannidis, Fotis; Reger, Isabella; Zehe, Albin; Becker, Martin; Hettinger, Lena; Hotho, Andreas. 2016.

[ Abstract ]
[ BibTeX ]
[ URL ]
[ Download ]

With regard to a computational representation of literary plot, this paper looks at the use of sentiment analysis for happy ending detection in German novels. Its focus lies on the investigation of previously proposed sentiment features in order to gain insight about the relevance of specific features on the one hand and the implications of their performance on the other hand. Therefore, we study various partitionings of novels, considering the highly variable concept of "ending". We also show that our approach, even though still rather simple, can potentially lead to substantial findings relevant to literary studies.

@misc{jannidis2016analyzing,
  abstract = {With regard to a computational representation of literary plot, this paper looks at the use of sentiment analysis for happy ending detection in German novels. Its focus lies on the investigation of previously proposed sentiment features in order to gain insight about the relevance of specific features on the one hand and the implications of their performance on the other hand. Therefore, we study various partitionings of novels, considering the highly variable concept of "ending". We also show that our approach, even though still rather simple, can potentially lead to substantial findings relevant to literary studies.},
  author = {Jannidis, Fotis and Reger, Isabella and Zehe, Albin and Becker, Martin and Hettinger, Lena and Hotho, Andreas},
  keywords = {author:zehe},
  note = {cite arxiv:1611.09028},
  title = {Analyzing Features for the Detection of Happy Endings in German Novels},
  year = 2016
}

Straight Talk! Automatic Recognition of Direct Speech in Nineteenth-Century French Novels. Schöch, Christof; Schlör, Daniel; Popp, Stefanie; Brunner, Annelen; Henny, Ulrike; Tello, Jos{\’e} Calvo. In DH, pp. 346–353. 2016.

[ BibTeX ]
[ Download ]

@inproceedings{schoch2016straight,
  author = {Schöch, Christof and Schlör, Daniel and Popp, Stefanie and Brunner, Annelen and Henny, Ulrike and Tello, Jos{\'e} Calvo},
  booktitle = {DH},
  keywords = {directspeech},
  pages = {346–353},
  title = {Straight Talk! Automatic Recognition of Direct Speech in Nineteenth-Century French Novels.},
  year = 2016
}

Extracting Semantics from Unconstrained Navigation on Wikipedia. Niebler, Thomas; Schlör, Daniel; Becker, Martin; Hotho, Andreas. In KI, 30(2), pp. 163–168. 2016.

[ BibTeX ]
[ URL ]
[ Download ]

@article{journals/ki/NieblerS0H16,
  author = {Niebler, Thomas and Schlör, Daniel and Becker, Martin and Hotho, Andreas},
  journal = {KI},
  keywords = {selected},
  number = 2,
  pages = {163-168},
  title = {Extracting Semantics from Unconstrained Navigation on Wikipedia},
  volume = 30,
  year = 2016
}

Straight Talk! Automatic Recognition of Direct Speech in Nineteenth-Century French Novels. Sch{\"o}ch, Christof; Schl{\"o}r, Daniel; Popp, Stefanie; Brunner, Annelen; Henny, Ulrike; Tello, Jos{\’e} Calvo. In DH, pp. 346–353. 2016.

[ BibTeX ]
[ Download ]

@inproceedings{schoch2016straight,
  author = {Sch{\"o}ch, Christof and Schl{\"o}r, Daniel and Popp, Stefanie and Brunner, Annelen and Henny, Ulrike and Tello, Jos{\'e} Calvo},
  booktitle = {DH},
  keywords = {directspeech},
  pages = {346–353},
  title = {Straight Talk! Automatic Recognition of Direct Speech in Nineteenth-Century French Novels.},
  year = 2016
}

Significance Testing for the Classification of Literary Subgenres. Hettinger, Lena; Jannidis, Fotis; Reger, Isabella; Hotho, Andreas. In DH 2016. 2016.

[ BibTeX ]
[ Download ]

@inproceedings{hettinger2016significance,
  author = {Hettinger, Lena and Jannidis, Fotis and Reger, Isabella and Hotho, Andreas},
  booktitle = {DH 2016},
  keywords = {classification},
  title = {Significance Testing for the Classification of Literary Subgenres},
  year = 2016
}

Evaluating Emergent Semantics in Folksonomies on Human Intuition. Niebler, Thomas; Becker, Martin; Zoller, Daniel; Doerfel, Stephan; Hotho, Andreas. 2015.

[ BibTeX ]
[ Download ]

@techreport{tr_niebler2015evaluating,
  author = {Niebler, Thomas and Becker, Martin and Zoller, Daniel and Doerfel, Stephan and Hotho, Andreas},
  keywords = {intuition},
  title = {Evaluating Emergent Semantics in Folksonomies on Human Intuition},
  year = 2015
}

Genre classification on German novels. Hettinger, Lena; Becker, Martin; Reger, Isabella; Jannidis, Fotis; Hotho, Andreas. In Proceedings of the 12th International Workshop on Text-based Information Retrieval. 2015.

[ BibTeX ]
[ URL ]
[ Download ]

@inproceedings{schwemmlein2015genre,
  author = {Hettinger, Lena and Becker, Martin and Reger, Isabella and Jannidis, Fotis and Hotho, Andreas},
  booktitle = {Proceedings of the 12th International Workshop on Text-based Information Retrieval},
  keywords = {classification},
  title = {Genre classification on German novels},
  year = 2015
}

Proceedings of the 1st International Workshop on Interactions between Data Mining and Natural Language Processing co-located with The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, DMNLP@PKDD/ECML 2014, Nancy, France, September 15, 2014. Cellier, Peggy; Charnois, Thierry; Hotho, Andreas; Matwin, Stan; Moens, Marie{-}Francine; Toussaint, Yannick. In Vol. 1202 of {CEUR} Workshop Proceedings. CEUR-WS.org, 2014.

[ BibTeX ]
[ URL ]
[ Download ]

@proceedings{cellier2014proceedings,
  editor = {Cellier, Peggy and Charnois, Thierry and Hotho, Andreas and Matwin, Stan and Moens, Marie{-}Francine and Toussaint, Yannick},
  keywords = {from:hotho},
  publisher = {CEUR-WS.org},
  series = {{CEUR} Workshop Proceedings},
  title = {Proceedings of the 1st International Workshop on Interactions between Data Mining and Natural Language Processing co-located with The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, DMNLP@PKDD/ECML 2014, Nancy, France, September 15, 2014},
  volume = 1202,
  year = 2014
}

Computing semantic relatedness from human navigational paths on Wikipedia. Singer, Philipp; Niebler, Thomas; Strohmaier, Markus; Hotho, Andreas. In Proceedings of the 22nd international conference on World Wide Web companion, of WWW ’13 Companion, ACM (ed.), pp. 171–172. International World Wide Web Conferences Steering Committee, Rio de Janeiro, Brazil, 2013.

[ Abstract ]
[ BibTeX ]
[ URL ]
[ Download ]

This paper presents a novel approach for computing semantic relatedness between concepts on Wikipedia by using human navigational paths for this task. Our results suggest that human navigational paths provide a viable source for calculating semantic relatedness between concepts on Wikipedia. We also show that we can improve accuracy by intelligent selection of path corpora based on path characteristics indicating that not all paths are equally useful. Our work makes an argument for expanding the existing arsenal of data sources for calculating semantic relatedness and to consider the utility of human navigational paths for this task.

@inproceedings{singer2013computing,
  abstract = {This paper presents a novel approach for computing semantic relatedness between concepts on Wikipedia by using human navigational paths for this task. Our results suggest that human navigational paths provide a viable source for calculating semantic relatedness between concepts on Wikipedia. We also show that we can improve accuracy by intelligent selection of path corpora based on path characteristics indicating that not all paths are equally useful. Our work makes an argument for expanding the existing arsenal of data sources for calculating semantic relatedness and to consider the utility of human navigational paths for this task.},
  address = {Republic and Canton of Geneva, Switzerland},
  author = {Singer, Philipp and Niebler, Thomas and Strohmaier, Markus and Hotho, Andreas},
  booktitle = {Proceedings of the 22nd international conference on World Wide Web companion},
  editor = {ACM},
  keywords = {selected},
  pages = {171–172},
  publisher = {International World Wide Web Conferences Steering Committee},
  series = {WWW '13 Companion},
  title = {Computing semantic relatedness from human navigational paths on Wikipedia},
  year = 2013
}

Computing Semantic Relatedness from Human Navigational Paths: A Case Study on Wikipedia. Singer, Philipp; Niebler, Thomas; Strohmaier, Markus; Hotho, Andreas. In International Journal on Semantic Web and Information Systems (IJSWIS), 9(4), pp. 41–70. IGI Global, 2013.

[ Abstract ]
[ BibTeX ]
[ URL ]
[ DOI ]
[ Download ]

In this article, the authors present a novel approach for computing semantic relatedness and conduct a large-scale study of it on Wikipedia. Unlike existing semantic analysis methods that utilize Wikipedia’s content or link structure, the authors propose to use human navigational paths on Wikipedia for this task. The authors obtain 1.8 million human navigational paths from a semi-controlled navigation experiment – a Wikipedia-based navigation game, in which users are required to find short paths between two articles in a given Wikipedia article network. The authors’ results are intriguing: They suggest that (i) semantic relatedness computed from human navigational paths may be more precise than semantic relatedness computed from Wikipedia’s plain link structure alone and (ii) that not all navigational paths are equally useful. Intelligent selection based on path characteristics can improve accuracy. The authors’ work makes an argument for expanding the existing arsenal of data sources for calculating semantic relatedness and to consider the utility of human navigational paths for this task.

@article{singer2013computing,
  abstract = {In this article, the authors present a novel approach for computing semantic relatedness and conduct a large-scale study of it on Wikipedia. Unlike existing semantic analysis methods that utilize Wikipedia’s content or link structure, the authors propose to use human navigational paths on Wikipedia for this task. The authors obtain 1.8 million human navigational paths from a semi-controlled navigation experiment – a Wikipedia-based navigation game, in which users are required to find short paths between two articles in a given Wikipedia article network. The authors’ results are intriguing: They suggest that (i) semantic relatedness computed from human navigational paths may be more precise than semantic relatedness computed from Wikipedia’s plain link structure alone and (ii) that not all navigational paths are equally useful. Intelligent selection based on path characteristics can improve accuracy. The authors’ work makes an argument for expanding the existing arsenal of data sources for calculating semantic relatedness and to consider the utility of human navigational paths for this task.},
  author = {Singer, Philipp and Niebler, Thomas and Strohmaier, Markus and Hotho, Andreas},
  journal = {International Journal on Semantic Web and Information Systems (IJSWIS)},
  keywords = {from:hotho},
  number = 4,
  pages = {41–70},
  publisher = {IGI Global},
  title = {Computing Semantic Relatedness from Human Navigational Paths: A Case Study on Wikipedia},
  volume = 9,
  year = 2013
}

Projects

LitBERT

KILiMod

MOTIV

Detecting Scenes in Fiction

Machine Learning and Knowledge Graphs

Analysing Comments on Twitch.tv

LLäMmlein

Concluded Projects

Publications

Bildnachweise