HydrAS (concluded)

Methods for Hypothesis-driven Analysis of Sequential Data (HydrAS)

HydrAS is a DFG funded project starting 2022 and running for three years.

Thematic description

Increased availability of large-scale digital trace data on human behavior requires the development of suitable algorithmic approaches in the fields of computer and data science. Such data often comes in the form of sequences, e.g. as sequences of visited websites or locations in cities. To analyze this kind of data and extract knowledge in large scale, the applicants and others presented a novel computational approach that enables the comparison of hypotheses (derived from intuition, previous studies, or social theories) with respect to their plausibility regarding observed sequences in a Bayesian approach.

In this project, we will develop fundamentally new data analysis methods in that direction that overcome current shortcomings. In that regard, we will (1) systemize and simplify the process of hypothesis elicitation by integrating (semi-)automatic procedures for deriving interpretable base hypotheses from background knowledge and combining base hypotheses with each other. Additionally, we aim to (2) develop methods that partition data sequences in such a way that each part of the data can be succinctly described in terms of background information on the features, and the transition behavior in each partition can be explained by given hypotheses in order to account for heterogeneity in the data. Finally, we (3) extend the general framework of hypothesis-based analysis of sequential data, which currently focuses on simple first-order Markov Chain models to more complex models such as Hidden Markov chain models, continuous time Markov chain models or neural networks for sequential data. This would allow to formalize more complex and more fine-grained hypotheses, to pick models that are most suitable for a specific scenario, and integrate additional information (e.g., time information) in an easily understandable way.

In contrast to many recently proposed methods in the field of data science and machine learning, our research will not focus on methods that yield the maximum predictive power. Instead, we concentrate on finding potential explanations of the data generation process that can be understood by human domain experts through incorporating their hypotheses directly into the analysis process. In that regard, it will provide unique opportunities to integrate hypothesis-driven data analysis on one hand with advanced machine learning techniques on the other hand to support the understanding of the underlying processes generating the observed sequences. While this project focuses on developing new data science methods for analyzing human behavior, we expect these to be easily transferable to other application areas featuring sequential data.

We host a repository with related publication and code here.

Staff

The following persons are involved in this project:

Publications

2025[ to top ]

Modeling and Analyzing the Influence of Non-Item Pages on Sequential Next-Item Prediction. Fischer, Elisabeth; Zehe, Albin; Hotho, Andreas; Schlör, Daniel. In ACM Trans. Recomm. Syst. Association for Computing Machinery, New York, NY, USA, 2025.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
- [ BibSonomy-Post ]
Analyzing sequences of interactions between users and items, sequential recommendation models can learn user intent and make predictions about the next item. Next to item interactions, most systems also have interactions with what we call non-item pages: these pages are not related to specific items but still can provide insights into the user’s interests, as, for example, navigation pages. We therefore propose a general way to include these non-item pages in sequential recommendation models to enhance next-item prediction. First, we demonstrate the influence of non-item pages on following interactions using the hypotheses testing framework HypTrails and propose methods for representing non-item pages in sequential recommendation models. Subsequently, we adapt popular sequential recommender models to integrate non-item pages and investigate their performance with different item representation strategies as well as their ability to handle noisy data. To show the general capabilities of the models to integrate non-item pages, we create a synthetic dataset for a controlled setting and then evaluate the improvements from including non-item pages on two real-world datasets. Our results show that non-item pages are a valuable source of information, and incorporating them in sequential recommendation models increases the performance of next-item prediction across all analyzed model architectures.

@article{Fischer2025, abstract = {Analyzing sequences of interactions between users and items, sequential recommendation models can learn user intent and make predictions about the next item. Next to item interactions, most systems also have interactions with what we call non-item pages: these pages are not related to specific items but still can provide insights into the user’s interests, as, for example, navigation pages. We therefore propose a general way to include these non-item pages in sequential recommendation models to enhance next-item prediction. First, we demonstrate the influence of non-item pages on following interactions using the hypotheses testing framework HypTrails and propose methods for representing non-item pages in sequential recommendation models. Subsequently, we adapt popular sequential recommender models to integrate non-item pages and investigate their performance with different item representation strategies as well as their ability to handle noisy data. To show the general capabilities of the models to integrate non-item pages, we create a synthetic dataset for a controlled setting and then evaluate the improvements from including non-item pages on two real-world datasets. Our results show that non-item pages are a valuable source of information, and incorporating them in sequential recommendation models increases the performance of next-item prediction across all analyzed model architectures.}, address = {New York, NY, USA}, author = {Fischer, Elisabeth and Zehe, Albin and Hotho, Andreas and Schlör, Daniel}, journal = {ACM Trans. Recomm. Syst.}, keywords = {selected}, month = {03}, publisher = {Association for Computing Machinery}, title = {Modeling and Analyzing the Influence of Non-Item Pages on Sequential Next-Item Prediction}, year = 2025 }
Integrating Hidden Markov Models and Bayesian Inference for Sequential Data Analysis. Technical Report (Master thesis), . Lappert, Julia. Master thesis. Julius-Maximilians-Universität Würzburg, Professur für Inverse Probleme, 2025.
- [ BibTeX ]
- [ Download ]
- [ BibSonomy-Post ]
@mastersthesis{lappert2025integrating, author = {Lappert, Julia}, keywords = {masterthesis}, school = {Julius-Maximilians-Universität Würzburg, Professur für Inverse Probleme}, title = {Integrating Hidden Markov Models and Bayesian Inference for Sequential Data Analysis}, year = 2025 }

2024[ to top ]

Bayesian Inference for Composition of Hypotheses in Sequential Data. Technical Report (Master thesis), . Levermann, Max Johann. Master thesis. Julius-Maximilians-Universität Würzburg, Professur für Inverse Probleme, 2024, September 28.
- [ BibTeX ]
- [ Download ]
- [ BibSonomy-Post ]
@mastersthesis{Levermann2024, author = {Levermann, Max Johann}, keywords = {masterthesis}, month = {09}, school = {Julius-Maximilians-Universität Würzburg, Professur für Inverse Probleme}, title = {Bayesian Inference for Composition of Hypotheses in Sequential Data}, type = {Masterarbeit}, year = 2024 }
CompTrails: comparing hypotheses across behavioral networks. Koopmann, Tobias; Becker, Martin; Lemmerich, Florian; Hotho, Andreas. In Data Mining and Knowledge Discovery. 2024.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
- [ BibSonomy-Post ]
The term Behavioral Networks describes networks that contain relational information on human behavior. This ranges from social networks that contain friendships or cooperations between individuals, to navigational networks that contain geographical or web navigation, and many more. Understanding the forces driving behavior within these networks can be beneficial to improving the underlying network, for example, by generating new hyperlinks on websites, or by proposing new connections and friends on social networks. Previous approaches considered different hypotheses on a single network and evaluated which hypothesis fits best. These hypotheses can represent human intuition and expert opinions or be based on previous insights. In this work, we extend these approaches to enable the comparison of a single hypothesis between multiple networks. We unveil several issues of naive approaches that potentially impact comparisons and lead to undesired results. Based on these findings, we propose a framework with five flexible components that allow addressing specific analysis goals tailored to the application scenario. We show the benefits and limits of our approach by applying it to synthetic data and several real-world datasets, including web navigation, bibliometric navigation, and geographic navigation. Our work supports practitioners and researchers with the aim of understanding similarities and differences in human behavior between environments.

@article{Koopmann2024, abstract = {The term Behavioral Networks describes networks that contain relational information on human behavior. This ranges from social networks that contain friendships or cooperations between individuals, to navigational networks that contain geographical or web navigation, and many more. Understanding the forces driving behavior within these networks can be beneficial to improving the underlying network, for example, by generating new hyperlinks on websites, or by proposing new connections and friends on social networks. Previous approaches considered different hypotheses on a single network and evaluated which hypothesis fits best. These hypotheses can represent human intuition and expert opinions or be based on previous insights. In this work, we extend these approaches to enable the comparison of a single hypothesis between multiple networks. We unveil several issues of naive approaches that potentially impact comparisons and lead to undesired results. Based on these findings, we propose a framework with five flexible components that allow addressing specific analysis goals tailored to the application scenario. We show the benefits and limits of our approach by applying it to synthetic data and several real-world datasets, including web navigation, bibliometric navigation, and geographic navigation. Our work supports practitioners and researchers with the aim of understanding similarities and differences in human behavior between environments.}, author = {Koopmann, Tobias and Becker, Martin and Lemmerich, Florian and Hotho, Andreas}, journal = {Data Mining and Knowledge Discovery}, keywords = {research:recommender}, month = {01}, title = {CompTrails: comparing hypotheses across behavioral networks}, year = 2024 }

2023[ to top ]

Higher-Order DeepTrails: Unified Approach to *Trails. Koopmann, Tobias; Pfister, Jan; Markus, André; Carolus, Astrid; Wienrich, Carolin; Hotho, Andreas. In Lernen, Wissen, Daten, Analysen {(LWDA)} Conference Proceedings, Marburg, Germany, October 9-11, 2023, Vol. 3630 of {CEUR} Workshop Proceedings, M. Leyer, J. Wichmann (eds.), pp. 372–386. CEUR-WS.org, 2023.
- [ BibTeX ]
- [ URL ]
- [ Download ]
- [ BibSonomy-Post ]
@inproceedings{DBLP:conf/lwa/Koopmann23, author = {Koopmann, Tobias and Pfister, Jan and Markus, André and Carolus, Astrid and Wienrich, Carolin and Hotho, Andreas}, booktitle = {Lernen, Wissen, Daten, Analysen {(LWDA)} Conference Proceedings, Marburg, Germany, October 9-11, 2023}, editor = {Leyer, Michael and Wichmann, Johannes}, keywords = {author:pfister}, pages = {372–386}, publisher = {CEUR-WS.org}, series = {{CEUR} Workshop Proceedings}, title = {Higher-Order DeepTrails: Unified Approach to *Trails}, volume = 3630, year = 2023 }

Hubland Nord

Thematic description

Bildnachweise