HydrAS

Methods for Hypothesis-driven Analysis of Sequential Data (HydrAS)

HydrAS is a DFG funded project starting 2022 and running for three years.

Thematic description

Increased availability of large-scale digital trace data on human behavior requires the development of suitable algorithmic approaches in the fields of computer and data science. Such data often comes in the form of sequences, e.g. as sequences of visited websites or locations in cities. To analyze this kind of data and extract knowledge in large scale, the applicants and others presented a novel computational approach that enables the comparison of hypotheses (derived from intuition, previous studies, or social theories) with respect to their plausibility regarding observed sequences in a Bayesian approach.

In this project, we will develop fundamentally new data analysis methods in that direction that overcome current shortcomings. In that regard, we will (1) systemize and simplify the process of hypothesis elicitation by integrating (semi-)automatic procedures for deriving interpretable base hypotheses from background knowledge and combining base hypotheses with each other. Additionally, we aim to (2) develop methods that partition data sequences in such a way that each part of the data can be succinctly described in terms of background information on the features, and the transition behavior in each partition can be explained by given hypotheses in order to account for heterogeneity in the data. Finally, we (3) extend the general framework of hypothesis-based analysis of sequential data, which currently focuses on simple first-order Markov Chain models to more complex models such as Hidden Markov chain models, continuous time Markov chain models or neural networks for sequential data. This would allow to formalize more complex and more fine-grained hypotheses, to pick models that are most suitable for a specific scenario, and integrate additional information (e.g., time information) in an easily understandable way.

In contrast to many recently proposed methods in the field of data science and machine learning, our research will not focus on methods that yield the maximum predictive power. Instead, we concentrate on finding potential explanations of the data generation process that can be understood by human domain experts through incorporating their hypotheses directly into the analysis process. In that regard, it will provide unique opportunities to integrate hypothesis-driven data analysis on one hand with advanced machine learning techniques on the other hand to support the understanding of the underlying processes generating the observed sequences. While this project focuses on developing new data science methods for analyzing human behavior, we expect these to be easily transferable to other application areas featuring sequential data.

We host a repository with related publication and code here.

Staff

The following persons are involved in this project:

Publications

2024[ to top ]

Levermann, M.J. (2024) Bayesian Inference for Composition of Hypotheses in Sequential Data, Master thesis.
- [ BibTeX ]
@mastersthesis{Levermann2024, author = {Levermann, Max Johann}, keywords = {masterthesis}, month = {09}, school = {Julius-Maximilians-Universität Würzburg, Professur für Inverse Probleme}, title = {Bayesian Inference for Composition of Hypotheses in Sequential Data}, type = {Masterarbeit}, year = 2024 }
Koopmann, T., Becker, M., Lemmerich, F., and Hotho, A. (2024) “CompTrails: comparing hypotheses across behavioral networks”, Data Mining and Knowledge Discovery, available: https://doi.org/10.1007/s10618-023-00996-8.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
The term Behavioral Networks describes networks that contain relational information on human behavior. This ranges from social networks that contain friendships or cooperations between individuals, to navigational networks that contain geographical or web navigation, and many more. Understanding the forces driving behavior within these networks can be beneficial to improving the underlying network, for example, by generating new hyperlinks on websites, or by proposing new connections and friends on social networks. Previous approaches considered different hypotheses on a single network and evaluated which hypothesis fits best. These hypotheses can represent human intuition and expert opinions or be based on previous insights. In this work, we extend these approaches to enable the comparison of a single hypothesis between multiple networks. We unveil several issues of naive approaches that potentially impact comparisons and lead to undesired results. Based on these findings, we propose a framework with five flexible components that allow addressing specific analysis goals tailored to the application scenario. We show the benefits and limits of our approach by applying it to synthetic data and several real-world datasets, including web navigation, bibliometric navigation, and geographic navigation. Our work supports practitioners and researchers with the aim of understanding similarities and differences in human behavior between environments.

@article{Koopmann2024, abstract = {The term Behavioral Networks describes networks that contain relational information on human behavior. This ranges from social networks that contain friendships or cooperations between individuals, to navigational networks that contain geographical or web navigation, and many more. Understanding the forces driving behavior within these networks can be beneficial to improving the underlying network, for example, by generating new hyperlinks on websites, or by proposing new connections and friends on social networks. Previous approaches considered different hypotheses on a single network and evaluated which hypothesis fits best. These hypotheses can represent human intuition and expert opinions or be based on previous insights. In this work, we extend these approaches to enable the comparison of a single hypothesis between multiple networks. We unveil several issues of naive approaches that potentially impact comparisons and lead to undesired results. Based on these findings, we propose a framework with five flexible components that allow addressing specific analysis goals tailored to the application scenario. We show the benefits and limits of our approach by applying it to synthetic data and several real-world datasets, including web navigation, bibliometric navigation, and geographic navigation. Our work supports practitioners and researchers with the aim of understanding similarities and differences in human behavior between environments.}, author = {Koopmann, Tobias and Becker, Martin and Lemmerich, Florian and Hotho, Andreas}, journal = {Data Mining and Knowledge Discovery}, keywords = {research:recommender}, month = {01}, title = {CompTrails: comparing hypotheses across behavioral networks}, year = 2024 }

2023[ to top ]

Koopmann, T., Pfister, J., Markus, A., Carolus, A., Wienrich, C., and Hotho, A. (2023) “Higher-Order DeepTrails: Unified Approach to *Trails”, in Leyer, M. and Wichmann, J., eds., Proceedings of the {LWDA} 2023 Workshops: FGWM, FGKD, and FGDB, Marburg (Germany), Oktober 9-11th, 2023, {CEUR} Workshop Proceedings, CEUR-WS.org.
- [ BibTeX ]
@inproceedings{DBLP:conf/lwa/Koopmann23, author = {Koopmann, Tobias and Pfister, Jan and Markus, André and Carolus, Astrid and Wienrich, Carolin and Hotho, Andreas}, booktitle = {Proceedings of the {LWDA} 2023 Workshops: FGWM, FGKD, and FGDB, Marburg (Germany), Oktober 9-11th, 2023}, editor = {Leyer, Michael and Wichmann, Johannes}, keywords = {research:fundamentals}, publisher = {CEUR-WS.org}, series = {{CEUR} Workshop Proceedings}, title = {Higher-Order DeepTrails: Unified Approach to *Trails}, year = 2023 }

Hubland Nord

Thematic description

Bildnachweise