REGIO (concluded) - Data Science Chair

Science and innovation are characterized by local interaction: Actors tend to prefer to cooperate with actors in their "closeness", where "closeness" can refer to different dimensions and properties. The aim of the REGIO project is to contribute to a better understanding of the role of "closeness" - especially geographic and thematic proximity - in the uptake and success of interaction relationships in science and private research and development. For this task we work together with the Berlin School of Library and Information Science, the International Center for Higher Education Research and the Research Center L3S.

Our task in this project is to understand the emergence of publications within the research landscape. Therefore, we will advance a number of hypotheses on regional research networks and innovation clusters and compare them with publication data, for example, from Web of Science. In order to test the plausibility of competing hypotheses, we will transfer and further develop the HypTrails approach, that allows the comparison of several hypotheses regarding a given dataset, to publication graphs.

Results

At the beginning of the project, a data concept was developed which made it possible to generate concrete publication data sets for various research areas. The basic idea was to identify researchers of certain research areas via the central journals and conferences of these areas. This resulted in a dataset on the German and international AI landscape.

In order to understand this dataset temporally and thematically, a list of the subdomains of the research domain AI was used. Thus, researchers can be assigned to these subtopics, provided they have published in the corresponding journals and conferences. The result is a thematic categorization of the AI research landscape and was made available to the public via the AIRankings website.

Based on previous scientific work, various aspects of proximity were defined and quantified. The german AI dataset was used to extract information about scientific researcher. In order to compare different aspects of proximity, the HypTrails approach was adapted from sequential data to graphs, allowing an application to co-author graphs. As a result, the new method AuthorTrails was created.

Furthermore, different approaches from deep learning were evaluated to predict co-authorship. On the one hand, deep neural graph network approaches were used and extended to predict new links in the co-author graph. As a second approach, we redefined the co-author prediction task and model the task as sequential prediction. The current state-of-the-art model of this domain BERT4Rec was extended and adapted to the new task. The CoBERT model developed in the project has the advantage of taking temporal information into account and can predict additional potential co-authors for each author.

Lastly, the AuthorTrails method was extended. AuthorTrails allows the comparison of different hypotheses within a data set. The further developed method CompTrails, on the other hand, allows the comparison of the same hypothesis across different datasets. This allows thematic and temporal analysis of the research landscape.
The code for the web pages and methods is publicly available.

Partners

This project is done in cooperation with the following institutions:

International Center for Higher Education Research, University of Kassel
Berlin School of Library and Information Science, Humboldt University of Berlin
Research Center L3S, University of Hannover

Staff

The following persons are involved in this project:

Official Website

Visit regio-project.org

Additional websites funded by this project:

Publications

2024[ to top ]

Koopmann, T., Becker, M., Lemmerich, F., and Hotho, A. (2024) “CompTrails: comparing hypotheses across behavioral networks”, Data Mining and Knowledge Discovery, available: https://doi.org/10.1007/s10618-023-00996-8.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
The term Behavioral Networks describes networks that contain relational information on human behavior. This ranges from social networks that contain friendships or cooperations between individuals, to navigational networks that contain geographical or web navigation, and many more. Understanding the forces driving behavior within these networks can be beneficial to improving the underlying network, for example, by generating new hyperlinks on websites, or by proposing new connections and friends on social networks. Previous approaches considered different hypotheses on a single network and evaluated which hypothesis fits best. These hypotheses can represent human intuition and expert opinions or be based on previous insights. In this work, we extend these approaches to enable the comparison of a single hypothesis between multiple networks. We unveil several issues of naive approaches that potentially impact comparisons and lead to undesired results. Based on these findings, we propose a framework with five flexible components that allow addressing specific analysis goals tailored to the application scenario. We show the benefits and limits of our approach by applying it to synthetic data and several real-world datasets, including web navigation, bibliometric navigation, and geographic navigation. Our work supports practitioners and researchers with the aim of understanding similarities and differences in human behavior between environments.

@article{Koopmann2024, abstract = {The term Behavioral Networks describes networks that contain relational information on human behavior. This ranges from social networks that contain friendships or cooperations between individuals, to navigational networks that contain geographical or web navigation, and many more. Understanding the forces driving behavior within these networks can be beneficial to improving the underlying network, for example, by generating new hyperlinks on websites, or by proposing new connections and friends on social networks. Previous approaches considered different hypotheses on a single network and evaluated which hypothesis fits best. These hypotheses can represent human intuition and expert opinions or be based on previous insights. In this work, we extend these approaches to enable the comparison of a single hypothesis between multiple networks. We unveil several issues of naive approaches that potentially impact comparisons and lead to undesired results. Based on these findings, we propose a framework with five flexible components that allow addressing specific analysis goals tailored to the application scenario. We show the benefits and limits of our approach by applying it to synthetic data and several real-world datasets, including web navigation, bibliometric navigation, and geographic navigation. Our work supports practitioners and researchers with the aim of understanding similarities and differences in human behavior between environments.}, author = {Koopmann, Tobias and Becker, Martin and Lemmerich, Florian and Hotho, Andreas}, journal = {Data Mining and Knowledge Discovery}, keywords = {research:recommender}, month = {01}, title = {CompTrails: comparing hypotheses across behavioral networks}, year = 2024 }

2021[ to top ]

Koopmann, T., Kobs, K., Herud, K., and Hotho, A. (2021) “CoBERT: Scientific Collaboration Prediction via Sequential Recommendation”, in 2021 International Conference on Data Mining Workshops (ICDMW), 45–54, available: https://doi.org/10.1109/ICDMW53433.2021.00013.
- [ Abstract ]
- [ BibTeX ]
- [ Download ]
Collaborations are an Important factor for scientific success, as the joint work leads to results individual scientists cannot easily reach. Recommending collaborations automatically can alleviate the time consuming and tedious search for potential collaborators. Usually, such recommendation systems rely on graph structures modeling co-authorship of papers and content-based relations such as similar paper keywords. Models are then trained to estimate the probability of links between certain authors in these graphs.In this paper, we argue that the order of papers is crucial for reliably predicting future collaborations, which is not considered by graph-based recommendation systems. We thus propose to reformulate the task of collaboration recommendation as a sequential recommendation task. Here, we aim to predict the next co-author in a chronologically sorted sequence of an author’s collaborators. We introduce CoBERT, a BERT4Rec inspired model, that predicts the sequence’s next co-author and thus a potential collaborator. Since the order of co-authors of a single paper is not that important compared to the overall paper order, we leverage positional embeddings encoding paper positions instead of co-author positions in the sequence. Additionally, we inject content features about every paper and their co-authors. We evaluate CoBERT on two datasets consisting of papers from the field of Artificial Intelligence and the journal PlosOne. We show that CoBERT can outperform graph-based methods and BERT4Rec when predicting the co-authors of the next paper. We make our code and data available.

@inproceedings{koopmann2021cobert, abstract = {Collaborations are an Important factor for scientific success, as the joint work leads to results individual scientists cannot easily reach. Recommending collaborations automatically can alleviate the time consuming and tedious search for potential collaborators. Usually, such recommendation systems rely on graph structures modeling co-authorship of papers and content-based relations such as similar paper keywords. Models are then trained to estimate the probability of links between certain authors in these graphs.In this paper, we argue that the order of papers is crucial for reliably predicting future collaborations, which is not considered by graph-based recommendation systems. We thus propose to reformulate the task of collaboration recommendation as a sequential recommendation task. Here, we aim to predict the next co-author in a chronologically sorted sequence of an author’s collaborators. We introduce CoBERT, a BERT4Rec inspired model, that predicts the sequence’s next co-author and thus a potential collaborator. Since the order of co-authors of a single paper is not that important compared to the overall paper order, we leverage positional embeddings encoding paper positions instead of co-author positions in the sequence. Additionally, we inject content features about every paper and their co-authors. We evaluate CoBERT on two datasets consisting of papers from the field of Artificial Intelligence and the journal PlosOne. We show that CoBERT can outperform graph-based methods and BERT4Rec when predicting the co-authors of the next paper. We make our code and data available.}, author = {Koopmann, Tobias and Kobs, Konstantin and Herud, Konstantin and Hotho, Andreas}, booktitle = {2021 International Conference on Data Mining Workshops (ICDMW)}, keywords = {research:recommender}, month = 12, pages = {45-54}, title = {CoBERT: Scientific Collaboration Prediction via Sequential Recommendation}, year = 2021 }
Koopmann, T., Stubbemann, M., Kapa, M., Paris, M., Buenstorf, G., Hanika, T., Hotho, A., Jäschke, R., and Stumme, G. (2021) “Proximity dimensions and the emergence of collaboration: a HypTrails study on German AI research”, Scientometrics, available: https://doi.org/10.1007/s11192-021-03922-1.
- [ Abstract ]
- [ BibTeX ]
- [ URL ]
- [ Download ]
Creation and exchange of knowledge depends on collaboration. Recent work has suggested that the emergence of collaboration frequently relies on geographic proximity. However, being co-located tends to be associated with other dimensions of proximity, such as social ties or a shared organizational environment. To account for such factors, multiple dimensions of proximity have been proposed, including cognitive, institutional, organizational, social and geographical proximity. Since they strongly interrelate, disentangling these dimensions and their respective impact on collaboration is challenging. To address this issue, we propose various methods for measuring different dimensions of proximity. We then present an approach to compare and rank them with respect to the extent to which they indicate co-publications and co-inventions. We adapt the HypTrails approach, which was originally developed to explain human navigation, to co-author and co-inventor graphs. We evaluate this approach on a subset of the German research community, specifically academic authors and inventors active in research on artificial intelligence (AI). We find that social proximity and cognitive proximity are more important for the emergence of collaboration than geographic proximity.

@article{koopmann2021proximity, abstract = {Creation and exchange of knowledge depends on collaboration. Recent work has suggested that the emergence of collaboration frequently relies on geographic proximity. However, being co-located tends to be associated with other dimensions of proximity, such as social ties or a shared organizational environment. To account for such factors, multiple dimensions of proximity have been proposed, including cognitive, institutional, organizational, social and geographical proximity. Since they strongly interrelate, disentangling these dimensions and their respective impact on collaboration is challenging. To address this issue, we propose various methods for measuring different dimensions of proximity. We then present an approach to compare and rank them with respect to the extent to which they indicate co-publications and co-inventions. We adapt the HypTrails approach, which was originally developed to explain human navigation, to co-author and co-inventor graphs. We evaluate this approach on a subset of the German research community, specifically academic authors and inventors active in research on artificial intelligence (AI). We find that social proximity and cognitive proximity are more important for the emergence of collaboration than geographic proximity.}, author = {Koopmann, Tobias and Stubbemann, Maximilian and Kapa, Matthias and Paris, Michael and Buenstorf, Guido and Hanika, Tom and Hotho, Andreas and Jäschke, Robert and Stumme, Gerd}, journal = {Scientometrics}, keywords = {research:fundamentals}, title = {Proximity dimensions and the emergence of collaboration: a HypTrails study on German AI research}, year = 2021 }

2020[ to top ]

Stubbemann, M. and Koopmann, T. (2020) “The German and International AI Network Data Set”, available: https://doi.org/10.5281/zenodo.3693604.
- [ BibTeX ]
@article{koopmann-germanai_2020, author = {Stubbemann, Maximilian and Koopmann, Tobias}, keywords = {regio}, month = {03}, publisher = {Zenodo}, title = {The German and International AI Network Data Set}, year = 2020 }

REGIO - A mapping of the origins and success of Cooperation relationships in regional research networks and innovation clusters