Science and innovation are characterized by local interaction: Actors tend to prefer to cooperate with actors in their "closeness", where "closeness" can refer to different dimensions and properties. The aim of the REGIO project is to contribute to a better understanding of the role of "closeness" - especially geographic and thematic proximity - in the uptake and success of interaction relationships in science and private research and development. For this task we work together with the Berlin School of Library and Information Science, the International Center for Higher Education Research and the Research Center L3S.
Our task in this project is to understand the emergence of publications within the research landscape. Therefore, we will advance a number of hypotheses on regional research networks and innovation clusters and compare them with publication data, for example, from Web of Science. In order to test the plausibility of competing hypotheses, we will transfer and further develop the HypTrails approach, that allows the comparison of several hypotheses regarding a given dataset, to publication graphs.
At the beginning of the project, a data concept was developed which made it possible to generate concrete publication data sets for various research areas. The basic idea was to identify researchers of certain research areas via the central journals and conferences of these areas. This resulted in a dataset on the German and international AI landscape.
In order to understand this dataset temporally and thematically, a list of the subdomains of the research domain AI was used. Thus, researchers can be assigned to these subtopics, provided they have published in the corresponding journals and conferences. The result is a thematic categorization of the AI research landscape and was made available to the public via the AIRankings website.
Based on previous scientific work, various aspects of proximity were defined and quantified. The german AI dataset was used to extract information about scientific researcher. In order to compare different aspects of proximity, the HypTrails approach was adapted from sequential data to graphs, allowing an application to co-author graphs. As a result, the new method AuthorTrails was created.
Furthermore, different approaches from deep learning were evaluated to predict co-authorship. On the one hand, deep neural graph network approaches were used and extended to predict new links in the co-author graph. As a second approach, we redefined the co-author prediction task and model the task as sequential prediction. The current state-of-the-art model of this domain BERT4Rec was extended and adapted to the new task. The CoBERT model developed in the project has the advantage of taking temporal information into account and can predict additional potential co-authors for each author.
Lastly, the AuthorTrails method was extended. AuthorTrails allows the comparison of different hypotheses within a data set. The further developed method CompTrails, on the other hand, allows the comparison of the same hypothesis across different datasets. This allows thematic and temporal analysis of the research landscape.
The code for the web pages and methods is publicly available.
This project is done in cooperation with the following institutions:
- International Center for Higher Education Research, University of Kassel
- Berlin School of Library and Information Science, Humboldt University of Berlin
- Research Center L3S, University of Hannover
Additional websites funded by this project:
“CoBERT: Scientific Collaboration Prediction via Sequential Recommendation”, in 2021 International Conference on Data Mining Workshops (ICDMW), 45–54.(2021)
“Proximity dimensions and the emergence of collaboration: a HypTrails study on German AI research”, Scientometrics, available: https://doi.org/10.1007/s11192-021-03922-1.(2021)
“Where to Submit Helping Researchers to Choose the Right Venue”, in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, Association for Computational Linguistics: Online, 878–883, available: https://www.aclweb.org/anthology/2020.findings-emnlp.78.(2020)