Deutsch Intern
    Data Science Chair

    Paper accepted at the HyperText'19 conference

    01/20/2020

    The paper 'On the Right Track! Analysing and Predicting Navigation Success in Wikipedia' has been published on the HyperText'19 conference.

    In some of our recent research , we analysed click trails extracted from different Wikipedia games and predicted the outcome of the game. This short post will explain why we did this and and how it works. 

    The internet is becoming more complex everyday. Hence it is also becoming harder to find web-pages and sought-after information. You might have experienced it yourself. For example you are searching, if glass bottles are allowed on the next festival you want to visit, but the web page is rather confusing. After some searching and only finding information about the line-up, you end up on the main page again, very annoyed and leaving the page. This is a unsatisfying result. One solution here could be a side bar, offering dynamic links depending on what the page assume the user is looking for. 

    In this case, the web page may have adapted the recommended subpages, knowing that you are randomly clicking not being able to find a target.  

    Based on this example, we set the overall task: We want to help the user finding their target (in this case information). In our work we do the first step by analysing these sequences of clicks (or click trails) and predict, whether the user will be successful and finds the target or not. This task is even harder, when no information about the target is given (which is the case in a realistic setting). Unfortunately there is a problem. For click trails in the internet we actually do not know, why the user left the page. Were they successful and did not need help or did they leave unsuccessful and could use some help? We decided to takle this problem by using game-based datasets on the well-known information plattform Wikipedia (namely WikiGame and WikiSpeedia). Due to the game setting, we know the target of the user and can infer, why they abandoned the task. 

    In this game a user is given a randomly selected start page (in this exemplary figure on the left side) and the task is to find the target page in as few clicks as possible only using hyperlinks. The result is a sequence of clicks. We analyse these click trails with respect to different properties (e.g. if they use hub nodes, in this case the page of Europe) in order to get a better understanding of the navigation behaviour.

    Based on the insights of our analysis, we use a deep learning approach to predict the outcome of the game. We try several settings, e.g. a prediction based only on the first few clicks, or based on only the last clicks. In general we are able to outscore our baselines using less information (in specific, we disregard the target node). Out top AUC score is 0.90, which shows that our approach is able to predict the outcome of the game in this setting. We are able to make the next step towards the overall goal, which is helping the user finding their sought-after information in a live-setting. 

    For more detail and insights about this work, we refer you to paper from Koopmann et al. [1].

    [1] Koopmann, T., Dallmann, A., Hettinger, L., Niebler, T. & Hotho, A. (2019). On the Right Track! Analysing and Predicting Navigation Success in Wikipedia. Proceedings of the 30th ACM Conference on Hypertext and Social Media (p./pp. 143–152), New York, NY, USA: ACM. ISBN: 978-1-4503-6885-8 

     

    Back