Wissensbasierte SystemeStudium und Lehre
Studien- & Abschlussarbeiten

Studien- und Abschlussarbeiten

Die Mitarbeiter des KBS beraten Sie gerne mit aktuellen Themenvorschlägen für Ihre Bachelor- / Masterarbeit.

Achtung: Bachelorarbeiten können ggf. auch zu zweit durchgeführt werden, solange der Einzelbeitrag zugeordnet werden kann. Für entsprechend umfangereichere Aufgabenstellungen sehen Sie bitte auch bei den Masterarbeiten nach.

BACHELOR- UND MASTERARBEITEN

  • Mit Bike GPS zu einem nachhaltigen Fahrrad-Tourismus (Prof. Nejdl)

    Bike GPS ist Betreiber eines Tourenportals (www.bike-gps.com), der unter anderem Tourismusregionen in den Alpen berät, wie sie einen Fahrrad-Tourismus aufbauen und wie sie sich als Mountainbike Destination weltweit einen Namen machen können. Die Erfolge können sich sehen lassen am Beispiel Gardasee, Livigno, Dolomiten, Ötztal, Zillertal, Kaltern und viele mehr.

    Basis des Erfolgs ist immer ein möglichst großen Touren- und Trailangebot für verschiedene Zielgruppen und zusätzlich eine Verknüpfung mit allen touristischen Angeboten einer Region. Um dieses Angebot mit den Wünschen der Gäste zu verzahnen, möchte Bike GPS im Rahmen einer Masterarbeit eine App für Android und iOS entwickeln, die auf einer Kartengrundlage eine vorher recherchierte Tour (aus einer großen Auswahl) anzeigt und auf der man sofort präzise und zweifelsfrei navigieren kann: One Click – Go. Zusätzlich sollen Hotels, Restaurants und Einkehrstationen, Fotos von touristischen Highlights, besonderen Panoramen und wichtigen Details auf der Tour gezeigt werden.

    Basis der Navigation ist das einzigartige Bike GPS RichTrack System, das es ermöglicht, auch auf dem Bike oder E-MTB (ähnlich wie im Auto) Abbiegehinweise mit Countdown und die eigene Position im Höhenprofil zu erhalten.

    Technisch soll die App, die als Open Source entwickelt werden soll, auf einem Cross-PlattformFramework wie Xamarin, Cordova oder Flutter basieren, für die Karten soll auf OpenStreetMapMaterial zurückgegriffen werden.

    Mountainbiker und E-Biker willkommen: Die Arbeit umfasst Recherche von Bike-Touren mit dem EMTB im kleinen Team, Kennenlernen des Systems und schließlich Ausarbeitung einer bisher einzigartigen Smartphone-App zur Präsentation bei diversen Tourismusregionen.

     

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. techn. Wolfgang Nejdl

  • Generating Abstractions for Hierarchical Reinforcment Learning based on Graph Representations (Megha Khosla, Daniel Kudenko)

    Reinforcement learning (RL) is a highly popular machine learning technique, mainly due to its natural fit to the agent paradigm (i.e. learning by repeatedly acting and sensing in an environment) and its resulting wide application potential. Despite these advantages, RL still suffers from scalability problems which cause it to require huge computing resources in complex domains.

    Hierarchical approaches that exploit abstractions of the application environment can help to speed up the learning. However, the abstractions require domain knowledge which is not always readily available. Therefore, in [1] it is suggested to generate suitable abstractions automatically. This project will investigate the use of graph representation techniques to generate such abstractions, based on state connectivity in the corresponding Markov Decision Process. The project will perform a comparative analysis of several selected graph representation methods and apply them to reinforcement learning in grid-world navigation tasks.

    References:

    1. J. Burden, D. Kudenko (2020): “Uniform State Abstraction for Reinforcement Learning”, European Conference on Artificial Intelligence (ECAI).

     

    Leitung und Ansprechpartner der Abschlussarbeit

    Dr. Megha KhoslaDaniel Kudenko

  • Neural Models for Improving First Phase Information Retrieval (Dr. Koustav Rudra)

    Document Retrieval problems typically consist of two phases. The first phase triesto retrieve more number of relevant documents for a given query from a largecollection of documents (typically in 10's of millions to billions of documents)whereas the second phase tries to rank the retrieved documents so that highlyrelevant documents appear on the top of the list. The first phase has to be simpleand fast because it works on the large set of document collection. Standard modelssuch as BM25, QL rely only on the tf-idf based term matching between query anddocument without considering the context of words. Recent neural ranking modelshave shown promising performance for the reranking task. However, there hasbeen limited work on neural models and indexing methods to improve the first-stage retrieval. The objective of this thesis is to explore the recent contextualautoregressive models to improve the first-stage retrieval methods.

    Problem Statement

    Given a large collection of items — documents, passages — the aim is two fold:

    • construct a neural retrieval model that exploits attention-based semantic termmatching for effective recall @ k
    • explore how to make inference efficient for this family of models by using ideasfrom KNN-search and LSH.

     

    Challenges

    There are two key challenges what we are faced when making progress for first-stage retrieval:

    1) Measuring recall is difficult for large datasets: The typical way in which recallis measured in existing document retrieval tasks (TREC) is by pooling multipleretrieval models followed by human judgement. But since all the pooling methodsalso follow the same retrieval methods, relevant documents not surfaced by theinitial retrieval method are not judged. This is a hard problem because re-annotation is not possible in such a missing data setting.

    2) Efficiency constraints are strong: Since most of the retrieval methods based onterm matching and KNN search are expected to be fast, any addition of modelcomplexity results in unacceptable inference times. For example deeper layers arealmost always expensive. Such constraints force us to consider alternate designdecisions that trade-off accuracy for speed.We will try to handle the above-mentioned challenges in the following two ways--- 1. consider a small size dataset, 2. a better retrieval model should improve theperformance (map, nDCG) of the second level reranking task.

    An ideal candidate should have:

    • Strong background in python and deep learning
    • motivation behind learning and exploring the data
    • knowledge about basic IR concepts

     

    Interested students are encouraged to email to Dr. Koustav Rudra at rudra(at)l3s(dot)de for scheduling a meeting.

    References:

    1. Context-Aware Sentence/Passage Term Importance Estimation For First StageRetrieval [https://arxiv.org/abs/1910.10687]
    2. Two tower model ICLR 2020 — openreview.net/forum
    3. Efficient Training on Very Large Corpora via Gramian Estimation. ICLR 2019--- https://arxiv.org/pdf/1807.07187.pdf
    4. StarSpace: Embed All The Things! AAAI 2018 ---https://arxiv.org/pdf/1709.03856.pdf

     

    Leitung und Ansprechpartner der Abschlussarbeit

    Dr. Koustav Rudra

  • Identifying Sub-Topics and Generating a Fair Summary of Migration-Related News Articles (Dr. Koustav Rudra)

    Thousands of news are produced and consumed each day. These news arescattered across diverse set of topics such as business, economy, internationalrelation, Migration etc. Migration is an issue currently faced by many countriesacross the globe. Different news channels cover various news about migrationrelated activities. A close inspection reveals that these migration-related news iscomposed of different sub-topics such as human trafficking, immigration,organized crime etc. Identifying these sub-topics and summarizing them is anessential first step to get a coherent real-time view of the situation. The objectiveof this project is two fold:

    1. Identifying migration-related sub-topics and classifying news sources into thesecategories.
    2. Generating a report/summary of the scenario by combining information frommultiple sources. The objective is to represent each sub-topic in a fair way.

     

    An ideal candidate should have

    • Strong background in python and deep learning
    • motivation behind learning and exploring the data
    • interest in some human level annotation
    • knowledge about basic NLP concepts

     

    Interested students are encouraged to email to Dr. Koustav Rudra atrudra(at)l3s(dot)de for scheduling a meeting.

    References:

    1. Summarizing User-generated Textual Content: Motivation and Methods forFairness in Algorithmic Summaries [https://arxiv.org/abs/1810.09147]
    2. Fair and Diverse DPP-based Data Summarization[https://arxiv.org/abs/1802.04023]

     

    Leitung und Ansprechpartner der Abschlussarbeit

    Dr. Koustav Rudra

  • Neural Ranking Models for Search (Dr. Koustav Rudra)

    Information retrieval consists of two phases. The first phase tries to retrieve morenumber of relevant documents for a given query whereas the second phase tries torank the retrieved documents so that highly relevant documents appear on the topof the list. In this project, we try to develop a neural ranking models that complywell with the IR related axioms.

    An ideal candidate should have

    • Strong background in python and deep learning
    • motivation behind learning and exploring the data
    • knowledge about basic IR concepts

     

    Interested students are encouraged to email to Dr. Koustav Rudra atrudra(at)l3s(dot)de for scheduling a meeting.

    References:

    1. Deeper Text Understanding for IR with Contextual Neural Language Modeling [https://dl.acm.org/doi/10.1145/3331184.3331303
    2. ]CEDR: Contextualized Embeddings for Document Ranking [https://arxiv.org/abs/1904.07094]
    3. Diagnosing BERT with Retrieval Heuristics [http://www.abcamara.com/documents/publications/ECIR_2020.pdf]

     

    Leitung und Ansprechpartner der Abschlussarbeit

    Dr. Koustav Rudra

  • Modeling Recurring Concepts in Imbalanced Data Streams (M.Sc. Amir Abolfazli, Prof. Dr. Eirini Ntoutsi)

    Classifying an imbalanced stream of non-stationary data with recurring drift is a challenging task. Recurringconcept drifts occur when the same concept, observed in the past, reappears in the data stream. In the real world,rush-hour in traffic flows and seasons are examples of repeated patterns. A common approach to dealing withrecurring drift is to retrain a classifier whenever a change is detected in the underlying distribution of data.However, if a recurring concept occurs, the model then needs to relearn the recurring concept. If we canrecognise recurring concepts, we may perform faster and better by reverting to the classifiers previously trainedon those concepts. Most of the existing approaches such as GraphPool [1], RCD [2], Diversity Pool [3], andECPF [4], dealing with recurring concepts, do not have any mechanism for handling the class imbalanceproblem, where the distribution of instances with respect to the classes is not equal, and often evolves in thestreaming data. Having no mechanism for handling the class imbalance results in models that have poorpredictive performance, particularly for the minority class which is often of higher importance. Therefore, abetter approach taking into account the class imbalance is needed.

    n this Master’s thesis, we aim to model the recurring concepts in data streams whose distribution is skewed. Forthis purpose, the following subtasks must be done: 1) implementing a graphical model which models theconcepts in an imbalanced data stream as states/nodes; 2) incorporating one oversampling method (randomoversampling, localized oversampling, SMOTE, ...) to increase the number of minority instances before theclassifier gets assigned to one state/node, and thus tackle the class imbalance; 3) dynamic visualization ofgraphical model using the Directed Graph.

    Requirements:

    • prerequisite course: Data Mining 2
    • solid programming skills in Python and its libraries (Numpy, scikit-learn, scikit-multiflow, ...)

     

    If you are interested, send an email to Amir Abolfazli (abolfazli[at]l3s.de) with your up-to-date CV and transcriptof records attached.

    References:

    1. Ahmadi, Zahra, and Stefan Kramer. "Modeling recurring concepts in data streams: a graph-based framework." Knowledge and Information Systems 55.1 (2018): 15-44.
    2. GonçAlves Jr, Paulo Mauricio, and Roberto Souto Maior De Barros. "RCD: A recurring concept drift framework." Pattern Recognition Letters 34.9 (2013): 1018-1025.
    3. Chiu, Chun Wai, and Leandro L. Minku. "Diversity-based pool of models for dealing with recurring concepts." 2018 International Joint Conference on Neural Networks (IJCNN). IEEE, 2018.
    4. Anderson, Robert, et al. "Recurring concept meta-learning for evolving data streams." Expert Systems with Applications 138 (2019): 112832.

     

    Leitung und Ansprechpartner der Abschlussarbeit

    M.Sc. Amir AbolfazliProf. Dr. Eirini Ntoutsi

  • Machine Learning for Personalised Medicine (Dr. Megha Khosla)

    Leitung und Ansprechpartner der Abschlussarbeit

    Dr. Megha Khosla

  • Fairness-aware Online Learning under Class Imbalance (M.Sc. Vasileios Iosifidis, Prof. Dr. Eirini Ntoutsi)

    Fairness-aware online learning has become an evolving field during the last fewyears. Fairness-aware online learning goal is to maintain a classifier that performswell and does not discriminate over the course of the stream. Some initial works havebeen proposed to tackle discriminatory outcomes from online classification [1, 2];however, these methods do not take into consideration the uneven class distributionover the course of the stream. If the imbalance problem is not tackled, the learnermainly learns the majority class and strongly misclassifies/rejects the minority. Suchmethods might appear to be fair for certain fairness definitions that rely on parity inthe predictions between the protected and non-protected groups. In reality though thelow discrimination scores are just an artifact of the low prediction rates for theminority class.

    In this master thesis, we want to investigate the combined problem of class-imbalanceand fairness-aware learning in the online setup. We focus on Naive Bayes classifierwhich has been extensively studied in the context of fairness but in the static setting.In this work, we plan to extend these models to the online setting taking into accountthe imbalance of the population under different fairness notions such as statisticalparity[3], equal opportunity [4], and equalized odds [4].

    An ideal candidate should be:

    • a self motivated and independent learner
    • knowledgeable about machine learning (good grades in Data Mining I, DataMining II)
    • experienced with python or java

     

    Interested students are encouraged to email to Eirini Ntoutsi at ntoutsi(at)l3s(dot)deand/or Vasileios Iosifidis at iosifidis(at)l3s(dot)de for scheduling an appointment. CVand transcript of records must be sent beforehand.

    References

    1. V. Iosifidis, H. Tran, E. Ntoutsi, "Fairness-enhancing interventions in streamclassification", 30th International Conference on Databases and ExpertSystems Applications (DEXA), 2019.
    2. W. Zhang, E. Ntoutsi, "An Adaptive Fairness-aware Decision Tree Classifier",International Joint Conference on Artificial Intelligence (IJCAI), 2019.
    3. Kamiran, F., & Calders, T. (2012). Data preprocessing techniques forclassification without discrimination. Knowledge and Information Systems,33(1), 1-33.
    4. Hardt, M., Price, E. and Srebro, N., 2016. Equality of opportunity insupervised learning. In Advances in neural information processing systems (pp.3315-3323).

     

     

    Leitung und Ansprechpartner der Abschlussarbeit

    M.Sc. Vasileios IosifidisProf. Dr. Eirini Ntoutsi

  • Question Answering Using Deep Learning (Prof. Anand)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Avishek Anand

  • Faster Inference for Deep Neural Rankers (Prof. Anand)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Avishek Anand

  • Interpretability of Neural Models (Prof. Anand)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Avishek Anand

  • Neural Information Retrieval (Prof. Anand)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Avishek Anand

  • Uncertain graph embedding (Dr. Tuan-Anh Hoang)

    Uncertain graph embeddingGraph embedding is an efficient approach for unsupervisedrepresentation learning for nodes in graphs, which has shown itsoutperformance in downstream applications [1]. There has been anumber of works on different methods for graph embedding, e.g.,DeepWalk [2], LINE [3], and Node2vec [4]. Most existing methodshowever do not consider the uncertainty in the input graphs. Examplesof the uncertainty include edges can only observed probabilistically, andnodes’ attributes are not well/fine-grained measured (e.g., young, ormiddle-aged instead of a number for age).

    In this project, we would like to extend and adapt the existing graphembedding methods as well as to develop new ones for working withuncertain graphs. We would also like to explore the efficiency of thesemethods for downstream applications in different contexts.

    An ideal candidate for this project should be:

    • a self motivated learner
    • experienced with programming languages (ideally C/C++ and/orPython)
    • knowledgeable about basic machine learning models

     

    References:

    1. Cui, Peng, et al. "​A survey on network embedding​." IEEE Transactions onKnowledge and Data Engineering 31.5 (2018): 833-852.
    2. Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. "​Deepwalk: Online learning ofsocial representations​." Proceedings of the 20th ACM SIGKDD internationalconference on Knowledge discovery and data mining. ACM, 2014.
    3. Tang, Jian, et al. "​Line: Large-scale information network embedding​."Proceedings of the 24th international conference on world wide web. InternationalWorld Wide Web Conferences Steering Committee, 2015.
    4. Grover, Aditya, and Jure Leskovec. "​node2vec: Scalable feature learning fornetworks​." Proceedings of the 22nd ACM SIGKDD international conference onKnowledge discovery and data mining. ACM, 2016.

     

    Leitung und Ansprechpartner der Abschlussarbeit

    Dr. Tuan-Anh Hoang

  • Automatic Text Summarization in cooperation with Volkswagen (Prof. Nejdl, Volkswagen)

    Bei Volkswagen mit einer Abschlussarbeit im Machine.Learning

    Team - Natural Language Processing - Automatic Text Summarization am Standort Wolfsburg

    Details

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. techn. Wolfgang Nejdl

  • Data Analytics and Mobility (Dr. Elena Demidova, M.Sc. Nicolas Tempelmeier)

    Cities of the future have a growing demand in intelligent mobility services andinfrastructure to support better mobility and enhance quality of life in urban areas. Anincreasing availability of urban data holds a great potential to facilitate efficient mobilityservices and infrastructure, for instance, through a better understanding of long-termtrends (such as e-mobility) and their impact on transportation needs, or the correlation ofmobility behavior in densely populated areas with influence factors such as weather,regional events or temporal fluctuations. Extraction, integration and analysis ofheterogeneous mobility-related urban data is of interest to various stakeholder groups,including city inhabitants, city councils, providers of mobility services and publictransportation. While data is spread across heterogeneous institutional repositories andWeb platforms, semantic technologies and machine learning methods can be exploited toenable the extraction and analysis of data.

    Examples of such data are:

    • Queries for public transportation services
    • Historical traffic speed records
    • Environmental data
    • Warnings about traffic incidents
    • Location of construction sites
    • Calendar of scheduled events

     

    Possible topics for a MSc thesis in this context could address one of the following research questions through utilizing state-of-the-art machine learning and data mining methods:

    • Prediction of road traffic from public transportation query logs
    • Estimating the impact of events
    • Impact of traffic incidents
    • Verification of traffic incidents•Prediction of congestion and bottlenecks in traffic
    • Event detection

     

    For further information, see the homepage of the project “Data4UrbanMobility”: http://d4um.l3s.uni-hannover.de

    Leitung und Ansprechpartner der Abschlussarbeit

    Dr. Elena Demidova M.Sc. Nicolas Tempelmeier

  • Deep Learning on Graph Structured Data (Dr. Megha Khosla)

    Leitung und Ansprechpartner der Abschlussarbeit

    Dr. Megha Khosla

  • Modelling and predicting the knowledge state of students (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth

  • Search as learning (SaL): Investigating the impact of images and videos in SaL scenarios (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth

  • Automatic highlighting of importants segments in educational videos (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth

  • Automatic question generation for (educational) videos (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth

  • Using knowledge graphs for video question answering (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth

  • Automatic captioning for scholarly figures (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth

  • Analysing and Linking Graphical Representations in Computer Science Publications to their Implementation (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth

  • Linking Formulas in Scientific Publication with their Software Implementation (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth

  • Information extraction from scientific (textual) publications for knowledge graph enrichment (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth

  • Improving OCR for recognizing text in scientific videos (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth

  • Semi-automatic enrichment of scientific videos with external recommendations (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth