Wissensbasierte SystemeStudium und Lehre
Studien- & Abschlussarbeiten

Studien- und Abschlussarbeiten

Die Mitarbeiter des KBS beraten Sie gerne mit aktuellen Themenvorschlägen für Ihre Bachelor- / Masterarbeit.

Achtung: Bachelorarbeiten können ggf. auch zu zweit durchgeführt werden, solange der Einzelbeitrag zugeordnet werden kann. Für entsprechend umfangereichere Aufgabenstellungen sehen Sie bitte auch bei den Masterarbeiten nach.

BACHELOR- UND MASTERARBEITEN

  • Identifying conserved host response for virus infections such as Covid-19 (Prof. Nejdl, Prof. Li)

    Clinical presentations of COVID-19 are highly variable, and while the majority of patients experiences mild to moderate symptoms, 10%–20% of patients develop pneumonia and severe disease. We recently performed the first single-cell RNA-sequencing of blood cells to determine changes in immune cell composition and activation in mild versus severe COVID-19 over time.1 A recent study based on multi-cohort analysis of host immune response identifies conserved protective and detrimental modules associated with severity across viruses2,3. Therefore, we hypothesized that viral infections induce a conserved host response and the conserved response is associated with disease severity. In this project, we will 1) implement the analysis framework for identifying conserved host response and 2) apply it to transcriptome data from multi-cohorts of COVID-19 and other virus infected patients.

    PDF-Flyer

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. techn. Wolfgang Nejdl Prof. Dr. Yang Li

  • Digitale Transformation in der Medizin (Prof. Nejdl)

    The programme enables students of medicine (doctoral students) and computer science (Master's students in the context of their final thesis) to jointly conduct research in the subject area of "Digital Transformation in Medicine". The existence or acquisition of computer science knowledge within the framework of the programme is not mandatory.

    Projects from the following thematic complexes are offered:

    1. Tools and technologies for doctors:
      • Workflows and process optimisation
      • Decision support systems
    2. Tools and technologies for patients:
      • General support and monitoring
      • Analysis technologies
      • Artificial intelligence methods (e.g. machine learning/neural networks for image analysis)
      • Big Data and databases (e.g. data mining methods)
      • Robotics and assistance systems

     

    The projects each include, as individual sub-projects, the development as well as evaluation of one of the above-mentioned technologies in cooperation between the two students. In addition to a first supervisor and a co-supervisor from their own discipline, the students are each assigned another supervisor from the other discipline who supports the students with questions and the work on the topic.

    Deadline: 06. April 2021

    Website PDF-Flyer

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. techn. Wolfgang Nejdl

  • A benchmark for biomedical understanding (Tam Nguyen)

    Psychiatric Disorders (PDs) rank 5th in terms of prevalence and account for 6.7% of “Disability Adjusted Life Years”. We have explored different types of datasets to understand the landscape of research on psychiatric disorders. In particular, we designed a categorization of psychiatric data, including genomics data, molecules data, drug review data, research publication data, clinical data, etc. Moreover, we also built a repository of related venues and downable resources, including top-tier journals (Bioinformatics, ACM Transactions on Computing for Healthcare, Nature Research, Genome Research, Social Psyhiatry and Psychiatric Epidemiology, etc.), top-tier conferences (IEEE International Conference on Bioinformatics and Biomedicine, ACM Conference on Health, Inference, and Learning etc.), and top-tier workshops (AI for public health, BioKDD, WWW AI In Health). In this project, we will develop a benchmark based on the explored datasets and related to the aforementioned research community such as performing question answering to help users cross-check psychiatric facts.

    Possible benchmark topics include:

    1. A Benchmark for (bio)medical Machine Reading Comprehension
    2. Benchmarking the Generality and Domain-Specification of MCR models
    3. Benchmark on active learning for biomedical document annotation
    4. Benchmark on active learning for biomedical image segmentation
    5. Benchmark on active learning for biomedical question answering

     

    An ideal candidate should have:

    • Good background in machine learning, deep learning, and programming (Python or R).
    • Knowledge about data analytics, especially on social media, textual data, and knowledge graph data.
    • Experiences with natural language processing tasks such as machine reading comprehension and question answering is a plus.
    • Love to read and explore scientific articles, preferably every day.
    • Pro-active in learning new things, preferably every day.

    Interested students are encouraged to email to Dr. Tam Nguyen tamnguyen(at)l3s(dot)de for discussions.

    References:

    • Survey Paper 1: Q. Jin, Z. Yuan, G. Xiong, Q. Yu, C. Tan, M. Chen, S. Huang, X. Liu, S. Yu, Biomedical Question Answering: A Comprehensive Review, (2021).
    • Towards Medical Machine Reading Comprehension with Structural Knowledge and Plain Text (EMNLP 2020)
    • BioMRC: A Dataset for Biomedical Machine Reading Comprehension 

     

    Leitung und Ansprechpartner der Abschlussarbeit

    Dr. Tam Nguyen

  • Debunking Medical Misinformation - Rootcauses, Benchmarks, and Explanations (Tam Nguyen)

    An abundance of false or misleading medical information has been observed on the Web and particularly on social media, posing a considerable threat to public health while eroding trust in healthcare systems. 6 out of 10 people search for the cause of their medical condition online, and among those who found a diagnosis online, 35% does not visit a professional medical provider. The COVID-19 pandemic has exacerbated this problem by bringing forward an infodemic surrounding the coronavirus that spreads as quickly and deadly as the virus itself. For example, rumours about remedies such as methanol to cure COVID-19 resulted in 300+ deaths and 1000+ people fallen ill. This project aims to build a framework to combat medical misinformation with focus on social media analytics, detection benchmarks, knowledge intensive tasks (question answering, machine reading comprehension, knowledge graph construction) and explainable mitigation measures.

    An ideal candidate should have:

    • Good background in data analytics, especially on social media, textual data, and knowledge graph data.
    • Motivated in learning and exploring the data of interest in some human level annotation.
    • Knowledge about machine learning, deep learning, and programming (Python or R).
    • Love to read and explore scientific articles, preferably every day.
    • Pro-active in learning new things, preferably every day.

    Interested students are encouraged to email to Dr. Tam Nguyen tamnguyen(at)l3s(dot)de for discussions.

    References:

    • Waszak, P.M., Kasprzycka-Waszak, W. and Kubanek, A., 2018. The spread of medical fake news in social media–the pilot quantitative study. Health policy and technology, 7(2), pp.115-118.
    • Naeem, S.B., Bhatti, R. and Khan, A., 2020. An exploration of how fake news is taking over social media and putting public health at risk. Health Information & Libraries Journal.
    • Treharne, T. and Papanikitas, A., 2020. Defining and detecting fake news in health and medicine reporting. Journal of the Royal Society of Medicine, 113(8), pp.302-305.

     

    Leitung und Ansprechpartner der Abschlussarbeit

    Tam Nguyen

  • Intelligente Prozessregelung durch mitlernende Stabilitätskarten (Svenja Reimer)

    Ratterschwingungen im Zerspanprozess führen zu einer schlechten Oberflächenqualität. Das Auftreten von Ratterschwingungen ist eng mit den gewählten Prozessstellgrößen (Schnitttiefe, - breite, Drehzahl) verknüpft. Zur Ermittlung der Zusammenhänge zwischen den Prozessstellgrößen und Prozessstabilität unter zur Wahl optimaler Prozessstellgrößen sind bislang aufwändige Simulationen notwendig. Durch den Einsatz von maschinellem Lernen und modernen Überwachungssystemen können diese Zusammenhänge auch im Prozess selbständig von der Maschine erlernt werden ("mitlernende Stabilitätskarten"). Ziel dieser Arbeit ist die Entwicklung und Umsetzung einer intelligenten Prozessregelung auf Basis der mitlernenden Stabilitätskarten. Die Arbeit umfasst unter anderem folgende Punkte:

    • Entwicklung von Bearbeitungsstrategien zur gezielten Datengenerierung für mitlernende Stabilitätskarten
    • Entwicklung von Strategien für das "online-Lernen" während dem Prozess
    • Entwicklung einer Zielfunktion für die Regelung
    • Zerspanuntersuchungen zur online Anpassung von Prozessstellgrößen

    PDF-Seite

    Leitung und Ansprechpartner der Abschlussarbeit

    Svenja Reimer

  • Understanding Relevance Search over Medical Knowledge Graphs (Prof. Ganguly)

    There is an exponential rise in the amount of medical evidence being produced, which makes it very difficult for medical professionals to stay regularly updated with the recent research studies in order to practice evidence-based medicine. In this master thesis, we aim to accommodate richer query variations for online biomedical literature search and redesign the document collection used for search into a knowledge graph. In other words, we will adapt the “exemplar query” setting developed by Mottin et al. (2016,2018) to the biomedical information retrieval domain and try to achieve state-of-the-art performance in TREC Precision Medicine Track.

    Specifically, the student will get first-hand experience working with unstructured text (clinical notes) and knowledge graphs, in a medical domain. 

    Prerequisites:

    1. Good knowledge of Natural Language Processing (NLP) and Graphs
    2. Strong programming background in Python and working knowledge of standard Machine Learning and NLP libraries

    References:

    1. Mottin et al. Exemplar queries: a new way of searching (VLDB 2016)
    2. Gu et al. Relevance Search over Schema-Rich Knowledge Graphs (WSDM 2019)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Niloy Ganguly Soumyadeep Roy

  • Neural Models for Improving First Phase Information Retrieval (Dr. Koustav Rudra)

    Document Retrieval problems typically consist of two phases. The first phase triesto retrieve more number of relevant documents for a given query from a largecollection of documents (typically in 10's of millions to billions of documents)whereas the second phase tries to rank the retrieved documents so that highlyrelevant documents appear on the top of the list. The first phase has to be simpleand fast because it works on the large set of document collection. Standard modelssuch as BM25, QL rely only on the tf-idf based term matching between query anddocument without considering the context of words. Recent neural ranking modelshave shown promising performance for the reranking task. However, there hasbeen limited work on neural models and indexing methods to improve the first-stage retrieval. The objective of this thesis is to explore the recent contextualautoregressive models to improve the first-stage retrieval methods.

    Problem Statement

    Given a large collection of items — documents, passages — the aim is two fold:

    • construct a neural retrieval model that exploits attention-based semantic termmatching for effective recall @ k
    • explore how to make inference efficient for this family of models by using ideasfrom KNN-search and LSH.

     

    Challenges

    There are two key challenges what we are faced when making progress for first-stage retrieval:

    1) Measuring recall is difficult for large datasets: The typical way in which recallis measured in existing document retrieval tasks (TREC) is by pooling multipleretrieval models followed by human judgement. But since all the pooling methodsalso follow the same retrieval methods, relevant documents not surfaced by theinitial retrieval method are not judged. This is a hard problem because re-annotation is not possible in such a missing data setting.

    2) Efficiency constraints are strong: Since most of the retrieval methods based onterm matching and KNN search are expected to be fast, any addition of modelcomplexity results in unacceptable inference times. For example deeper layers arealmost always expensive. Such constraints force us to consider alternate designdecisions that trade-off accuracy for speed.We will try to handle the above-mentioned challenges in the following two ways--- 1. consider a small size dataset, 2. a better retrieval model should improve theperformance (map, nDCG) of the second level reranking task.

    An ideal candidate should have:

    • Strong background in python and deep learning
    • motivation behind learning and exploring the data
    • knowledge about basic IR concepts

     

    Interested students are encouraged to email to Dr. Koustav Rudra at rudra(at)l3s(dot)de for scheduling a meeting.

    References:

    1. Context-Aware Sentence/Passage Term Importance Estimation For First StageRetrieval [https://arxiv.org/abs/1910.10687]
    2. Two tower model ICLR 2020 — openreview.net/forum
    3. Efficient Training on Very Large Corpora via Gramian Estimation. ICLR 2019--- https://arxiv.org/pdf/1807.07187.pdf
    4. StarSpace: Embed All The Things! AAAI 2018 ---https://arxiv.org/pdf/1709.03856.pdf

     

    Leitung und Ansprechpartner der Abschlussarbeit

    Dr. Koustav Rudra

  • Identifying Sub-Topics and Generating a Fair Summary of Migration-Related News Articles (Dr. Koustav Rudra)

    Thousands of news are produced and consumed each day. These news arescattered across diverse set of topics such as business, economy, internationalrelation, Migration etc. Migration is an issue currently faced by many countriesacross the globe. Different news channels cover various news about migrationrelated activities. A close inspection reveals that these migration-related news iscomposed of different sub-topics such as human trafficking, immigration,organized crime etc. Identifying these sub-topics and summarizing them is anessential first step to get a coherent real-time view of the situation. The objectiveof this project is two fold:

    1. Identifying migration-related sub-topics and classifying news sources into thesecategories.
    2. Generating a report/summary of the scenario by combining information frommultiple sources. The objective is to represent each sub-topic in a fair way.

     

    An ideal candidate should have

    • Strong background in python and deep learning
    • motivation behind learning and exploring the data
    • interest in some human level annotation
    • knowledge about basic NLP concepts

     

    Interested students are encouraged to email to Dr. Koustav Rudra atrudra(at)l3s(dot)de for scheduling a meeting.

    References:

    1. Summarizing User-generated Textual Content: Motivation and Methods forFairness in Algorithmic Summaries [https://arxiv.org/abs/1810.09147]
    2. Fair and Diverse DPP-based Data Summarization[https://arxiv.org/abs/1802.04023]

     

    Leitung und Ansprechpartner der Abschlussarbeit

    Dr. Koustav Rudra

  • Neural Ranking Models for Search (Dr. Koustav Rudra)

    Information retrieval consists of two phases. The first phase tries to retrieve morenumber of relevant documents for a given query whereas the second phase tries torank the retrieved documents so that highly relevant documents appear on the topof the list. In this project, we try to develop a neural ranking models that complywell with the IR related axioms.

    An ideal candidate should have

    • Strong background in python and deep learning
    • motivation behind learning and exploring the data
    • knowledge about basic IR concepts

     

    Interested students are encouraged to email to Dr. Koustav Rudra atrudra(at)l3s(dot)de for scheduling a meeting.

    References:

    1. Deeper Text Understanding for IR with Contextual Neural Language Modeling [https://dl.acm.org/doi/10.1145/3331184.3331303
    2. ]CEDR: Contextualized Embeddings for Document Ranking [https://arxiv.org/abs/1904.07094]
    3. Diagnosing BERT with Retrieval Heuristics [http://www.abcamara.com/documents/publications/ECIR_2020.pdf]

     

    Leitung und Ansprechpartner der Abschlussarbeit

    Dr. Koustav Rudra

  • Fairness-aware Online Learning under Class Imbalance (Dr. Vasileios Iosifidis, Prof. Dr. Wolfgang Nejdl)

    Fairness-aware online learning has become an evolving field during the last fewyears. Fairness-aware online learning goal is to maintain a classifier that performswell and does not discriminate over the course of the stream. Some initial works havebeen proposed to tackle discriminatory outcomes from online classification [1, 2];however, these methods do not take into consideration the uneven class distributionover the course of the stream. If the imbalance problem is not tackled, the learnermainly learns the majority class and strongly misclassifies/rejects the minority. Suchmethods might appear to be fair for certain fairness definitions that rely on parity inthe predictions between the protected and non-protected groups. In reality though thelow discrimination scores are just an artifact of the low prediction rates for theminority class.

    In this master thesis, we want to investigate the combined problem of class-imbalanceand fairness-aware learning in the online setup. We focus on Naive Bayes classifierwhich has been extensively studied in the context of fairness but in the static setting.In this work, we plan to extend these models to the online setting taking into accountthe imbalance of the population under different fairness notions such as statisticalparity[3], equal opportunity [4], and equalized odds [4].

    An ideal candidate should be:

    • a self motivated and independent learner
    • knowledgeable about machine learning (good grades in Data Mining I, DataMining II)
    • experienced with python or java

     

    Interested students are encouraged to email to Wolfgang Nejdl and/or Vasileios Iosifidis at for scheduling an appointment. CV and transcript of records must be sent beforehand.

    References

    1. V. Iosifidis, H. Tran, E. Ntoutsi, "Fairness-enhancing interventions in streamclassification", 30th International Conference on Databases and ExpertSystems Applications (DEXA), 2019.
    2. W. Zhang, E. Ntoutsi, "An Adaptive Fairness-aware Decision Tree Classifier",International Joint Conference on Artificial Intelligence (IJCAI), 2019.
    3. Kamiran, F., & Calders, T. (2012). Data preprocessing techniques forclassification without discrimination. Knowledge and Information Systems,33(1), 1-33.
    4. Hardt, M., Price, E. and Srebro, N., 2016. Equality of opportunity insupervised learning. In Advances in neural information processing systems (pp.3315-3323).

     

     

    Leitung und Ansprechpartner der Abschlussarbeit

    M.Sc. Vasileios IosifidisProf. Dr. techn. Wolfgang Nejdl

  • Question Answering Using Deep Learning (Prof. Anand)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Avishek Anand

  • Faster Inference for Deep Neural Rankers (Prof. Anand)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Avishek Anand

  • Interpretability of Neural Models (Prof. Anand)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Avishek Anand

  • Neural Information Retrieval (Prof. Anand)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Avishek Anand

  • Data Analytics and Mobility (Dr. Elena Demidova, M.Sc. Nicolas Tempelmeier)

    Cities of the future have a growing demand in intelligent mobility services andinfrastructure to support better mobility and enhance quality of life in urban areas. Anincreasing availability of urban data holds a great potential to facilitate efficient mobilityservices and infrastructure, for instance, through a better understanding of long-termtrends (such as e-mobility) and their impact on transportation needs, or the correlation ofmobility behavior in densely populated areas with influence factors such as weather,regional events or temporal fluctuations. Extraction, integration and analysis ofheterogeneous mobility-related urban data is of interest to various stakeholder groups,including city inhabitants, city councils, providers of mobility services and publictransportation. While data is spread across heterogeneous institutional repositories andWeb platforms, semantic technologies and machine learning methods can be exploited toenable the extraction and analysis of data.

    Examples of such data are:

    • Queries for public transportation services
    • Historical traffic speed records
    • Environmental data
    • Warnings about traffic incidents
    • Location of construction sites
    • Calendar of scheduled events

     

    Possible topics for a MSc thesis in this context could address one of the following research questions through utilizing state-of-the-art machine learning and data mining methods:

    • Prediction of road traffic from public transportation query logs
    • Estimating the impact of events
    • Impact of traffic incidents
    • Verification of traffic incidents•Prediction of congestion and bottlenecks in traffic
    • Event detection

     

    For further information, see the homepage of the project “Data4UrbanMobility”: http://d4um.l3s.uni-hannover.de

    Leitung und Ansprechpartner der Abschlussarbeit

    Dr. Elena Demidova M.Sc. Nicolas Tempelmeier

  • Modelling and predicting the knowledge state of students (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth

  • Search as learning (SaL): Investigating the impact of images and videos in SaL scenarios (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth

  • Automatic highlighting of importants segments in educational videos (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth

  • Automatic question generation for (educational) videos (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth

  • Using knowledge graphs for video question answering (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth

  • Automatic captioning for scholarly figures (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth

  • Analysing and Linking Graphical Representations in Computer Science Publications to their Implementation (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth

  • Linking Formulas in Scientific Publication with their Software Implementation (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth

  • Information extraction from scientific (textual) publications for knowledge graph enrichment (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth

  • Improving OCR for recognizing text in scientific videos (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth

  • Semi-automatic enrichment of scientific videos with external recommendations (Prof. Ewerth)

    Leitung und Ansprechpartner der Abschlussarbeit

    Prof. Dr. Ralph Ewerth