back to list

Project: Visual analytics for the Analysis of Pangenomes

Description

Master project with Astrid van den Brandt.

Studying genetic variation underlying phenotypes is an important topic in genomics. In plant genomic research, for example, scientists analyze the variation between cultivars and wild types to develop crops with improved resistance to diseases. This analysis is commonly based on comparison to a single reference genome. Because the number of genomes is growing rapidly and to avoid bias towards a single reference genome, the field is shifting towards the use of pangenomes, i.e., abstract representations of multiple genomes in a species or population. While pangenomes allow for a more complete picture of the genetic variation, their large size and complex data structure hinder analysis. 

To deal with this, genome scientists need visual analytics tools that support interactive and exploratory analysis of pangenomes to identify relevant information for variation analysis. A major challenge is to handle multiple references together with providing the adequate context of heterogeneous (meta)data, such as annotations, evolutionary relationships, and phenotypes. To address this challenge, we aim to create new visual analytics approaches that support various comparative analysis tasks (small-scale variation, structural variation, i.e. synteny analysis and presence-absence variation, and intra-genomic variation) in large sets of crop genomes and their plant pathogens.

Currently, we have developed a design for the analysis of small-scale variation within a single gene across 100 to 200 genomes. Some potential future work challenges are:

  1. Scaling up to larger pangenomes (> 200 genome sequences), using:
    • Tailored visual encodings and interaction techniques
    • Computational/algorithmic support 
  2. Scaling up from genes to larger regions
  3. Displaying multiple regions at the same time
  4. Statistics calculations to assist pattern search and downstream analysis 
  5. Tree comparison and interactivity

Besides the current work in small-scale variation, there are various more open-ended challenges in the other variation analysis tasks that can be addressed. 

Please reach out to me if you are interested in working in this area or want to know more.

Details
Supervisor
Astrid van den Brandt