Dimensionality reduction (DR) techniques such as UMAP and t-SNE are widely used to project high-dimensional datasets into two-dimensional visualizations. These projections help analysts visually identify clusters, trends, and outliers in complex datasets. However, interpreting why specific groups appear in the projection remains difficult. While analysts can visually recognize patterns, understanding which underlying data attributes are responsible for the observed structures is often unclear. This lack of interpretability limits the usefulness of DR techniques in exploratory analysis, especially in domains where explainability is important.
Recent work in
explainable AI and interactive visualization has shown that combining machine
learning models with visual interaction can help users better understand
complex analytical results. Decision trees, in particular, provide
human-readable rules that describe differences between classes and are
therefore promising candidates for explaining selections made within DR
projections.
This project
investigates how interactive decision trees can support the interpretation of
dimensionality reduction results. The core idea is that a user interactively
selects a group of points within a DR visualization (e.g., a cluster in a UMAP
projection). The selected and non-selected points are then treated as two
separate classes, after which a decision tree classifier is generated to
distinguish between them.
The project
focuses on designing and implementing an interactive visual analytics prototype
that tightly integrates DR visualizations with decision tree explanations.
Important research questions include:
Possible extensions include comparing different explanation strategies, supporting iterative refinement of selections, or exploring how stable the explanations remain under varying DR parameters.
The project is expected to last 6 months, and at the end, the
student should deliver a report describing the work performed, the methodology
used, and corresponding findings. It is expected that the results can be used
in a scientific journal publication.
Requirements:
Stef van den Elzen