back to list
Project: Do Large Language Models Understand Visualizations? Evaluating Visual Understanding vs. Textual Memorization in Visualization Annotation
Description
Problem Description
Large Language Models (LLMs) are increasingly used to generate natural‑language annotations for data visualizations such as line charts, scatterplots, and bar plots. These annotations can describe trends, highlight anomalies, or summarize relationships in the data. However, it remains unclear whether LLMs produce these annotations by genuinely interpreting the visual content or by exploiting memorized textual patterns, dataset artifacts, or statistical priors unrelated to the actual visualization.
This project investigates the extent to which LLM‑generated annotations reflect true visual understanding. Specifically, it examines whether LLMs rely on the rendered visual content or instead infer annotations from textual metadata, axis labels, or common dataset structures. Understanding this distinction is crucial for evaluating the reliability of LLM‑assisted visualization tools and for designing systems that support trustworthy data analysis.
References
- Ahmed Masry, Do Xuan Long, Jia Qing Tan, Shafiq Joty, and Enamul Hoque. 2022. ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2263–2279, Dublin, Ireland. Association for Computational Linguistics.
- Rahmanzadehgervi, Pooyan & Bolton, Logan & Taesiri, Mohammad Reza & Nguyen, Anh. (2024). Vision language models are blind. 10.48550/arXiv.2407.06581.
Details
- Supervisor
-
Fernando Paulovich
- Interested?
-
Get in contact