Data labeling, as a fundamental task in supervised machine
learning, refers to the annotation of data with representative labels. In
contrast to active learning (AL), interactive labeling relies on users’
knowledge and pattern identification ability to select meaningful instances to
label [1]. Previous work shows that this user-centered instance selection
method can relieve the cold-start problem and the query strategy definition
issue in AL. Besides, it potentially speeds up the labeling process by
identifying similar instances and labeling them once. Different user strategies
may be defined to describe how users identify instances to label based on
visualization results [2].
The goal of this project is to predict user strategies and recommend instances to label so as to facilitate the interactive labeling process. To achieve this goal, a visual interactive labeling interface is required, for example, as shown in the Figure. Some algorithms or machine learning models can be used to learn and predict user selection[3][4]. We also would like to conduct a following evaluation of the final solution. Therefore, we expect you to have programming fundamentals and know about basic knowledge on machine learning and visualization.
References
[1] J. Bernard, M. Hutter, M. Zeppelzauer, D. Fellner and M. Sedlmair, "Comparing Visual-Interactive Labeling with Active Learning: An Experimental Study," in IEEE Transactions on Visualization and Computer Graphics, vol. 24, no. 1, pp. 298-308, Jan. 2018, doi: 10.1109/TVCG.2017.2744818.
[2] Bernard, J., Zeppelzauer, M., Lehmann, M., Müller, M. and Sedlmair, M. (2018), Towards User-Centered Active Learning Algorithms. Computer Graphics Forum, 37: 121-132. https://doi.org/10.1111/cgf.13406
[3] Fan, C. and Hauser, H. (2018), Fast and Accurate CNN-based Brushing in Scatterplots. Computer Graphics Forum, 37: 111-120. https://doi.org/10.1111/cgf.13405
[4] Gadhave K, Görtler J, Cutler Z, et al. Predicting intent behind selections in scatterplot visualizations. Information Visualization. 2021;20(4):207-228. doi:10.1177/14738716211038604