- Autor
- Chegini, Mohammad
- Bernard, Jürgen
- Cui, Jian
- Chegini, Fatemeh
- Sourin, Alexei
- Andrews, Keith
- Schreck, Tobias
- TitelInteractive Visual Labelling versus Active Learning: An Experimental Comparison
- Datei
- DOI10.1631/FITEE.1900549
- Persistent Identifier
- Erschienen inFrontiers of information technology & electronic engineering
- Band21
- Erscheinungsjahr2020
- Heft4
- Seiten524-535
- LicenceCC BY
- ISSN2095-9230
- Download Statistik1737
- Peer ReviewJa
- AbstractMethods from supervised machine learning allow the classification of new data automatically and aretremendously helpful for data analysis. The quality of supervised maching learning depends not only on the typeof algorithm used, but also on the quality of the labelled dataset used to train the classifier. Labelling instancesin a training dataset is often done manually relying on selections and annotations by expert analysts, and is oftena tedious and time-consuming process. Active learning algorithms can automatically determine a subset of datainstances for which labels would provide useful input to the learning process. Interactive visual labelling techniquesare a promising alternative, providing effective visual overviews from which an analyst can simultaneously exploredata records and select items to a label. By putting the analyst in the loop, higher accuracy can be achieved inthe resulting classifier. While initial results of interactive visual labelling techniques are promising in the sense thatuser labelling can improve supervised learning, many aspects of these techniques are still largely unexplored. Thispaper presents a study conducted using the mVis tool to compare three interactive visualisations, similarity map,scatterplot matrix (SPLOM), and parallel coordinates, with each other and with active learning for the purpose oflabelling a multivariate dataset. The results show that all three interactive visual labelling techniques surpass activelearning algorithms in terms of classifier accuracy, and that users subjectively prefer the similarity map over SPLOMand parallel coordinates for labelling. Users also employ different labelling strategies depending on the visualisation used.