In this thesis a set of novel video annotation methods for performance evaluation of object detection, tracking and recognition applications is proposed. Large scale labeled datasets are of key importance for the development of automatic video analysis tools as they, from one hand, allow multi-class classifiers training and, from the other hand, support the algorithms evaluation phase. This is widely recognized by the multimedia and computer vision communities, as witnessed by the growing number of available datasets; however, the research still lacks in usable and effective annotation tools, since a lot of human effort is necessary to generate high quality ground truth data. However, it is not feasible to collect large video ground truths, covering as much scenarios and object categories as possible, by exploiting only the effort of isolated research groups. For these reasons in this thesis we first present a semi-automatic stand-alone tool for gathering ground truth data with the aim of improving the user experience by providing edit shortcuts such as hotkeys and drag-and-drop, and by integrating computer vision algorithms to make the whole process automatic with a little intervention by the end users. In this context we also present a collaborative web-based platform for video ground truthing which integrates the stand-alone tools and provides an easy and intuitive user interface that allows plain video annotation and instant sharing/integration of the generated ground truths, in order not to only alleviate a large part of the effort and time needed, but also to increase the quality of the generated annotations. These tools are specifically thought to help users in collecting annotations thanks to the introduction of simple interfaces, which considerably improve and facilitate their work, also by integrating novel methods for quality control, but still remain a burdensome task with regard to the attention and time needed to obtain good records. To motivate the users and relieve them from the tiresome task of making manual annotations, we devised strategies to automatically create annotation by processing data from the crowd. To this end we initially develop an approach based on an online game to collect big noisy data. By exploiting the information, we then propose data-driven approaches, mainly based on image segmentation and statistical methods, which allow us to obtain reliable video annotations by using low quality and noisy data gathered quickly and easily from the game. Also we demonstrate that the quality of the obtained annotations increases as more users play with the game making it an effective and valid application for the collection of consistent ground truth data.

Large scale ground truth generation for performance evaluation of computer vision methods / DI SALVO, Roberto. - (2013 Dec 10).

Large scale ground truth generation for performance evaluation of computer vision methods

DI SALVO, ROBERTO
2013-12-10

Abstract

In this thesis a set of novel video annotation methods for performance evaluation of object detection, tracking and recognition applications is proposed. Large scale labeled datasets are of key importance for the development of automatic video analysis tools as they, from one hand, allow multi-class classifiers training and, from the other hand, support the algorithms evaluation phase. This is widely recognized by the multimedia and computer vision communities, as witnessed by the growing number of available datasets; however, the research still lacks in usable and effective annotation tools, since a lot of human effort is necessary to generate high quality ground truth data. However, it is not feasible to collect large video ground truths, covering as much scenarios and object categories as possible, by exploiting only the effort of isolated research groups. For these reasons in this thesis we first present a semi-automatic stand-alone tool for gathering ground truth data with the aim of improving the user experience by providing edit shortcuts such as hotkeys and drag-and-drop, and by integrating computer vision algorithms to make the whole process automatic with a little intervention by the end users. In this context we also present a collaborative web-based platform for video ground truthing which integrates the stand-alone tools and provides an easy and intuitive user interface that allows plain video annotation and instant sharing/integration of the generated ground truths, in order not to only alleviate a large part of the effort and time needed, but also to increase the quality of the generated annotations. These tools are specifically thought to help users in collecting annotations thanks to the introduction of simple interfaces, which considerably improve and facilitate their work, also by integrating novel methods for quality control, but still remain a burdensome task with regard to the attention and time needed to obtain good records. To motivate the users and relieve them from the tiresome task of making manual annotations, we devised strategies to automatically create annotation by processing data from the crowd. To this end we initially develop an approach based on an online game to collect big noisy data. By exploiting the information, we then propose data-driven approaches, mainly based on image segmentation and statistical methods, which allow us to obtain reliable video annotations by using low quality and noisy data gathered quickly and easily from the game. Also we demonstrate that the quality of the obtained annotations increases as more users play with the game making it an effective and valid application for the collection of consistent ground truth data.
10-dic-2013
Ground truth, performance eveluation, crowdsourcing
Large scale ground truth generation for performance evaluation of computer vision methods / DI SALVO, Roberto. - (2013 Dec 10).
File in questo prodotto:
File Dimensione Formato  
TesiDottorato_VERSIONE_FINALE.pdf

accesso aperto

Licenza: PUBBLICO - Pubblico con Copyright
Dimensione 2.91 MB
Formato Adobe PDF
2.91 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/585461
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact