Machine Learning, and in general Artificial Intelligence approaches, brought a great advance in each and every field of Computer Science increasing accuracy levels of predictors in any known problem. Indeed, this evolution enabled the construction of effective frameworks and solutions able to be used in investigative and forensics scenarios for detection of fakes and, in general, manipulations in multimedia contents. On the other hand, can we trust these systems? Is research activity going in the right direction? Are we just taking the low-hanging fruit without taking into account many real-case-in-the-wild situations? The purpose of this paper is to raise an alert to the research community in the specific context of synthetic voice detection, where data available for training is not big enough to give sufficient trust in the techniques available in the literature. To this aim, an exploratory investigation of the most common voice spoofing dataset was carried out and it was surprisingly easy to build simple classifiers without any Deep Learning techniques. Simple considerations on bitrate were sufficient to achieve an effective detection performance.
Is synthetic voice detection research going into the right direction?
Giudice, O;Stanco, F;Allegra, D
2022-01-01
Abstract
Machine Learning, and in general Artificial Intelligence approaches, brought a great advance in each and every field of Computer Science increasing accuracy levels of predictors in any known problem. Indeed, this evolution enabled the construction of effective frameworks and solutions able to be used in investigative and forensics scenarios for detection of fakes and, in general, manipulations in multimedia contents. On the other hand, can we trust these systems? Is research activity going in the right direction? Are we just taking the low-hanging fruit without taking into account many real-case-in-the-wild situations? The purpose of this paper is to raise an alert to the research community in the specific context of synthetic voice detection, where data available for training is not big enough to give sufficient trust in the techniques available in the literature. To this aim, an exploratory investigation of the most common voice spoofing dataset was carried out and it was surprisingly easy to build simple classifiers without any Deep Learning techniques. Simple considerations on bitrate were sufficient to achieve an effective detection performance.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.