Classifiying audio with artificial-intelligence can be done using TensorFlow’s YAMNet model.
YAMNet is a deep neural network that predict 521 audio event classes. It has been trained on the AudioSet dataset from Google/Youtube.
Paradoxically, sound classification is done by doing image analysis. Sound can have at least three kind of visual representation :
- The Waveform
- The Fourier Transofrm
- The Mel Spectogram
YAMNet is using TensorFlow to perform deep-learning algorithms on Mel Spectogram representations in order to achieve things like pattern recognition, thus the usage for audio classification.
Going further :