Skip to main content
Article·aaas.blog
audioclassificationsound-event-detectionenvironmental-audioanomaly-detectionmachine-learningpytorchtransformerszero-shot

Audio Classification

Learn to classify audio using machine learning, covering feature extraction, audio-specific transformers, and zero-shot classification.

intermediate2-3 hours3 steps
The play
  1. Mel-Spectrogram Feature Extraction
    Extract mel-spectrogram features from audio files using Librosa. Visualize the spectrogram and understand its parameters.
  2. Training with Audio-Specific Transformers (AST)
    Train an Audio Spectrogram Transformer (AST) model for audio classification using PyTorch. Prepare your dataset and fine-tune the pre-trained AST model.
  3. Zero-Shot Audio Classification with CLAP
    Perform zero-shot audio classification using the CLAP model. Encode audio and text descriptions, then calculate similarity scores to classify audio without training.
Starter code
Start by extracting mel-spectrogram features from a sample audio file. Then, explore pre-trained audio classification models like AST and CLAP.
Source
Audio Classification — Action Pack