Last Updated on 2021-05-12 by Clay
Just as torchvision
is a module in PyTorch that specializes in processing pictures, torchaudio
to be recorded today is a module in PyTorch that specializes in processing audio.
Able to process text, picture, and audio ... etc. PyTorch is really a convenient deep learning framework!
As always, the official teaching document is attached: Pytorch tutorial
Introduction of torchaudio
If you want to use torchaudio
, you need to use the following command to install it.
pip3 install torchaudio
It may take some time.
First, we need to import the packages and modules we need.
# -*- coding: utf-8 -*- import torchaudio import matplotlib.pyplot as plt import requests
This is the official test sound file, you can use your own. I save the file as test.wav. (According to the official instructions, torchaudio only suppoers .wav and .mp3.)
# Test music fileName = 'test.wav' url = "https://pytorch.org/tutorials//_static/img/steam-train-whistle-daniel_simon-converted-from-mp3.wav" r = requests.get(url) # Save with open(fileName, 'wb') as f: f.write(r.content)
Draw a waveform graph
waveform, sample_rate = torchaudio.load(fileName) print('Shape of waveform: {}'.format(waveform.size())) print('Sample rate of waveform: {}'.format(sample_rate)) plt.figure() plt.plot(waveform.t().numpy()) plt.show()
Output:
Shape of waveform: torch.Size([2, 276858])
Sample rate of waveform: 44100
You don't have to use numpy()
to convert to numpy data type, just use the original tensor data type. The. most important thing is to use t()
to transpos, so that there are two values in one dimension, and we can draw a waveform.
- waveform: Acoording to the explanation on the official website, it is "original audio signal"
- sample_rate: As the sampling rate, here is 44100 Hz, which is often used as the sampling rate of CDs.
In addition, we can also resample:
new_sample_rate = sample_rate/10 channel = 0 transformed = torchaudio.transforms transformed = transformed.Resample(sample_rate, new_sample_rate) transformed = transformed(waveform[channel, :].view(1, -1)) print('Shape of transformed waveform:', transformed.size()) plt.figure() plt.plot(transformed[0, :].numpy()) plt.show()
Output:
Shape of transformed waveform: torch.Size([1, 27686])
tensor([ 4.5531e-03, 1.6837e-02, 8.0987e-03, …, -5.0898e-06, 6.0601e-06, 2.6707e-05])
Spectrogram
The frequency spectrum, as the name implies, is the representation method of the "time domain" signal in the "frequency domain", which is usually converted by the "Fourier transform". Usually the amplitude is the y-axis and the frequency is the x-axis.
# Spectrogram specgram = torchaudio.transforms.Spectrogram()(waveform) print('Shape of spectrogram:', specgram.size()) plt.figure() plt.imshow(specgram.log2()[0, :, :].numpy()) plt.show()
Output:
Shape of transformed waveform: torch.Size([1, 27686])
tensor([ 4.5531e-03, 1.6837e-02, 8.0987e-03, …, -5.0898e-06,
6.0601e-06, 2.6707e-05])
Additional Record
Basically, I only followed a few functions that I might use; as for many transform technologies provided by torchaudio, I think I will find time to record a complete one, and I should not briefly introduce the visualized audio files with this article. mixed together.
Under the official PyTorch teaching, there are drawing teaching such as Mel Spectrogram; but I may not use it for the time being.
It is worth narrating that the package of Kaldi was mentioned in the official teaching-awful, I am completely unfamiliar! Seems to be a well-known Python audio processing package? I think I should study this kit more if I have time.
Read More
- [PyTorch] Tutorial(1) What is Tensor?
- [PyTorch] Tutorial(2) Automatic derivative
- [PyTorch] Tutorial(3) Introduction of Neural Networks
- [PyTorch] Tutorial(4) Train a model to classify MNIST dataset
- [PyTorch] Tutorial(5) How to train a model to classify CIFAR-10 database
- [PyTorch] Tutorial(6) Audio of Processing Module: torchaudio
- [PyTorch] Tutorial(7) Use Deep Generative Adversarial Network (DCGAN) to generate pictures