使用 Pydub 和 Google 語音識別 API 在 Python 中進行音訊處理

在本教程中，我們將處理音訊檔案。我們將把音訊分解成塊以識別其中的內容。我們還將把音訊檔案的內容儲存在文字檔案中。使用以下命令安裝以下模組。

pip install pydub

如果您執行上述命令，您將收到以下成功訊息

Collecting pydub
Downloading https://files.pythonhosted.org/packages/79/db/eaf620b73a1eec3c8c6f8f5
b0b236a50f9da88ad57802154b7ba7664d0b8/pydub-0.23.1-py2.py3-none-any.whl
Installing collected packages: pydub
Successfully installed pydub-0.23.1

pip install audioread

如果您執行上述命令，您將收到以下成功訊息。

Collecting audioread
Downloading https://files.pythonhosted.org/packages/2e/0b/940ea7861e0e9049f09dcfd
72a90c9ae55f697c17c299a323f0148f913d2/audioread-2.1.8.tar.gz
Building wheels for collected packages: audioread
Building wheel for audioread (setup.py): started
Building wheel for audioread (setup.py): finished with status 'done'
Created wheel for audioread: filename=audioread-2.1.8-cp37-none-any.whl size=2309
8 sha256=92b6f46d6b4726e7a13233dc9d84744ba74e23187123e67f663650f24390dc9d
Stored in directory: C:\Users\hafeezulkareem\AppData\Local\pip\Cache\wheels\b9\64
\09\0b6417df9d8ba8bc61a7d2553c5cebd714ec169644c88fc012
Successfully built audioread
Installing collected packages: audioread
Successfully installed audioread-2.1.8

pip install SpeechRecognition

如果您執行上述命令，您將收到以下成功訊息。

Collecting SpeechRecognition
Downloading https://files.pythonhosted.org/packages/26/e1/7f5678cd94ec1234269d237
56dbdaa4c8cfaed973412f88ae8adf7893a50/SpeechRecognition-3.8.1-py2.py3-none-any.whl
(32.8MB)
Installing collected packages: SpeechRecognition
Successfully installed SpeechRecognition-3.8.1

此過程中有兩個步驟。

將音訊分成塊。
我們必須使用 **SpeechRecognition** 提取內容。

從您的庫中選擇一個音訊檔案。讓我們開始程式碼。

示例

# importing the module
import pydub
import speech_recognition
# getting the audio file
audio = pydub.AudioSegment.from_wav('audio.wav')
# length of the audio in milliseconds
audio_length = len(audio)
print(f'Audio Length: {audio_length}')
# chunk counter
chunk_counter = 1
audio_text = open('audio_text.txt', 'w+')
# setting where to slice the audio
point = 60000
# overlap - remaining audio after slicing
rem = 8000
# initialising variables to track chunks and ending
flag = 0
start = 0
end = 0
# iterating through the audio with incrementing of rem
for i in range(0, 2 * audio_length, point):
   # in first iteration end = rem
   if i == 0:
      start = 0
      end = point
   else:
      # other iterations
      start = end - rem
      end = start + point
   # if end is greater than audio_length
   if end >= audio_length:
      end = audio_length
      # to indicate stop
      flag = 1
   # getting a chunk from the audio
   chunk = audio[start:end]
   # chunk name
   chunk_name = f'chunk_{chunk_counter}'
   # storing the chunk to local storage
   chunk.export(chunk_name, format = 'wav')
   # printing the chunk
   print(f'{chunk_name} start: {start} end: {end}')
   # incrementing chunk counter
   chunk_counter += 1
   # recognising text from the audio
   # initialising the recognizer
   recognizer = speech_recognition.Recognizer()
   # creating a listened audio
   with speech_recognition.AudioFile(chunk_name) as chunk_audio:
      chunk_listened = recognizer.listen(chunk_audio)
   # recognizing content from the audio
   try:
      # getting content from the chunk
      content = recognizer.recognize_google(chunk_listened)
      # writing to the file
      audio_text.write(content + '\n')
   # if not recognized
   except speech_recognition.UnknownValueError:
      print('Audio is not recognized')
   # internet error
   except speech_recognition.RequestError as Error:
      print('Can\'t connect to the internet')
   # checking the flag
   if flag == 1:
      audio_text.close()
   break

輸出

如果您執行上述程式碼，您將獲得以下結果。

Audio Length: 480052
chunk_1 start: 0 end: 60000
chunk_2 start: 52000 end: 112000
chunk_3 start: 104000 end: 164000
chunk_4 start: 156000 end: 216000
chunk_5 start: 208000 end: 268000
chunk_6 start: 260000 end: 320000
chunk_7 start: 312000 end: 372000
chunk_8 start: 364000 end: 424000
chunk_9 start: 416000 end: 476000
chunk_10 start: 468000 end: 480052

檢查檔案內容。

# opening the file in read mode
with open('audio_text.txt', 'r') as file:
   print(file.read())

如果您執行上述程式碼，您將獲得以下結果。

English and I am here in San Francisco I am back in San Francisco last week we were
in Texas at a teaching country and The Reader of the teaching conference was a plan
e Re
improve teaching as a result you are
house backup file with bad it had some
English is coming soon one day only time
12 o1 a.m.
everything about her English now or powering on my email list
sports in your city check your email email
Harjeet girlfriend
next Tuesday
checking the year enjoying office English keep listening keep smiling keep enjoying
your English learning

結論

如果您對本教程有任何疑問，請在評論區提出。

Hafeezul Kareem

更新於: 2019年11月1日

739 次瀏覽

開啟您的職業生涯

透過完成課程獲得認證

開始學習

使用 Pydub 和 Google 語音識別 API 在 Python 中進行音訊處理

示例

輸出

結論

開啟您的 職業生涯

開啟您的職業生涯