easydel.inference.vwhisper.utils

easydel.inference.vwhisper.utils#

easydel.inference.vwhisper.utils.chunk_iter_with_batch(audio_array: ndarray, chunk_length: int, stride_left: int, stride_right: int, batch_size: int, feature_extractor)[source]#

Process an audio array into chunks with overlapping strides.

Parameters
  • audio_array – Input audio array

  • chunk_length – Length of each chunk in samples

  • stride_left – Left stride in samples

  • stride_right – Right stride in samples

  • batch_size – Number of chunks to process at once

  • feature_extractor – Feature extractor to process audio

Yields

Batches of processed audio chunks

easydel.inference.vwhisper.utils.process_audio_input(audio_input: Union[str, bytes, ndarray, Dict[str, Union[ndarray, int]]], feature_extractor)[source]#

Process audio input into a numpy array with correct sampling rate.

Parameters
  • audio_input – Input audio in various formats

  • feature_extractor – Feature extractor with sampling rate info

Returns

Tuple of (audio_array, stride)