Useful wrappers¶

play_dialog_with_stitching(voice: Voice, prompts: List[str | Dict[str, str]], generation_options: GenerationOptions = GenerationOptions(latencyOptimizationLevel=0, speed=1, stability=None, similarity_boost=None, style=None, use_speaker_boost=None, model='eleven_multilingual_v2', output_format='mp3_highest', seed=None, language_code=None, pronunciation_dictionaries=None), first_prompt_pretext: Optional[str] = None, default_playback_options: PlaybackOptions() = PlaybackOptions(runInBackground=False, portaudioDeviceID=None, onPlaybackStart=<function PlaybackOptions.<lambda>>, onPlaybackEnd=<function PlaybackOptions.<lambda>>, audioPostProcessor=<function PlaybackOptions.<lambda>>), auto_determine_emotion: bool = False)¶

This function generates and plays back a series of audios using request stitching.

Parameters:

voice (-) – The voice to use
prompts (-) – The list of texts to be generated, containing either strings or dicts which have both a ‘prompt’ and a ‘next_text’, so it can be manually overridden. They can also optionally contain a ‘playback_options’.
generation_options (-) – The GenerationOptions to use.
first_prompt_pretext (-) – The previous_text to use for the first generation.
default_playback_options (-) – The PlaybackOptions to apply to every generation, unless overridden.
auto_determine_emotion (-) – Whether to automatically try to determine the emotion of the text, and insert next_text accordingly. Defaults to false.

sts_long_audio(source_audio: bytes | BinaryIO, voice: Voice, generation_options: GenerationOptions = GenerationOptions(latencyOptimizationLevel=0, speed=1, stability=None, similarity_boost=None, style=None, use_speaker_boost=None, model='eleven_multilingual_sts_v2', output_format='mp3_highest', seed=None, language_code=None, pronunciation_dictionaries=None), speech_threshold: float = 0.5) → bytes¶

Allows you to process a long audio file with speech to speech automatically, using Silero-VAD to split it up naturally.

Parameters:

source_audio (bytes|BinaryIO) – The source audio.
voice (Voice) – The voice to use for STS.
generation_options (GenerationOptions) – The generation options to use. The model specified must support STS.
speech_threshold (float) – The likelyhood that a segment must be speech for it to be recognized (0.5/50% works for most audio files).

Returns:

The bytes of the final audio, all concatenated, in mp3 format.

Return type:

bytes

class Synthesizer(defaultPlaybackOptions: ~elevenlabslib.helpers.PlaybackOptions = PlaybackOptions(runInBackground=True, portaudioDeviceID=None, onPlaybackStart=<function PlaybackOptions.<lambda>>, onPlaybackEnd=<function PlaybackOptions.<lambda>>, audioPostProcessor=<function PlaybackOptions.<lambda>>), defaultGenerationOptions: ~elevenlabslib.helpers.GenerationOptions = GenerationOptions(latencyOptimizationLevel=3, speed=1, stability=None, similarity_boost=None, style=None, use_speaker_boost=None, model='eleven_multilingual_v2', output_format='mp3_highest', seed=None, language_code=None, pronunciation_dictionaries=None))¶

This is a helper class, which allows you to queue up multiple audio generations.

They will all be downloaded together, and will play back in the same order you put them in. I’ve found this gives the lowest possible latency.

start()¶: Begins processing the queued audio.

stop()¶: Stops playing back audio once the current one is finished.

abort()¶: Stops playing back audio immediately.

change_output_device(portAudioDeviceID: int)¶: Allows you to change the current output device.

change_default_settings(defaultGenerationOptions: GenerationOptions | None = None, defaultPlaybackOptions: PlaybackOptions | None = None)¶: Allows you to change the default settings.

add_to_queue(voice: Voice, prompt: str, generationOptions: GenerationOptions | None = None, playbackOptions: PlaybackOptions | None = None) → None¶: Adds an item to the synthesizer queue. :param voice: The voice that will speak the prompt :type voice: Voice :param prompt: The prompt to be spoken :type prompt: str :param generationOptions: Overrides the generation options for this generation :type generationOptions: GenerationOptions, optional :param playbackOptions: Overrides the playback options for this generation :type playbackOptions: PlaybackOptions, optional

class ReusableInputStreamer(voice: ~elevenlabslib.Voice.Voice, defaultPlaybackOptions: ~elevenlabslib.helpers.PlaybackOptions = PlaybackOptions(runInBackground=True, portaudioDeviceID=None, onPlaybackStart=<function PlaybackOptions.<lambda>>, onPlaybackEnd=<function PlaybackOptions.<lambda>>, audioPostProcessor=<function PlaybackOptions.<lambda>>), generationOptions: ~elevenlabslib.helpers.GenerationOptions = GenerationOptions(latencyOptimizationLevel=3, speed=1, stability=None, similarity_boost=None, style=None, use_speaker_boost=None, model='eleven_multilingual_v2', output_format='mp3_highest', seed=None, language_code=None, pronunciation_dictionaries=None), websocketOptions: ~elevenlabslib.helpers.WebsocketOptions = WebsocketOptions(try_trigger_generation=False, chunk_length_schedule=[125], enable_ssml_parsing=False, inactivity_timeout=20, sync_alignment=False, auto_mode=False))¶

This is basically a reusable wrapper around a websocket connection.

stop()¶: Stops playing back audio once the current one is finished.

abort()¶: Stops playing back audio immediately.

change_settings(generationOptions: GenerationOptions | None = None, defaultPlaybackOptions: PlaybackOptions | None = None, websocketOptions: WebsocketOptions | None = None)¶: Allows you to change the settings and then re-establishes the socket.

queue_audio(prompt: Iterator[str] | AsyncIterator, playbackOptions: PlaybackOptions | None = None) → tuple[Future[OutputStream], Future[Queue]]¶

Queues up an audio to be generated and played back.

Parameters:

prompt – The iterator to use for the generation.
playbackOptions – Overrides the playbackOptions for this generation.

Returns:

A tuple consisting of two futures, the one for the playback stream and the one for the transcript queue.

Return type:

tuple

class ReusableInputStreamerNoPlayback(voice: Voice, generationOptions: GenerationOptions = GenerationOptions(latencyOptimizationLevel=3, speed=1, stability=None, similarity_boost=None, style=None, use_speaker_boost=None, model='eleven_multilingual_v2', output_format='mp3_highest', seed=None, language_code=None, pronunciation_dictionaries=None), websocketOptions: WebsocketOptions = WebsocketOptions(try_trigger_generation=False, chunk_length_schedule=[125], enable_ssml_parsing=False, inactivity_timeout=20, sync_alignment=False, auto_mode=False))¶

This is basically a reusable wrapper around a websocket connection.

stop()¶: Stops the websocket.

abort()¶: Stops the websocket.

change_settings(generationOptions: GenerationOptions | None = None, defaultPlaybackOptions: PlaybackOptions | None = None, websocketOptions: WebsocketOptions | None = None)¶: Allows you to change the settings and then re-establishes the socket.

queue_audio(prompt: Iterator[str] | AsyncIterator) → tuple[Future[Queue], Future[Queue]]¶

Queues up an audio to be generated and played back.

Parameters:: prompt – The iterator to use for the generation.
Returns:: A tuple consisting of two futures, one for the numpy audio queue and one for the transcript queue.
Return type:: tuple