Helper functions/classes

class LibVoiceInfo(category: LibCategory | None = None, gender: LibGender | None = None, age: LibAge | None = None, accent: LibAccent | None = None, language: str | None = None)

Contains the information for a voice in the Voice Library.

to_query_params()

Converts filter attributes to a dictionary of query parameters, omitting None values.

class PlaybackOptions(runInBackground: bool = False, portaudioDeviceID: int | None = None, onPlaybackStart: ~typing.Callable[[], ~typing.Any] = <function PlaybackOptions.<lambda>>, onPlaybackEnd: ~typing.Callable[[], ~typing.Any] = <function PlaybackOptions.<lambda>>, audioPostProcessor: ~typing.Callable[[~numpy.ndarray, int], ~numpy.ndarray] = <function PlaybackOptions.<lambda>>)

This class holds the options for playback.

Parameters:
  • runInBackground (bool, optional) – Whether to play/stream audio in the background or wait for it to finish playing. Defaults to False.

  • portaudioDeviceID (int, optional) – The ID of the audio device to use for playback. Defaults to the default output device.

  • onPlaybackStart (Callable, optional) – Function to call once the playback begins.

  • onPlaybackEnd (Callable, optional) – Function to call once the playback ends.

  • audioPostProcessor (Callable, optional) – Function to apply post-processing to the audio. Must take a float32 ndarray (of arbitrary length) and an int (the sample rate) as input and return another float32 ndarray.

class GenerationOptions(model_id: str | None = None, latencyOptimizationLevel: int = 0, speed: float | None = 1, stability: float | None = None, similarity_boost: float | None = None, style: float | None = None, use_speaker_boost: bool | None = None, model: Model | str | None = 'eleven_multilingual_v2', output_format: str = 'mp3_highest', seed: int | None = None, language_code: str | None = None, pronunciation_dictionaries: List[PronunciationDictionary] | None = None)

This class holds the options for TTS generation. If any option besides model_id and latencyOptimizationLevel is omitted, the stored value associated with the voice is used.

Parameters:
  • model (Model|str, optional) – The TTS model (or its ID) to use for the generation. Defaults to multilingual v2.

  • latencyOptimizationLevel (int, optional) – The level of latency optimization (0-4) to apply. Defaults to 0.

  • stability (float, optional) – A float between 0 and 1 representing the stability of the generated audio. If omitted, the current stability setting is used.

  • similarity_boost (float, optional) – A float between 0 and 1 representing the similarity boost of the generated audio. If omitted, the current similarity boost setting is used.

  • style (float, optional) – A float between 0 and 1 representing how much focus should be placed on the text vs the associated audio data for the voice’s style, with 0 being all text and 1 being all audio.

  • use_speaker_boost (bool, optional) – Boost the similarity of the synthesized speech and the voice at the cost of some generation speed.

  • output_format (str, optional) – Output format for the audio. mp3_highest and pcm_highest will automatically use the highest quality of that format you have available.

  • pronunciation_dictionaries (List[PronunciationDictionary], optional) – The pronunciation dictionaries to apply to this request (max 3).

  • seed (int, optional) – The seed. to use for this generation (Determinism is not guaranteed)

  • speed (float, optional) – The speed setting. Between 0.7 and 1.2. Defaults to 1.

  • language_code (str, optional) – An ISO 639-1 code, used to enforce a language for the model. Currently turbo v2.5 only.

Warning

The style and use_speaker_boost parameters are only available on v2 models, and will be ignored for v1 models.

Setting style to higher than 0 and enabling use_speaker_boost will both increase latency.

output_format is currently ignored when using speech to speech.

Warning

Using pcm_highest and mp3_highest will cache the resulting quality for the user object. You can use user.update_audio_quality() to force an update.

class WebsocketOptions(try_trigger_generation: bool = False, chunk_length_schedule: ~typing.List[int] = <factory>, enable_ssml_parsing: bool = False, inactivity_timeout: float | None = 20, sync_alignment: bool = False, auto_mode: bool = False)

This class holds the options for the websocket endpoint.

Parameters:
  • chunk_length_schedule (list[int], optional) – Chunking schedule for generation. If you pass [50, 120, 500], the first audio chunk will be generated after recieving 50 characters, the second after 120 more (so 170 total), and the third onwards after 500. Defaults to [50], so always generating ASAP.

  • try_trigger_generation (bool, optional) – Whether to try and generate a chunk of audio at >50 characters, regardless of the chunk_length_schedule. Defaults to False, sent with every message (but can be overridden).

  • enable_ssml_parsing (bool, optional) – Whether to enable parsing of SSML tags, such as breaks or pronunciations. Increases latency. Defaults to False.

  • inactivity_timeout (float, optional) – The time in seconds to wait before closing the connection if no messages are sent. Defaults to 20.

  • sync_alignment (bool, optional) – Whether to include timing data with every audio chunk. Defaults to False.

  • auto_mode (bool, optional) – Reduces latency by disabling all buffers. It is ONLY recommended when sending full sentences or phrases. Defaults to False.

class StitchingOptions(previous_text: str | None = None, next_text: str | None = None, previous_request_ids: List[int | HistoryItem] | None = None, next_request_ids: List[int | HistoryItem] | None = None, auto_next_text: bool = False)

This class holds the options for request stitching and prompting.

Parameters:
  • previous_text (str, optional) – Prompt which will be place before the quoted text.

  • next_text (str, optional) – Prompt which will be placed after the quoted text.

  • previous_request_ids (list[int|HistoryItem], optional) – A list of request_ids or HistoryItems generated before this generation. Overrides previous_text.

  • next_request_ids (list[int|HistoryItem], optional) – A list of request_ids or HistoryItems generated after this generation. Overrides next_text.

  • auto_next_text (bool, optional) – Automatically appends a next_text appropriate for the prompt. Defaults to false, disabled if next_text is included.

class GenerationInfo(history_item_id: str | None = None, request_id: str | None = None, tts_latency_ms: str | None = None, transcript: list[str] | None = None, character_cost: int | None = None)

This contains the information returned regarding a (non-websocket) generation.

class SFXOptions(duration_seconds: float | None = None, prompt_influence: float | None = None)

This contains the parameters for a sound effect generation.

run_ai_speech_classifier(audioBytes: bytes)

Runs Elevenlabs’ AI speech classifier on the provided audio data.

Parameters:

audioBytes – The bytes of the audio file (mp3, wav, most formats should work) you want to analyze.

Returns:

Dict containing all the information returned by the tool (usually just the probability of it being AI generated)

play_audio_v2(audioData: bytes | ~numpy.ndarray, playbackOptions: ~elevenlabslib.helpers.PlaybackOptions = PlaybackOptions(runInBackground=False, portaudioDeviceID=None, onPlaybackStart=<function PlaybackOptions.<lambda>>, onPlaybackEnd=<function PlaybackOptions.<lambda>>, audioPostProcessor=<function PlaybackOptions.<lambda>>), audioFormat: str | ~elevenlabslib.helpers.GenerationOptions = 'mp3_44100_128') OutputStream

Plays the given audio and calls the given functions.

Parameters:
  • audioData (bytes|numpy.ndarray) – The audio data to play, either in bytes or as a numpy array (float32!)

  • playbackOptions (PlaybackOptions, optional) – The playback options.

  • audioFormat (str, optional) – The format of audioData - same formats used for GenerationOptions. If not mp3 (or numpy array), then has to specify the samplerate in the format (like pcm_44100). Defaults to mp3.

Returns:

None

save_audio_v2(audioData: bytes | ndarray, saveLocation: BinaryIO | str, outputFormat: str, inputFormat: str | GenerationOptions = 'mp3_44100_128') None

This function saves the audio data to the specified location OR file-like object. soundfile is used for the conversion, so it supports any format it does.

Parameters:
  • audioData (bytes) – The audio data.

  • saveLocation (str|BinaryIO) – The path (or file-like object) where the data will be saved.

  • outputFormat (str) – The format in which the audio will be saved (mp3/wav/ogg/etc).

  • inputFormat – The format of audioData - same formats used for GenerationOptions. If not mp3, then has to specify the samplerate in the format (like pcm_44100). Defaults to mp3.

check_api_key(api_key: str | None) bool

Checks if the provided API key is valid by making a request to the ElevenLabs API.

Parameters:

api_key (str) – The API key to check.

Returns:

True if the API key is valid, False otherwise.

Return type:

bool