Helper functions/classes¶
- class LibVoiceInfo(category: LibCategory | None = None, gender: LibGender | None = None, age: LibAge | None = None, accent: LibAccent | None = None, language: str | None = None)¶
Contains the information for a voice in the Voice Library.
- to_query_params()¶
Converts filter attributes to a dictionary of query parameters, omitting None values.
- class PlaybackOptions(runInBackground: bool = False, portaudioDeviceID: int | None = None, onPlaybackStart: ~typing.Callable[[], ~typing.Any] = <function PlaybackOptions.<lambda>>, onPlaybackEnd: ~typing.Callable[[], ~typing.Any] = <function PlaybackOptions.<lambda>>, audioPostProcessor: ~typing.Callable[[~numpy.ndarray, int], ~numpy.ndarray] = <function PlaybackOptions.<lambda>>)¶
This class holds the options for playback.
- Parameters:
runInBackground (bool, optional) – Whether to play/stream audio in the background or wait for it to finish playing. Defaults to False.
portaudioDeviceID (int, optional) – The ID of the audio device to use for playback. Defaults to the default output device.
onPlaybackStart (Callable, optional) – Function to call once the playback begins.
onPlaybackEnd (Callable, optional) – Function to call once the playback ends.
audioPostProcessor (Callable, optional) – Function to apply post-processing to the audio. Must take a float32 ndarray (of arbitrary length) and an int (the sample rate) as input and return another float32 ndarray.
- class GenerationOptions(model_id: str | None = None, latencyOptimizationLevel: int = 0, speed: float | None = 1, stability: float | None = None, similarity_boost: float | None = None, style: float | None = None, use_speaker_boost: bool | None = None, model: Model | str | None = 'eleven_multilingual_v2', output_format: str = 'mp3_highest', seed: int | None = None, language_code: str | None = None, pronunciation_dictionaries: List[PronunciationDictionary] | None = None)¶
This class holds the options for TTS generation. If any option besides model_id and latencyOptimizationLevel is omitted, the stored value associated with the voice is used.
- Parameters:
model (Model|str, optional) – The TTS model (or its ID) to use for the generation. Defaults to multilingual v2.
latencyOptimizationLevel (int, optional) – The level of latency optimization (0-4) to apply. Defaults to 0.
stability (float, optional) – A float between 0 and 1 representing the stability of the generated audio. If omitted, the current stability setting is used.
similarity_boost (float, optional) – A float between 0 and 1 representing the similarity boost of the generated audio. If omitted, the current similarity boost setting is used.
style (float, optional) – A float between 0 and 1 representing how much focus should be placed on the text vs the associated audio data for the voice’s style, with 0 being all text and 1 being all audio.
use_speaker_boost (bool, optional) – Boost the similarity of the synthesized speech and the voice at the cost of some generation speed.
output_format (str, optional) – Output format for the audio. mp3_highest and pcm_highest will automatically use the highest quality of that format you have available.
pronunciation_dictionaries (List[PronunciationDictionary], optional) – The pronunciation dictionaries to apply to this request (max 3).
seed (int, optional) – The seed. to use for this generation (Determinism is not guaranteed)
speed (float, optional) – The speed setting. Between 0.7 and 1.2. Defaults to 1.
language_code (str, optional) – An ISO 639-1 code, used to enforce a language for the model. Currently turbo v2.5 only.
Warning
The style and use_speaker_boost parameters are only available on v2 models, and will be ignored for v1 models.
Setting style to higher than 0 and enabling use_speaker_boost will both increase latency.
output_format is currently ignored when using speech to speech.
Warning
Using pcm_highest and mp3_highest will cache the resulting quality for the user object. You can use user.update_audio_quality() to force an update.
- class WebsocketOptions(try_trigger_generation: bool = False, chunk_length_schedule: ~typing.List[int] = <factory>, enable_ssml_parsing: bool = False, inactivity_timeout: float | None = 20, sync_alignment: bool = False, auto_mode: bool = False)¶
This class holds the options for the websocket endpoint.
- Parameters:
chunk_length_schedule (list[int], optional) – Chunking schedule for generation. If you pass [50, 120, 500], the first audio chunk will be generated after recieving 50 characters, the second after 120 more (so 170 total), and the third onwards after 500. Defaults to [50], so always generating ASAP.
try_trigger_generation (bool, optional) – Whether to try and generate a chunk of audio at >50 characters, regardless of the chunk_length_schedule. Defaults to False, sent with every message (but can be overridden).
enable_ssml_parsing (bool, optional) – Whether to enable parsing of SSML tags, such as breaks or pronunciations. Increases latency. Defaults to False.
inactivity_timeout (float, optional) – The time in seconds to wait before closing the connection if no messages are sent. Defaults to 20.
sync_alignment (bool, optional) – Whether to include timing data with every audio chunk. Defaults to False.
auto_mode (bool, optional) – Reduces latency by disabling all buffers. It is ONLY recommended when sending full sentences or phrases. Defaults to False.
- class StitchingOptions(previous_text: str | None = None, next_text: str | None = None, previous_request_ids: List[int | HistoryItem] | None = None, next_request_ids: List[int | HistoryItem] | None = None, auto_next_text: bool = False)¶
This class holds the options for request stitching and prompting.
- Parameters:
previous_text (str, optional) – Prompt which will be place before the quoted text.
next_text (str, optional) – Prompt which will be placed after the quoted text.
previous_request_ids (list[int|HistoryItem], optional) – A list of request_ids or HistoryItems generated before this generation. Overrides previous_text.
next_request_ids (list[int|HistoryItem], optional) – A list of request_ids or HistoryItems generated after this generation. Overrides next_text.
auto_next_text (bool, optional) – Automatically appends a next_text appropriate for the prompt. Defaults to false, disabled if next_text is included.
- class GenerationInfo(history_item_id: str | None = None, request_id: str | None = None, tts_latency_ms: str | None = None, transcript: list[str] | None = None, character_cost: int | None = None)¶
This contains the information returned regarding a (non-websocket) generation.
- class SFXOptions(duration_seconds: float | None = None, prompt_influence: float | None = None)¶
This contains the parameters for a sound effect generation.
- run_ai_speech_classifier(audioBytes: bytes)¶
Runs Elevenlabs’ AI speech classifier on the provided audio data.
- Parameters:
audioBytes – The bytes of the audio file (mp3, wav, most formats should work) you want to analyze.
- Returns:
Dict containing all the information returned by the tool (usually just the probability of it being AI generated)
- play_audio_v2(audioData: bytes | ~numpy.ndarray, playbackOptions: ~elevenlabslib.helpers.PlaybackOptions = PlaybackOptions(runInBackground=False, portaudioDeviceID=None, onPlaybackStart=<function PlaybackOptions.<lambda>>, onPlaybackEnd=<function PlaybackOptions.<lambda>>, audioPostProcessor=<function PlaybackOptions.<lambda>>), audioFormat: str | ~elevenlabslib.helpers.GenerationOptions = 'mp3_44100_128') OutputStream¶
Plays the given audio and calls the given functions.
- Parameters:
audioData (bytes|numpy.ndarray) – The audio data to play, either in bytes or as a numpy array (float32!)
playbackOptions (PlaybackOptions, optional) – The playback options.
audioFormat (str, optional) – The format of audioData - same formats used for GenerationOptions. If not mp3 (or numpy array), then has to specify the samplerate in the format (like pcm_44100). Defaults to mp3.
- Returns:
None
- save_audio_v2(audioData: bytes | ndarray, saveLocation: BinaryIO | str, outputFormat: str, inputFormat: str | GenerationOptions = 'mp3_44100_128') None¶
This function saves the audio data to the specified location OR file-like object. soundfile is used for the conversion, so it supports any format it does.
- Parameters:
audioData (bytes) – The audio data.
saveLocation (str|BinaryIO) – The path (or file-like object) where the data will be saved.
outputFormat (str) – The format in which the audio will be saved (mp3/wav/ogg/etc).
inputFormat – The format of audioData - same formats used for GenerationOptions. If not mp3, then has to specify the samplerate in the format (like pcm_44100). Defaults to mp3.
- check_api_key(api_key: str | None) bool¶
Checks if the provided API key is valid by making a request to the ElevenLabs API.
- Parameters:
api_key (str) – The API key to check.
- Returns:
True if the API key is valid, False otherwise.
- Return type:
bool