Voice¶
- class Voice(voiceData, linkedUser: User)¶
Represents a voice in the ElevenLabs API.
It’s the parent class for all voices, and used directly for the premade ones.
- static voiceFactory(voiceData, linkedUser: User) Voice | EditableVoice | ClonedVoice | ProfessionalVoice¶
Initializes a new instance of Voice or one of its subclasses depending on voiceData.
- Parameters:
voiceData – A dictionary containing the voice data.
linkedUser – An instance of the User class representing the linked user.
- Returns:
The voice object
- Return type:
- update_data() dict¶
Tip
I’ve only added specific getters for the most common attributes (name/description).
Use this function for all other metadata.
Additionally, this also updates all the properties of the voice (name, description, etc).
- Returns:
A dict containing all the metadata for the voice, such as the name, the description, etc.
- Return type:
dict
- property category¶
This property indicates the “type” of the voice, whether it’s premade, cloned, designed etc.
- property linkedUser¶
Note
This property can also be set. This is mostly in case some future update adds shared voices (beyond the currently available premade ones).
The user currently linked to the voice, whose API key will be used to generate audio.
- Returns:
The user linked to the voice.
- Return type:
- edit_settings(stability: float | None = None, similarity_boost: float | None = None, style: float | None = None, use_speaker_boost: bool | None = None, speed: float | None = None)¶
Note
If either argument is omitted, the current values will be used instead.
Edit the settings of the current voice.
- Parameters:
stability (float, optional) – The stability to set.
similarity_boost (float, optional) – The similarity boost to set.
style (float, optional) – The style to set (v2 models only).
use_speaker_boost (bool, optional) – Whether to enable the speaker boost (v2 models only).
speed (float, optional) – The speed to set (Between 0.8 and 1.2).
- Raises:
ValueError – If the provided values don’t fit the correct ranges.
- generate_audio_v3(prompt: str | bytes | BinaryIO, generation_options: GenerationOptions = GenerationOptions(latencyOptimizationLevel=0, speed=1, stability=None, similarity_boost=None, style=None, use_speaker_boost=None, model='eleven_multilingual_v2', output_format='mp3_highest', seed=None, language_code=None, pronunciation_dictionaries=None), prompting_options: PromptingOptions | None = None, stitching_options: StitchingOptions = StitchingOptions(previous_text=None, next_text=None, previous_request_ids=None, next_request_ids=None, auto_next_text=False)) tuple[Future[bytes], Future[GenerationInfo] | None]¶
Generates speech for the given prompt or audio and returns the audio data as bytes of a file alongside the new historyID.
Tip
If you would like to save the audio to disk or otherwise, you can use helpers.save_audio_bytes().
- Parameters:
prompt (str|bytes|BinaryIO) – The text prompt or audio bytes/file pointer to generate speech for.
generation_options (GenerationOptions) – Options for the audio generation such as the model to use and the voice settings.
stitching_options (StitchingOptions, optional) – Options for request stitching and pre/post text.
- Returns:
A future that will contain the bytes of the audio file once the generation is complete.
An optional future that will contain the GenerationInfo object for the generation.
- Return type:
tuple[Future[bytes], Optional[GenerationInfo]]
Note
If using PCM as the output_format, the return audio bytes are a WAV.
- stream_audio_v3(prompt: str | ~typing.Iterator[str] | ~typing.Iterator[dict] | ~typing.AsyncIterator | bytes | ~typing.BinaryIO, playback_options: ~elevenlabslib.helpers.PlaybackOptions = PlaybackOptions(runInBackground=False, portaudioDeviceID=None, onPlaybackStart=<function PlaybackOptions.<lambda>>, onPlaybackEnd=<function PlaybackOptions.<lambda>>, audioPostProcessor=<function PlaybackOptions.<lambda>>), generation_options: ~elevenlabslib.helpers.GenerationOptions = GenerationOptions(latencyOptimizationLevel=0, speed=1, stability=None, similarity_boost=None, style=None, use_speaker_boost=None, model='eleven_multilingual_v2', output_format='mp3_highest', seed=None, language_code=None, pronunciation_dictionaries=None), websocket_options: ~elevenlabslib.helpers.WebsocketOptions = WebsocketOptions(try_trigger_generation=False, chunk_length_schedule=[125], enable_ssml_parsing=False, inactivity_timeout=20, sync_alignment=False, auto_mode=False), prompting_options: ~elevenlabslib.helpers.PromptingOptions | None = None, stitching_options: ~elevenlabslib.helpers.StitchingOptions = StitchingOptions(previous_text=None, next_text=None, previous_request_ids=None, next_request_ids=None, auto_next_text=False), disable_playback: bool = False) tuple[Queue[ndarray], Queue[str] | None, Future[OutputStream] | None, Future[GenerationInfo] | None]¶
Generate and stream audio from the given prompt (or str iterator).
- Parameters:
prompt (str|Iterator[str]|Iterator[dict]|bytes|BinaryIO) – The text prompt to generate audio from OR an iterator that returns multiple strings or dicts (for input streaming) OR the bytes/file pointer of an audio file.
playback_options (PlaybackOptions, optional) – Options for the audio playback such as the device to use and whether to run in the background.
generation_options (GenerationOptions, optional) – Options for the audio generation such as the model to use and the voice settings.
websocket_options (WebsocketOptions, optional) – Options for the websocket streaming. Ignored if not passed when not using websockets.
stitching_options (StitchingOptions, optional) – Options for request stitching and pre/post prompting the audio.
disable_playback (bool, optional) – Allows you to disable playback altogether.
- Returns:
A queue containing the numpy audio data as float32 arrays.
An queue for audio transcripts.
An optional future for controlling the playback, returned if playback is not disabled.
An optional future containing a GenerationInfo with metadata about the audio generation.
- Return type:
tuple[queue.Queue[numpy.ndarray], Optional[queue.Queue[str]], Optional[Future[OutputStream]], Optional[GenerationInfo]]
- get_preview_url() str | None¶
- Returns:
The preview URL of the voice, or None if it hasn’t been generated.
- Return type:
str|None
- get_preview_bytes() bytes¶
- Returns:
The preview audio bytes.
- Return type:
bytes
- Raises:
RuntimeError – If no preview URL is available.
- class EditableVoice(voiceData, linkedUser: User)¶
Bases:
VoiceThis class is shared by all the voices which can have their details edited and be deleted from an account.
- edit_voice(newName: str | None = None, newLabels: dict[str, str] | None = None, description: str | None = None)¶
Edit the name/labels of the voice.
- Parameters:
newName (str) – The new name
newLabels (str) – The new labels
description (str) – The new description
- delete_voice()¶
This function deletes the voice, and also sets the voiceID to be empty.
- class DesignedVoice(voiceData, linkedUser: User)¶
Bases:
EditableVoiceRepresents a voice created via voice design.
Returns the share link for the voice.
Warning
If sharing is disabled, raises a RuntimeError.
- Returns:
The share link for the voice.
- class ClonedVoice(voiceData, linkedUser: User)¶
Bases:
EditableVoiceRepresents a voice created via instant voice cloning.
- get_samples() list[Sample]¶
- Returns:
The samples that make up this voice clone.
- Return type:
list[Sample]
- add_samples_by_path(samples: list[str] | str)¶
This function adds samples to the current voice by their file paths.
- Parameters:
samples (list[str]|str) – A list with the file paths to the audio files or a str containing a single path.
- Raises:
ValueError – If no samples are provided.
- add_samples_bytes(samples: dict[str, bytes])¶
This function adds samples to the current voice by their file names and bytes.
- Parameters:
samples (dict[str, bytes]) – A dictionary of audio file names and their respective bytes.
- Raises:
ValueError – If no samples are provided.
- class ProfessionalVoice(voiceData, linkedUser: User)¶
Bases:
EditableVoiceRepresents a voice created via professional voice cloning.