Voice

Caution

Do not initialize these classes yourself! Just use the methods from ElevenLabsUser to get voices. If you really know what you’re doing, and still want to do it, use the voiceFactory method instead of the constructors.

class Voice(voiceData, linkedUser: User)

Represents a voice in the ElevenLabs API.

It’s the parent class for all voices, and used directly for the premade ones.

static edit_stream_settings(playbackBlockSize=None, downloadChunkSize=None) None

This function lets you override the default values used for the streaming function.

Danger

This change affects all voices.

Please only do this if you know what you’re doing.

Parameters:
  • playbackBlockSize (int) – The size (in frames) of the blocks used for playback.

  • downloadChunkSize (int) – The size (in bytes) of the chunks to be downloaded.

static voiceFactory(voiceData, linkedUser: User) Voice | EditableVoice | ClonedVoice | ProfessionalVoice

Initializes a new instance of Voice or one of its subclasses depending on voiceData.

Parameters:
  • voiceData – A dictionary containing the voice data.

  • linkedUser – An instance of the User class representing the linked user.

Returns:

The voice object

Return type:

Voice | DesignedVoice | ClonedVoice

update_data() dict

Tip

I’ve only added specific getters for the most common attributes (name/description).

Use this function for all other metadata.

Additionally, this also updates all the properties of the voice (name, description, etc).

Returns:

A dict containing all the metadata for the voice, such as the name, the description, etc.

Return type:

dict

property category

This property indicates the “type” of the voice, whether it’s premade, cloned, designed etc.

property linkedUser

Note

This property can also be set. This is mostly in case some future update adds shared voices (beyond the currently available premade ones).

The user currently linked to the voice, whose API key will be used to generate audio.

Returns:

The user linked to the voice.

Return type:

User

edit_settings(stability: float | None = None, similarity_boost: float | None = None, style: float | None = None, use_speaker_boost: bool | None = None)

Note

If either argument is omitted, the current values will be used instead.

Edit the settings of the current voice.

Parameters:
  • stability (float, optional) – The stability to set.

  • similarity_boost (float, optional) – The similarity boost to set.

  • style (float, optional) – The style to set (v2 models only).

  • use_speaker_boost (bool, optional) – Whether to enable the speaker boost (v2 models only).

Raises:

ValueError – If the provided values don’t fit the correct ranges.

generate_audio_v3(prompt: str | bytes | BinaryIO, generation_options: GenerationOptions = GenerationOptions(latencyOptimizationLevel=0, stability=None, similarity_boost=None, style=None, use_speaker_boost=None, model='eleven_monolingual_v1', output_format='mp3_highest', forced_pronunciations=None), prompting_options: PromptingOptions | None = None) tuple[Future[bytes], Future[GenerationInfo] | None]

Generates speech for the given prompt or audio and returns the audio data as bytes of a file alongside the new historyID.

Tip

If you would like to save the audio to disk or otherwise, you can use helpers.save_audio_bytes().

Parameters:
  • prompt (str|bytes|BinaryIO) – The text prompt or audio bytes/file pointer to generate speech for.

  • generation_options (GenerationOptions) – Options for the audio generation such as the model to use and the voice settings.

  • prompting_options (PromptingOptions) – Options for pre/post prompting the audio, for improved emotion. Ignored for speech to speech.

Returns:

  • A future that will contain the bytes of the audio file once the generation is complete.

  • An optional future that will contain the GenerationInfo object for the generation.

Return type:

tuple[Future[bytes], Optional[GenerationInfo]]

Note

If using PCM as the output_format, the return audio bytes are a WAV.

stream_audio_v3(prompt: str | ~typing.Iterator[str] | ~typing.Iterator[dict] | ~typing.AsyncIterator | bytes | ~typing.BinaryIO, playback_options: ~elevenlabslib.helpers.PlaybackOptions = PlaybackOptions(runInBackground=False, portaudioDeviceID=None, onPlaybackStart=<function PlaybackOptions.<lambda>>, onPlaybackEnd=<function PlaybackOptions.<lambda>>, audioPostProcessor=<function PlaybackOptions.<lambda>>), generation_options: ~elevenlabslib.helpers.GenerationOptions = GenerationOptions(latencyOptimizationLevel=0, stability=None, similarity_boost=None, style=None, use_speaker_boost=None, model='eleven_monolingual_v1', output_format='mp3_highest', forced_pronunciations=None), websocket_options: ~elevenlabslib.helpers.WebsocketOptions = WebsocketOptions(try_trigger_generation=False, chunk_length_schedule=[125], enable_ssml_parsing=False, buffer_char_length=-1), prompting_options: ~elevenlabslib.helpers.PromptingOptions | None = None, disable_playback: bool = False) tuple[Queue[ndarray], Queue[str] | None, Future[OutputStream] | None, Future[GenerationInfo] | None]

Generate and stream audio from the given prompt (or str iterator).

Parameters:
  • prompt (str|Iterator[str]|Iterator[dict]|bytes|BinaryIO) – The text prompt to generate audio from OR an iterator that returns multiple strings or dicts (for input streaming) OR the bytes/file pointer of an audio file.

  • playback_options (PlaybackOptions, optional) – Options for the audio playback such as the device to use and whether to run in the background.

  • generation_options (GenerationOptions, optional) – Options for the audio generation such as the model to use and the voice settings.

  • websocket_options (WebsocketOptions, optional) – Options for the websocket streaming. Ignored if not passed when not using websockets.

  • prompting_options (PromptingOptions, optional) – Options for pre/post prompting the audio, for improved emotion. Ignored for input streaming and STS.

  • disable_playback (bool, optional) – Allows you to disable playback altogether.

Returns:

  • A queue containing the numpy audio data as float32 arrays.

  • An optional queue for audio transcripts, populated if websocket streaming is used.

  • An optional future for controlling the playback, returned if playback is not disabled.

  • An optional future containing a GenerationInfo with metadata about the audio generation.

Return type:

tuple[queue.Queue[numpy.ndarray], Optional[queue.Queue[str]], Optional[Future[OutputStream]], Optional[GenerationInfo]]

get_preview_url() str | None
Returns:

The preview URL of the voice, or None if it hasn’t been generated.

Return type:

str|None

get_preview_bytes() bytes
Returns:

The preview audio bytes.

Return type:

bytes

Raises:

RuntimeError – If no preview URL is available.

class EditableVoice(voiceData, linkedUser: User)

Bases: Voice

This class is shared by all the voices which can have their details edited and be deleted from an account.

edit_voice(newName: str | None = None, newLabels: dict[str, str] | None = None, description: str | None = None)

Edit the name/labels of the voice.

Parameters:
  • newName (str) – The new name

  • newLabels (str) – The new labels

  • description (str) – The new description

delete_voice()

This function deletes the voice, and also sets the voiceID to be empty.

class DesignedVoice(voiceData, linkedUser: User)

Bases: EditableVoice

Represents a voice created via voice design.

Returns the share link for the voice.

Warning

If sharing is disabled, raises a RuntimeError.

Returns:

The share link for the voice.

class ClonedVoice(voiceData, linkedUser: User)

Bases: EditableVoice

Represents a voice created via instant voice cloning.

get_samples() list[Sample]
Returns:

The samples that make up this voice clone.

Return type:

list[Sample]

add_samples_by_path(samples: list[str] | str)

This function adds samples to the current voice by their file paths.

Parameters:

samples (list[str]|str) – A list with the file paths to the audio files or a str containing a single path.

Raises:

ValueError – If no samples are provided.

add_samples_bytes(samples: dict[str, bytes])

This function adds samples to the current voice by their file names and bytes.

Parameters:

samples (dict[str, bytes]) – A dictionary of audio file names and their respective bytes.

Raises:

ValueError – If no samples are provided.

class ProfessionalVoice(voiceData, linkedUser: User)

Bases: EditableVoice

Represents a voice created via professional voice cloning.

get_samples() list[Sample]

Caution

There is an API bug here. The /voices/voiceID endpoint does not correctly return sample data for professional cloning voices.

Returns:

The samples that make up this professional voice clone.

Return type:

list[Sample]