voice-cloning - 技术专题深度解读

CorentinJ / Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

python deep-learning tensorflow pytorch tts voice-cloning

Updated Mar 9, 2026
Python

RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

text-to-speech tts voice-cloning vits voice-clone voice-cloneai

Updated Apr 30, 2026
Python

coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

python text-to-speech deep-learning speech pytorch tts speech-synthesis voice-conversion vocoder voice-synthesis tacotron voice-cloning speaker-encodings melgan speaker-encoder multi-speaker-tts glow-tts hifigan tts-model

Updated Aug 16, 2024
Python

OpenBMB / VoxCPM

VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

audio multilingual python text-to-speech speech pytorch tts speech-synthesis deeplearning voice-cloning voice-design tts-model minicpm voxcpm

Updated May 22, 2026
Python

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

python text-to-speech japanese chatbot multi-lingual tts english chinese korean cantonese natural-language-generation cross-lingual fine-grained fine-tuning voice-cloning audio-generation chatgpt gpt-4o cosyvoice

Updated May 25, 2026
Python

DrewThomasson / ebook2audiobook

Generate audiobooks from e-books, voice cloning & 1158+ languages!

multilingual windows linux docker mac kaggle audiobook tts english epub chinese gradio audiobooks colab-notebook voice-cloning xtts

Updated May 29, 2026
Python

Huanshere / VideoLingo

Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音，一键全自动视频搬运AI字幕组

localization dubbing video-translation voice-cloning ai-translation

Updated Mar 24, 2026
Python

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Updated May 30, 2026
Python

abus-aikorea / voice-pro

Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.

text-to-speech translator audiobook podcasts tts speech-synthesis subtitles speech-recognition webui speech-to-text karaoke transcription gradio whisper voice-conversion voice-cloning yt-dlp faster-whisper whisperx

Updated Dec 5, 2025
Python

multimodal-art-projection / YuE

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

ai deep-learning llama gpt music-generation voice-cloning huggingface style-transfers audio-generation foundation-models llms

Updated Jun 4, 2025
Python

debpalash / OmniVoice-Studio

The open-source ElevenLabs alternative for local voice cloning, design, create, dubbing and dictation Desktop App

text-to-speech self-hosted tts speech-recognition speech-to-text transcription video-editing asr dubbing voice-cloning voice-generation voice-ai elevenlabs local-ai dubbing-ai omnivoice omnivoice-studio

Updated Jun 4, 2026
Python

IAHispano / Applio

A simple, high-quality voice conversion tool focused on ease of use and performance.

text-to-speech ai voice speech pytorch tts rvc voice-conversion vc voice-cloning speech-to-speech vits voice-clone applio

Updated May 30, 2026
Python

OpenMOSS / MOSS-TTS

MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenarios, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental sound effects, and real‑time streaming TTS.

audio text-to-speech multimodal voice-cloning llm audio-tokenizer

Updated Jun 4, 2026
Python

Camb-ai / MARS5-TTS

MARS5 speech model (TTS) from CAMB.AI

text-to-speech speech speech-synthesis prosody voice-cloning voice-cloneai

Updated Aug 1, 2024
Jupyter Notebook

High-Logic / Genie-TTS

GPT-SoVITS ONNX Inference Engine & Model Converter

text-to-speech tts voice-cloning vits voice-clone gpt-sovits

Updated Apr 18, 2026
Python

MiniMax-AI / MiniMax-MCP

Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.

text-to-speech mcp image-generation text-to-image video-generation image-to-video voice-cloning text-to-video mcp-server mcp-tools

Updated May 21, 2026
Python

Enemyx-net / VibeVoice-ComfyUI

A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.

text-to-speech tts voice-cloning ai-voice voice-generation ai-audio t2s ai-tts ai-voice-clone ai-voice-clonining voice-generator comfyui-nodes comfyui-custom-node comfyui-custom-nodes-text-to-speech vibevoice vibevoice-microsoft

Updated Feb 18, 2026
Python

voice-cloning-app / Voice-Cloning-App

A Python/Pytorch app for easily synthesising human voices

python text-to-speech deep-learning pytorch tts voice-cloning tacotron2

Updated Dec 2, 2024
Python

coqui-ai / open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

text-to-speech tts speech-synthesis voice-recognition speech-recognition speech-to-text stt speech-processing voice-activity-detection speech-separation speech-emotion-recognition voice-cloning

Updated Jun 6, 2024

devnen / Chatterbox-TTS-Server

Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale text processing. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and CPU.

python text-to-speech ai cuda web-ui api-server pytorch tts speech-synthesis rocm chatterbox speech-synthesis-api tts-api voice-cloning fastapi huggingface openai-api audio-generation chatterbox-tts

Updated May 26, 2026
Python

Here are 673 public repositories matching this topic...