本文最后更新于244 天前,其中的信息可能已经过时.
1.简介
CosyVoice是用于多语言、音色和情感控制的自然语音生成的阿里巴巴的一个创新模型,他在多语言语音生成、零样本语音生成、跨语言语音克隆和指令跟随功能方面表现出色。
github地址:https://github.com/FunAudioLLM/CosyVoice
2.部署
-
下载项目文件
git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
-
进入项目目录并确保仓库包含了所有需要的代码和依赖项
cd CosyVoice git submodule update --init --recursive
-
创建并启动conda环境(使用conda命令需要下载anaconda Anaconda,教程在Deep-Live-Cam部署 – xlblog (xlweb.top)中讲过。)
conda create -n cosyvoice python=3.8 conda activate cosyvoice
-
安装pynini
conda install -y -c conda-forge pynini==2.1.5 #由于网络不稳定、代理设置问题或 Conda 服务器暂时不可用引安装不上配置临时国内镜像源重新安装 conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ conda config --set show_channel_urls yes conda install -y -c conda-forge pynini==2.1.5
-
安装项目依赖
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
-
下载模型
- 使用python下载:创建一个python文件,代码内容为下面内容,运行
# SDK模型下载 from modelscope import snapshot_download snapshot_download('iic/CosyVoice-300M', local_dir='pretrained_models/CosyVoice-300M') snapshot_download('iic/CosyVoice-300M-SFT', local_dir='pretrained_models/CosyVoice-300M-SFT') snapshot_download('iic/CosyVoice-300M-Instruct', local_dir='pretrained_models/CosyVoice-300M-Instruct') snapshot_download('iic/CosyVoice-ttsfrd', local_dir='pretrained_models/CosyVoice-ttsfrd')
- 或使用git命令下载:(使用git模型下载,请确保已安装git,教程在Deep-Live-Cam部署 – xlblog (xlweb.top)讲过)
mkdir -p pretrained_models git clone https://www.modelscope.cn/iic/CosyVoice-300M.git pretrained_models/CosyVoice-300M git clone https://www.modelscope.cn/iic/CosyVoice-300M-SFT.git pretrained_models/CosyVoice-300M-SFT git clone https://www.modelscope.cn/iic/CosyVoice-300M-Instruct.git pretrained_models/CosyVoice-300M-Instruct git clone https://www.modelscope.cn/iic/CosyVoice-ttsfrd.git pretrained_models/CosyVoice-ttsfrd
-
新建启动文件,创建三个文件,文件内容分别如下,后缀名改为.bat。(四个功能分别对应三个不同模型,所以使用三个不同模型来启动webui,新建成.bat文件方便使用,双击直接打开即可。)
- 内置音色生成指令:
@echo off call conda activate cosyvoice start http://127.0.0.1:50000 python webui.py --port 50000 --model_dir pretrained_models/CosyVoice-300M-SFT pause
- 内置音色+语气微调
@echo off call conda activate cosyvoice start http://127.0.0.1:50002 python webui.py --port 50002 --model_dir pretrained_models/CosyVoice-300M-Instruct pause
- 克隆音色生成
@echo off call conda activate cosyvoice start http://127.0.0.1:50001 python webui.py --port 50001 --model_dir pretrained_models/CosyVoice-300M pause