# 语音合成与识别服务Docker-部署说明 ## 一、简介 语音合成与识别服务(以下简称“服务”)提供高质量的文本转语音(TTS)和语音转文本(ASR)功能。为了方便用户快速部署和使用该服务,我们提供了基于Docker的部署方案。本文档将详细介绍如何使用Docker部署该服务。 本文提供的语音合成与识别服务,是基于paddlespeech开发的TTS和ASR服务(纯CPU服务),用户可以通过Docker快速部署并使用该服务。 ## 二、环境准备 在开始部署之前,请确保您的环境满足以下要求: 1. 已安装Docker和Docker Compose。 2. 具备一定的Linux命令行操作基础。 3. 确保您的服务器具备足够的计算资源(CPU、内存、GPU等)。 ## 三、获取Docker镜像 我们提供了预构建的Docker镜像,地址位于七牛云,地址为: ``` https://datacdn.data-it.tech/HomeAssistant/dokerimages/paddlespeech1.1/paddlespeech.tar ``` ## 四、在线部署步骤 ### 执行一键安装脚本 ``` curl -fsSL https://datacdn.data-it.tech/HomeAssistant/dokerimages/paddlespeech1.1/install-paddlespeech.sh | sudo bash ``` #### 如下图所示: ![](https://qncdn.tairongkj.com/docs/images/20250921094451.png) ![](https://qncdn.tairongkj.com/docs/images/20250921095852.png) ### 首次运行 使用TTS或asr服务,首次运行会自动下载模型文件,模型文件较大,***下载过程约需10分钟***,请耐心等待。 #### 如下所示 我们可以如用下命令实时查看运行日志: ``` docker logs -f paddlespeech ``` TTS模型首次使用时侍自动从网络下载约2G的模型文件,下载完成后会自动启动服务。后续再次使用时,不会重复下载模型文件。 ![](https://qncdn.tairongkj.com/docs/images/20250921093139.png) Asr模型首次使用时侍自动从网络下载约1G的模型文件,下载完成后会自动启动服务。后续再次使用时,不会重复下载模型文件。 ![](https://qncdn.tairongkj.com/docs/images/20250921093510.png) ## 五、离线部署步骤 ### 离线部署的相关文件下载地址 1. Docker镜像文件: [下载地址](https://datacdn.data-it.tech/HomeAssistant/dokerimages/paddlespeech1.1/paddlespeech.tar) 2. 数据卷文件: 以下文件要下载并复制到目标机器临时认置的目录,与install-paddlespeech-localh.sh所在目录同级的子目录data目录下 - [home_dtuser_opt_paddlespeech_data_logs.tar.gz](https://datacdn.data-it.tech/HomeAssistant/dokerimages/paddlespeech1.1/data/home_dtuser_opt_paddlespeech_data_logs.tar.gz) - [home_dtuser_opt_paddlespeech_data_nltk_data.tar.gz](https://datacdn.data-it.tech/HomeAssistant/dokerimages/paddlespeech1.1/data/home_dtuser_opt_paddlespeech_data_nltk_data.tar.gz) - [home_dtuser_opt_paddlespeech_data_output.tar.gz](https://datacdn.data-it.tech/HomeAssistant/dokerimages/paddlespeech1.1/data/home_dtuser_opt_paddlespeech_data_output.tar.gz) - [home_dtuser_opt_paddlespeech_data_paddlenlp_models.tar.gz](https://datacdn.data-it.tech/HomeAssistant/dokerimages/paddlespeech1.1/data/home_dtuser_opt_paddlespeech_data_paddlenlp_models.tar.gz) - [home_dtuser_opt_paddlespeech_data_paddlespeech_models.tar.gz](https://datacdn.data-it.tech/HomeAssistant/dokerimages/paddlespeech1.1/data/home_dtuser_opt_paddlespeech_data_paddlespeech_models.tar.gz) 3. 离线安装脚本: [下载地址](https://datacdn.data-it.tech/HomeAssistant/dokerimages/paddlespeech1.1/install-paddlespeech-localh.sh) 4. 测试演示使用网页: [下载地址](https://datacdn.data-it.tech/HomeAssistant/dokerimages/paddlespeech1.1/tts_demo.zip) ***离线安装包分两部份,一部份是容器镜像,一部份是数据卷文件,安装启动容器前先替换对据卷绑定的目录,这样容器启动后无须下载模型,可以在无互联网的情况下直接使用*** ### 1. 下载Docker镜像与数据卷文件 如下示例,我们将文件放到用户目录的down下,请保持相应的目录结构,如下所示: ```bash tst@tst-VMware-Virtual-Platform:~/down$ pwd /home/tst/down tst@tst-VMware-Virtual-Platform:~/down$ ls data install-paddlespeech-localh.sh tst@tst-VMware-Virtual-Platform:~/down$ tree . ├── data │   ├── home_dtuser_opt_paddlespeech_data_logs.tar.gz │   ├── home_dtuser_opt_paddlespeech_data_nltk_data.tar.gz │   ├── home_dtuser_opt_paddlespeech_data_output.tar.gz │   ├── home_dtuser_opt_paddlespeech_data_paddlenlp_models.tar.gz │   └── home_dtuser_opt_paddlespeech_data_paddlespeech_models.tar.gz └── install-paddlespeech-localh.sh 2 directories, 6 files ``` ![](https://qncdn.tairongkj.com/docs/images/20250921171328.png) ### 2. 给运行脚本添加执行权限 ```bash chmod +x install-paddlespeech-localh.sh ``` ### 3. 执行安装脚本 ```bash sudo ./install-paddlespeech-localh.sh ``` 按装脚本将会先解压文件,并加载docker镜像,然后启动容器,启动完成后,可以使用docker ps命令查看运行状态,如下所示: ```bash tst@tst-VMware-Virtual-Platform:~/down$ sudo ./install-paddlespeech-localh.sh [INFO] 开始 PaddleSpeech 离线安装... [SUCCESS] 磁盘空间检查通过(55GB) [INFO] 检测网络连接... [SUCCESS] 网络连接正常 [INFO] 创建目录结构... [SUCCESS] 目录结构创建完成 [INFO] 解压数据卷... [INFO] 正在解压: /home/tst/down/data/home_dtuser_opt_paddlespeech_data_logs.tar.gz 到 /dt_opt/paddlespeech/data/logs ./ [INFO] 正在解压: /home/tst/down/data/home_dtuser_opt_paddlespeech_data_nltk_data.tar.gz 到 /dt_opt/paddlespeech/data/nltk_data ./ ./taggers/ ./taggers/averaged_perceptron_tagger/ ./taggers/averaged_perceptron_tagger/averaged_perceptron_tagger.pickle ./taggers/averaged_perceptron_tagger.zip ./corpora/ ./corpora/cmudict/ ./corpora/cmudict/cmudict ./corpora/cmudict/README ./corpora/cmudict.zip [INFO] 正在解压: /home/tst/down/data/home_dtuser_opt_paddlespeech_data_output.tar.gz 到 /dt_opt/paddlespeech/data/output ./ ./2753d658-b228-4ae6-94fc-3404a41f4526.wav ./9f24d71e-9f29-4d60-aa13-677fd19bba46.wav ./38ea94c6-e19b-4e41-8cca-454d8e374a15.wav ./d44aa891-c36b-473f-aef3-44484f0cd74b.wav ./c5ef53d3-502b-4f81-9fdd-583836ba7096.wav ./3931cd47-e022-4385-b588-4a61e64e5eed.wav ./1d213380-eee6-43ba-88db-99ba9b5609df.wav ./c85f65de-a4c1-43aa-a251-52ce4e9df271.wav ./7cd5fc65-665b-444c-9c9f-41f1ce8036f8.wav ./f74dbda7-1cf8-4056-bd30-523b74af8b0e.wav [INFO] 正在解压: /home/tst/down/data/home_dtuser_opt_paddlespeech_data_paddlenlp_models.tar.gz 到 /dt_opt/paddlespeech/data/paddlenlp_models ./ ./packages/ ./models/ ./models/bert-base-chinese/ ./models/bert-base-chinese/bert-base-chinese-vocab.txt ./models/bert-base-chinese/tokenizer_config.json ./models/bert-base-chinese/vocab.txt ./models/bert-base-chinese/special_tokens_map.json ./models/embeddings/ ./models/.locks/ ./models/.locks/bert-base-chinese/ ./datasets/ [INFO] 正在解压: /home/tst/down/data/home_dtuser_opt_paddlespeech_data_paddlespeech_models.tar.gz 到 /dt_opt/paddlespeech/data/paddlespeech_models ./ ./conf/ ./conf/cache.yaml ./models/ ./models/G2PWModel_1.1/ ./models/G2PWModel_1.1/config.py ./models/G2PWModel_1.1/MONOPHONIC_CHARS.txt ./models/G2PWModel_1.1/__pycache__/ ./models/G2PWModel_1.1/__pycache__/config.cpython-39.pyc ./models/G2PWModel_1.1/__pycache__/config.cpython-310.pyc ./models/G2PWModel_1.1/record.log ./models/G2PWModel_1.1/bopomofo_to_pinyin_wo_tune_dict.json ./models/G2PWModel_1.1/POLYPHONIC_CHARS.txt ./models/G2PWModel_1.1/char_bopomofo_dict.json ./models/G2PWModel_1.1/g2pW.onnx ./models/G2PWModel_1.1/version ./models/G2PWModel_1.1.zip ./models/conformer_wenetspeech-zh-16k/ ./models/conformer_wenetspeech-zh-16k/1.0/ ./models/conformer_wenetspeech-zh-16k/1.0/asr1_conformer_wenetspeech_ckpt_0.1.1.model.tar/ ./models/conformer_wenetspeech-zh-16k/1.0/asr1_conformer_wenetspeech_ckpt_0.1.1.model.tar/conf/ ./models/conformer_wenetspeech-zh-16k/1.0/asr1_conformer_wenetspeech_ckpt_0.1.1.model.tar/conf/preprocess.yaml ./models/conformer_wenetspeech-zh-16k/1.0/asr1_conformer_wenetspeech_ckpt_0.1.1.model.tar/conf/tuning/ ./models/conformer_wenetspeech-zh-16k/1.0/asr1_conformer_wenetspeech_ckpt_0.1.1.model.tar/conf/tuning/decode.yaml ./models/conformer_wenetspeech-zh-16k/1.0/asr1_conformer_wenetspeech_ckpt_0.1.1.model.tar/conf/conformer.yaml ./models/conformer_wenetspeech-zh-16k/1.0/asr1_conformer_wenetspeech_ckpt_0.1.1.model.tar/conf/conformer_infer.yaml ./models/conformer_wenetspeech-zh-16k/1.0/asr1_conformer_wenetspeech_ckpt_0.1.1.model.tar/data/ ./models/conformer_wenetspeech-zh-16k/1.0/asr1_conformer_wenetspeech_ckpt_0.1.1.model.tar/data/lang_char/ ./models/conformer_wenetspeech-zh-16k/1.0/asr1_conformer_wenetspeech_ckpt_0.1.1.model.tar/data/lang_char/vocab.txt ./models/conformer_wenetspeech-zh-16k/1.0/asr1_conformer_wenetspeech_ckpt_0.1.1.model.tar/data/mean_std.json ./models/conformer_wenetspeech-zh-16k/1.0/asr1_conformer_wenetspeech_ckpt_0.1.1.model.tar/model.yaml ./models/conformer_wenetspeech-zh-16k/1.0/asr1_conformer_wenetspeech_ckpt_0.1.1.model.tar/exp/ ./models/conformer_wenetspeech-zh-16k/1.0/asr1_conformer_wenetspeech_ckpt_0.1.1.model.tar/exp/conformer/ ./models/conformer_wenetspeech-zh-16k/1.0/asr1_conformer_wenetspeech_ckpt_0.1.1.model.tar/exp/conformer/checkpoints/ ./models/conformer_wenetspeech-zh-16k/1.0/asr1_conformer_wenetspeech_ckpt_0.1.1.model.tar/exp/conformer/checkpoints/wenetspeech.pdparams ./models/conformer_wenetspeech-zh-16k/1.0/asr1_conformer_wenetspeech_ckpt_0.1.1.model.tar.gz ./models/fastspeech2_csmsc-zh/ ./models/fastspeech2_csmsc-zh/1.0/ ./models/fastspeech2_csmsc-zh/1.0/fastspeech2_nosil_baker_ckpt_0.4.zip ./models/fastspeech2_csmsc-zh/1.0/fastspeech2_nosil_baker_ckpt_0.4/ ./models/fastspeech2_csmsc-zh/1.0/fastspeech2_nosil_baker_ckpt_0.4/snapshot_iter_76000.pdz ./models/fastspeech2_csmsc-zh/1.0/fastspeech2_nosil_baker_ckpt_0.4/phone_id_map.txt ./models/fastspeech2_csmsc-zh/1.0/fastspeech2_nosil_baker_ckpt_0.4/default.yaml ./models/fastspeech2_csmsc-zh/1.0/fastspeech2_nosil_baker_ckpt_0.4/speech_stats.npy ./models/fastspeech2_csmsc-zh/1.0/fastspeech2_nosil_baker_ckpt_0.4/pitch_stats.npy ./models/fastspeech2_csmsc-zh/1.0/fastspeech2_nosil_baker_ckpt_0.4/energy_stats.npy ./models/hifigan_csmsc-zh/ ./models/hifigan_csmsc-zh/1.0/ ./models/hifigan_csmsc-zh/1.0/hifigan_csmsc_ckpt_0.1.1.zip ./models/hifigan_csmsc-zh/1.0/hifigan_csmsc_ckpt_0.1.1/ ./models/hifigan_csmsc-zh/1.0/hifigan_csmsc_ckpt_0.1.1/default.yaml ./models/hifigan_csmsc-zh/1.0/hifigan_csmsc_ckpt_0.1.1/snapshot_iter_2500000.pdz ./models/hifigan_csmsc-zh/1.0/hifigan_csmsc_ckpt_0.1.1/feats_stats.npy ./datasets/ [SUCCESS] 数据卷解压完成 [INFO] 解压镜像包... [SUCCESS] 镜像包已移动到 /dt_opt/paddlespeech/images [INFO] 加载Docker镜像... daf557c4f08e: Loading layer [==================================================>] 81.04MB/81.04MB d593918d433d: Loading layer [==================================================>] 4.123MB/4.123MB 7c4cff21e743: Loading layer [==================================================>] 41.36MB/41.36MB 25f6bb353e99: Loading layer [==================================================>] 5.12kB/5.12kB bc3a46414f17: Loading layer [==================================================>] 1.536kB/1.536kB 8fc7185920f1: Loading layer [==================================================>] 3.584kB/3.584kB bac2a7f893df: Loading layer [==================================================>] 847MB/847MB 092bab498a3d: Loading layer [==================================================>] 13.31MB/13.31MB 19db93cce63e: Loading layer [==================================================>] 2.56kB/2.56kB 2fc7499dded4: Loading layer [==================================================>] 3.042GB/3.042GB 7da6b60d7eb7: Loading layer [==================================================>] 2.53MB/2.53MB 057242764992: Loading layer [==================================================>] 8.704kB/8.704kB 9a996f3d539d: Loading layer [==================================================>] 3.072kB/3.072kB 4cdf0570391d: Loading layer [==================================================>] 104.2MB/104.2MB Loaded image: dt_iot/paddlespeech:latest [SUCCESS] 镜像加载成功 [INFO] 生成docker-compose.yaml... [SUCCESS] docker-compose.yaml创建完成 [INFO] 启动容器... [+] Running 2/2 ✔ Network dtnet Created 0.5s ✔ Container paddlespeech Started 4.6s [SUCCESS] 容器启动成功 [SUCCESS] PaddleSpeech 离线安装完成! ``` ### 4. 查看容器运行状态 ```bash docker ps ``` ### 5. 实时查看容器日志 ```bash docker logs -f paddlespeech ``` ![](https://qncdn.tairongkj.com/docs/images/20250921171624.png) ### 6. 管理容器 进入容器器的部署目录: /dt_opt/paddlespeech ```bash docker ps up -d # 启动容器 docker ps down # 停止容器 docker ps restart # 重启容器 docker ps logs -f paddlespeech # 实时查看日志 docker ps exec -it paddlespeech /bin/bash # 进入容器 docker ps rm -f paddlespeech # 删除容器 docker rmi dt_iot/paddlespeech:latest # 删除镜像 docker volume rm paddlespeech_data # 删除数据卷 docker network rm dtnet # 删除网络 docker system prune -a # 清理无用的镜像、容器、数据卷、网络等 ``` 以上是常用命令,使用时根据需要灵活使用。 ### 7. 调整端口 找到文件 ``` /dt_opt/paddlespeech/docker-compose.yaml ``` 修改对应的端口号,如下所示: ![](https://qncdn.tairongkj.com/docs/images/20250921172307.png) 更改第一个端口为需要的端口号即可,如下所示: ```bash docker-compse down sudo nano ./docker-compose.yaml # 修改对应的端口号,第6行 # - "8001:8001" # 保存退出,重启容器即可生效 docker-compose up -d ``` ## 六、使用说明 为了方便的演示如何调用这个api,我们提供了一个测试网页,[下载地址](https://datacdn.data-it.tech/HomeAssistant/dokerimages/paddlespeech1.1/tts_demo.zip)为: ``` https://datacdn.data-it.tech/HomeAssistant/dokerimages/paddlespeech1.1/tts_demo.zip ``` ![](https://qncdn.tairongkj.com/docs/images/20250921170242.png) ### 1. 语音合成(TTS)服务 ##### curl示例 ```curl curl --request POST \ --url http://ikuai.m-iot.tech:58150/tts \ --header 'content-type: application/x-www-form-urlencoded' \ --data 'text=请张三到一室做检查' \ --data lang=zh \ --data spk_id=0 ``` #### HTTTP API说明 ```HTTP POST /tts HTTP/1.1 Content-Type: application/x-www-form-urlencoded Host: ikuai.m-iot.tech:58150 Content-Length: 103 text=%E8%AF%B7%E5%BC%A0%E4%B8%89%E5%88%B0%E4%B8%80%E5%AE%A4%E5%81%9A%E6%A3%80%E6%9F%A5&lang=zh&spk_id=0 ``` ##### Java 示例 ```java //asynchttp AsyncHttpClient client = new DefaultAsyncHttpClient(); client.prepare("POST", "http://ikuai.m-iot.tech:58150/tts") .setHeader("content-type", "application/x-www-form-urlencoded") .setBody("text=%E8%AF%B7%E5%BC%A0%E4%B8%89%E5%88%B0%E4%B8%80%E5%AE%A4%E5%81%9A%E6%A3%80%E6%9F%A5&lang=zh&spk_id=0") .execute() .toCompletableFuture() .thenAccept(System.out::println) .join(); client.close(); //nethttp HttpRequest request = HttpRequest.newBuilder() .uri(URI.create("http://ikuai.m-iot.tech:58150/tts")) .header("content-type", "application/x-www-form-urlencoded") .method("POST", HttpRequest.BodyPublishers.ofString("text=%E8%AF%B7%E5%BC%A0%E4%B8%89%E5%88%B0%E4%B8%80%E5%AE%A4%E5%81%9A%E6%A3%80%E6%9F%A5&lang=zh&spk_id=0")) .build(); HttpResponse response = HttpClient.newHttpClient().send(request, HttpResponse.BodyHandlers.ofString()); System.out.println(response.body()); //okhttp OkHttpClient client = new OkHttpClient(); MediaType mediaType = MediaType.parse("application/x-www-form-urlencoded"); RequestBody body = RequestBody.create(mediaType, "text=%E8%AF%B7%E5%BC%A0%E4%B8%89%E5%88%B0%E4%B8%80%E5%AE%A4%E5%81%9A%E6%A3%80%E6%9F%A5&lang=zh&spk_id=0"); Request request = new Request.Builder() .url("http://ikuai.m-iot.tech:58150/tts") .post(body) .addHeader("content-type", "application/x-www-form-urlencoded") .build(); Response response = client.newCall(request).execute(); //unirest HttpResponse response = Unirest.post("http://ikuai.m-iot.tech:58150/tts") .header("content-type", "application/x-www-form-urlencoded") .body("text=%E8%AF%B7%E5%BC%A0%E4%B8%89%E5%88%B0%E4%B8%80%E5%AE%A4%E5%81%9A%E6%A3%80%E6%9F%A5&lang=zh&spk_id=0") .asString(); ``` #### javaScript 示例 ```javascript //Xhr const data = 'text=%E8%AF%B7%E5%BC%A0%E4%B8%89%E5%88%B0%E4%B8%80%E5%AE%A4%E5%81%9A%E6%A3%80%E6%9F%A5&lang=zh&spk_id=0'; const xhr = new XMLHttpRequest(); xhr.withCredentials = true; xhr.addEventListener('readystatechange', function () { if (this.readyState === this.DONE) { console.log(this.responseText); } }); xhr.open('POST', 'http://ikuai.m-iot.tech:58150/tts'); xhr.setRequestHeader('content-type', 'application/x-www-form-urlencoded'); xhr.send(data); //Axios import axios from 'axios'; const encodedParams = new URLSearchParams(); encodedParams.set('text', '请张三到一室做检查'); encodedParams.set('lang', 'zh'); encodedParams.set('spk_id', '0'); const options = { method: 'POST', url: 'http://ikuai.m-iot.tech:58150/tts', headers: {'content-type': 'application/x-www-form-urlencoded'}, data: encodedParams, }; try { const { data } = await axios.request(options); console.log(data); } catch (error) { console.error(error); } //Fetch const url = 'http://ikuai.m-iot.tech:58150/tts'; const options = { method: 'POST', headers: {'content-type': 'application/x-www-form-urlencoded'}, body: new URLSearchParams({text: '请张三到一室做检查', lang: 'zh', spk_id: '0'}) }; try { const response = await fetch(url, options); const data = await response.json(); console.log(data); } catch (error) { console.error(error); } //JQuery const settings = { async: true, crossDomain: true, url: 'http://ikuai.m-iot.tech:58150/tts', method: 'POST', headers: { 'content-type': 'application/x-www-form-urlencoded' }, data: { text: '请张三到一室做检查', lang: 'zh', spk_id: '0' } }; $.ajax(settings).done(function (response) { console.log(response); }); ``` ##### C# 示例 ```csharp //httpclient using System.Net.Http.Headers; var client = new HttpClient(); var request = new HttpRequestMessage { Method = HttpMethod.Post, RequestUri = new Uri("http://ikuai.m-iot.tech:58150/tts"), Content = new FormUrlEncodedContent(new Dictionary { { "text", "请张三到一室做检查" }, { "lang", "zh" }, { "spk_id", "0" }, }), }; using (var response = await client.SendAsync(request)) { response.EnsureSuccessStatusCode(); var body = await response.Content.ReadAsStringAsync(); Console.WriteLine(body); } //restsharp var client = new RestClient("http://ikuai.m-iot.tech:58150/tts"); var request = new RestRequest("", Method.Post); request.AddHeader("content-type", "application/x-www-form-urlencoded"); request.AddParameter("application/x-www-form-urlencoded", "text=%E8%AF%B7%E5%BC%A0%E4%B8%89%E5%88%B0%E4%B8%80%E5%AE%A4%E5%81%9A%E6%A3%80%E6%9F%A5&lang=zh&spk_id=0", ParameterType.RequestBody); var response = client.Execute(request); ``` #### go 示例 ```go package main import ( "fmt" "strings" "net/http" "io" ) func main() { url := "http://ikuai.m-iot.tech:58150/tts" payload := strings.NewReader("text=%E8%AF%B7%E5%BC%A0%E4%B8%89%E5%88%B0%E4%B8%80%E5%AE%A4%E5%81%9A%E6%A3%80%E6%9F%A5&lang=zh&spk_id=0") req, _ := http.NewRequest("POST", url, payload) req.Header.Add("content-type", "application/x-www-form-urlencoded") res, _ := http.DefaultClient.Do(req) defer res.Body.Close() body, _ := io.ReadAll(res.Body) fmt.Println(res) fmt.Println(string(body)) } ``` ##### 返回结果 ``` Preparing request to http://ikuai.m-iot.tech:58150/tts Current time is 2025-09-21T07:49:17.236Z POST http://ikuai.m-iot.tech:58150/tts Accept: application/json, text/plain, */* Content-Type: application/x-www-form-urlencoded User-Agent: bruno-runtime/2.3.0 text=%E8%AF%B7%E5%BC%A0%E4%B8%89%E5%88%B0%E4%B8%80%E5%AE%A4%E5%81%9A%E6%A3%80%E6%9F%A5&lang=zh&spk_id=0 SSL validation: enabled HTTP/1.1 200 OK date: Sun, 21 Sep 2025 07:49:18 GMT server: uvicorn content-type: audio/wav content-disposition: attachment; filename="7b41c15b-e0a4-4ba5-aeb1-ceff058c90e6.wav" content-length: 89444 last-modified: Sun, 21 Sep 2025 07:49:20 GMT etag: d85eaa2cadf7182ac9b54763e6a97e2c request-duration: 2224 Request completed in 2224 ms ``` ![](https://qncdn.tairongkj.com/docs/images/20250921160616.png) ![](https://qncdn.tairongkj.com/docs/images/20250921160647.png) ![](https://qncdn.tairongkj.com/docs/images/20250921160716.png) ![](https://qncdn.tairongkj.com/docs/images/20250921160755.png) ### 2. 语音识别(ASR)服务 #### 上传文件调用示例 ##### curl示例 ```curl curl --request POST \ --url http://ikuai.m-iot.tech:58150/asr \ --header 'content-type: multipart/form-data' \ --form 'file=@C:\Users\trphoenix\Documents\录音\录音 (6).m4a' \ --form lang=zh \ --form format=wav \ --form sample_rate=16000 ``` 直接上传文件,文件格式支持wav、m4a、mp3等常见音频格式,服务端会自动转换为wav格式进行识别。 ```curl curl --request POST \ --url http://ikuai.m-iot.tech:58150/asr \ --header 'content-type: multipart/form-data' \ --form 'file=@C:\Users\trphoenix\Documents\录音\录音 (6).m4a' \ --form lang=zh ``` 传入参数可以只有file与lang这两个参数,其它参数,可以忽略,如下图所示 ![](https://qncdn.tairongkj.com/docs/images/20250921162309.png) ##### HTTP 示例说明 ```HTTP POST /asr HTTP/1.1 Content-Type: multipart/form-data; boundary=---011000010111000001101001 Host: ikuai.m-iot.tech:58150 Content-Length: 239 -----011000010111000001101001 Content-Disposition: form-data; name="file" C:\Users\trphoenix\Documents\录音\录音 (6).m4a -----011000010111000001101001 Content-Disposition: form-data; name="lang" zh -----011000010111000001101001-- ``` ##### java 示例 ```java // asynchttp AsyncHttpClient client = new DefaultAsyncHttpClient(); client.prepare("POST", "http://ikuai.m-iot.tech:58150/asr") .setHeader("content-type", "multipart/form-data; boundary=---011000010111000001101001") .setBody("-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"file\"\r\n\r\nC:\\Users\\trphoenix\\Documents\\录音\\录音 (6).m4a\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"lang\"\r\n\r\nzh\r\n-----011000010111000001101001--\r\n\r\n") .execute() .toCompletableFuture() .thenAccept(System.out::println) .join(); client.close(); // nethttp HttpRequest request = HttpRequest.newBuilder() .uri(URI.create("http://ikuai.m-iot.tech:58150/asr")) .header("content-type", "multipart/form-data; boundary=---011000010111000001101001") .method("POST", HttpRequest.BodyPublishers.ofString("-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"file\"\r\n\r\nC:\\Users\\trphoenix\\Documents\\录音\\录音 (6).m4a\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"lang\"\r\n\r\nzh\r\n-----011000010111000001101001--\r\n\r\n")) .build(); HttpResponse response = HttpClient.newHttpClient().send(request, HttpResponse.BodyHandlers.ofString()); System.out.println(response.body()); // okhttp OkHttpClient client = new OkHttpClient(); MediaType mediaType = MediaType.parse("multipart/form-data; boundary=---011000010111000001101001"); RequestBody body = RequestBody.create(mediaType, "-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"file\"\r\n\r\nC:\\Users\\trphoenix\\Documents\\录音\\录音 (6).m4a\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"lang\"\r\n\r\nzh\r\n-----011000010111000001101001--\r\n\r\n"); Request request = new Request.Builder() .url("http://ikuai.m-iot.tech:58150/asr") .post(body) .addHeader("content-type", "multipart/form-data; boundary=---011000010111000001101001") .build(); Response response = client.newCall(request).execute(); // unirest HttpResponse response = Unirest.post("http://ikuai.m-iot.tech:58150/asr") .header("content-type", "multipart/form-data; boundary=---011000010111000001101001") .body("-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"file\"\r\n\r\nC:\\Users\\trphoenix\\Documents\\录音\\录音 (6).m4a\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"lang\"\r\n\r\nzh\r\n-----011000010111000001101001--\r\n\r\n") .asString(); ``` ##### javascript 示例 ```javascript //xhr const data = new FormData(); data.append('file', 'C:\Users\trphoenix\Documents\录音\录音 (6).m4a'); data.append('lang', 'zh'); const xhr = new XMLHttpRequest(); xhr.withCredentials = true; xhr.addEventListener('readystatechange', function () { if (this.readyState === this.DONE) { console.log(this.responseText); } }); xhr.open('POST', 'http://ikuai.m-iot.tech:58150/asr'); xhr.send(data); // Axios import axios from 'axios'; const form = new FormData(); form.append('file', 'C:\Users\trphoenix\Documents\录音\录音 (6).m4a'); form.append('lang', 'zh'); const options = { method: 'POST', url: 'http://ikuai.m-iot.tech:58150/asr', headers: {'content-type': 'multipart/form-data; boundary=---011000010111000001101001'}, data: '[form]' }; try { const { data } = await axios.request(options); console.log(data); } catch (error) { console.error(error); } // fetch const url = 'http://ikuai.m-iot.tech:58150/asr'; const form = new FormData(); form.append('file', 'C:\Users\trphoenix\Documents\录音\录音 (6).m4a'); form.append('lang', 'zh'); const options = {method: 'POST'}; options.body = form; try { const response = await fetch(url, options); const data = await response.json(); console.log(data); } catch (error) { console.error(error); } //jquery const form = new FormData(); form.append('file', 'C:\Users\trphoenix\Documents\录音\录音 (6).m4a'); form.append('lang', 'zh'); const settings = { async: true, crossDomain: true, url: 'http://ikuai.m-iot.tech:58150/asr', method: 'POST', headers: {}, processData: false, contentType: false, mimeType: 'multipart/form-data', data: form }; $.ajax(settings).done(function (response) { console.log(response); }); ``` ##### C# 示例 ```csharp //httpclient using System.Net.Http.Headers; var client = new HttpClient(); var request = new HttpRequestMessage { Method = HttpMethod.Post, RequestUri = new Uri("http://ikuai.m-iot.tech:58150/asr"), Content = new MultipartFormDataContent { new StringContent(["C:\\Users\\trphoenix\\Documents\\录音\\录音 (6).m4a"]) { Headers = { ContentDisposition = new ContentDispositionHeaderValue("form-data") { Name = "file", FileName = "C:\Users\trphoenix\Documents\录音\录音 (6).m4a", } } }, new StringContent("zh") { Headers = { ContentDisposition = new ContentDispositionHeaderValue("form-data") { Name = "lang", } } }, }, }; using (var response = await client.SendAsync(request)) { response.EnsureSuccessStatusCode(); var body = await response.Content.ReadAsStringAsync(); Console.WriteLine(body); } //Restsharp var client = new RestClient("http://ikuai.m-iot.tech:58150/asr"); var request = new RestRequest("", Method.Post); request.AddHeader("content-type", "multipart/form-data; boundary=---011000010111000001101001"); request.AddParameter("multipart/form-data; boundary=---011000010111000001101001", "-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"file\"\r\n\r\nC:\\Users\\trphoenix\\Documents\\录音\\录音 (6).m4a\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"lang\"\r\n\r\nzh\r\n-----011000010111000001101001--\r\n\r\n", ParameterType.RequestBody); var response = client.Execute(request); ``` ##### go 示例 ```go package main import ( "fmt" "strings" "net/http" "io" ) func main() { url := "http://ikuai.m-iot.tech:58150/asr" payload := strings.NewReader("-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"file\"\r\n\r\nC:\\Users\\trphoenix\\Documents\\录音\\录音 (6).m4a\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"lang\"\r\n\r\nzh\r\n-----011000010111000001101001--\r\n\r\n") req, _ := http.NewRequest("POST", url, payload) req.Header.Add("content-type", "multipart/form-data; boundary=---011000010111000001101001") res, _ := http.DefaultClient.Do(req) defer res.Body.Close() body, _ := io.ReadAll(res.Body) fmt.Println(res) fmt.Println(string(body)) } ``` ##### java 示例 ```java // asynchttp AsyncHttpClient client = new DefaultAsyncHttpClient(); client.prepare("POST", "http://ikuai.m-iot.tech:58150/asr") .setHeader("content-type", "multipart/form-data; boundary=---011000010111000001101001") .setBody("-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"file\"\r\n\r\nC:\\Users\\trphoenix\\Documents\\录音\\录音 (6).m4a\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"lang\"\r\n\r\nzh\r\n-----011000010111000001101001--\r\n\r\n") .execute() .toCompletableFuture() .thenAccept(System.out::println) .join(); client.close(); // nethttp HttpRequest request = HttpRequest.newBuilder() .uri(URI.create("http://ikuai.m-iot.tech:58150/asr")) .header("content-type", "multipart/form-data; boundary=---011000010111000001101001") .method("POST", HttpRequest.BodyPublishers.ofString("-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"file\"\r\n\r\nC:\\Users\\trphoenix\\Documents\\录音\\录音 (6).m4a\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"lang\"\r\n\r\nzh\r\n-----011000010111000001101001--\r\n\r\n")) .build(); HttpResponse response = HttpClient.newHttpClient().send(request, HttpResponse.BodyHandlers.ofString()); System.out.println(response.body()); // okhttp OkHttpClient client = new OkHttpClient(); MediaType mediaType = MediaType.parse("multipart/form-data; boundary=---011000010111000001101001"); RequestBody body = RequestBody.create(mediaType, "-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"file\"\r\n\r\nC:\\Users\\trphoenix\\Documents\\录音\\录音 (6).m4a\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"lang\"\r\n\r\nzh\r\n-----011000010111000001101001--\r\n\r\n"); Request request = new Request.Builder() .url("http://ikuai.m-iot.tech:58150/asr") .post(body) .addHeader("content-type", "multipart/form-data; boundary=---011000010111000001101001") .build(); Response response = client.newCall(request).execute(); // unirest HttpResponse response = Unirest.post("http://ikuai.m-iot.tech:58150/asr") .header("content-type", "multipart/form-data; boundary=---011000010111000001101001") .body("-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"file\"\r\n\r\nC:\\Users\\trphoenix\\Documents\\录音\\录音 (6).m4a\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"lang\"\r\n\r\nzh\r\n-----011000010111000001101001--\r\n\r\n") .asString(); ``` ##### 返回结果 ```json { "result": "喂语音测试语音测试录音测试录音测试", "file_size": 185334, "wav_info": { "n_channels": 1, "sample_width": 2, "framerate": 16000, "n_frames": 115029, "duration": 7.1893125 }, "ffmpeg_log": "" } ``` ![](https://qncdn.tairongkj.com/docs/images/20250921164758.png) ***body*** ```json { "_overheadLength": 252, "_valueLength": 2, "_valuesToMeasure": [ { "fd": null, "path": "C:\\Users\\trphoenix\\Documents\\录音\\录音 (6).m4a", "flags": "r", "mode": 438, "end": null, "bytesRead": 0, "_events": {}, "_readableState": { "highWaterMark": 65536, "buffer": [], "bufferIndex": 0, "length": 0, "pipes": [], "awaitDrainWriters": null }, "_eventsCount": 3 } ], "writable": false, "readable": true, "dataSize": 0, "maxDataSize": 2097152, "pauseStreams": true, "_released": false, "_streams": [ "----------------------------147555471668548578107544\r\nContent-Disposition: form-data; name=\"file\"; filename=\"录音 (6).m4a\"\r\nContent-Type: wav\r\n\r\n", { "source": "[Circular]", "dataSize": 0, "maxDataSize": null, "pauseStream": true, "_maxDataSizeExceeded": false, "_released": false, "_bufferedEvents": [ { "0": "pause" } ], "_events": {}, "_eventsCount": 1 }, null, "----------------------------147555471668548578107544\r\nContent-Disposition: form-data; name=\"lang\"\r\n\r\n", "zh", null ], "_currentStream": null, "_insideLoop": false, "_pendingNext": false, "_boundary": "--------------------------147555471668548578107544" } ``` ![](https://qncdn.tairongkj.com/docs/images/20250921164909.png) ![](https://qncdn.tairongkj.com/docs/images/20250921164934.png) ***Network Logs *** ``` Preparing request to http://ikuai.m-iot.tech:58150/asr Current time is 2025-09-21T08:22:20.590Z POST http://ikuai.m-iot.tech:58150/asr Accept: application/json, text/plain, */* Content-Type: multipart/form-data; boundary=--------------------------147555471668548578107544 User-Agent: bruno-runtime/2.3.0 { "_overheadLength": 252, "_valueLength": 2, "_valuesToMeasure": [ { "fd": null, "path": "C:\\Users\\trphoenix\\Documents\\录音\\录音 (6).m4a", "flags": "r", "mode": 438, "end": null, "bytesRead": 0, "_events": {}, "_readableState": { "highWaterMark": 65536, "buffer": [], "bufferIndex": 0, "length": 0, "pipes": [], "awaitDrainWriters": null }, "_eventsCount": 3 } ], "writable": false, "readable": true, "dataSize": 0, "maxDataSize": 2097152, "pauseStreams": true, "_released": false, "_streams": [ "----------------------------147555471668548578107544\r\nContent-Disposition: form-data; name=\"file\"; filename=\"录音 (6).m4a\"\r\nContent-Type: wav\r\n\r\n", { "source": { "fd": null, "path": "C:\\Users\\trphoenix\\Documents\\录音\\录音 (6).m4a", "flags": "r", "mode": 438, "end": null, "bytesRead": 0, "_events": {}, "_readableState": { "highWaterMark": 65536, "buffer": [], "bufferIndex": 0, "length": 0, "pipes": [], "awaitDrainWriters": null }, "_eventsCount": 3 }, "dataSize": 0, "maxDataSize": null, "pauseStream": true, "_maxDataSizeExceeded": false, "_released": false, "_bufferedEvents": [ { "0": "pause" } ], "_events": {}, "_eventsCount": 1 }, null, "----------------------------147555471668548578107544\r\nContent-Disposition: form-data; name=\"lang\"\r\n\r\n", "zh", null ], "_currentStream": null, "_insideLoop": false, "_pendingNext": false, "_boundary": "--------------------------147555471668548578107544" } SSL validation: enabled HTTP/1.1 200 OK date: Sun, 21 Sep 2025 08:22:21 GMT server: uvicorn content-length: 201 content-type: application/json request-duration: 6989 Request completed in 6989 ms ``` #### 浏览器录音识别示例 ![](https://qncdn.tairongkj.com/docs/images/20250921165339.png) ![](https://qncdn.tairongkj.com/docs/images/20250921165403.png) ![](https://qncdn.tairongkj.com/docs/images/20250921165449.png) ![](https://qncdn.tairongkj.com/docs/images/20250921165607.png) ![](https://qncdn.tairongkj.com/docs/images/20250921165632.png) ![](https://qncdn.tairongkj.com/docs/images/20250921165652.png) ***说明*** paddlespeech中用的一个库有可能是单线程的,所以在高并发场景下,可能会出现响应变慢的情况,这个时候,可以考虑增加实例来解决这个问题。 实际使用时,如果想提高并发能力,可以尝试更改环境变量中的PYTHONUNBUFFFERED=4,或更高一个适合的数字,来提高并发能力。 ![](https://qncdn.tairongkj.com/docs/images/20250921172804.png) ***《完》***