寻求语音识别可区分发言人的开源技术手段

2026-04-13 12:531阅读0评论SEO资源

内容介绍
文章标签
相关推荐

问题描述：

RT
最近在找一款可开源部署的区分发言人的语音识别项目
有没有佬友推荐一下

网友解答：

--【壹】--：

有，kaldi k2.

github.com

GitHub - k2-fsa/sherpa-onnx: Speech-to-text, text-to-speech, speaker...

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, RK NPU, Axera NPU, Ascend NPU, x86_64 servers, websocket server/client, support 12 programming languages

image913×1034 67.1 KB

--【贰】--：

感谢晚上布一个看看

--【叁】--：

嘶我记得我们之前公司有用一个这个玩意但是我记不得了

--【肆】--：

开源的最简单的不就是pyannote吗

很大一部分都是基于这个魔改的

然后就是微软的vibevoice 也是开源的

--【伍】--：

github.com

GitHub - MahmoudAshraf97/whisper-diarization: Automatic Speech Recognition with Speaker...

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

github.com

GitHub - FunAudioLLM/Fun-ASR: Fun-ASR is an end-to-end speech recognition large...

Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.

github.com

GitHub - wenet-e2e/wespeaker: Research and Production Oriented Speaker...

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit

这是GPT找的几个项目，佬可以看看

--【陆】--：

如果想起来了请踢一下

--【柒】--：

github.com

GitHub - modelscope/3D-Speaker: A Repository for Single- and Multi-modal Speaker...

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

github.com

GitHub - pyannote/pyannote-audio: Neural building blocks for speaker diarization:...

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

不过我看 Benchmark ，错误率十几二十几很正常。

--【捌】--：

感觉与人数没有太大关系，主要是多人同时说话，无法有效区分，你可以自行换里面的声纹区分的模型（CAM++等），虽然，我感觉察觉也不大，这是不需要提前设定声纹库的，多人轮流效果还是很好的

--【玖】--：

飞书有这个功能（具体没了解过，因为看我司会议总结有这个）
不过不是开源的

--【拾】--：

插眼同求

--【拾壹】--：

场景约 2～6 人

--【拾贰】--：

嗯，这个看似可行对我来说也只要区分开 A B C 谁说话就行

--【拾叁】--：

image1653×1083 151 KB

noScribe 了解一下。
之前有个需求是转会议记录，要区分人说话。
效果感觉还可以。

--【拾肆】--：

阿里的FunASR，之前项目用的，效果还不错
当然，多人同时说话的效果还是不那么行

--【拾伍】--：

飞书用不了 hh

标签：人工智能快问快答软件开发

问题描述：

RT
最近在找一款可开源部署的区分发言人的语音识别项目
有没有佬友推荐一下

网友解答：

--【壹】--：

有，kaldi k2.

github.com

GitHub - k2-fsa/sherpa-onnx: Speech-to-text, text-to-speech, speaker...

image913×1034 67.1 KB

--【贰】--：

感谢晚上布一个看看

--【叁】--：

嘶我记得我们之前公司有用一个这个玩意但是我记不得了

--【肆】--：

开源的最简单的不就是pyannote吗

很大一部分都是基于这个魔改的

然后就是微软的vibevoice 也是开源的

--【伍】--：

github.com

GitHub - MahmoudAshraf97/whisper-diarization: Automatic Speech Recognition with Speaker...

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

github.com

GitHub - FunAudioLLM/Fun-ASR: Fun-ASR is an end-to-end speech recognition large...

Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.

github.com

GitHub - wenet-e2e/wespeaker: Research and Production Oriented Speaker...

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit

这是GPT找的几个项目，佬可以看看

--【陆】--：

如果想起来了请踢一下

--【柒】--：

github.com

GitHub - modelscope/3D-Speaker: A Repository for Single- and Multi-modal Speaker...

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

github.com

GitHub - pyannote/pyannote-audio: Neural building blocks for speaker diarization:...

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

不过我看 Benchmark ，错误率十几二十几很正常。

--【捌】--：

--【玖】--：

飞书有这个功能（具体没了解过，因为看我司会议总结有这个）
不过不是开源的

--【拾】--：

插眼同求

--【拾壹】--：

场景约 2～6 人

--【拾贰】--：

嗯，这个看似可行对我来说也只要区分开 A B C 谁说话就行

--【拾叁】--：

image1653×1083 151 KB

noScribe 了解一下。
之前有个需求是转会议记录，要区分人说话。
效果感觉还可以。

--【拾肆】--：

阿里的FunASR，之前项目用的，效果还不错
当然，多人同时说话的效果还是不那么行

--【拾伍】--：

飞书用不了 hh

标签：人工智能快问快答软件开发

GitHub - k2-fsa/sherpa-onnx: Speech-to-text, text-to-speech, speaker...

GitHub - MahmoudAshraf97/whisper-diarization: Automatic Speech Recognition with Speaker...

GitHub - FunAudioLLM/Fun-ASR: Fun-ASR is an end-to-end speech recognition large...

GitHub - wenet-e2e/wespeaker: Research and Production Oriented Speaker...

GitHub - modelscope/3D-Speaker: A Repository for Single- and Multi-modal Speaker...

GitHub - pyannote/pyannote-audio: Neural building blocks for speaker diarization:...

相关推荐

GitHub - k2-fsa/sherpa-onnx: Speech-to-text, text-to-speech, speaker...

GitHub - MahmoudAshraf97/whisper-diarization: Automatic Speech Recognition with Speaker...

GitHub - FunAudioLLM/Fun-ASR: Fun-ASR is an end-to-end speech recognition large...

GitHub - wenet-e2e/wespeaker: Research and Production Oriented Speaker...

GitHub - modelscope/3D-Speaker: A Repository for Single- and Multi-modal Speaker...

GitHub - pyannote/pyannote-audio: Neural building blocks for speaker diarization:...

相关推荐