医生端语音结构化自动填表技术方案

📋 需求背景

在医生PC端的HIS系统中,通过增加语音按钮实现:

医生语音输入病情描述
系统自动结构化识别
自动填写到HIS表单字段中

约束条件:

✅ 不限制浏览器类型
✅ 无法连接外网
✅ 只能增加按钮,不能修改HIS系统
✅ 需要通用方案

🎯 推荐方案: 浏览器扩展 + 本地语音识别

方案架构

┌─────────────────────────────────────────────────┐
│              医生浏览器环境                       │
├─────────────────────────────────────────────────┤
│  ┌──────────┐      ┌──────────────┐            │
│  │ HIS系统   │  ↔   │ Chrome扩展   │            │
│  │ (网页)    │      │ (内容脚本)    │            │
│  └──────────┘      └──────────────┘            │
│       ↑                    ↕                    │
│       │              ┌──────────────┐            │
│       │              │ Background   │            │
│       │              │ Script       │            │
│       │              └──────────────┘            │
│       │                    ↕                    │
│       │              ┌──────────────┐            │
│       └────────────→│ 本地语音服务  │            │
│                      │ (内网部署)    │            │
│                      └──────────────┘            │
└─────────────────────────────────────────────────┘

技术栈

层级	技术选型	说明
界面注入	Chrome Extension	在HIS页面添加"语音输入"按钮
语音识别	Web Speech API	浏览器内置,无需外网
结构化处理	本地NLP服务	内网部署的医疗NLP模型
表单填写	DOM Manipulation	通过JS自动填表

🔧 核心技术原理

1. 浏览器扩展工作原理

// manifest.json - 扩展配置文件
{
  "manifest_version": 3,
  "name": "医生语音助手",
  "permissions": ["activeTab", "scripting"],
  "content_scripts": [{
    "matches": ["<all_urls>"],
    "js": ["content.js"]
  }],
  "background": {
    "service_worker": "background.js"
  }
}

工作流程:

Content Script: 在HIS页面注入,添加语音按钮
监听按钮点击: 启动录音
发送到Background: 处理语音识别(避免跨域问题)
返回结构化数据: Content Script接收并填写表单

2. 语音识别实现

方案A: Web Speech API (推荐)

// content.js
const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)();

recognition.lang = 'zh-CN';
recognition.continuous = false;
recognition.interimResults = false;

recognition.onresult = (event) => {
  const transcript = event.results[0][0].transcript;
  console.log('识别结果:', transcript);
  // 发送到本地NLP服务进行结构化
  sendToNLPServer(transcript);
};

recognition.start();

优点:

✅ 浏览器原生支持,无需插件
✅ 完全本地运行,无需外网
✅ Chrome/Edge/Safari均支持

缺点:

⚠️ 识别准确率依赖浏览器
⚠️ 不支持Firefox(需降级方案)

方案B: 本地语音服务

# 部署开源语音识别引擎
docker run -d -p 9000:9000 \
  -e ASR_MODEL=wav2vec2 \
  ghcr.io/k2-fsa/sherpa-onnx:latest

// 调用本地语音服务
async function transcribeAudio(audioBlob) {
  const response = await fetch('http://localhost:9000/asr', {
    method: 'POST',
    body: audioBlob
  });
  return await response.json();
}

3. 医疗文本结构化

使用本地部署的医疗NLP模型:

# 本地NLP服务 (Flask示例)
from flask import Flask, request, jsonify
from transformers import AutoTokenizer, AutoModelForTokenClassification

app = Flask(__name__)

# 加载中文医疗NER模型
tokenizer = AutoTokenizer.from_pretrained("./medical-ner-model")
model = AutoModelForTokenClassification.from_pretrained("./medical-ner-model")

@app.route('/extract', methods=['POST'])
def extract_entities():
    text = request.json['text']

    # NER识别
    entities = recognize_medical_entities(text, model, tokenizer)

    # 结构化输出
    return jsonify({
        "主诉": entities.get("主诉"),
        "现病史": entities.get("现病史"),
        "诊断": entities.get("诊断"),
        "用药": entities.get("用药")
    })

def recognize_medical_entities(text, model, tokenizer):
    # 实现医疗实体识别逻辑
    # 返回结构化数据
    pass

推荐模型:

Harvest - 中文医疗文本理解
PubMedBERT - 英文医疗NER
MedSpaCy - 医疗文本处理

4. 自动填表实现

// content.js
function fillForm(structuredData) {
  // 方式1: 通过字段ID填写
  if (document.getElementById('chief_complaint')) {
    document.getElementById('chief_complaint').value = structuredData.主诉;
  }

  // 方式2: 通过字段名称填写
  const inputs = document.querySelectorAll('input[type="text"]');
  inputs.forEach(input => {
    if (input.placeholder.includes('主诉')) {
      input.value = structuredData.主诉;
    }
  });

  // 方式3: 通过XPath定位(复杂表单)
  const xpath = "//label[contains(text(), '现病史')]/../input";
  const result = document.evaluate(xpath, document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null);
  if (result.singleNodeValue) {
    result.singleNodeValue.value = structuredData.现病史;
  }

  // 触发change事件,让HIS系统识别数据变化
  document.getElementById('chief_complaint').dispatchEvent(new Event('change'));
}

📁 项目结构

medical-voice-extension/
├── manifest.json              # 扩展配置文件
├── src/
│   ├── content.js            # 内容脚本(注入HIS页面)
│   ├── background.js         # 后台脚本(处理语音)
│   ├── ui.js                 # UI组件(语音按钮)
│   └── utils/
│       ├── form-filler.js    # 表单填写工具
│       └── nlp-client.js     # NLP服务客户端
├── server/                   # 本地语音+NLP服务
│   ├── app.py               # Flask服务
│   ├── models/              # 医疗NER模型
│   └── asr/                 # 语音识别引擎
└── docs/                     # 文档
    └── deployment.md        # 部署指南

🚀 部署步骤

第一步: 安装浏览器扩展

打包扩展

cd medical-voice-extension
zip -r extension.zip . -x "*.git*" "node_modules/*"

安装到Chrome
- 打开 chrome://extensions/
- 开启"开发者模式"
- 点击"加载已解压的扩展程序"
- 选择扩展文件夹

安装到Edge

# Edge同样支持Chrome扩展
打开 edge://extensions/
重复Chrome步骤

Firefox降级方案

// 检测浏览器,Firefox使用本地语音服务
if (!window.SpeechRecognition && !window.webkitSpeechRecognition) {
 useLocalASRServer();
}

第二步: 部署本地语音服务

方案A: 使用Sherpa-ONNX(推荐)

# 1. 安装Sherpa-ONNX
git clone https://github.com/k2-fsa/sherpa-onnx
cd sherpa-onnx
mkdir build && cd build
cmake -DSHERPA_ONNX_ENABLE_PYTHON=ON ..
make -j4

# 2. 下载中文模型
cd ../
python ./scripts/sherpa-onnx-downloader.py \
  --model-type=paraformer \
  --language=zh

# 3. 启动服务
./build/bin/sherpa-onnx-online-websocket-server \
  --port=6006 \
  --model-dir=./paraformer-zh

方案B: 使用Whisper (更高准确率)

# 1. 安装依赖
pip install faster-whisper flask flask-cors

# 2. 启动服务
python asr_server.py

# asr_server.py
from flask import Flask, request, jsonify
from faster_whisper import WhisperModel
import tempfile

app = Flask(__name__)
model = WhisperModel("base", device="cpu", compute_type="int8")

@app.route('/transcribe', methods=['POST'])
def transcribe():
    audio_file = request.files['audio']

    with tempfile.NamedTemporaryFile() as tmp:
        audio_file.save(tmp.name)
        segments, info = model.transcribe(tmp.name, language="zh")

    text = "".join([segment.text for segment in segments])
    return jsonify({"text": text})

if __name__ == '__main__':
    app.run(port=5000, host='0.0.0.0')

第三步: 部署医疗NLP服务

# 1. 安装依赖
pip install flask transformers torch

# 2. 下载医疗NER模型
# 使用Harvest或自己训练的模型
git clone https://github.com/chineseGLUE/Harvest.git
cd Harvest/models
wget https://huggingface.co/HuatGPT/HuatGPT-medical-ner/resolve/main/pytorch_model.bin

# 3. 启动服务
cd ../
python nlp_server.py

# nlp_server.py
from flask import Flask, request, jsonify
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

app = Flask(__name__)

# 加载模型
tokenizer = AutoTokenizer.from_pretrained("./models/medical-ner")
model = AutoModelForTokenClassification.from_pretrained("./models/medical-ner")

@app.route('/extract', methods=['POST'])
def extract_entities():
    data = request.json
    text = data['text']

    # 分词
    inputs = tokenizer(text, return_tensors="pt")

    # 预测
    with torch.no_grad():
        outputs = model(**inputs)

    # 解析实体
    predictions = torch.argmax(outputs.logits, dim=2)
    entities = parse_entities(tokenizer, predictions[0])

    return jsonify({
        "主诉": entities.get("CHIEF_COMPLAINT", ""),
        "现病史": entities.get("HISTORY", ""),
        "诊断": entities.get("DIAGNOSIS", ""),
        "用药": entities.get("MEDICATION", "")
    })

def parse_entities(tokenizer, predictions):
    # 实现实体解析逻辑
    # 返回结构化数据
    pass

if __name__ == '__main__':
    app.run(port=5001, host='0.0.0.0')

第四步: 配置扩展连接本地服务

// src/utils/nlp-client.js
const CONFIG = {
  ASR_URL: 'http://localhost:5000/transcribe',  // 语音识别服务
  NLP_URL: 'http://localhost:5001/extract',     // NLP服务
};

async function processVoice(audioBlob) {
  // 1. 语音转文字
  const asrResult = await fetch(CONFIG.ASR_URL, {
    method: 'POST',
    body: audioBlob
  });
  const { text } = await asrResult.json();

  // 2. 文字结构化
  const nlpResult = await fetch(CONFIG.NLP_URL, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ text })
  });
  const structuredData = await nlpResult.json();

  return structuredData;
}

📖 详细中文教程资源

浏览器扩展开发

Chrome官方文档(中文)
- https://developer.chrome.com/docs/extensions/mv3/getstarted/
- 涵盖: 扩展基础、Content Scripts、消息传递
MDN Web Docs (中文)
- https://developer.mozilla.org/zh-CN/docs/Mozilla/Add-ons/WebExtensions
- 涵盖: 跨浏览器兼容性、API详解
实战教程
- 《Chrome扩展开发实战》- 电子书
- B站搜索: "Chrome扩展开发入门"

语音识别

Web Speech API教程
- https://developer.mozilla.org/zh-CN/docs/Web/API/Web_Speech_API
- 包含完整示例代码
Sherpa-ONNX中文文档
- https://github.com/k2-fsa/sherpa-onnx
- 部署指南、模型下载、API文档
Whisper中文教程
- https://github.com/openai/whisper
- Python部署指南

医疗NLP

Harvest中文医疗文本理解
- https://github.com/chineseGLUE/Harvest
- 医疗实体识别、关系抽取
Transformers中文文档
- https://huggingface.co/docs/transformers/zh
- 模型加载、推理、微调
医疗NER论文
- 《Chinese Medical Named Entity Recognition with BERT》
- 知乎搜索: "医疗NER BERT"

自动填表

DOM操作教程
- https://developer.mozilla.org/zh-CN/docs/Web/API/Document_Object_Model
- MDN官方中文文档
XPath教程
- https://www.w3school.com.cn/xpath/index.asp
- 复杂表单字段定位

⚠️ 注意事项

1. 浏览器兼容性

浏览器	Web Speech API	扩展支持	推荐方案
Chrome	✅	✅	使用Web Speech API
Edge	✅	✅	使用Web Speech API
Firefox	❌	✅	使用本地ASR服务
Safari	✅	⚠️	需测试扩展支持

2. HIS系统适配

常见问题:

❌ HIS系统使用Shadow DOM → 使用穿透选择器
❌ 字段ID动态生成 → 使用XPath或占位符匹配
❌ 表单是iframe → 检测并切换到iframe内容

解决方案:

// 检测iframe
function fillFormInIframe(data) {
  const iframe = document.querySelector('iframe');
  if (iframe) {
    const iframeDoc = iframe.contentDocument || iframe.contentWindow.document;
    const input = iframeDoc.getElementById('field_id');
    input.value = data.主诉;
  }
}

// Shadow DOM穿透
function queryShadowSelector(selector) {
  const elements = [];
  const walk = (node) => {
    if (node.shadowRoot) {
      const el = node.shadowRoot.querySelector(selector);
      if (el) elements.push(el);
      walk(node.shadowRoot);
    }
    node.childNodes?.forEach(walk);
  };
  walk(document.body);
  return elements;
}

3. 数据安全

✅ 所有数据在内网流转,不经过外网
✅ 语音数据本地处理,不上传云端
✅ 建议对NLP服务添加身份认证

4. 性能优化

语音识别: 使用WebSocket长连接,避免重复建立
NLP推理: 使用量化模型(ONNX)减少内存占用
表单填写: 批量操作,减少重排重绘

🎯 总结

方案优势:

✅ 无需修改HIS系统,只增加按钮
✅ 支持主流浏览器(Chrome/Edge/Firefox)
✅ 完全离线运行,无需外网
✅ 通用性强,可适配不同HIS系统

技术栈:

Chrome Extension (界面注入)
Web Speech API / Sherpa-ONNX (语音识别)
Transformers + Harvest (医疗NER)
DOM Manipulation (自动填表)

推荐学习路径:

先学习Chrome扩展开发(1-2天)
掌握Web Speech API(半天)
部署本地语音服务(1天)
部署医疗NLP服务(2-3天)
实现自动填表逻辑(1天)

总开发周期: 约5-7个工作日

医生端语音填表技术方案.md 15 KB Permalink History Raw