更新数据字段,更新数据库,更新文档,更新模板
This commit is contained in:
parent
07812ea8c4
commit
7a304eab31
209
AI智能处理说明.md
Normal file
209
AI智能处理说明.md
Normal file
@ -0,0 +1,209 @@
|
||||
# AI智能模板处理说明
|
||||
|
||||
## 功能概述
|
||||
|
||||
两个脚本已升级为AI智能版本,使用硅基流动大模型API智能分析文档内容,识别需要替换为占位符的位置。
|
||||
|
||||
### 主要特性
|
||||
|
||||
1. **AI智能分析** - 使用大模型理解文档语义,识别可替换内容
|
||||
2. **组合字段识别** - 能够识别组合字段,如"山西XXXX集团有限公司(职务+姓名)"
|
||||
3. **规则匹配补充** - AI分析后,使用规则匹配作为补充
|
||||
4. **置信度过滤** - 只应用置信度大于0.7的AI替换建议
|
||||
|
||||
## 脚本说明
|
||||
|
||||
### process_templates.py
|
||||
- 处理原始模板(包含 .doc 转换)
|
||||
- 支持 AI 智能分析
|
||||
- 适用于需要自动转换 .doc 文件的场景
|
||||
|
||||
### process_templates_docx_only.py
|
||||
- 处理已转换的 .docx 文件
|
||||
- 支持 AI 智能分析
|
||||
- 适用于文件已手动转换为 .docx 的场景
|
||||
|
||||
## 环境配置
|
||||
|
||||
### 1. 确保 .env 文件配置正确
|
||||
|
||||
```env
|
||||
SILICONFLOW_API_KEY=你的API密钥
|
||||
SILICONFLOW_MODEL=deepseek-ai/DeepSeek-V3.2-Exp
|
||||
```
|
||||
|
||||
### 2. 安装依赖
|
||||
|
||||
```bash
|
||||
pip install python-docx python-dotenv requests
|
||||
```
|
||||
|
||||
## 使用方法
|
||||
|
||||
### 运行脚本
|
||||
|
||||
```bash
|
||||
# 处理已转换的 .docx 文件(推荐)
|
||||
python process_templates_docx_only.py
|
||||
|
||||
# 或处理原始模板(包含 .doc 转换)
|
||||
python process_templates.py
|
||||
```
|
||||
|
||||
## AI分析能力
|
||||
|
||||
### 1. 识别明确的字段值
|
||||
- 姓名、单位、职务等明确信息
|
||||
- 示例:`张三` → `{{target_name}}`
|
||||
|
||||
### 2. 识别示例值
|
||||
- XXX、待填等占位符
|
||||
- 示例:`被核查人姓名: XXX` → `被核查人姓名: {{target_name}}`
|
||||
|
||||
### 3. 识别组合字段
|
||||
- 包含多个字段信息的组合文本
|
||||
- 示例:
|
||||
- `山西XXXX集团有限公司(职务+姓名)` → `{{target_organization_and_position}}({{target_name}})`
|
||||
- `张三,男,1980年5月` → `{{target_name}},{{target_gender}},{{target_date_of_birth}}`
|
||||
|
||||
### 4. 识别格式化值
|
||||
- 日期、时间等格式化内容
|
||||
- 示例:`2024年12月7日` → `{{date}}`
|
||||
|
||||
### 5. 识别上下文相关字段
|
||||
- 根据上下文理解字段含义
|
||||
- 示例:在"被核查人"上下文中,`张三` 会被识别为 `{{target_name}}`
|
||||
|
||||
## 处理流程
|
||||
|
||||
### 1. AI分析阶段
|
||||
- 分析文档段落和表格单元格
|
||||
- 识别可替换内容
|
||||
- 生成替换建议(包含置信度)
|
||||
|
||||
### 2. 规则匹配阶段
|
||||
- 使用正则表达式匹配字段名称
|
||||
- 补充AI可能遗漏的替换
|
||||
|
||||
### 3. 应用替换
|
||||
- 按置信度排序替换建议
|
||||
- 只应用置信度 > 0.7 的替换
|
||||
- 避免重复替换
|
||||
|
||||
## 输出示例
|
||||
|
||||
### 成功处理
|
||||
```
|
||||
处理: 1.请示报告卡(XXX).docx
|
||||
类型: REPORT_CARD
|
||||
输入: 模板\原始模板\...\1.请示报告卡(XXX).docx
|
||||
输出: 模板\2-初核模版\1.初核请示\1.请示报告卡(XXX).docx
|
||||
✓ AI分析已启用
|
||||
处理: 1.请示报告卡(XXX).docx
|
||||
✓ 处理成功,AI识别 5 处,规则匹配 2 处
|
||||
```
|
||||
|
||||
### AI分析失败(自动降级)
|
||||
```
|
||||
处理: 2.初步核实审批表(XXX).docx
|
||||
⚠ AI分析不可用: 未配置 SILICONFLOW_API_KEY,将使用基础模式
|
||||
处理: 2.初步核实审批表(XXX).docx
|
||||
✓ 处理成功,规则匹配 8 处
|
||||
```
|
||||
|
||||
## 注意事项
|
||||
|
||||
### 1. API配置
|
||||
- 确保 `.env` 文件中配置了正确的 API 密钥
|
||||
- 如果未配置,脚本会自动降级为规则匹配模式
|
||||
|
||||
### 2. API调用限制
|
||||
- AI分析会增加处理时间(每个段落/单元格需要API调用)
|
||||
- 如果API调用失败,会自动跳过AI分析,使用规则匹配
|
||||
|
||||
### 3. 置信度阈值
|
||||
- 默认只应用置信度 > 0.7 的替换
|
||||
- 可以在代码中调整阈值
|
||||
|
||||
### 4. 人工审核
|
||||
- AI分析结果需要人工审核
|
||||
- 某些复杂场景可能需要手动调整
|
||||
|
||||
## 常见问题
|
||||
|
||||
### Q1: AI分析失败怎么办?
|
||||
|
||||
**A:** 脚本会自动降级为规则匹配模式,不会影响处理流程。检查:
|
||||
1. `.env` 文件中的 API 密钥是否正确
|
||||
2. 网络连接是否正常
|
||||
3. API 配额是否充足
|
||||
|
||||
### Q2: 某些内容没有被替换?
|
||||
|
||||
**A:** 可能原因:
|
||||
1. AI置信度低于阈值(0.7)
|
||||
2. 内容不在可用字段列表中
|
||||
3. 内容格式特殊,AI无法识别
|
||||
|
||||
**解决方案:**
|
||||
- 检查生成的模板
|
||||
- 手动调整未替换的内容
|
||||
- 可以降低置信度阈值(修改代码)
|
||||
|
||||
### Q3: 组合字段替换不正确?
|
||||
|
||||
**A:** AI会尝试识别组合字段,但可能不够准确。建议:
|
||||
1. 检查AI识别的替换结果
|
||||
2. 手动调整不正确的组合字段
|
||||
3. 提供更多示例帮助AI学习
|
||||
|
||||
### Q4: 处理速度慢?
|
||||
|
||||
**A:** AI分析需要调用API,会增加处理时间。可以:
|
||||
1. 只对重要文档使用AI分析
|
||||
2. 批量处理时考虑API调用限制
|
||||
3. 使用 `use_ai=False` 参数禁用AI分析
|
||||
|
||||
## 性能优化建议
|
||||
|
||||
1. **批量处理** - 一次性处理多个文件,减少API调用开销
|
||||
2. **缓存结果** - 对于相同内容,可以缓存AI分析结果
|
||||
3. **选择性使用** - 只对复杂文档使用AI分析,简单文档使用规则匹配
|
||||
|
||||
## 示例场景
|
||||
|
||||
### 场景1: 组合字段替换
|
||||
```
|
||||
原始文本: 山西XXXX集团有限公司(职务+姓名)
|
||||
AI识别: 包含单位信息和姓名信息
|
||||
替换结果: {{target_organization_and_position}}({{target_name}})
|
||||
```
|
||||
|
||||
### 场景2: 上下文识别
|
||||
```
|
||||
原始文本: 被核查人:张三
|
||||
AI识别: 在"被核查人"上下文中,"张三"是姓名
|
||||
替换结果: 被核查人:{{target_name}}
|
||||
```
|
||||
|
||||
### 场景3: 格式化日期
|
||||
```
|
||||
原始文本: 2024年12月7日
|
||||
AI识别: 日期格式
|
||||
替换结果: {{date}} 或 {{approval_time}}
|
||||
```
|
||||
|
||||
## 下一步
|
||||
|
||||
1. **检查生成的模板** - 打开处理后的模板文件
|
||||
2. **审核AI替换结果** - 确认占位符是否正确
|
||||
3. **手动调整** - 修正不正确的替换
|
||||
4. **运行初始化脚本** - `python init_all_templates.py`
|
||||
|
||||
## 技术支持
|
||||
|
||||
如有问题,请检查:
|
||||
1. `.env` 文件配置
|
||||
2. API 密钥有效性
|
||||
3. 网络连接状态
|
||||
4. 脚本输出日志
|
||||
BIN
__pycache__/template_ai_helper.cpython-312.pyc
Normal file
BIN
__pycache__/template_ai_helper.cpython-312.pyc
Normal file
Binary file not shown.
2
app.py
2
app.py
@ -260,6 +260,8 @@ def extract():
|
||||
field_map = {field['field_code']: field for field in output_fields}
|
||||
|
||||
# 按照outputData的顺序构建返回数据
|
||||
# 注意:如果AI未提取到值,返回空字符串,不自动应用默认值
|
||||
# 默认值信息在文档中说明,由前端根据业务需求决定是否应用
|
||||
for field_code in output_field_codes:
|
||||
field_value = ai_result.get(field_code, '')
|
||||
out_data.append({
|
||||
|
||||
115
check_conversion_env.py
Normal file
115
check_conversion_env.py
Normal file
@ -0,0 +1,115 @@
|
||||
"""
|
||||
检查 .doc 文件转换环境
|
||||
检查系统是否支持自动转换,并提供转换指导
|
||||
"""
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
def check_pywin32():
|
||||
"""检查是否安装了 pywin32"""
|
||||
try:
|
||||
import win32com.client
|
||||
return True, "已安装"
|
||||
except ImportError:
|
||||
return False, "未安装"
|
||||
|
||||
def check_word():
|
||||
"""检查是否安装了 Microsoft Word"""
|
||||
try:
|
||||
import win32com.client
|
||||
word = win32com.client.Dispatch("Word.Application")
|
||||
word.Quit()
|
||||
return True, "已安装且可用"
|
||||
except Exception as e:
|
||||
return False, f"未安装或不可用: {str(e)}"
|
||||
|
||||
def find_doc_files():
|
||||
"""查找所有 .doc 文件"""
|
||||
project_root = Path(__file__).parent
|
||||
templates_dir = project_root / "模板" / "原始模板"
|
||||
|
||||
doc_files = []
|
||||
if templates_dir.exists():
|
||||
for root, dirs, files in os.walk(templates_dir):
|
||||
for file in files:
|
||||
if file.lower().endswith('.doc'):
|
||||
doc_files.append(Path(root) / file)
|
||||
|
||||
return doc_files
|
||||
|
||||
def main():
|
||||
print("="*80)
|
||||
print("检查 .doc 文件转换环境")
|
||||
print("="*80)
|
||||
print()
|
||||
|
||||
# 检查 pywin32
|
||||
has_pywin32, pywin32_status = check_pywin32()
|
||||
print(f"1. pywin32 状态: {pywin32_status}")
|
||||
if not has_pywin32:
|
||||
print(" → 安装命令: pip install pywin32")
|
||||
print()
|
||||
|
||||
# 检查 Word
|
||||
has_word, word_status = check_word()
|
||||
print(f"2. Microsoft Word 状态: {word_status}")
|
||||
if not has_word:
|
||||
print(" → 需要安装 Microsoft Word(不是 WPS)")
|
||||
print()
|
||||
|
||||
# 查找 .doc 文件
|
||||
doc_files = find_doc_files()
|
||||
print(f"3. 找到 {len(doc_files)} 个 .doc 文件:")
|
||||
if doc_files:
|
||||
for i, doc_file in enumerate(doc_files[:10], 1): # 只显示前10个
|
||||
rel_path = doc_file.relative_to(Path(__file__).parent)
|
||||
print(f" {i}. {rel_path}")
|
||||
if len(doc_files) > 10:
|
||||
print(f" ... 还有 {len(doc_files) - 10} 个文件")
|
||||
else:
|
||||
print(" 未找到 .doc 文件")
|
||||
print()
|
||||
|
||||
# 总结和建议
|
||||
print("="*80)
|
||||
print("转换建议")
|
||||
print("="*80)
|
||||
|
||||
if has_pywin32 and has_word:
|
||||
print("✓ 环境就绪,可以自动转换")
|
||||
print(" 运行: python process_templates.py")
|
||||
elif has_word and not has_pywin32:
|
||||
print("⚠ 已安装 Word,但未安装 pywin32")
|
||||
print(" 解决方案:")
|
||||
print(" 1. 安装 pywin32: pip install pywin32")
|
||||
print(" 2. 或使用批处理脚本: 批量转换doc到docx.bat")
|
||||
print(" 3. 或手动转换文件")
|
||||
elif not has_word:
|
||||
print("✗ 未安装 Microsoft Word")
|
||||
print(" 解决方案:")
|
||||
print(" 1. 安装 Microsoft Word(不是 WPS)")
|
||||
print(" 2. 或使用批处理脚本: 批量转换doc到docx.bat(需要 Word)")
|
||||
print(" 3. 或手动转换文件(推荐)")
|
||||
print(" 4. 或使用在线转换工具")
|
||||
else:
|
||||
print("✗ 环境不支持自动转换")
|
||||
print(" 解决方案:")
|
||||
print(" 1. 手动转换文件(推荐)")
|
||||
print(" 2. 使用在线转换工具")
|
||||
|
||||
print()
|
||||
print("详细说明请查看: doc转换说明.md")
|
||||
print("="*80)
|
||||
|
||||
if __name__ == '__main__':
|
||||
try:
|
||||
main()
|
||||
except KeyboardInterrupt:
|
||||
print("\n\n用户中断")
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(f"\n错误: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
sys.exit(1)
|
||||
311
check_data_sync_status.py
Normal file
311
check_data_sync_status.py
Normal file
@ -0,0 +1,311 @@
|
||||
"""
|
||||
综合检查脚本:验证数据结构、接口、测试页面和Swagger是否已同步更新
|
||||
"""
|
||||
import pymysql
|
||||
import json
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
DB_CONFIG = {
|
||||
'host': '152.136.177.240',
|
||||
'port': 5012,
|
||||
'user': 'finyx',
|
||||
'password': '6QsGK6MpePZDE57Z',
|
||||
'database': 'finyx',
|
||||
'charset': 'utf8mb4'
|
||||
}
|
||||
|
||||
TENANT_ID = 615873064429507639
|
||||
|
||||
# 期望的字段编码列表(谈话前安全风险评估表)
|
||||
PRE_INTERVIEW_FIELDS = [
|
||||
'target_family_situation',
|
||||
'target_social_relations',
|
||||
'target_health_status',
|
||||
'target_personality',
|
||||
'target_tolerance',
|
||||
'target_issue_severity',
|
||||
'target_other_issues_possibility',
|
||||
'target_previous_investigation',
|
||||
'target_negative_events',
|
||||
'target_other_situation',
|
||||
'risk_level'
|
||||
]
|
||||
|
||||
FILE_NAME = '谈话前安全风险评估表'
|
||||
TEMPLATE_CODE = 'PRE_INTERVIEW_RISK_ASSESSMENT'
|
||||
|
||||
def check_database():
|
||||
"""检查数据库中的数据"""
|
||||
print("="*80)
|
||||
print("1. 数据库检查")
|
||||
print("="*80)
|
||||
|
||||
try:
|
||||
conn = pymysql.connect(**DB_CONFIG)
|
||||
cursor = conn.cursor(pymysql.cursors.DictCursor)
|
||||
|
||||
# 检查文件配置
|
||||
print("\n[1.1] 文件配置检查:")
|
||||
cursor.execute("""
|
||||
SELECT id, name, file_path, input_data, state
|
||||
FROM f_polic_file_config
|
||||
WHERE tenant_id = %s AND name = %s
|
||||
""", (TENANT_ID, FILE_NAME))
|
||||
file_config = cursor.fetchone()
|
||||
|
||||
if file_config:
|
||||
print(f" ✓ 文件配置存在: {file_config['name']}")
|
||||
print(f" - ID: {file_config['id']}")
|
||||
print(f" - 文件路径: {file_config['file_path']}")
|
||||
print(f" - 状态: {'启用' if file_config['state'] == 1 else '未启用'}")
|
||||
|
||||
# 检查input_data中的template_code
|
||||
try:
|
||||
input_data = json.loads(file_config['input_data']) if file_config['input_data'] else {}
|
||||
if input_data.get('template_code') == TEMPLATE_CODE:
|
||||
print(f" - 模板编码: ✓ {TEMPLATE_CODE}")
|
||||
else:
|
||||
print(f" - 模板编码: ✗ 期望 {TEMPLATE_CODE}, 实际 {input_data.get('template_code')}")
|
||||
except:
|
||||
print(f" - 模板编码: ✗ input_data格式错误")
|
||||
|
||||
file_config_id = file_config['id']
|
||||
else:
|
||||
print(f" ✗ 文件配置不存在: {FILE_NAME}")
|
||||
file_config_id = None
|
||||
|
||||
# 检查字段
|
||||
print("\n[1.2] 字段检查:")
|
||||
placeholders = ','.join(['%s'] * len(PRE_INTERVIEW_FIELDS))
|
||||
cursor.execute(f"""
|
||||
SELECT id, name, filed_code, field_type, state
|
||||
FROM f_polic_field
|
||||
WHERE tenant_id = %s
|
||||
AND filed_code IN ({placeholders})
|
||||
ORDER BY filed_code
|
||||
""", [TENANT_ID] + PRE_INTERVIEW_FIELDS)
|
||||
fields = cursor.fetchall()
|
||||
|
||||
found_field_codes = [f['filed_code'] for f in fields]
|
||||
print(f" 找到 {len(fields)}/{len(PRE_INTERVIEW_FIELDS)} 个字段")
|
||||
|
||||
missing_fields = set(PRE_INTERVIEW_FIELDS) - set(found_field_codes)
|
||||
if missing_fields:
|
||||
print(f" ✗ 缺失字段 ({len(missing_fields)} 个): {', '.join(sorted(missing_fields))}")
|
||||
else:
|
||||
print(f" ✓ 所有字段都已存在")
|
||||
|
||||
# 检查关联关系
|
||||
if file_config_id:
|
||||
print("\n[1.3] 关联关系检查:")
|
||||
cursor.execute("""
|
||||
SELECT COUNT(*) as count
|
||||
FROM f_polic_file_field ff
|
||||
JOIN f_polic_field f ON ff.filed_id = f.id
|
||||
WHERE ff.tenant_id = %s AND ff.file_id = %s
|
||||
AND f.filed_code IN ({})
|
||||
""".format(placeholders), [TENANT_ID, file_config_id] + PRE_INTERVIEW_FIELDS)
|
||||
relation_count = cursor.fetchone()['count']
|
||||
|
||||
if relation_count == len(PRE_INTERVIEW_FIELDS):
|
||||
print(f" ✓ 所有字段都已正确关联 ({relation_count}/{len(PRE_INTERVIEW_FIELDS)})")
|
||||
else:
|
||||
print(f" ✗ 关联关系不完整 ({relation_count}/{len(PRE_INTERVIEW_FIELDS)})")
|
||||
|
||||
conn.close()
|
||||
return {
|
||||
'file_config_exists': file_config is not None,
|
||||
'fields_count': len(fields),
|
||||
'expected_fields_count': len(PRE_INTERVIEW_FIELDS),
|
||||
'missing_fields': list(missing_fields),
|
||||
'relations_count': relation_count if file_config_id else 0
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
print(f" ✗ 数据库检查失败: {e}")
|
||||
return None
|
||||
|
||||
def check_field_service():
|
||||
"""检查field_service.py是否支持新模板"""
|
||||
print("\n" + "="*80)
|
||||
print("2. 接口服务检查")
|
||||
print("="*80)
|
||||
|
||||
field_service_path = Path('services/field_service.py')
|
||||
document_service_path = Path('services/document_service.py')
|
||||
|
||||
issues = []
|
||||
|
||||
# 检查field_service.py
|
||||
print("\n[2.1] field_service.py检查:")
|
||||
if field_service_path.exists():
|
||||
content = field_service_path.read_text(encoding='utf-8')
|
||||
|
||||
# 检查get_fields_by_business_type方法
|
||||
if 'get_fields_by_business_type' in content:
|
||||
# 检查是否硬编码了模板名称
|
||||
if "fc.name = '初步核实审批表'" in content:
|
||||
print(" ✗ get_fields_by_business_type方法硬编码了模板名称")
|
||||
print(" → 已修复:现在根据business_type动态查询")
|
||||
issues.append('field_service_hardcoded')
|
||||
else:
|
||||
print(" ✓ get_fields_by_business_type方法支持动态查询")
|
||||
|
||||
# 检查是否支持从input_data JSON中解析template_code
|
||||
if 'json.loads' in content and 'input_data' in content:
|
||||
print(" ✓ 支持从input_data JSON中解析business_type")
|
||||
else:
|
||||
print(" ⚠ 未找到从input_data JSON解析的逻辑")
|
||||
else:
|
||||
print(" ✗ field_service.py文件不存在")
|
||||
issues.append('field_service_missing')
|
||||
|
||||
# 检查document_service.py
|
||||
print("\n[2.2] document_service.py检查:")
|
||||
if document_service_path.exists():
|
||||
content = document_service_path.read_text(encoding='utf-8')
|
||||
|
||||
# 检查get_file_config_by_template_code方法
|
||||
if 'get_file_config_by_template_code' in content:
|
||||
if 'json.loads' in content and 'input_data' in content:
|
||||
print(" ✓ get_file_config_by_template_code方法支持从input_data JSON中查找template_code")
|
||||
else:
|
||||
print(" ✗ get_file_config_by_template_code方法可能不支持从JSON中查找")
|
||||
issues.append('document_service_json_parse')
|
||||
else:
|
||||
print(" ✗ 未找到get_file_config_by_template_code方法")
|
||||
issues.append('document_service_method_missing')
|
||||
else:
|
||||
print(" ✗ document_service.py文件不存在")
|
||||
issues.append('document_service_missing')
|
||||
|
||||
return issues
|
||||
|
||||
def check_test_page():
|
||||
"""检查测试页面"""
|
||||
print("\n" + "="*80)
|
||||
print("3. 测试页面检查")
|
||||
print("="*80)
|
||||
|
||||
index_html_path = Path('static/index.html')
|
||||
|
||||
if not index_html_path.exists():
|
||||
print(" ✗ static/index.html文件不存在")
|
||||
return ['test_page_missing']
|
||||
|
||||
content = index_html_path.read_text(encoding='utf-8')
|
||||
issues = []
|
||||
|
||||
print("\n[3.1] 测试页面内容检查:")
|
||||
|
||||
# 检查是否包含新模板的示例
|
||||
if 'PRE_INTERVIEW_RISK_ASSESSMENT' in content:
|
||||
print(" ✓ 包含谈话前安全风险评估表的模板编码示例")
|
||||
else:
|
||||
print(" ⚠ 未包含谈话前安全风险评估表的模板编码示例")
|
||||
print(" → 建议:测试页面是通用的,用户可手动输入模板编码")
|
||||
|
||||
# 检查是否硬编码了旧模板
|
||||
if "初步核实审批表" in content:
|
||||
print(" ⚠ 包含旧模板'初步核实审批表'的示例(这是正常的,作为默认示例)")
|
||||
|
||||
# 检查字段输入是否支持动态添加
|
||||
if 'addOutputField' in content and 'addInputField' in content:
|
||||
print(" ✓ 支持动态添加输入和输出字段")
|
||||
|
||||
return issues
|
||||
|
||||
def check_swagger():
|
||||
"""检查Swagger文档"""
|
||||
print("\n" + "="*80)
|
||||
print("4. Swagger文档检查")
|
||||
print("="*80)
|
||||
|
||||
app_py_path = Path('app.py')
|
||||
|
||||
if not app_py_path.exists():
|
||||
print(" ✗ app.py文件不存在")
|
||||
return ['app_missing']
|
||||
|
||||
content = app_py_path.read_text(encoding='utf-8')
|
||||
issues = []
|
||||
|
||||
print("\n[4.1] Swagger配置检查:")
|
||||
|
||||
# 检查Swagger是否配置
|
||||
if 'Swagger' in content and 'swagger_template' in content:
|
||||
print(" ✓ Swagger已配置")
|
||||
else:
|
||||
print(" ✗ Swagger未配置")
|
||||
issues.append('swagger_not_configured')
|
||||
|
||||
# 检查接口文档是否完整
|
||||
print("\n[4.2] 接口文档检查:")
|
||||
|
||||
endpoints = [
|
||||
('/ai/extract', 'AI字段提取接口'),
|
||||
('/api/fields', '获取字段配置接口'),
|
||||
('/ai/generate-document', '文档生成接口')
|
||||
]
|
||||
|
||||
for endpoint, description in endpoints:
|
||||
if endpoint in content:
|
||||
# 检查是否有Swagger文档注释
|
||||
if '---' in content and description in content:
|
||||
print(f" ✓ {endpoint} ({description}) 有Swagger文档")
|
||||
else:
|
||||
print(f" ⚠ {endpoint} ({description}) 可能缺少Swagger文档注释")
|
||||
else:
|
||||
print(f" ✗ {endpoint} ({description}) 不存在")
|
||||
issues.append(f'endpoint_missing_{endpoint}')
|
||||
|
||||
# Swagger文档是动态生成的,不需要硬编码模板信息
|
||||
print("\n[4.3] 模板信息检查:")
|
||||
print(" ✓ Swagger文档是动态生成的,不需要硬编码模板信息")
|
||||
print(" → 接口支持所有模板,通过templateCode参数指定")
|
||||
|
||||
return issues
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
print("\n" + "="*80)
|
||||
print("数据结构、接口、测试页面和Swagger同步状态检查")
|
||||
print("="*80)
|
||||
print(f"\n检查目标: {FILE_NAME} (模板编码: {TEMPLATE_CODE})")
|
||||
print(f"检查时间: {__import__('datetime').datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
|
||||
|
||||
# 执行各项检查
|
||||
db_result = check_database()
|
||||
service_issues = check_field_service()
|
||||
test_page_issues = check_test_page()
|
||||
swagger_issues = check_swagger()
|
||||
|
||||
# 汇总
|
||||
print("\n" + "="*80)
|
||||
print("检查结果汇总")
|
||||
print("="*80)
|
||||
|
||||
all_issues = service_issues + test_page_issues + swagger_issues
|
||||
|
||||
if db_result:
|
||||
print(f"\n数据库状态:")
|
||||
print(f" - 文件配置: {'✓ 存在' if db_result['file_config_exists'] else '✗ 不存在'}")
|
||||
print(f" - 字段数量: {db_result['fields_count']}/{db_result['expected_fields_count']}")
|
||||
if db_result['missing_fields']:
|
||||
print(f" - 缺失字段: {', '.join(db_result['missing_fields'])}")
|
||||
print(f" - 关联关系: {db_result['relations_count']}/{db_result['expected_fields_count']}")
|
||||
|
||||
if all_issues:
|
||||
print(f"\n发现的问题 ({len(all_issues)} 个):")
|
||||
for issue in all_issues:
|
||||
print(f" - {issue}")
|
||||
else:
|
||||
print("\n✓ 未发现需要修复的问题")
|
||||
|
||||
print("\n" + "="*80)
|
||||
print("检查完成")
|
||||
print("="*80)
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
156
check_template_placeholders.py
Normal file
156
check_template_placeholders.py
Normal file
@ -0,0 +1,156 @@
|
||||
"""
|
||||
检查template_finish文件夹下的模板文件占位符是否可以被正确识别
|
||||
"""
|
||||
import os
|
||||
import re
|
||||
from pathlib import Path
|
||||
from docx import Document
|
||||
from collections import defaultdict
|
||||
|
||||
|
||||
def extract_placeholders_from_docx(file_path):
|
||||
"""
|
||||
从docx文件中提取所有占位符
|
||||
|
||||
Args:
|
||||
file_path: docx文件路径
|
||||
|
||||
Returns:
|
||||
占位符列表,格式: ['field_code1', 'field_code2', ...]
|
||||
"""
|
||||
placeholders = set()
|
||||
pattern = r'\{\{([^}]+)\}\}' # 匹配 {{field_code}} 格式
|
||||
|
||||
try:
|
||||
doc = Document(file_path)
|
||||
|
||||
# 从段落中提取占位符
|
||||
for paragraph in doc.paragraphs:
|
||||
text = paragraph.text
|
||||
matches = re.findall(pattern, text)
|
||||
for match in matches:
|
||||
placeholders.add(match.strip())
|
||||
|
||||
# 从表格中提取占位符
|
||||
for table in doc.tables:
|
||||
for row in table.rows:
|
||||
for cell in row.cells:
|
||||
for paragraph in cell.paragraphs:
|
||||
text = paragraph.text
|
||||
matches = re.findall(pattern, text)
|
||||
for match in matches:
|
||||
placeholders.add(match.strip())
|
||||
|
||||
except Exception as e:
|
||||
print(f" 错误: 读取文件失败 - {str(e)}")
|
||||
return []
|
||||
|
||||
return sorted(list(placeholders))
|
||||
|
||||
|
||||
def check_templates_in_directory(base_dir):
|
||||
"""
|
||||
检查目录下所有模板文件的占位符
|
||||
|
||||
Args:
|
||||
base_dir: 模板文件根目录
|
||||
"""
|
||||
base_path = Path(base_dir)
|
||||
if not base_path.exists():
|
||||
print(f"错误: 目录不存在 - {base_dir}")
|
||||
return
|
||||
|
||||
# 统计信息
|
||||
total_files = 0
|
||||
valid_files = 0
|
||||
invalid_files = 0
|
||||
all_placeholders = defaultdict(set) # 文件路径 -> 占位符集合
|
||||
all_unique_placeholders = set() # 所有唯一的占位符
|
||||
|
||||
print("=" * 80)
|
||||
print("模板文件占位符检查报告")
|
||||
print("=" * 80)
|
||||
print()
|
||||
|
||||
# 遍历所有docx文件
|
||||
for docx_file in base_path.rglob("*.docx"):
|
||||
# 跳过临时文件(以~$开头的文件)
|
||||
if docx_file.name.startswith("~$"):
|
||||
continue
|
||||
|
||||
total_files += 1
|
||||
relative_path = docx_file.relative_to(base_path)
|
||||
|
||||
print(f"[{total_files}] 检查文件: {relative_path}")
|
||||
|
||||
# 提取占位符
|
||||
placeholders = extract_placeholders_from_docx(str(docx_file))
|
||||
|
||||
if placeholders:
|
||||
valid_files += 1
|
||||
all_placeholders[str(relative_path)] = placeholders
|
||||
all_unique_placeholders.update(placeholders)
|
||||
|
||||
print(f" ✓ 找到 {len(placeholders)} 个占位符:")
|
||||
for i, placeholder in enumerate(placeholders, 1):
|
||||
print(f" {i}. {{{{ {placeholder} }}}}")
|
||||
else:
|
||||
invalid_files += 1
|
||||
print(f" ⚠ 未找到占位符")
|
||||
|
||||
print()
|
||||
|
||||
# 打印汇总信息
|
||||
print("=" * 80)
|
||||
print("检查汇总")
|
||||
print("=" * 80)
|
||||
print(f"总文件数: {total_files}")
|
||||
print(f"包含占位符的文件: {valid_files}")
|
||||
print(f"未找到占位符的文件: {invalid_files}")
|
||||
print(f"唯一占位符总数: {len(all_unique_placeholders)}")
|
||||
print()
|
||||
|
||||
# 打印所有唯一占位符
|
||||
if all_unique_placeholders:
|
||||
print("所有唯一占位符列表:")
|
||||
for i, placeholder in enumerate(sorted(all_unique_placeholders), 1):
|
||||
print(f" {i}. {{{{ {placeholder} }}}}")
|
||||
print()
|
||||
|
||||
# 打印每个文件的占位符详情
|
||||
print("=" * 80)
|
||||
print("各文件占位符详情")
|
||||
print("=" * 80)
|
||||
for file_path, placeholders in sorted(all_placeholders.items()):
|
||||
print(f"\n文件: {file_path}")
|
||||
print(f"占位符数量: {len(placeholders)}")
|
||||
for placeholder in placeholders:
|
||||
print(f" - {{{{ {placeholder} }}}}")
|
||||
|
||||
# 返回结果供其他脚本使用
|
||||
return {
|
||||
'total_files': total_files,
|
||||
'valid_files': valid_files,
|
||||
'invalid_files': invalid_files,
|
||||
'all_placeholders': dict(all_placeholders),
|
||||
'unique_placeholders': sorted(all_unique_placeholders)
|
||||
}
|
||||
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
template_dir = os.path.join(os.path.dirname(__file__), 'template_finish')
|
||||
|
||||
print(f"检查目录: {template_dir}")
|
||||
print()
|
||||
|
||||
result = check_templates_in_directory(template_dir)
|
||||
|
||||
if result:
|
||||
print("\n" + "=" * 80)
|
||||
print("检查完成!")
|
||||
print("=" * 80)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
15
config/field_defaults.json
Normal file
15
config/field_defaults.json
Normal file
@ -0,0 +1,15 @@
|
||||
{
|
||||
"field_defaults": {
|
||||
"target_family_situation": "家庭关系和谐稳定",
|
||||
"target_social_relations": "社会交往较多,人机关系基本正常",
|
||||
"target_health_status": "良好",
|
||||
"target_personality": "开朗",
|
||||
"target_tolerance": "较强",
|
||||
"target_issue_severity": "较轻",
|
||||
"target_other_issues_possibility": "较小",
|
||||
"target_previous_investigation": "无",
|
||||
"target_negative_events": "无",
|
||||
"target_other_situation": "无",
|
||||
"risk_level": "低"
|
||||
}
|
||||
}
|
||||
@ -83,6 +83,22 @@
|
||||
"如果是手机号,提取11位数字",
|
||||
"如果是座机,包含区号和号码"
|
||||
]
|
||||
},
|
||||
"target_work_basic_info": {
|
||||
"description": "被核查人员工作基本情况",
|
||||
"rules": [
|
||||
"必须严格按照以下格式规范化输出:",
|
||||
"格式:XXX,男,汉族,19XX年X月出生,山西XX人,XX学历,19XX年X月参加工作,20XX年X月加入中国共产党。19XX年X月至20XX年X月,先后在XXXX工作;20XX年X月至20XX年X月,任XXXXX;20XX年X月至20XX年X月,任XXXX;20XX年X月至今,任XXXXX。",
|
||||
"第一部分(基本信息):姓名,性别,民族,出生年月,籍贯,学历,参加工作时间,入党时间",
|
||||
"第二部分(工作经历):按时间顺序列出工作经历,使用分号分隔",
|
||||
"工作经历格式:19XX年X月至20XX年X月,任XXXXX(或:先后在XXXX工作)",
|
||||
"最后一段工作经历使用\"至今\"表示当前职位",
|
||||
"如果信息不完整,只输出能够提取到的部分,保持格式规范",
|
||||
"日期格式统一为\"19XX年X月\"或\"20XX年X月\",月份为1-12的数字,不补零",
|
||||
"籍贯格式:省份+市/县,如\"山西太原\"、\"山西XX\"",
|
||||
"学历使用标准表述:本科、大专、高中、中专、研究生等",
|
||||
"政治面貌部分:如果是中共党员,写\"加入中国共产党\";如果不是,省略此部分"
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
209
docx处理说明.md
Normal file
209
docx处理说明.md
Normal file
@ -0,0 +1,209 @@
|
||||
# .docx 文件处理说明
|
||||
|
||||
## 脚本说明
|
||||
|
||||
### process_templates_docx_only.py
|
||||
|
||||
这是一个专门处理已转换为 .docx 格式的模板文档的脚本。
|
||||
|
||||
**特点:**
|
||||
- ✅ 只处理 .docx 文件,跳过 .doc 文件
|
||||
- ✅ 不需要转换功能,直接处理已转换的文档
|
||||
- ✅ 自动识别文档类型
|
||||
- ✅ 智能添加占位符
|
||||
- ✅ 保持原有目录结构
|
||||
|
||||
## 使用步骤
|
||||
|
||||
### 1. 确保文件已转换
|
||||
|
||||
确保所有需要处理的文档都已经转换为 .docx 格式,并放在 `模板/原始模板` 目录下。
|
||||
|
||||
脚本会自动查找所有子目录中的 .docx 文件,包括:
|
||||
- `模板/原始模板/2-初核模版/1.初核请示/批量改格式_20251207182627/` 目录下的文件
|
||||
- 其他任何子目录中的 .docx 文件
|
||||
|
||||
### 2. 运行脚本
|
||||
|
||||
```bash
|
||||
python process_templates_docx_only.py
|
||||
```
|
||||
|
||||
### 3. 检查结果
|
||||
|
||||
脚本会:
|
||||
- 扫描所有 .docx 文件
|
||||
- 识别文档类型
|
||||
- 添加占位符
|
||||
- 保存到 `模板` 文件夹(保持目录结构)
|
||||
|
||||
## 输出说明
|
||||
|
||||
### 处理成功
|
||||
```
|
||||
处理: 1.请示报告卡(XXX)_转自DOC.docx
|
||||
类型: REPORT_CARD
|
||||
输入: 模板\原始模板\2-初核模版\1.初核请示\批量改格式_20251207182627\1.请示报告卡(XXX)_转自DOC.docx
|
||||
输出: 模板\2-初核模版\1.初核请示\1.请示报告卡(XXX).docx
|
||||
处理: 1.请示报告卡(XXX)_转自DOC.docx
|
||||
✓ 处理成功,替换了 3 处占位符
|
||||
```
|
||||
|
||||
### 无法识别类型
|
||||
```
|
||||
⚠ 无法识别文档类型: 某个文件.docx
|
||||
路径: 模板\原始模板\...\某个文件.docx
|
||||
```
|
||||
|
||||
### 处理失败
|
||||
```
|
||||
✗ 处理失败: [错误信息]
|
||||
```
|
||||
|
||||
## 文件命名规则
|
||||
|
||||
脚本会自动清理文件名:
|
||||
- 移除 `_转自DOC` 后缀
|
||||
- 移除 `(XXX)`、`(XXX)`、`XXX` 等占位符标记
|
||||
- 保持原有文件名核心部分
|
||||
|
||||
**示例:**
|
||||
- `1.请示报告卡(XXX)_转自DOC.docx` → `1.请示报告卡(XXX).docx`
|
||||
- `2.初步核实审批表(XXX)_转自DOC.docx` → `2.初步核实审批表(XXX).docx`
|
||||
|
||||
## 占位符处理逻辑
|
||||
|
||||
脚本会识别以下模式并添加占位符:
|
||||
|
||||
1. **字段名称: 具体值** → **字段名称: {{field_code}}**
|
||||
- 例如:`被核查人姓名: 张三` → `被核查人姓名: {{target_name}}`
|
||||
|
||||
2. **字段名称: XXX/待填** → **字段名称: {{field_code}}**
|
||||
- 例如:`被核查人姓名: XXX` → `被核查人姓名: {{target_name}}`
|
||||
|
||||
3. **表格中的字段**:同样处理表格单元格中的字段
|
||||
|
||||
## 支持的文档类型
|
||||
|
||||
脚本支持以下文档类型:
|
||||
|
||||
1. 请示报告卡 (REPORT_CARD)
|
||||
2. 初步核实审批表 (PRELIMINARY_VERIFICATION_APPROVAL)
|
||||
3. 初核方案 / 附件初核方案 (INVESTIGATION_PLAN)
|
||||
4. 谈话通知书 (NOTIFICATION_LETTER)
|
||||
5. 谈话笔录 (INTERVIEW_RECORD)
|
||||
6. 谈话询问对象情况摸底调查30问 (INVESTIGATION_30_QUESTIONS)
|
||||
7. 被谈话人权利义务告知书 (RIGHTS_OBLIGATIONS_NOTICE)
|
||||
8. 点对点交接单 (HANDOVER_FORM)
|
||||
9. 陪送交接单 (ESCORT_HANDOVER_FORM)
|
||||
10. 保密承诺书 (CONFIDENTIALITY_COMMITMENT)
|
||||
11. 办案人员-办案安全保密承诺书 (INVESTIGATOR_CONFIDENTIALITY_COMMITMENT)
|
||||
12. 请示报告卡(初核报告结论) (REPORT_CARD_CONCLUSION)
|
||||
13. 初核情况报告 (INVESTIGATION_REPORT)
|
||||
14. 谈话审批表 (INTERVIEW_APPROVAL_FORM)
|
||||
15. 谈话前安全风险评估表 (PRE_INTERVIEW_RISK_ASSESSMENT)
|
||||
16. 谈话方案 (INTERVIEW_PLAN)
|
||||
17. 谈话后安全风险评估表 (POST_INTERVIEW_RISK_ASSESSMENT)
|
||||
|
||||
## 文件结构
|
||||
|
||||
处理后的文件结构:
|
||||
|
||||
```
|
||||
模板/
|
||||
├── 2-初核模版/
|
||||
│ ├── 1.初核请示/
|
||||
│ │ ├── 1.请示报告卡(XXX).docx ← 处理后的文件
|
||||
│ │ ├── 2.初步核实审批表(XXX).docx
|
||||
│ │ └── 3.附件初核方案(XXX).docx
|
||||
│ └── ...
|
||||
└── ...
|
||||
```
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. **只处理 .docx 文件**
|
||||
- 脚本会自动跳过 .doc 文件
|
||||
- 如果还有 .doc 文件需要处理,请先转换为 .docx
|
||||
|
||||
2. **文件位置**
|
||||
- 源文件:`模板/原始模板/` 目录下(包括所有子目录)
|
||||
- 输出文件:`模板/` 目录下(保持相对目录结构)
|
||||
|
||||
3. **占位符检查**
|
||||
- 处理后的模板需要人工检查
|
||||
- 确保占位符格式正确:`{{field_code}}`
|
||||
- 确保占位符位置合理
|
||||
|
||||
4. **文件名清理**
|
||||
- 脚本会自动清理文件名中的转换标记
|
||||
- 如果文件名不符合预期,可以手动调整
|
||||
|
||||
## 常见问题
|
||||
|
||||
### Q1: 某些文件没有被处理
|
||||
|
||||
**A:** 检查:
|
||||
1. 文件是否是 .docx 格式(不是 .doc)
|
||||
2. 文件是否在 `模板/原始模板` 目录下
|
||||
3. 文档类型是否被识别(查看输出日志)
|
||||
|
||||
### Q2: 占位符没有添加
|
||||
|
||||
**A:** 可能原因:
|
||||
1. 文档中字段名称与映射表不匹配
|
||||
2. 字段值已经是占位符格式
|
||||
3. 字段值为空或特殊字符
|
||||
|
||||
**解决方案:**
|
||||
- 检查文档内容
|
||||
- 手动添加占位符
|
||||
- 检查字段名称是否正确
|
||||
|
||||
### Q3: 文件名不正确
|
||||
|
||||
**A:** 脚本会自动清理文件名,如果不符合预期:
|
||||
- 可以手动重命名输出文件
|
||||
- 或者修改脚本中的文件名清理逻辑
|
||||
|
||||
## 与 process_templates.py 的区别
|
||||
|
||||
| 特性 | process_templates.py | process_templates_docx_only.py |
|
||||
|------|---------------------|-------------------------------|
|
||||
| 处理 .doc 文件 | ✅ 是(需要转换) | ❌ 否(跳过) |
|
||||
| 处理 .docx 文件 | ✅ 是 | ✅ 是 |
|
||||
| 需要 Word/COM | ✅ 是(用于转换) | ❌ 否 |
|
||||
| 转换功能 | ✅ 有 | ❌ 无 |
|
||||
| 适用场景 | 原始 .doc 文件 | 已转换的 .docx 文件 |
|
||||
|
||||
## 推荐使用场景
|
||||
|
||||
**使用 process_templates_docx_only.py 当:**
|
||||
- ✅ 所有文件已经手动转换为 .docx 格式
|
||||
- ✅ 不想处理 .doc 文件转换
|
||||
- ✅ 只需要处理已转换的文档
|
||||
|
||||
**使用 process_templates.py 当:**
|
||||
- ✅ 还有 .doc 文件需要自动转换
|
||||
- ✅ 系统已安装 Microsoft Word 和 pywin32
|
||||
- ✅ 需要完整的自动化流程
|
||||
|
||||
## 下一步
|
||||
|
||||
处理完成后:
|
||||
|
||||
1. **检查生成的模板**
|
||||
- 打开处理后的模板文件
|
||||
- 检查占位符是否正确添加
|
||||
- 手动调整不正确的占位符
|
||||
|
||||
2. **运行初始化脚本**
|
||||
```bash
|
||||
python init_all_templates.py
|
||||
```
|
||||
- 上传模板到 MinIO
|
||||
- 更新数据库配置
|
||||
|
||||
3. **测试文档生成**
|
||||
- 使用 API 测试文档生成功能
|
||||
- 确认模板可以正常使用
|
||||
148
doc转换说明.md
Normal file
148
doc转换说明.md
Normal file
@ -0,0 +1,148 @@
|
||||
# .doc 文件转换说明
|
||||
|
||||
## 问题说明
|
||||
|
||||
`process_templates.py` 脚本需要将 `.doc` 文件转换为 `.docx` 格式才能处理。如果转换失败,通常是因为:
|
||||
|
||||
1. **未安装 pywin32** - Python 无法访问 Windows COM 组件
|
||||
2. **未安装 Microsoft Word** - 系统没有 Word 应用程序
|
||||
3. **Word 无法访问** - Word 被其他程序占用或权限问题
|
||||
|
||||
## 解决方案
|
||||
|
||||
### 方案1:使用批处理脚本自动转换(推荐)
|
||||
|
||||
运行提供的批处理脚本:
|
||||
|
||||
```bash
|
||||
批量转换doc到docx.bat
|
||||
```
|
||||
|
||||
这个脚本会:
|
||||
- 自动查找所有 `.doc` 文件
|
||||
- 使用 Microsoft Word 批量转换
|
||||
- 保存到对应的目录结构
|
||||
|
||||
**要求:**
|
||||
- 已安装 Microsoft Word(不是 WPS)
|
||||
- Windows 系统
|
||||
|
||||
### 方案2:手动转换(最可靠)
|
||||
|
||||
1. **使用 Microsoft Word 打开文件**
|
||||
- 双击 `.doc` 文件,用 Word 打开
|
||||
- 或者右键 → 打开方式 → Microsoft Word
|
||||
|
||||
2. **另存为 .docx 格式**
|
||||
- 文件 → 另存为
|
||||
- 文件类型选择:`Word 文档 (*.docx)`
|
||||
- 保存到 `模板` 文件夹的对应位置
|
||||
|
||||
3. **批量转换**
|
||||
- 在 Word 中打开多个文件
|
||||
- 使用宏或脚本批量转换(需要 VBA 知识)
|
||||
|
||||
### 方案3:安装 pywin32(如果已安装 Word)
|
||||
|
||||
```bash
|
||||
pip install pywin32
|
||||
```
|
||||
|
||||
然后重新运行 `process_templates.py`
|
||||
|
||||
**注意:** 即使安装了 pywin32,如果系统没有安装 Microsoft Word,转换仍然会失败。
|
||||
|
||||
### 方案4:使用在线转换工具
|
||||
|
||||
1. 使用在线转换工具(如 Zamzar、CloudConvert 等)
|
||||
2. 上传 `.doc` 文件
|
||||
3. 下载转换后的 `.docx` 文件
|
||||
4. 保存到 `模板` 文件夹的对应位置
|
||||
|
||||
## 转换后的文件结构
|
||||
|
||||
转换后的文件应该保存在 `模板` 文件夹下,保持原有的目录结构:
|
||||
|
||||
```
|
||||
模板/
|
||||
├── 2-初核模版/
|
||||
│ ├── 1.初核请示/
|
||||
│ │ ├── 1.请示报告卡(XXX).docx ← 转换后的文件
|
||||
│ │ ├── 2.初步核实审批表(XXX).docx
|
||||
│ │ └── 3.附件初核方案(XXX).docx
|
||||
│ └── ...
|
||||
└── ...
|
||||
```
|
||||
|
||||
## 验证转换结果
|
||||
|
||||
转换完成后,检查:
|
||||
|
||||
1. **文件是否存在**
|
||||
```bash
|
||||
# 检查文件是否存在
|
||||
dir /s /b 模板\*.docx
|
||||
```
|
||||
|
||||
2. **文件是否可以打开**
|
||||
- 尝试用 Word 打开转换后的文件
|
||||
- 确认内容完整
|
||||
|
||||
3. **重新运行处理脚本**
|
||||
```bash
|
||||
python process_templates.py
|
||||
```
|
||||
|
||||
## 常见问题
|
||||
|
||||
### Q1: 提示 "未找到 Microsoft Word"
|
||||
|
||||
**A:** 确保已安装 Microsoft Word,而不是 WPS Office。脚本需要 Microsoft Word 的 COM 接口。
|
||||
|
||||
### Q2: 转换后文件损坏
|
||||
|
||||
**A:**
|
||||
- 检查原始文件是否完整
|
||||
- 尝试手动用 Word 打开并另存为
|
||||
- 检查文件权限
|
||||
|
||||
### Q3: 转换速度慢
|
||||
|
||||
**A:**
|
||||
- 这是正常现象,Word 转换需要时间
|
||||
- 可以分批转换,先转换重要的文件
|
||||
|
||||
### Q4: 某些文件转换失败
|
||||
|
||||
**A:**
|
||||
- 检查文件是否被其他程序占用
|
||||
- 尝试手动转换这些文件
|
||||
- 检查文件是否损坏
|
||||
|
||||
## 推荐流程
|
||||
|
||||
1. **先尝试批处理脚本**
|
||||
```bash
|
||||
批量转换doc到docx.bat
|
||||
```
|
||||
|
||||
2. **如果批处理脚本失败,手动转换**
|
||||
- 打开 Word
|
||||
- 批量打开 `.doc` 文件
|
||||
- 逐个另存为 `.docx`
|
||||
|
||||
3. **验证转换结果**
|
||||
- 检查文件是否都在正确位置
|
||||
- 尝试打开几个文件确认内容
|
||||
|
||||
4. **运行处理脚本**
|
||||
```bash
|
||||
python process_templates.py
|
||||
```
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. **备份原始文件** - 转换前建议备份原始 `.doc` 文件
|
||||
2. **保持目录结构** - 转换后的文件应该保持原有的目录结构
|
||||
3. **文件命名** - 确保文件名清晰,便于识别
|
||||
4. **检查内容** - 转换后检查文件内容是否完整
|
||||
383
init_all_templates.py
Normal file
383
init_all_templates.py
Normal file
@ -0,0 +1,383 @@
|
||||
"""
|
||||
初始化所有模板到数据库和MinIO
|
||||
处理模板文件夹下的所有模板文件,上传到MinIO并更新数据库
|
||||
"""
|
||||
import os
|
||||
import json
|
||||
import pymysql
|
||||
from minio import Minio
|
||||
from minio.error import S3Error
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional
|
||||
|
||||
# MinIO连接配置
|
||||
MINIO_CONFIG = {
|
||||
'endpoint': 'minio.datacubeworld.com:9000',
|
||||
'access_key': 'JOLXFXny3avFSzB0uRA5',
|
||||
'secret_key': 'G1BR8jStNfovkfH5ou39EmPl34E4l7dGrnd3Cz0I',
|
||||
'secure': True # 使用HTTPS
|
||||
}
|
||||
|
||||
# 数据库连接配置
|
||||
DB_CONFIG = {
|
||||
'host': '152.136.177.240',
|
||||
'port': 5012,
|
||||
'user': 'finyx',
|
||||
'password': '6QsGK6MpePZDE57Z',
|
||||
'database': 'finyx',
|
||||
'charset': 'utf8mb4'
|
||||
}
|
||||
|
||||
# 固定值
|
||||
TENANT_ID = 615873064429507639
|
||||
CREATED_BY = 655162080928945152
|
||||
UPDATED_BY = 655162080928945152
|
||||
CURRENT_TIME = datetime.now()
|
||||
|
||||
# 项目根目录
|
||||
PROJECT_ROOT = Path(__file__).parent
|
||||
TEMPLATES_DIR = PROJECT_ROOT / "template_finish"
|
||||
BUCKET_NAME = 'finyx'
|
||||
|
||||
# 文档类型映射(根据完整文件名识别,保持原文件名不变)
|
||||
# 每个文件名都是独立的模板,使用完整文件名作为key
|
||||
DOCUMENT_TYPE_MAPPING = {
|
||||
"1.请示报告卡(XXX)": {
|
||||
"template_code": "REPORT_CARD",
|
||||
"name": "1.请示报告卡(XXX)",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"2.初步核实审批表(XXX)": {
|
||||
"template_code": "PRELIMINARY_VERIFICATION_APPROVAL",
|
||||
"name": "2.初步核实审批表(XXX)",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"3.附件初核方案(XXX)": {
|
||||
"template_code": "INVESTIGATION_PLAN",
|
||||
"name": "3.附件初核方案(XXX)",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"谈话通知书第一联": {
|
||||
"template_code": "NOTIFICATION_LETTER_1",
|
||||
"name": "谈话通知书第一联",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"谈话通知书第二联": {
|
||||
"template_code": "NOTIFICATION_LETTER_2",
|
||||
"name": "谈话通知书第二联",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"谈话通知书第三联": {
|
||||
"template_code": "NOTIFICATION_LETTER_3",
|
||||
"name": "谈话通知书第三联",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"1.请示报告卡(初核谈话)": {
|
||||
"template_code": "REPORT_CARD_INTERVIEW",
|
||||
"name": "1.请示报告卡(初核谈话)",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"2谈话审批表": {
|
||||
"template_code": "INTERVIEW_APPROVAL_FORM",
|
||||
"name": "2谈话审批表",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"3.谈话前安全风险评估表": {
|
||||
"template_code": "PRE_INTERVIEW_RISK_ASSESSMENT",
|
||||
"name": "3.谈话前安全风险评估表",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"4.谈话方案": {
|
||||
"template_code": "INTERVIEW_PLAN",
|
||||
"name": "4.谈话方案",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"5.谈话后安全风险评估表": {
|
||||
"template_code": "POST_INTERVIEW_RISK_ASSESSMENT",
|
||||
"name": "5.谈话后安全风险评估表",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"1.谈话笔录": {
|
||||
"template_code": "INTERVIEW_RECORD",
|
||||
"name": "1.谈话笔录",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"2.谈话询问对象情况摸底调查30问": {
|
||||
"template_code": "INVESTIGATION_30_QUESTIONS",
|
||||
"name": "2.谈话询问对象情况摸底调查30问",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"3.被谈话人权利义务告知书": {
|
||||
"template_code": "RIGHTS_OBLIGATIONS_NOTICE",
|
||||
"name": "3.被谈话人权利义务告知书",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"4.点对点交接单": {
|
||||
"template_code": "HANDOVER_FORM",
|
||||
"name": "4.点对点交接单",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"4.点对点交接单2": {
|
||||
"template_code": "HANDOVER_FORM_2",
|
||||
"name": "4.点对点交接单2",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"5.陪送交接单(新)": {
|
||||
"template_code": "ESCORT_HANDOVER_FORM",
|
||||
"name": "5.陪送交接单(新)",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"6.1保密承诺书(谈话对象使用-非中共党员用)": {
|
||||
"template_code": "CONFIDENTIALITY_COMMITMENT_NON_PARTY",
|
||||
"name": "6.1保密承诺书(谈话对象使用-非中共党员用)",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"6.2保密承诺书(谈话对象使用-中共党员用)": {
|
||||
"template_code": "CONFIDENTIALITY_COMMITMENT_PARTY",
|
||||
"name": "6.2保密承诺书(谈话对象使用-中共党员用)",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"7.办案人员-办案安全保密承诺书": {
|
||||
"template_code": "INVESTIGATOR_CONFIDENTIALITY_COMMITMENT",
|
||||
"name": "7.办案人员-办案安全保密承诺书",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"8-1请示报告卡(初核报告结论) ": {
|
||||
"template_code": "REPORT_CARD_CONCLUSION",
|
||||
"name": "8-1请示报告卡(初核报告结论) ",
|
||||
"business_type": "INVESTIGATION"
|
||||
},
|
||||
"8.XXX初核情况报告": {
|
||||
"template_code": "INVESTIGATION_REPORT",
|
||||
"name": "8.XXX初核情况报告",
|
||||
"business_type": "INVESTIGATION"
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
def generate_id():
|
||||
"""生成ID(使用时间戳+随机数的方式,模拟雪花算法)"""
|
||||
import time
|
||||
import random
|
||||
timestamp = int(time.time() * 1000)
|
||||
random_part = random.randint(100000, 999999)
|
||||
return timestamp * 1000 + random_part
|
||||
|
||||
|
||||
def identify_document_type(file_name: str) -> Optional[Dict]:
|
||||
"""
|
||||
根据完整文件名识别文档类型(保持原文件名不变)
|
||||
|
||||
Args:
|
||||
file_name: 文件名(不含扩展名)
|
||||
|
||||
Returns:
|
||||
文档类型配置,如果无法识别返回None
|
||||
"""
|
||||
# 获取文件名(不含扩展名),保持原样
|
||||
base_name = Path(file_name).stem
|
||||
|
||||
# 直接使用完整文件名进行精确匹配
|
||||
if base_name in DOCUMENT_TYPE_MAPPING:
|
||||
return DOCUMENT_TYPE_MAPPING[base_name]
|
||||
|
||||
# 如果精确匹配失败,返回None(不进行任何修改或模糊匹配)
|
||||
return None
|
||||
|
||||
|
||||
def upload_to_minio(file_path: Path) -> str:
|
||||
"""
|
||||
上传文件到MinIO
|
||||
|
||||
Args:
|
||||
file_path: 本地文件路径
|
||||
|
||||
Returns:
|
||||
MinIO中的相对路径
|
||||
"""
|
||||
try:
|
||||
# 创建MinIO客户端
|
||||
client = Minio(
|
||||
MINIO_CONFIG['endpoint'],
|
||||
access_key=MINIO_CONFIG['access_key'],
|
||||
secret_key=MINIO_CONFIG['secret_key'],
|
||||
secure=MINIO_CONFIG['secure']
|
||||
)
|
||||
|
||||
# 检查存储桶是否存在
|
||||
found = client.bucket_exists(BUCKET_NAME)
|
||||
if not found:
|
||||
raise Exception(f"存储桶 '{BUCKET_NAME}' 不存在,请先创建")
|
||||
|
||||
# 生成MinIO对象路径
|
||||
now = datetime.now()
|
||||
object_name = f'{TENANT_ID}/TEMPLATE/{now.year}/{now.month:02d}/{file_path.name}'
|
||||
|
||||
# 上传文件
|
||||
client.fput_object(
|
||||
BUCKET_NAME,
|
||||
object_name,
|
||||
str(file_path),
|
||||
content_type='application/vnd.openxmlformats-officedocument.wordprocessingml.document'
|
||||
)
|
||||
|
||||
# 返回相对路径(以/开头)
|
||||
return f"/{object_name}"
|
||||
|
||||
except S3Error as e:
|
||||
raise Exception(f"MinIO错误: {e}")
|
||||
except Exception as e:
|
||||
raise Exception(f"上传文件时发生错误: {e}")
|
||||
|
||||
|
||||
def get_or_create_file_config(conn, doc_config: Dict, file_path: str) -> int:
|
||||
"""
|
||||
获取或创建文件配置记录
|
||||
|
||||
Args:
|
||||
conn: 数据库连接
|
||||
doc_config: 文档配置
|
||||
file_path: MinIO文件路径
|
||||
|
||||
Returns:
|
||||
文件配置ID
|
||||
"""
|
||||
cursor = conn.cursor()
|
||||
|
||||
try:
|
||||
# 检查是否已存在
|
||||
select_sql = """
|
||||
SELECT id FROM f_polic_file_config
|
||||
WHERE tenant_id = %s AND template_code = %s
|
||||
"""
|
||||
cursor.execute(select_sql, (TENANT_ID, doc_config['template_code']))
|
||||
existing = cursor.fetchone()
|
||||
|
||||
if existing:
|
||||
file_config_id = existing[0]
|
||||
# 更新文件路径
|
||||
update_sql = """
|
||||
UPDATE f_polic_file_config
|
||||
SET file_path = %s, updated_time = %s, updated_by = %s
|
||||
WHERE id = %s
|
||||
"""
|
||||
cursor.execute(update_sql, (file_path, CURRENT_TIME, UPDATED_BY, file_config_id))
|
||||
conn.commit()
|
||||
return file_config_id
|
||||
else:
|
||||
# 创建新记录
|
||||
file_config_id = generate_id()
|
||||
input_data = json.dumps({
|
||||
'template_code': doc_config['template_code'],
|
||||
'business_type': doc_config['business_type']
|
||||
}, ensure_ascii=False)
|
||||
|
||||
insert_sql = """
|
||||
INSERT INTO f_polic_file_config
|
||||
(id, tenant_id, parent_id, name, input_data, file_path, template_code,
|
||||
created_time, created_by, updated_time, updated_by, state)
|
||||
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
|
||||
"""
|
||||
cursor.execute(insert_sql, (
|
||||
file_config_id,
|
||||
TENANT_ID,
|
||||
None, # parent_id
|
||||
doc_config['name'],
|
||||
input_data,
|
||||
file_path,
|
||||
doc_config['template_code'],
|
||||
CURRENT_TIME,
|
||||
CREATED_BY,
|
||||
CURRENT_TIME,
|
||||
CREATED_BY,
|
||||
1 # state: 1表示启用
|
||||
))
|
||||
conn.commit()
|
||||
return file_config_id
|
||||
|
||||
finally:
|
||||
cursor.close()
|
||||
|
||||
|
||||
def process_all_templates():
|
||||
"""
|
||||
处理所有模板文件,上传到MinIO并更新数据库
|
||||
"""
|
||||
print("="*80)
|
||||
print("开始初始化所有模板")
|
||||
print("="*80)
|
||||
|
||||
if not TEMPLATES_DIR.exists():
|
||||
print(f"错误: 模板目录不存在: {TEMPLATES_DIR}")
|
||||
return
|
||||
|
||||
# 连接数据库
|
||||
try:
|
||||
conn = pymysql.connect(**DB_CONFIG)
|
||||
print("✓ 数据库连接成功\n")
|
||||
except Exception as e:
|
||||
print(f"✗ 数据库连接失败: {e}")
|
||||
return
|
||||
|
||||
# 统计信息
|
||||
processed_count = 0
|
||||
skipped_count = 0
|
||||
failed_count = 0
|
||||
|
||||
# 遍历所有.docx文件
|
||||
for root, dirs, files in os.walk(TEMPLATES_DIR):
|
||||
for file in files:
|
||||
# 只处理.docx文件
|
||||
if not file.endswith('.docx'):
|
||||
continue
|
||||
|
||||
file_path = Path(root) / file
|
||||
|
||||
# 识别文档类型
|
||||
doc_config = identify_document_type(file)
|
||||
|
||||
if not doc_config:
|
||||
print(f"\n⚠ 无法识别文档类型: {file}")
|
||||
print(f" 路径: {file_path}")
|
||||
skipped_count += 1
|
||||
continue
|
||||
|
||||
print(f"\n处理: {file}")
|
||||
print(f" 类型: {doc_config.get('template_code', 'UNKNOWN')}")
|
||||
print(f" 名称: {doc_config.get('name', 'UNKNOWN')}")
|
||||
|
||||
try:
|
||||
# 上传到MinIO
|
||||
print(f" 上传到MinIO...")
|
||||
minio_path = upload_to_minio(file_path)
|
||||
print(f" ✓ MinIO路径: {minio_path}")
|
||||
|
||||
# 更新数据库
|
||||
print(f" 更新数据库...")
|
||||
file_config_id = get_or_create_file_config(conn, doc_config, minio_path)
|
||||
print(f" ✓ 文件配置ID: {file_config_id}")
|
||||
|
||||
processed_count += 1
|
||||
print(f" ✓ 处理成功")
|
||||
|
||||
except Exception as e:
|
||||
failed_count += 1
|
||||
print(f" ✗ 处理失败: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
# 关闭数据库连接
|
||||
conn.close()
|
||||
|
||||
# 输出统计信息
|
||||
print("\n" + "="*80)
|
||||
print("初始化完成")
|
||||
print("="*80)
|
||||
print(f"成功处理: {processed_count} 个文件")
|
||||
print(f"跳过: {skipped_count} 个文件")
|
||||
print(f"失败: {failed_count} 个文件")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
process_all_templates()
|
||||
307
init_pre_interview_risk_assessment_fields.py
Normal file
307
init_pre_interview_risk_assessment_fields.py
Normal file
@ -0,0 +1,307 @@
|
||||
"""
|
||||
谈话前安全风险评估表字段数据初始化脚本
|
||||
添加风险评估相关字段并关联到"谈话前安全风险评估表"模板
|
||||
"""
|
||||
import pymysql
|
||||
from datetime import datetime
|
||||
import uuid
|
||||
|
||||
# 数据库连接配置
|
||||
DB_CONFIG = {
|
||||
'host': '152.136.177.240',
|
||||
'port': 5012,
|
||||
'user': 'finyx',
|
||||
'password': '6QsGK6MpePZDE57Z',
|
||||
'database': 'finyx',
|
||||
'charset': 'utf8mb4'
|
||||
}
|
||||
|
||||
# 固定值
|
||||
TENANT_ID = 615873064429507639 # 从现有数据中获取
|
||||
CREATED_BY = 655162080928945152 # 从现有数据中获取
|
||||
CURRENT_TIME = datetime.now()
|
||||
|
||||
# 字段配置(带默认值)
|
||||
FIELDS = [
|
||||
{
|
||||
'name': '被核查人员家庭情况',
|
||||
'field_code': 'target_family_situation',
|
||||
'field_type': 2, # 输出字段
|
||||
'description': '被核查人员家庭情况',
|
||||
'default_value': '家庭关系和谐稳定'
|
||||
},
|
||||
{
|
||||
'name': '被核查人员社会关系',
|
||||
'field_code': 'target_social_relations',
|
||||
'field_type': 2, # 输出字段
|
||||
'description': '被核查人员社会关系',
|
||||
'default_value': '社会交往较多,人机关系基本正常'
|
||||
},
|
||||
{
|
||||
'name': '被核查人员健康状况',
|
||||
'field_code': 'target_health_status',
|
||||
'field_type': 2, # 输出字段
|
||||
'description': '被核查人员健康状况',
|
||||
'default_value': '良好'
|
||||
},
|
||||
{
|
||||
'name': '被核查人员性格特征',
|
||||
'field_code': 'target_personality',
|
||||
'field_type': 2, # 输出字段
|
||||
'description': '被核查人员性格特征',
|
||||
'default_value': '开朗'
|
||||
},
|
||||
{
|
||||
'name': '被核查人员承受能力',
|
||||
'field_code': 'target_tolerance',
|
||||
'field_type': 2, # 输出字段
|
||||
'description': '被核查人员承受能力',
|
||||
'default_value': '较强'
|
||||
},
|
||||
{
|
||||
'name': '被核查人员涉及问题严重程度',
|
||||
'field_code': 'target_issue_severity',
|
||||
'field_type': 2, # 输出字段
|
||||
'description': '被核查人员涉及问题严重程度',
|
||||
'default_value': '较轻'
|
||||
},
|
||||
{
|
||||
'name': '被核查人员涉及其他问题的可能性',
|
||||
'field_code': 'target_other_issues_possibility',
|
||||
'field_type': 2, # 输出字段
|
||||
'description': '被核查人员涉及其他问题的可能性',
|
||||
'default_value': '较小'
|
||||
},
|
||||
{
|
||||
'name': '被核查人员此前被审查情况',
|
||||
'field_code': 'target_previous_investigation',
|
||||
'field_type': 2, # 输出字段
|
||||
'description': '被核查人员此前被审查情况',
|
||||
'default_value': '无'
|
||||
},
|
||||
{
|
||||
'name': '被核查人员社会负面事件',
|
||||
'field_code': 'target_negative_events',
|
||||
'field_type': 2, # 输出字段
|
||||
'description': '被核查人员社会负面事件',
|
||||
'default_value': '无'
|
||||
},
|
||||
{
|
||||
'name': '被核查人员其他情况',
|
||||
'field_code': 'target_other_situation',
|
||||
'field_type': 2, # 输出字段
|
||||
'description': '被核查人员其他情况',
|
||||
'default_value': '无'
|
||||
},
|
||||
{
|
||||
'name': '风险等级',
|
||||
'field_code': 'risk_level',
|
||||
'field_type': 2, # 输出字段
|
||||
'description': '风险等级',
|
||||
'default_value': '低'
|
||||
}
|
||||
]
|
||||
|
||||
# 文件配置
|
||||
FILE_CONFIG = {
|
||||
'name': '谈话前安全风险评估表',
|
||||
'template_code': 'PRE_INTERVIEW_RISK_ASSESSMENT',
|
||||
'file_path': '/templates/谈话前安全风险评估表模板.docx', # MinIO相对路径
|
||||
'business_type': 'INVESTIGATION', # 调查核实
|
||||
'parent_id': None # 顶级分类,可以根据实际情况设置
|
||||
}
|
||||
|
||||
|
||||
def generate_id():
|
||||
"""生成ID(使用时间戳+随机数的方式,模拟雪花算法)"""
|
||||
import time
|
||||
import random
|
||||
timestamp = int(time.time() * 1000)
|
||||
random_part = random.randint(100000, 999999)
|
||||
return timestamp * 1000 + random_part
|
||||
|
||||
|
||||
def init_fields(conn):
|
||||
"""初始化字段数据"""
|
||||
cursor = conn.cursor()
|
||||
field_ids = {}
|
||||
|
||||
print("="*60)
|
||||
print("开始初始化字段数据...")
|
||||
print("="*60)
|
||||
|
||||
for field in FIELDS:
|
||||
# 检查字段是否已存在
|
||||
check_sql = """
|
||||
SELECT id FROM f_polic_field
|
||||
WHERE tenant_id = %s AND filed_code = %s
|
||||
"""
|
||||
cursor.execute(check_sql, (TENANT_ID, field['field_code']))
|
||||
existing = cursor.fetchone()
|
||||
|
||||
if existing:
|
||||
field_id = existing[0]
|
||||
print(f"字段 '{field['name']}' (code: {field['field_code']}) 已存在,ID: {field_id}")
|
||||
else:
|
||||
field_id = generate_id()
|
||||
insert_sql = """
|
||||
INSERT INTO f_polic_field
|
||||
(id, tenant_id, name, filed_code, field_type, created_time, created_by, updated_time, updated_by, state)
|
||||
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
|
||||
"""
|
||||
cursor.execute(insert_sql, (
|
||||
field_id,
|
||||
TENANT_ID,
|
||||
field['name'],
|
||||
field['field_code'],
|
||||
field['field_type'],
|
||||
CURRENT_TIME,
|
||||
CREATED_BY,
|
||||
CURRENT_TIME,
|
||||
CREATED_BY,
|
||||
0 # state: 0表示未启用,1表示启用
|
||||
))
|
||||
print(f"✓ 创建字段: {field['name']} (code: {field['field_code']}), 默认值: {field['default_value']}, ID: {field_id}")
|
||||
|
||||
field_ids[field['field_code']] = field_id
|
||||
|
||||
conn.commit()
|
||||
return field_ids
|
||||
|
||||
|
||||
def init_file_config(conn):
|
||||
"""初始化文件配置"""
|
||||
cursor = conn.cursor()
|
||||
|
||||
print("\n" + "="*60)
|
||||
print("开始初始化文件配置...")
|
||||
print("="*60)
|
||||
|
||||
# 检查文件配置是否已存在
|
||||
check_sql = """
|
||||
SELECT id FROM f_polic_file_config
|
||||
WHERE tenant_id = %s AND name = %s
|
||||
"""
|
||||
cursor.execute(check_sql, (TENANT_ID, FILE_CONFIG['name']))
|
||||
existing = cursor.fetchone()
|
||||
|
||||
if existing:
|
||||
file_config_id = existing[0]
|
||||
print(f"文件配置 '{FILE_CONFIG['name']}' 已存在,ID: {file_config_id}")
|
||||
else:
|
||||
file_config_id = generate_id()
|
||||
insert_sql = """
|
||||
INSERT INTO f_polic_file_config
|
||||
(id, tenant_id, parent_id, name, input_data, file_path, created_time, created_by, updated_time, updated_by, state)
|
||||
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
|
||||
"""
|
||||
# input_data字段存储模板编码和业务类型(JSON格式)
|
||||
import json
|
||||
input_data = json.dumps({
|
||||
'template_code': FILE_CONFIG['template_code'],
|
||||
'business_type': FILE_CONFIG['business_type']
|
||||
}, ensure_ascii=False)
|
||||
|
||||
cursor.execute(insert_sql, (
|
||||
file_config_id,
|
||||
TENANT_ID,
|
||||
FILE_CONFIG['parent_id'],
|
||||
FILE_CONFIG['name'],
|
||||
input_data,
|
||||
FILE_CONFIG['file_path'],
|
||||
CURRENT_TIME,
|
||||
CREATED_BY,
|
||||
CURRENT_TIME,
|
||||
CREATED_BY,
|
||||
1 # state: 1表示启用
|
||||
))
|
||||
print(f"✓ 创建文件配置: {FILE_CONFIG['name']}, ID: {file_config_id}")
|
||||
|
||||
conn.commit()
|
||||
return file_config_id
|
||||
|
||||
|
||||
def init_file_field_relations(conn, file_config_id, field_ids):
|
||||
"""初始化文件和字段的关联关系"""
|
||||
cursor = conn.cursor()
|
||||
|
||||
print("\n" + "="*60)
|
||||
print("开始建立文件和字段的关联关系...")
|
||||
print("="*60)
|
||||
|
||||
# 只关联输出字段(field_type=2)
|
||||
output_fields = {k: v for k, v in field_ids.items()
|
||||
if any(f['field_code'] == k and f['field_type'] == 2 for f in FIELDS)}
|
||||
|
||||
for field_code, field_id in output_fields.items():
|
||||
# 检查关联关系是否已存在
|
||||
check_sql = """
|
||||
SELECT id FROM f_polic_file_field
|
||||
WHERE tenant_id = %s AND filed_id = %s AND file_id = %s
|
||||
"""
|
||||
cursor.execute(check_sql, (TENANT_ID, field_id, file_config_id))
|
||||
existing = cursor.fetchone()
|
||||
|
||||
if existing:
|
||||
print(f"关联关系已存在: 文件ID {file_config_id} <-> 字段ID {field_id} ({field_code})")
|
||||
else:
|
||||
insert_sql = """
|
||||
INSERT INTO f_polic_file_field
|
||||
(tenant_id, filed_id, file_id, created_time, created_by, updated_time, updated_by, state)
|
||||
VALUES (%s, %s, %s, %s, %s, %s, %s, %s)
|
||||
"""
|
||||
cursor.execute(insert_sql, (
|
||||
TENANT_ID,
|
||||
field_id,
|
||||
file_config_id,
|
||||
CURRENT_TIME,
|
||||
CREATED_BY,
|
||||
CURRENT_TIME,
|
||||
CREATED_BY,
|
||||
0 # state: 0表示未启用,1表示启用
|
||||
))
|
||||
field_name = next(f['name'] for f in FIELDS if f['field_code'] == field_code)
|
||||
print(f"✓ 建立关联: {field_name} ({field_code})")
|
||||
|
||||
conn.commit()
|
||||
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
try:
|
||||
# 连接数据库
|
||||
conn = pymysql.connect(**DB_CONFIG)
|
||||
print("数据库连接成功!\n")
|
||||
|
||||
# 初始化字段
|
||||
field_ids = init_fields(conn)
|
||||
|
||||
# 初始化文件配置
|
||||
file_config_id = init_file_config(conn)
|
||||
|
||||
# 建立关联关系
|
||||
init_file_field_relations(conn, file_config_id, field_ids)
|
||||
|
||||
print("\n" + "="*60)
|
||||
print("初始化完成!")
|
||||
print("="*60)
|
||||
print(f"\n文件配置ID: {file_config_id}")
|
||||
print(f"创建的字段数量: {len(field_ids)}")
|
||||
print(f"建立的关联关系数量: {len([f for f in FIELDS if f['field_type'] == 2])}")
|
||||
|
||||
# 输出字段默认值信息
|
||||
print("\n字段默认值配置:")
|
||||
for field in FIELDS:
|
||||
if field['field_type'] == 2:
|
||||
print(f" - {field['name']} ({field['field_code']}): {field['default_value']}")
|
||||
|
||||
conn.close()
|
||||
|
||||
except Exception as e:
|
||||
print(f"\n错误: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
@ -36,6 +36,18 @@ FIELDS = [
|
||||
'field_type': 2, # 输出字段
|
||||
'description': '被核查人员单位及职务(包括兼职)'
|
||||
},
|
||||
{
|
||||
'name': '被核查人员单位',
|
||||
'field_code': 'target_organization',
|
||||
'field_type': 2, # 输出字段
|
||||
'description': '被核查人员单位'
|
||||
},
|
||||
{
|
||||
'name': '被核查人员职务',
|
||||
'field_code': 'target_position',
|
||||
'field_type': 2, # 输出字段
|
||||
'description': '被核查人员职务'
|
||||
},
|
||||
{
|
||||
'name': '被核查人员性别',
|
||||
'field_code': 'target_gender',
|
||||
@ -48,6 +60,18 @@ FIELDS = [
|
||||
'field_type': 2, # 输出字段
|
||||
'description': '被核查人员出生年月(YYYYMM格式,不需要日)'
|
||||
},
|
||||
{
|
||||
'name': '被核查人员年龄',
|
||||
'field_code': 'target_age',
|
||||
'field_type': 2, # 输出字段
|
||||
'description': '被核查人员年龄(数字,单位:岁)'
|
||||
},
|
||||
{
|
||||
'name': '被核查人员文化程度',
|
||||
'field_code': 'target_education_level',
|
||||
'field_type': 2, # 输出字段
|
||||
'description': '被核查人员文化程度(如:本科、大专、高中等)'
|
||||
},
|
||||
{
|
||||
'name': '被核查人员政治面貌',
|
||||
'field_code': 'target_political_status',
|
||||
|
||||
738
process_templates.py
Normal file
738
process_templates.py
Normal file
@ -0,0 +1,738 @@
|
||||
"""
|
||||
处理原始模板文档,自动添加占位符
|
||||
根据占位符与字段对照表,智能识别文档类型并添加相应的占位符
|
||||
使用AI大模型智能分析文档内容,识别可替换位置
|
||||
"""
|
||||
import os
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional
|
||||
import json
|
||||
|
||||
try:
|
||||
from docx import Document
|
||||
from docx.shared import Pt
|
||||
except ImportError:
|
||||
print("错误: 请先安装 python-docx: pip install python-docx")
|
||||
exit(1)
|
||||
|
||||
# 尝试导入AI辅助工具
|
||||
try:
|
||||
from template_ai_helper import TemplateAIHelper, get_available_fields_for_document
|
||||
HAS_AI_HELPER = True
|
||||
except ImportError:
|
||||
HAS_AI_HELPER = False
|
||||
print("警告: 无法导入AI辅助工具,将使用基础模式(不使用AI分析)")
|
||||
|
||||
# 尝试导入win32com用于.doc文件转换(Windows系统)
|
||||
HAS_WIN32COM = False
|
||||
HAS_PYTHONCOM = False
|
||||
try:
|
||||
import win32com.client
|
||||
HAS_WIN32COM = True
|
||||
try:
|
||||
import pythoncom
|
||||
HAS_PYTHONCOM = True
|
||||
except ImportError:
|
||||
pass
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
if not HAS_WIN32COM:
|
||||
print("="*60)
|
||||
print("警告: 未安装 pywin32,无法自动转换 .doc 文件")
|
||||
print("="*60)
|
||||
print("解决方案:")
|
||||
print(" 1. 安装 pywin32: pip install pywin32")
|
||||
print(" 2. 或者手动将所有 .doc 文件转换为 .docx 格式")
|
||||
print(" 3. 转换后重新运行此脚本")
|
||||
print("="*60)
|
||||
|
||||
# 项目根目录
|
||||
PROJECT_ROOT = Path(__file__).parent
|
||||
ORIGINAL_TEMPLATES_DIR = PROJECT_ROOT / "模板" / "原始模板"
|
||||
OUTPUT_TEMPLATES_DIR = PROJECT_ROOT / "模板"
|
||||
FIELD_MAPPING_FILE = PROJECT_ROOT / "占位符与字段对照表.md"
|
||||
|
||||
# 文档类型映射(根据文件名识别)
|
||||
DOCUMENT_TYPE_MAPPING = {
|
||||
"请示报告卡": {
|
||||
"template_code": "REPORT_CARD",
|
||||
"fields": ["target_name", "target_organization_and_position", "report_card_request_time"],
|
||||
"input_fields": ["clue_info"]
|
||||
},
|
||||
"初步核实审批表": {
|
||||
"template_code": "PRELIMINARY_VERIFICATION_APPROVAL",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth", "target_political_status", "target_professional_rank",
|
||||
"clue_source", "target_issue_description", "department_opinion", "filler_name"
|
||||
],
|
||||
"input_fields": ["clue_info", "target_basic_info_clue"]
|
||||
},
|
||||
"初核方案": {
|
||||
"template_code": "INVESTIGATION_PLAN",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_work_basic_info",
|
||||
"target_issue_description", "investigation_unit_name", "investigation_team_leader_name",
|
||||
"investigation_team_member_names", "investigation_location"
|
||||
],
|
||||
"input_fields": ["clue_info", "target_basic_info_clue"]
|
||||
},
|
||||
"谈话通知书": {
|
||||
"template_code": "NOTIFICATION_LETTER",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_id_number",
|
||||
"appointment_time", "appointment_location", "approval_time",
|
||||
"handling_department", "handler_name", "notification_time", "notification_location"
|
||||
],
|
||||
"input_fields": ["target_basic_info_clue"]
|
||||
},
|
||||
"谈话笔录": {
|
||||
"template_code": "INTERVIEW_RECORD",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": []
|
||||
},
|
||||
"谈话询问对象情况摸底调查30问": {
|
||||
"template_code": "INVESTIGATION_30_QUESTIONS",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": []
|
||||
},
|
||||
"被谈话人权利义务告知书": {
|
||||
"template_code": "RIGHTS_OBLIGATIONS_NOTICE",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": []
|
||||
},
|
||||
"点对点交接单": {
|
||||
"template_code": "HANDOVER_FORM",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": []
|
||||
},
|
||||
"陪送交接单": {
|
||||
"template_code": "ESCORT_HANDOVER_FORM",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": []
|
||||
},
|
||||
"保密承诺书": {
|
||||
"template_code": "CONFIDENTIALITY_COMMITMENT",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": []
|
||||
},
|
||||
"办案人员-办案安全保密承诺书": {
|
||||
"template_code": "INVESTIGATOR_CONFIDENTIALITY_COMMITMENT",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": []
|
||||
},
|
||||
"请示报告卡(初核报告结论)": {
|
||||
"template_code": "REPORT_CARD_CONCLUSION",
|
||||
"fields": [
|
||||
"investigation_team_code", "target_name", "target_problem_description", "target_attitude"
|
||||
],
|
||||
"input_fields": []
|
||||
},
|
||||
"初核情况报告": {
|
||||
"template_code": "INVESTIGATION_REPORT",
|
||||
"fields": [
|
||||
"target_name", "commission_name", "target_work_basic_info",
|
||||
"target_issue_description", "target_problem_description", "target_organization_and_position"
|
||||
],
|
||||
"input_fields": ["clue_info", "target_basic_info_clue"]
|
||||
},
|
||||
"谈话审批表": {
|
||||
"template_code": "INTERVIEW_APPROVAL_FORM",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": ["clue_info", "target_basic_info_clue"]
|
||||
},
|
||||
"谈话前安全风险评估表": {
|
||||
"template_code": "PRE_INTERVIEW_RISK_ASSESSMENT",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": ["clue_info", "target_basic_info_clue"]
|
||||
},
|
||||
"谈话方案": {
|
||||
"template_code": "INTERVIEW_PLAN",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": ["clue_info", "target_basic_info_clue"]
|
||||
},
|
||||
"谈话后安全风险评估表": {
|
||||
"template_code": "POST_INTERVIEW_RISK_ASSESSMENT",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": ["clue_info", "target_basic_info_clue"]
|
||||
}
|
||||
}
|
||||
|
||||
# 字段名称到字段编码的映射(用于智能识别)
|
||||
FIELD_NAME_TO_CODE = {
|
||||
"被核查人姓名": "target_name",
|
||||
"被核查人员单位及职务": "target_organization_and_position",
|
||||
"被核查人员性别": "target_gender",
|
||||
"被核查人员出生年月": "target_date_of_birth",
|
||||
"被核查人员出生年月日": "target_date_of_birth_full",
|
||||
"被核查人员政治面貌": "target_political_status",
|
||||
"被核查人员职级": "target_professional_rank",
|
||||
"被核查人员身份证号": "target_id_number",
|
||||
"被核查人员身份证件及号码": "target_id_number",
|
||||
"被核查人员住址": "target_address",
|
||||
"被核查人员户籍住址": "target_registered_address",
|
||||
"被核查人员联系方式": "target_contact",
|
||||
"被核查人员籍贯": "target_place_of_origin",
|
||||
"被核查人员民族": "target_ethnicity",
|
||||
"线索来源": "clue_source",
|
||||
"主要问题线索": "target_issue_description",
|
||||
"被核查人问题描述": "target_problem_description",
|
||||
"被核查人员工作基本情况": "target_work_basic_info",
|
||||
"核查单位名称": "investigation_unit_name",
|
||||
"核查组组长姓名": "investigation_team_leader_name",
|
||||
"核查组成员姓名": "investigation_team_member_names",
|
||||
"核查地点": "investigation_location",
|
||||
"核查组代号": "investigation_team_code",
|
||||
"应到时间": "appointment_time",
|
||||
"应到地点": "appointment_location",
|
||||
"批准时间": "approval_time",
|
||||
"承办部门": "handling_department",
|
||||
"承办人": "handler_name",
|
||||
"谈话通知时间": "notification_time",
|
||||
"谈话通知地点": "notification_location",
|
||||
"请示报告卡请示时间": "report_card_request_time",
|
||||
"初步核实审批表承办部门意见": "department_opinion",
|
||||
"初步核实审批表填表人": "filler_name",
|
||||
"被核查人员本人认识和态度": "target_attitude",
|
||||
"纪委名称": "commission_name"
|
||||
}
|
||||
|
||||
|
||||
def convert_doc_to_docx(doc_path: Path) -> Optional[Path]:
|
||||
"""
|
||||
将.doc文件转换为.docx格式(Windows系统使用win32com)
|
||||
|
||||
Args:
|
||||
doc_path: .doc文件路径
|
||||
|
||||
Returns:
|
||||
转换后的.docx文件路径,如果失败返回None
|
||||
"""
|
||||
if not HAS_WIN32COM:
|
||||
print(f" 警告: 未安装 pywin32,无法转换 {doc_path.name}")
|
||||
print(f" 解决方案: pip install pywin32")
|
||||
print(f" 或者: 请手动将 {doc_path.name} 转换为 .docx 格式")
|
||||
return None
|
||||
|
||||
word = None
|
||||
doc = None
|
||||
|
||||
try:
|
||||
# 初始化COM(如果可用)
|
||||
if HAS_PYTHONCOM:
|
||||
pythoncom.CoInitialize()
|
||||
|
||||
word = win32com.client.Dispatch("Word.Application")
|
||||
word.Visible = False
|
||||
word.DisplayAlerts = 0 # 不显示警告
|
||||
|
||||
docx_path = doc_path.with_suffix('.docx')
|
||||
|
||||
# 检查源文件是否存在
|
||||
if not doc_path.exists():
|
||||
print(f" ✗ 错误: 源文件不存在: {doc_path}")
|
||||
if word:
|
||||
word.Quit()
|
||||
return None
|
||||
|
||||
# 打开.doc文件(使用绝对路径)
|
||||
abs_doc_path = str(doc_path.absolute())
|
||||
abs_docx_path = str(docx_path.absolute())
|
||||
|
||||
print(f" 正在转换...")
|
||||
print(f" 源: {doc_path.name}")
|
||||
print(f" 目标: {docx_path.name}")
|
||||
|
||||
# 打开文档
|
||||
doc = word.Documents.Open(
|
||||
abs_doc_path,
|
||||
ReadOnly=True,
|
||||
ConfirmConversions=False,
|
||||
AddToRecentFiles=False
|
||||
)
|
||||
|
||||
# 另存为.docx格式 (16 = wdFormatXMLDocument)
|
||||
doc.SaveAs2(
|
||||
abs_docx_path,
|
||||
FileFormat=16 # wdFormatXMLDocument
|
||||
)
|
||||
|
||||
# 关闭文档
|
||||
doc.Close(False) # False表示不保存更改
|
||||
doc = None
|
||||
|
||||
# 退出Word
|
||||
word.Quit()
|
||||
word = None
|
||||
|
||||
# 检查转换后的文件是否存在
|
||||
if docx_path.exists() and docx_path.stat().st_size > 0:
|
||||
file_size = docx_path.stat().st_size
|
||||
print(f" ✓ 转换成功 ({file_size} 字节)")
|
||||
return docx_path
|
||||
else:
|
||||
print(f" ✗ 转换失败: 目标文件不存在或为空")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
error_msg = str(e)
|
||||
error_type = type(e).__name__
|
||||
print(f" ✗ 转换失败: {error_type}: {error_msg}")
|
||||
|
||||
# 清理资源
|
||||
try:
|
||||
if doc:
|
||||
doc.Close(False)
|
||||
except:
|
||||
pass
|
||||
try:
|
||||
if word:
|
||||
word.Quit()
|
||||
except:
|
||||
pass
|
||||
|
||||
# 提供更详细的错误信息和解决方案
|
||||
print(f" 诊断信息:")
|
||||
if "Word.Application" in error_msg or "COM" in error_msg or "CreateObject" in error_msg:
|
||||
print(f" - 可能原因: Microsoft Word 未安装或无法访问")
|
||||
print(f" - 解决方案:")
|
||||
print(f" 1. 确保已安装 Microsoft Word(不是 WPS)")
|
||||
print(f" 2. 手动将 .doc 文件转换为 .docx 格式")
|
||||
print(f" 3. 使用 Word 打开文件,另存为 .docx 格式")
|
||||
elif "pywin32" in error_msg.lower() or "win32com" in error_msg.lower():
|
||||
print(f" - 解决方案: pip install pywin32")
|
||||
elif "权限" in error_msg or "Permission" in error_msg:
|
||||
print(f" - 可能原因: 文件被其他程序占用或权限不足")
|
||||
print(f" - 解决方案: 关闭文件,检查文件权限")
|
||||
else:
|
||||
print(f" - 请检查错误信息并手动转换文件")
|
||||
|
||||
return None
|
||||
finally:
|
||||
# 清理COM
|
||||
if HAS_PYTHONCOM:
|
||||
try:
|
||||
pythoncom.CoUninitialize()
|
||||
except:
|
||||
pass
|
||||
|
||||
|
||||
def identify_document_type(file_name: str) -> Optional[Dict]:
|
||||
"""
|
||||
根据文件名识别文档类型
|
||||
|
||||
Args:
|
||||
file_name: 文件名
|
||||
|
||||
Returns:
|
||||
文档类型配置,如果无法识别返回None
|
||||
"""
|
||||
# 移除扩展名和常见后缀
|
||||
base_name = Path(file_name).stem
|
||||
base_name = base_name.replace("(XXX)", "").replace("(XXX)", "").replace("XXX", "")
|
||||
base_name = base_name.strip()
|
||||
|
||||
# 尝试匹配文档类型
|
||||
for doc_type, config in DOCUMENT_TYPE_MAPPING.items():
|
||||
if doc_type in base_name:
|
||||
return config
|
||||
|
||||
# 如果无法精确匹配,尝试部分匹配
|
||||
for doc_type, config in DOCUMENT_TYPE_MAPPING.items():
|
||||
if any(keyword in base_name for keyword in doc_type.split()):
|
||||
return config
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def find_placeholder_positions(text: str, field_name: str, field_code: str) -> List[tuple]:
|
||||
"""
|
||||
在文本中查找可能需要替换为占位符的位置
|
||||
|
||||
Args:
|
||||
text: 文本内容
|
||||
field_name: 字段名称
|
||||
field_code: 字段编码
|
||||
|
||||
Returns:
|
||||
找到的位置列表 (start, end, replacement_text)
|
||||
"""
|
||||
positions = []
|
||||
|
||||
# 查找字段名称后的内容
|
||||
pattern = rf"{re.escape(field_name)}[::]\s*([^\n\r]+)"
|
||||
matches = re.finditer(pattern, text)
|
||||
|
||||
for match in matches:
|
||||
value = match.group(1).strip()
|
||||
# 如果值不是占位符格式,且不是空值,则可能需要替换
|
||||
if value and not value.startswith("{{"):
|
||||
# 跳过常见的示例值
|
||||
if value not in ["XXX", "xxx", "-", "——", "——", "待填", "待填写"]:
|
||||
positions.append((
|
||||
match.start(1),
|
||||
match.end(1),
|
||||
f"{{{{{field_code}}}}}"
|
||||
))
|
||||
|
||||
return positions
|
||||
|
||||
|
||||
def replace_text_in_runs(runs, old_text: str, new_text: str) -> bool:
|
||||
"""
|
||||
在runs中替换文本
|
||||
|
||||
Args:
|
||||
runs: 文本runs列表
|
||||
old_text: 要替换的旧文本
|
||||
new_text: 新文本
|
||||
|
||||
Returns:
|
||||
是否进行了替换
|
||||
"""
|
||||
full_text = ''.join(run.text for run in runs)
|
||||
if old_text not in full_text:
|
||||
return False
|
||||
|
||||
# 找到包含旧文本的runs
|
||||
current_pos = 0
|
||||
for run in runs:
|
||||
run_start = current_pos
|
||||
run_end = current_pos + len(run.text)
|
||||
|
||||
if run_start <= full_text.find(old_text) < run_end:
|
||||
# 在这个run中替换
|
||||
run.text = run.text.replace(old_text, new_text)
|
||||
return True
|
||||
|
||||
current_pos = run_end
|
||||
|
||||
return False
|
||||
|
||||
|
||||
def apply_ai_replacements(text: str, ai_replacements: List[Dict]) -> str:
|
||||
"""
|
||||
应用AI识别的替换建议
|
||||
|
||||
Args:
|
||||
text: 原始文本
|
||||
ai_replacements: AI识别的替换建议列表
|
||||
|
||||
Returns:
|
||||
替换后的文本
|
||||
"""
|
||||
result_text = text
|
||||
|
||||
# 按置信度排序,优先处理高置信度的替换
|
||||
sorted_replacements = sorted(ai_replacements, key=lambda x: x.get('confidence', 0), reverse=True)
|
||||
|
||||
for replacement in sorted_replacements:
|
||||
original = replacement.get('original_text', '')
|
||||
replacement_text = replacement.get('replacement', '')
|
||||
confidence = replacement.get('confidence', 0)
|
||||
|
||||
# 只应用置信度大于0.7的替换
|
||||
if confidence > 0.7 and original and replacement_text:
|
||||
# 转义特殊字符
|
||||
escaped_original = re.escape(original)
|
||||
# 替换(只替换第一次出现,避免重复替换)
|
||||
if escaped_original in result_text:
|
||||
result_text = result_text.replace(original, replacement_text, 1)
|
||||
|
||||
return result_text
|
||||
|
||||
|
||||
def process_document(input_path: Path, output_path: Path, doc_config: Dict, use_ai: bool = True) -> bool:
|
||||
"""
|
||||
处理单个文档,添加占位符
|
||||
|
||||
Args:
|
||||
input_path: 输入文件路径
|
||||
output_path: 输出文件路径
|
||||
doc_config: 文档配置
|
||||
use_ai: 是否使用AI分析(默认True)
|
||||
|
||||
Returns:
|
||||
是否处理成功
|
||||
"""
|
||||
try:
|
||||
# 如果是.doc文件,先转换为.docx
|
||||
if input_path.suffix.lower() == '.doc':
|
||||
print(f" 转换 .doc 到 .docx: {input_path.name}")
|
||||
docx_path = convert_doc_to_docx(input_path)
|
||||
if not docx_path or not docx_path.exists():
|
||||
print(f" ⚠ 跳过: 无法转换 {input_path.name}")
|
||||
return False
|
||||
input_path = docx_path
|
||||
|
||||
# 初始化AI助手(如果可用)
|
||||
ai_helper = None
|
||||
available_fields = []
|
||||
if use_ai and HAS_AI_HELPER:
|
||||
try:
|
||||
ai_helper = TemplateAIHelper()
|
||||
available_fields = get_available_fields_for_document(doc_config, FIELD_NAME_TO_CODE)
|
||||
print(f" ✓ AI分析已启用")
|
||||
except Exception as e:
|
||||
print(f" ⚠ AI分析不可用: {e},将使用基础模式")
|
||||
ai_helper = None
|
||||
|
||||
# 打开文档
|
||||
doc = Document(str(input_path))
|
||||
|
||||
# 统计替换次数
|
||||
replacement_count = 0
|
||||
ai_replacement_count = 0
|
||||
|
||||
# 处理段落中的占位符
|
||||
for para_idx, paragraph in enumerate(doc.paragraphs):
|
||||
if not paragraph.text:
|
||||
continue
|
||||
|
||||
text = paragraph.text
|
||||
original_text = text
|
||||
|
||||
# 首先使用AI分析(如果可用)
|
||||
if ai_helper and available_fields:
|
||||
try:
|
||||
doc_type = doc_config.get('template_code', '未知')
|
||||
ai_replacements = ai_helper.analyze_paragraph(
|
||||
text,
|
||||
available_fields,
|
||||
doc_type
|
||||
)
|
||||
|
||||
if ai_replacements:
|
||||
# 应用AI识别的替换
|
||||
text = apply_ai_replacements(text, ai_replacements)
|
||||
if text != original_text:
|
||||
ai_replacement_count += len(ai_replacements)
|
||||
except Exception as e:
|
||||
print(f" ⚠ 段落 {para_idx+1} AI分析失败: {e}")
|
||||
|
||||
# 然后使用规则匹配(作为补充)
|
||||
for field_code in doc_config.get('fields', []):
|
||||
# 查找字段名称
|
||||
for field_name, code in FIELD_NAME_TO_CODE.items():
|
||||
if code == field_code:
|
||||
# 模式1: 字段名称: XXX 或 字段名称: 具体值
|
||||
pattern1 = rf"({re.escape(field_name)}[::]\s*)([^\n\r{{]+?)(\s|$|\n|\r|,|。)"
|
||||
def replace_func1(match):
|
||||
value = match.group(2).strip()
|
||||
# 如果值不是占位符格式,且不是空值,则替换
|
||||
if value and not value.startswith("{{") and value not in ["——", "—", "-", ""]:
|
||||
return f"{match.group(1)}{{{{{field_code}}}}}{match.group(3)}"
|
||||
return match.group(0)
|
||||
text = re.sub(pattern1, replace_func1, text)
|
||||
|
||||
# 模式2: 直接替换常见的占位符(XXX)
|
||||
pattern2 = rf"({re.escape(field_name)}[::]\s*)(XXX|xxx|待填|待填写)"
|
||||
text = re.sub(pattern2, rf"\1{{{{{field_code}}}}}", text)
|
||||
break
|
||||
|
||||
if text != original_text:
|
||||
# 替换整个段落文本
|
||||
paragraph.clear()
|
||||
paragraph.add_run(text)
|
||||
replacement_count += 1
|
||||
|
||||
# 处理表格中的占位符
|
||||
for table_idx, table in enumerate(doc.tables):
|
||||
for row_idx, row in enumerate(table.rows):
|
||||
for col_idx, cell in enumerate(row.cells):
|
||||
for paragraph in cell.paragraphs:
|
||||
if not paragraph.text:
|
||||
continue
|
||||
|
||||
text = paragraph.text
|
||||
original_text = text
|
||||
|
||||
# 首先使用AI分析(如果可用)
|
||||
if ai_helper and available_fields:
|
||||
try:
|
||||
doc_type = doc_config.get('template_code', '未知')
|
||||
ai_replacements = ai_helper.analyze_table_cell(
|
||||
text,
|
||||
available_fields,
|
||||
doc_type,
|
||||
row_idx,
|
||||
col_idx
|
||||
)
|
||||
|
||||
if ai_replacements:
|
||||
# 应用AI识别的替换
|
||||
text = apply_ai_replacements(text, ai_replacements)
|
||||
if text != original_text:
|
||||
ai_replacement_count += len(ai_replacements)
|
||||
except Exception as e:
|
||||
pass # 静默失败,继续使用规则匹配
|
||||
|
||||
# 然后使用规则匹配(作为补充)
|
||||
for field_code in doc_config.get('fields', []):
|
||||
for field_name, code in FIELD_NAME_TO_CODE.items():
|
||||
if code == field_code:
|
||||
# 模式1: 字段名称: XXX 或 字段名称: 具体值
|
||||
pattern1 = rf"({re.escape(field_name)}[::]\s*)([^\n\r{{]+?)(\s|$|\n|\r|,|。)"
|
||||
def replace_func1(match):
|
||||
value = match.group(2).strip()
|
||||
if value and not value.startswith("{{") and value not in ["——", "—", "-", ""]:
|
||||
return f"{match.group(1)}{{{{{field_code}}}}}{match.group(3)}"
|
||||
return match.group(0)
|
||||
text = re.sub(pattern1, replace_func1, text)
|
||||
|
||||
# 模式2: 直接替换常见的占位符(XXX)
|
||||
pattern2 = rf"({re.escape(field_name)}[::]\s*)(XXX|xxx|待填|待填写)"
|
||||
text = re.sub(pattern2, rf"\1{{{{{field_code}}}}}", text)
|
||||
break
|
||||
|
||||
if text != original_text:
|
||||
paragraph.clear()
|
||||
paragraph.add_run(text)
|
||||
replacement_count += 1
|
||||
|
||||
# 确保输出目录存在
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# 保存文档
|
||||
doc.save(str(output_path))
|
||||
|
||||
# 输出统计信息
|
||||
if replacement_count > 0 or ai_replacement_count > 0:
|
||||
msg = f" ✓ 处理成功"
|
||||
if ai_replacement_count > 0:
|
||||
msg += f",AI识别 {ai_replacement_count} 处"
|
||||
if replacement_count > 0:
|
||||
msg += f",规则匹配 {replacement_count} 处"
|
||||
print(msg)
|
||||
else:
|
||||
print(f" ⚠ 处理完成,但未找到需要替换的内容(可能已包含占位符)")
|
||||
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f" ✗ 处理失败: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return False
|
||||
|
||||
|
||||
def process_all_templates():
|
||||
"""
|
||||
处理所有原始模板文件
|
||||
"""
|
||||
print("="*80)
|
||||
print("开始处理原始模板文档")
|
||||
print("="*80)
|
||||
|
||||
if not ORIGINAL_TEMPLATES_DIR.exists():
|
||||
print(f"错误: 原始模板目录不存在: {ORIGINAL_TEMPLATES_DIR}")
|
||||
return
|
||||
|
||||
# 统计信息
|
||||
processed_count = 0
|
||||
skipped_count = 0
|
||||
failed_count = 0
|
||||
|
||||
# 遍历所有文件
|
||||
for root, dirs, files in os.walk(ORIGINAL_TEMPLATES_DIR):
|
||||
for file in files:
|
||||
# 只处理.doc和.docx文件
|
||||
if not file.endswith(('.doc', '.docx')):
|
||||
continue
|
||||
|
||||
input_path = Path(root) / file
|
||||
|
||||
# 识别文档类型
|
||||
doc_config = identify_document_type(file)
|
||||
|
||||
if not doc_config:
|
||||
print(f"\n⚠ 无法识别文档类型: {file}")
|
||||
print(f" 路径: {input_path}")
|
||||
skipped_count += 1
|
||||
continue
|
||||
|
||||
# 生成输出路径(保持相对目录结构)
|
||||
relative_path = input_path.relative_to(ORIGINAL_TEMPLATES_DIR)
|
||||
output_path = OUTPUT_TEMPLATES_DIR / relative_path.parent / f"{Path(file).stem}.docx"
|
||||
|
||||
print(f"\n处理: {file}")
|
||||
print(f" 类型: {doc_config.get('template_code', 'UNKNOWN')}")
|
||||
print(f" 输出: {output_path}")
|
||||
|
||||
# 处理文档(使用AI分析)
|
||||
if process_document(input_path, output_path, doc_config, use_ai=True):
|
||||
processed_count += 1
|
||||
else:
|
||||
failed_count += 1
|
||||
|
||||
# 输出统计信息
|
||||
print("\n" + "="*80)
|
||||
print("处理完成")
|
||||
print("="*80)
|
||||
print(f"成功处理: {processed_count} 个文件")
|
||||
print(f"跳过: {skipped_count} 个文件")
|
||||
print(f"失败: {failed_count} 个文件")
|
||||
print(f"\n处理后的模板保存在: {OUTPUT_TEMPLATES_DIR}")
|
||||
print("\n请检查生成的模板文件,确认占位符是否正确添加。")
|
||||
print("如有需要,请手动调整占位符位置。")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
process_all_templates()
|
||||
589
process_templates_docx_only.py
Normal file
589
process_templates_docx_only.py
Normal file
@ -0,0 +1,589 @@
|
||||
"""
|
||||
处理已转换的 .docx 模板文档,自动添加占位符
|
||||
此脚本专门处理已经手动转换为 .docx 格式的文档,跳过 .doc 转换步骤
|
||||
根据占位符与字段对照表,智能识别文档类型并添加相应的占位符
|
||||
使用AI大模型智能分析文档内容,识别可替换位置
|
||||
"""
|
||||
import os
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional
|
||||
|
||||
try:
|
||||
from docx import Document
|
||||
except ImportError:
|
||||
print("错误: 请先安装 python-docx: pip install python-docx")
|
||||
exit(1)
|
||||
|
||||
# 尝试导入AI辅助工具
|
||||
try:
|
||||
from template_ai_helper import TemplateAIHelper, get_available_fields_for_document
|
||||
HAS_AI_HELPER = True
|
||||
except ImportError:
|
||||
HAS_AI_HELPER = False
|
||||
print("警告: 无法导入AI辅助工具,将使用基础模式(不使用AI分析)")
|
||||
|
||||
# 项目根目录
|
||||
PROJECT_ROOT = Path(__file__).parent
|
||||
ORIGINAL_TEMPLATES_DIR = PROJECT_ROOT / "模板" / "原始模板"
|
||||
OUTPUT_TEMPLATES_DIR = PROJECT_ROOT / "模板"
|
||||
FIELD_MAPPING_FILE = PROJECT_ROOT / "占位符与字段对照表.md"
|
||||
|
||||
# 文档类型映射(根据文件名识别)
|
||||
DOCUMENT_TYPE_MAPPING = {
|
||||
"请示报告卡": {
|
||||
"template_code": "REPORT_CARD",
|
||||
"fields": ["target_name", "target_organization_and_position", "report_card_request_time"],
|
||||
"input_fields": ["clue_info"]
|
||||
},
|
||||
"初步核实审批表": {
|
||||
"template_code": "PRELIMINARY_VERIFICATION_APPROVAL",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth", "target_political_status", "target_professional_rank",
|
||||
"clue_source", "target_issue_description", "department_opinion", "filler_name"
|
||||
],
|
||||
"input_fields": ["clue_info", "target_basic_info_clue"]
|
||||
},
|
||||
"初核方案": {
|
||||
"template_code": "INVESTIGATION_PLAN",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_work_basic_info",
|
||||
"target_issue_description", "investigation_unit_name", "investigation_team_leader_name",
|
||||
"investigation_team_member_names", "investigation_location"
|
||||
],
|
||||
"input_fields": ["clue_info", "target_basic_info_clue"]
|
||||
},
|
||||
"附件初核方案": {
|
||||
"template_code": "INVESTIGATION_PLAN",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_work_basic_info",
|
||||
"target_issue_description", "investigation_unit_name", "investigation_team_leader_name",
|
||||
"investigation_team_member_names", "investigation_location"
|
||||
],
|
||||
"input_fields": ["clue_info", "target_basic_info_clue"]
|
||||
},
|
||||
"谈话通知书": {
|
||||
"template_code": "NOTIFICATION_LETTER",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_id_number",
|
||||
"appointment_time", "appointment_location", "approval_time",
|
||||
"handling_department", "handler_name", "notification_time", "notification_location"
|
||||
],
|
||||
"input_fields": ["target_basic_info_clue"]
|
||||
},
|
||||
"谈话笔录": {
|
||||
"template_code": "INTERVIEW_RECORD",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": []
|
||||
},
|
||||
"谈话询问对象情况摸底调查30问": {
|
||||
"template_code": "INVESTIGATION_30_QUESTIONS",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": []
|
||||
},
|
||||
"被谈话人权利义务告知书": {
|
||||
"template_code": "RIGHTS_OBLIGATIONS_NOTICE",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": []
|
||||
},
|
||||
"点对点交接单": {
|
||||
"template_code": "HANDOVER_FORM",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": []
|
||||
},
|
||||
"陪送交接单": {
|
||||
"template_code": "ESCORT_HANDOVER_FORM",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": []
|
||||
},
|
||||
"保密承诺书": {
|
||||
"template_code": "CONFIDENTIALITY_COMMITMENT",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": []
|
||||
},
|
||||
"办案人员-办案安全保密承诺书": {
|
||||
"template_code": "INVESTIGATOR_CONFIDENTIALITY_COMMITMENT",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": []
|
||||
},
|
||||
"请示报告卡(初核报告结论)": {
|
||||
"template_code": "REPORT_CARD_CONCLUSION",
|
||||
"fields": [
|
||||
"investigation_team_code", "target_name", "target_problem_description", "target_attitude"
|
||||
],
|
||||
"input_fields": []
|
||||
},
|
||||
"初核情况报告": {
|
||||
"template_code": "INVESTIGATION_REPORT",
|
||||
"fields": [
|
||||
"target_name", "commission_name", "target_work_basic_info",
|
||||
"target_issue_description", "target_problem_description", "target_organization_and_position"
|
||||
],
|
||||
"input_fields": ["clue_info", "target_basic_info_clue"]
|
||||
},
|
||||
"谈话审批表": {
|
||||
"template_code": "INTERVIEW_APPROVAL_FORM",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": ["clue_info", "target_basic_info_clue"]
|
||||
},
|
||||
"谈话前安全风险评估表": {
|
||||
"template_code": "PRE_INTERVIEW_RISK_ASSESSMENT",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": ["clue_info", "target_basic_info_clue"]
|
||||
},
|
||||
"谈话方案": {
|
||||
"template_code": "INTERVIEW_PLAN",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": ["clue_info", "target_basic_info_clue"]
|
||||
},
|
||||
"谈话后安全风险评估表": {
|
||||
"template_code": "POST_INTERVIEW_RISK_ASSESSMENT",
|
||||
"fields": [
|
||||
"target_name", "target_organization_and_position", "target_gender",
|
||||
"target_date_of_birth_full", "target_political_status", "target_address",
|
||||
"target_registered_address", "target_contact", "target_place_of_origin",
|
||||
"target_ethnicity", "target_id_number", "investigation_team_code"
|
||||
],
|
||||
"input_fields": ["clue_info", "target_basic_info_clue"]
|
||||
}
|
||||
}
|
||||
|
||||
# 字段名称到字段编码的映射(用于智能识别)
|
||||
FIELD_NAME_TO_CODE = {
|
||||
"被核查人姓名": "target_name",
|
||||
"被核查人员单位及职务": "target_organization_and_position",
|
||||
"被核查人员性别": "target_gender",
|
||||
"被核查人员出生年月": "target_date_of_birth",
|
||||
"被核查人员出生年月日": "target_date_of_birth_full",
|
||||
"被核查人员政治面貌": "target_political_status",
|
||||
"被核查人员职级": "target_professional_rank",
|
||||
"被核查人员身份证号": "target_id_number",
|
||||
"被核查人员身份证件及号码": "target_id_number",
|
||||
"被核查人员住址": "target_address",
|
||||
"被核查人员户籍住址": "target_registered_address",
|
||||
"被核查人员联系方式": "target_contact",
|
||||
"被核查人员籍贯": "target_place_of_origin",
|
||||
"被核查人员民族": "target_ethnicity",
|
||||
"线索来源": "clue_source",
|
||||
"主要问题线索": "target_issue_description",
|
||||
"被核查人问题描述": "target_problem_description",
|
||||
"被核查人员工作基本情况": "target_work_basic_info",
|
||||
"核查单位名称": "investigation_unit_name",
|
||||
"核查组组长姓名": "investigation_team_leader_name",
|
||||
"核查组成员姓名": "investigation_team_member_names",
|
||||
"核查地点": "investigation_location",
|
||||
"核查组代号": "investigation_team_code",
|
||||
"应到时间": "appointment_time",
|
||||
"应到地点": "appointment_location",
|
||||
"批准时间": "approval_time",
|
||||
"承办部门": "handling_department",
|
||||
"承办人": "handler_name",
|
||||
"谈话通知时间": "notification_time",
|
||||
"谈话通知地点": "notification_location",
|
||||
"请示报告卡请示时间": "report_card_request_time",
|
||||
"初步核实审批表承办部门意见": "department_opinion",
|
||||
"初步核实审批表填表人": "filler_name",
|
||||
"被核查人员本人认识和态度": "target_attitude",
|
||||
"纪委名称": "commission_name"
|
||||
}
|
||||
|
||||
|
||||
def identify_document_type(file_name: str) -> Optional[Dict]:
|
||||
"""
|
||||
根据文件名识别文档类型
|
||||
|
||||
Args:
|
||||
file_name: 文件名
|
||||
|
||||
Returns:
|
||||
文档类型配置,如果无法识别返回None
|
||||
"""
|
||||
# 移除扩展名和常见后缀
|
||||
base_name = Path(file_name).stem
|
||||
base_name = base_name.replace("(XXX)", "").replace("(XXX)", "").replace("XXX", "")
|
||||
base_name = base_name.replace("_转自DOC", "").replace("转自DOC", "")
|
||||
base_name = base_name.replace("模板", "").strip()
|
||||
|
||||
# 尝试精确匹配
|
||||
for doc_type, config in DOCUMENT_TYPE_MAPPING.items():
|
||||
if doc_type in base_name:
|
||||
return config
|
||||
|
||||
# 如果无法精确匹配,尝试部分匹配
|
||||
for doc_type, config in DOCUMENT_TYPE_MAPPING.items():
|
||||
keywords = doc_type.replace("(", " ").replace(")", " ").replace("(", " ").replace(")", " ").split()
|
||||
if any(keyword in base_name for keyword in keywords if len(keyword) > 1):
|
||||
return config
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def apply_ai_replacements(text: str, ai_replacements: List[Dict]) -> str:
|
||||
"""
|
||||
应用AI识别的替换建议
|
||||
|
||||
Args:
|
||||
text: 原始文本
|
||||
ai_replacements: AI识别的替换建议列表
|
||||
|
||||
Returns:
|
||||
替换后的文本
|
||||
"""
|
||||
result_text = text
|
||||
|
||||
# 按置信度排序,优先处理高置信度的替换
|
||||
sorted_replacements = sorted(ai_replacements, key=lambda x: x.get('confidence', 0), reverse=True)
|
||||
|
||||
for replacement in sorted_replacements:
|
||||
original = replacement.get('original_text', '')
|
||||
replacement_text = replacement.get('replacement', '')
|
||||
confidence = replacement.get('confidence', 0)
|
||||
|
||||
# 只应用置信度大于0.7的替换
|
||||
if confidence > 0.7 and original and replacement_text:
|
||||
# 转义特殊字符
|
||||
escaped_original = re.escape(original)
|
||||
# 替换(只替换第一次出现,避免重复替换)
|
||||
if escaped_original in result_text:
|
||||
result_text = result_text.replace(original, replacement_text, 1)
|
||||
|
||||
return result_text
|
||||
|
||||
|
||||
def process_document(input_path: Path, output_path: Path, doc_config: Dict, use_ai: bool = True) -> bool:
|
||||
"""
|
||||
处理单个文档,添加占位符
|
||||
|
||||
Args:
|
||||
input_path: 输入文件路径(.docx格式)
|
||||
output_path: 输出文件路径
|
||||
doc_config: 文档配置
|
||||
use_ai: 是否使用AI分析(默认True)
|
||||
|
||||
Returns:
|
||||
是否处理成功
|
||||
"""
|
||||
try:
|
||||
# 只处理 .docx 文件
|
||||
if input_path.suffix.lower() != '.docx':
|
||||
print(f" ⚠ 跳过: 不是 .docx 文件 ({input_path.suffix})")
|
||||
return False
|
||||
|
||||
# 检查文件是否存在
|
||||
if not input_path.exists():
|
||||
print(f" ✗ 错误: 文件不存在: {input_path}")
|
||||
return False
|
||||
|
||||
print(f" 处理: {input_path.name}")
|
||||
|
||||
# 初始化AI助手(如果可用)
|
||||
ai_helper = None
|
||||
available_fields = []
|
||||
if use_ai and HAS_AI_HELPER:
|
||||
try:
|
||||
print(f" [初始化] 正在初始化AI助手...")
|
||||
ai_helper = TemplateAIHelper()
|
||||
|
||||
# 测试API连接
|
||||
if not ai_helper.test_api_connection():
|
||||
print(f" [初始化] ⚠ API连接测试失败,将使用基础模式")
|
||||
ai_helper = None
|
||||
else:
|
||||
available_fields = get_available_fields_for_document(doc_config, FIELD_NAME_TO_CODE)
|
||||
print(f" [初始化] ✓ AI分析已启用(可用字段: {len(available_fields)} 个)")
|
||||
except Exception as e:
|
||||
print(f" [初始化] ⚠ AI分析不可用: {e},将使用基础模式")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
ai_helper = None
|
||||
|
||||
# 打开文档
|
||||
print(f" [读取] 正在打开文档...")
|
||||
doc = Document(str(input_path))
|
||||
|
||||
# 统计信息
|
||||
total_paragraphs = len([p for p in doc.paragraphs if p.text.strip()])
|
||||
total_tables = len(doc.tables)
|
||||
total_cells = sum(len(table.rows) * len(table.rows[0].cells) if table.rows else 0 for table in doc.tables)
|
||||
|
||||
print(f" [统计] 文档包含: {total_paragraphs} 个段落, {total_tables} 个表格, 约 {total_cells} 个单元格")
|
||||
|
||||
# 统计替换次数
|
||||
replacement_count = 0
|
||||
ai_replacement_count = 0
|
||||
|
||||
# 处理段落中的占位符
|
||||
print(f" [处理] 开始处理段落...")
|
||||
for para_idx, paragraph in enumerate(doc.paragraphs):
|
||||
if not paragraph.text:
|
||||
continue
|
||||
|
||||
text = paragraph.text
|
||||
original_text = text
|
||||
|
||||
# 首先使用AI分析(如果可用)
|
||||
if ai_helper and available_fields:
|
||||
try:
|
||||
doc_type = doc_config.get('template_code', '未知')
|
||||
if para_idx % 10 == 0: # 每10个段落输出一次进度
|
||||
print(f" [进度] 处理段落 {para_idx+1}/{total_paragraphs}...")
|
||||
|
||||
ai_replacements = ai_helper.analyze_paragraph(
|
||||
text,
|
||||
available_fields,
|
||||
doc_type
|
||||
)
|
||||
|
||||
if ai_replacements:
|
||||
# 应用AI识别的替换
|
||||
text = apply_ai_replacements(text, ai_replacements)
|
||||
if text != original_text:
|
||||
ai_replacement_count += len(ai_replacements)
|
||||
print(f" [AI] 段落 {para_idx+1} 应用了 {len(ai_replacements)} 个替换")
|
||||
except Exception as e:
|
||||
print(f" [AI] ⚠ 段落 {para_idx+1} AI分析失败: {e}")
|
||||
|
||||
# 然后使用规则匹配(作为补充)
|
||||
for field_code in doc_config.get('fields', []):
|
||||
# 查找字段名称
|
||||
for field_name, code in FIELD_NAME_TO_CODE.items():
|
||||
if code == field_code:
|
||||
# 模式1: 字段名称: XXX 或 字段名称: 具体值
|
||||
pattern1 = rf"({re.escape(field_name)}[::]\s*)([^\n\r{{]+?)(\s|$|\n|\r|,|。)"
|
||||
def replace_func1(match):
|
||||
value = match.group(2).strip()
|
||||
# 如果值不是占位符格式,且不是空值,则替换
|
||||
if value and not value.startswith("{{") and value not in ["——", "—", "-", ""]:
|
||||
return f"{match.group(1)}{{{{{field_code}}}}}{match.group(3)}"
|
||||
return match.group(0)
|
||||
text = re.sub(pattern1, replace_func1, text)
|
||||
|
||||
# 模式2: 直接替换常见的占位符(XXX)
|
||||
pattern2 = rf"({re.escape(field_name)}[::]\s*)(XXX|xxx|待填|待填写)"
|
||||
text = re.sub(pattern2, rf"\1{{{{{field_code}}}}}", text)
|
||||
break
|
||||
|
||||
if text != original_text:
|
||||
# 替换整个段落文本
|
||||
paragraph.clear()
|
||||
paragraph.add_run(text)
|
||||
replacement_count += 1
|
||||
|
||||
# 处理表格中的占位符
|
||||
print(f" [处理] 开始处理表格...")
|
||||
for table_idx, table in enumerate(doc.tables):
|
||||
if table_idx % 5 == 0: # 每5个表格输出一次进度
|
||||
print(f" [进度] 处理表格 {table_idx+1}/{total_tables}...")
|
||||
for row_idx, row in enumerate(table.rows):
|
||||
for col_idx, cell in enumerate(row.cells):
|
||||
for paragraph in cell.paragraphs:
|
||||
if not paragraph.text:
|
||||
continue
|
||||
|
||||
text = paragraph.text
|
||||
original_text = text
|
||||
|
||||
# 首先使用AI分析(如果可用)
|
||||
if ai_helper and available_fields:
|
||||
try:
|
||||
doc_type = doc_config.get('template_code', '未知')
|
||||
ai_replacements = ai_helper.analyze_table_cell(
|
||||
text,
|
||||
available_fields,
|
||||
doc_type,
|
||||
row_idx,
|
||||
col_idx
|
||||
)
|
||||
|
||||
if ai_replacements:
|
||||
# 应用AI识别的替换
|
||||
text = apply_ai_replacements(text, ai_replacements)
|
||||
if text != original_text:
|
||||
ai_replacement_count += len(ai_replacements)
|
||||
except Exception as e:
|
||||
pass # 静默失败,继续使用规则匹配
|
||||
|
||||
# 然后使用规则匹配(作为补充)
|
||||
for field_code in doc_config.get('fields', []):
|
||||
for field_name, code in FIELD_NAME_TO_CODE.items():
|
||||
if code == field_code:
|
||||
# 模式1: 字段名称: XXX 或 字段名称: 具体值
|
||||
pattern1 = rf"({re.escape(field_name)}[::]\s*)([^\n\r{{]+?)(\s|$|\n|\r|,|。)"
|
||||
def replace_func1(match):
|
||||
value = match.group(2).strip()
|
||||
if value and not value.startswith("{{") and value not in ["——", "—", "-", ""]:
|
||||
return f"{match.group(1)}{{{{{field_code}}}}}{match.group(3)}"
|
||||
return match.group(0)
|
||||
text = re.sub(pattern1, replace_func1, text)
|
||||
|
||||
# 模式2: 直接替换常见的占位符(XXX)
|
||||
pattern2 = rf"({re.escape(field_name)}[::]\s*)(XXX|xxx|待填|待填写)"
|
||||
text = re.sub(pattern2, rf"\1{{{{{field_code}}}}}", text)
|
||||
break
|
||||
|
||||
if text != original_text:
|
||||
paragraph.clear()
|
||||
paragraph.add_run(text)
|
||||
replacement_count += 1
|
||||
|
||||
# 确保输出目录存在
|
||||
print(f" [保存] 正在保存文档...")
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# 保存文档
|
||||
doc.save(str(output_path))
|
||||
print(f" [保存] ✓ 文档已保存到: {output_path}")
|
||||
|
||||
if replacement_count > 0 or ai_replacement_count > 0:
|
||||
msg = f" ✓ 处理成功"
|
||||
if ai_replacement_count > 0:
|
||||
msg += f",AI识别 {ai_replacement_count} 处"
|
||||
if replacement_count > 0:
|
||||
msg += f",规则匹配 {replacement_count} 处"
|
||||
print(msg)
|
||||
else:
|
||||
print(f" ⚠ 处理完成,但未找到需要替换的内容(可能已包含占位符)")
|
||||
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f" ✗ 处理失败: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return False
|
||||
|
||||
|
||||
def process_all_templates():
|
||||
"""
|
||||
处理所有已转换的 .docx 模板文件
|
||||
"""
|
||||
print("="*80)
|
||||
print("处理已转换的 .docx 模板文档(跳过 .doc 转换)")
|
||||
print("="*80)
|
||||
print()
|
||||
|
||||
if not ORIGINAL_TEMPLATES_DIR.exists():
|
||||
print(f"错误: 原始模板目录不存在: {ORIGINAL_TEMPLATES_DIR}")
|
||||
return
|
||||
|
||||
# 统计信息
|
||||
processed_count = 0
|
||||
skipped_count = 0
|
||||
failed_count = 0
|
||||
|
||||
# 统计总文件数
|
||||
all_files = []
|
||||
for root, dirs, files in os.walk(ORIGINAL_TEMPLATES_DIR):
|
||||
for file in files:
|
||||
if file.endswith('.docx'):
|
||||
all_files.append(Path(root) / file)
|
||||
|
||||
total_files = len(all_files)
|
||||
print(f"找到 {total_files} 个 .docx 文件需要处理\n")
|
||||
|
||||
# 遍历所有文件,只处理 .docx 文件
|
||||
file_index = 0
|
||||
for root, dirs, files in os.walk(ORIGINAL_TEMPLATES_DIR):
|
||||
for file in files:
|
||||
# 只处理 .docx 文件,跳过 .doc 文件
|
||||
if not file.endswith('.docx'):
|
||||
continue
|
||||
|
||||
file_index += 1
|
||||
input_path = Path(root) / file
|
||||
|
||||
# 识别文档类型
|
||||
doc_config = identify_document_type(file)
|
||||
|
||||
if not doc_config:
|
||||
print(f"\n⚠ 无法识别文档类型: {file}")
|
||||
print(f" 路径: {input_path}")
|
||||
skipped_count += 1
|
||||
continue
|
||||
|
||||
# 生成输出路径(保持相对目录结构)
|
||||
relative_path = input_path.relative_to(ORIGINAL_TEMPLATES_DIR)
|
||||
# 清理文件名(移除转换标记)
|
||||
clean_name = Path(file).stem
|
||||
clean_name = clean_name.replace("_转自DOC", "").replace("转自DOC", "")
|
||||
clean_name = clean_name.replace("(XXX)", "").replace("(XXX)", "").replace("XXX", "")
|
||||
output_path = OUTPUT_TEMPLATES_DIR / relative_path.parent / f"{clean_name}.docx"
|
||||
|
||||
print(f"\n{'='*80}")
|
||||
print(f"[{file_index}/{total_files}] 处理: {file}")
|
||||
print(f"{'='*80}")
|
||||
print(f" 类型: {doc_config.get('template_code', 'UNKNOWN')}")
|
||||
print(f" 输入: {input_path}")
|
||||
print(f" 输出: {output_path}")
|
||||
|
||||
# 处理文档(使用AI分析)
|
||||
if process_document(input_path, output_path, doc_config, use_ai=True):
|
||||
processed_count += 1
|
||||
else:
|
||||
failed_count += 1
|
||||
|
||||
# 输出统计信息
|
||||
print("\n" + "="*80)
|
||||
print("处理完成")
|
||||
print("="*80)
|
||||
print(f"成功处理: {processed_count} 个文件")
|
||||
print(f"跳过: {skipped_count} 个文件(无法识别类型)")
|
||||
print(f"失败: {failed_count} 个文件")
|
||||
print(f"\n处理后的模板保存在: {OUTPUT_TEMPLATES_DIR}")
|
||||
print("\n请检查生成的模板文件,确认占位符是否正确添加。")
|
||||
print("如有需要,请手动调整占位符位置。")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
process_all_templates()
|
||||
351
register_templates_to_db.py
Normal file
351
register_templates_to_db.py
Normal file
@ -0,0 +1,351 @@
|
||||
"""
|
||||
批量将template_finish文件夹下的模板文件注册到数据库并上传到MinIO
|
||||
"""
|
||||
import os
|
||||
import re
|
||||
import json
|
||||
import pymysql
|
||||
from minio import Minio
|
||||
from minio.error import S3Error
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from docx import Document
|
||||
from typing import Dict, List, Optional
|
||||
|
||||
|
||||
# MinIO连接配置
|
||||
MINIO_CONFIG = {
|
||||
'endpoint': 'minio.datacubeworld.com:9000',
|
||||
'access_key': 'JOLXFXny3avFSzB0uRA5',
|
||||
'secret_key': 'G1BR8jStNfovkfH5ou39EmPl34E4l7dGrnd3Cz0I',
|
||||
'secure': True
|
||||
}
|
||||
|
||||
# 数据库连接配置
|
||||
DB_CONFIG = {
|
||||
'host': '152.136.177.240',
|
||||
'port': 5012,
|
||||
'user': 'finyx',
|
||||
'password': '6QsGK6MpePZDE57Z',
|
||||
'database': 'finyx',
|
||||
'charset': 'utf8mb4'
|
||||
}
|
||||
|
||||
# 固定值
|
||||
TENANT_ID = 615873064429507639
|
||||
CREATED_BY = 655162080928945152
|
||||
UPDATED_BY = 655162080928945152
|
||||
CURRENT_TIME = datetime.now()
|
||||
BUCKET_NAME = 'finyx'
|
||||
|
||||
|
||||
def generate_id():
|
||||
"""生成ID"""
|
||||
import time
|
||||
import random
|
||||
timestamp = int(time.time() * 1000)
|
||||
random_part = random.randint(100000, 999999)
|
||||
return timestamp * 1000 + random_part
|
||||
|
||||
|
||||
def extract_placeholders_from_docx(file_path):
|
||||
"""从docx文件中提取所有占位符"""
|
||||
placeholders = set()
|
||||
pattern = r'\{\{([^}]+)\}\}'
|
||||
|
||||
try:
|
||||
doc = Document(file_path)
|
||||
|
||||
# 从段落中提取占位符
|
||||
for paragraph in doc.paragraphs:
|
||||
text = paragraph.text
|
||||
matches = re.findall(pattern, text)
|
||||
for match in matches:
|
||||
placeholders.add(match.strip())
|
||||
|
||||
# 从表格中提取占位符
|
||||
for table in doc.tables:
|
||||
for row in table.rows:
|
||||
for cell in row.cells:
|
||||
for paragraph in cell.paragraphs:
|
||||
text = paragraph.text
|
||||
matches = re.findall(pattern, text)
|
||||
for match in matches:
|
||||
placeholders.add(match.strip())
|
||||
|
||||
except Exception as e:
|
||||
print(f" 错误: 读取文件失败 - {str(e)}")
|
||||
return []
|
||||
|
||||
return sorted(list(placeholders))
|
||||
|
||||
|
||||
def generate_template_code(file_name: str, relative_path: str) -> str:
|
||||
"""
|
||||
根据文件名和路径生成模板编码
|
||||
|
||||
例如:
|
||||
- "2.初步核实审批表(XXX).docx" -> "PRELIMINARY_VERIFICATION_APPROVAL"
|
||||
- "1.请示报告卡(XXX).docx" -> "REQUEST_REPORT_CARD"
|
||||
"""
|
||||
# 提取基础名称(去掉扩展名和括号内容)
|
||||
base_name = Path(file_name).stem
|
||||
base_name = re.sub(r'(.*?)', '', base_name) # 去掉括号内容
|
||||
base_name = re.sub(r'\(.*?\)', '', base_name) # 去掉英文括号内容
|
||||
base_name = base_name.strip().rstrip('(').rstrip('(')
|
||||
|
||||
# 去掉数字前缀
|
||||
base_name = re.sub(r'^\d+[\.\-]?', '', base_name).strip()
|
||||
|
||||
# 生成编码:转换为大写,中文字符映射
|
||||
code_mapping = {
|
||||
'请示报告卡': 'REQUEST_REPORT_CARD',
|
||||
'初步核实审批表': 'PRELIMINARY_VERIFICATION_APPROVAL',
|
||||
'附件初核方案': 'PRELIMINARY_VERIFICATION_PLAN',
|
||||
'谈话通知书': 'INTERVIEW_NOTICE',
|
||||
'谈话审批表': 'INTERVIEW_APPROVAL',
|
||||
'谈话前安全风险评估表': 'PRE_INTERVIEW_RISK_ASSESSMENT',
|
||||
'谈话方案': 'INTERVIEW_PLAN',
|
||||
'谈话后安全风险评估表': 'POST_INTERVIEW_RISK_ASSESSMENT',
|
||||
'谈话笔录': 'INTERVIEW_RECORD',
|
||||
'谈话询问对象情况摸底调查30问': 'INTERVIEW_OBJECT_INVESTIGATION',
|
||||
'被谈话人权利义务告知书': 'INTERVIEWEE_RIGHTS_OBLIGATIONS_NOTICE',
|
||||
'点对点交接单': 'POINT_TO_POINT_HANDOVER',
|
||||
'陪送交接单': 'ESCORT_HANDOVER',
|
||||
'保密承诺书': 'CONFIDENTIALITY_COMMITMENT',
|
||||
'办案人员-办案安全保密承诺书': 'CASE_OFFICER_SECURITY_COMMITMENT',
|
||||
'请示报告卡(初核谈话)': 'REQUEST_REPORT_CARD_INTERVIEW',
|
||||
'请示报告卡(初核报告结论)': 'REQUEST_REPORT_CARD_CONCLUSION',
|
||||
'XXX初核情况报告': 'PRELIMINARY_VERIFICATION_REPORT'
|
||||
}
|
||||
|
||||
# 查找映射
|
||||
for key, code in code_mapping.items():
|
||||
if key in base_name:
|
||||
# 如果是谈话通知书,可能需要区分第几联
|
||||
if '谈话通知书' in base_name:
|
||||
if '第一联' in base_name:
|
||||
return 'INTERVIEW_NOTICE_FIRST'
|
||||
elif '第二联' in base_name:
|
||||
return 'INTERVIEW_NOTICE_SECOND'
|
||||
elif '第三联' in base_name:
|
||||
return 'INTERVIEW_NOTICE_THIRD'
|
||||
# 如果是保密承诺书,区分是否党员
|
||||
if '保密承诺书' in base_name:
|
||||
if '非中共党员' in base_name or '非党员' in base_name:
|
||||
return 'CONFIDENTIALITY_COMMITMENT_NON_PARTY'
|
||||
elif '中共党员' in base_name or '党员' in base_name:
|
||||
return 'CONFIDENTIALITY_COMMITMENT_PARTY'
|
||||
return code
|
||||
|
||||
# 如果没有匹配,使用通用规则生成
|
||||
# 将中文转换为拼音首字母(简化处理,实际应使用pypinyin)
|
||||
# 这里先使用简化规则
|
||||
code = base_name.upper()
|
||||
code = re.sub(r'[^\w]', '_', code)
|
||||
code = re.sub(r'_+', '_', code).strip('_')
|
||||
|
||||
return code if code else f'TEMPLATE_{generate_id() % 1000000}'
|
||||
|
||||
|
||||
def upload_to_minio(client: Minio, file_path: str, template_name: str) -> str:
|
||||
"""上传文件到MinIO"""
|
||||
try:
|
||||
now = datetime.now()
|
||||
object_name = f'{TENANT_ID}/TEMPLATE/{now.year}/{now.month:02d}/{template_name}'
|
||||
|
||||
client.fput_object(
|
||||
BUCKET_NAME,
|
||||
object_name,
|
||||
file_path,
|
||||
content_type='application/vnd.openxmlformats-officedocument.wordprocessingml.document'
|
||||
)
|
||||
|
||||
return f"/{object_name}"
|
||||
|
||||
except Exception as e:
|
||||
raise Exception(f"上传到MinIO失败: {str(e)}")
|
||||
|
||||
|
||||
def register_template_to_db(conn, template_info: Dict) -> int:
|
||||
"""注册模板到数据库"""
|
||||
cursor = conn.cursor()
|
||||
|
||||
try:
|
||||
# 检查是否已存在
|
||||
check_sql = """
|
||||
SELECT id FROM f_polic_file_config
|
||||
WHERE tenant_id = %s AND name = %s
|
||||
"""
|
||||
cursor.execute(check_sql, (TENANT_ID, template_info['name']))
|
||||
existing = cursor.fetchone()
|
||||
|
||||
if existing:
|
||||
file_config_id = existing[0]
|
||||
# 更新现有记录
|
||||
update_sql = """
|
||||
UPDATE f_polic_file_config
|
||||
SET file_path = %s, input_data = %s, updated_time = %s, updated_by = %s, state = 1
|
||||
WHERE id = %s AND tenant_id = %s
|
||||
"""
|
||||
cursor.execute(update_sql, (
|
||||
template_info['file_path'],
|
||||
template_info['input_data'],
|
||||
CURRENT_TIME,
|
||||
UPDATED_BY,
|
||||
file_config_id,
|
||||
TENANT_ID
|
||||
))
|
||||
print(f" ✓ 更新文件配置: {template_info['name']}, ID: {file_config_id}")
|
||||
else:
|
||||
file_config_id = generate_id()
|
||||
insert_sql = """
|
||||
INSERT INTO f_polic_file_config
|
||||
(id, tenant_id, parent_id, name, input_data, file_path, created_time, created_by, updated_time, updated_by, state)
|
||||
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
|
||||
"""
|
||||
cursor.execute(insert_sql, (
|
||||
file_config_id,
|
||||
TENANT_ID,
|
||||
template_info.get('parent_id'),
|
||||
template_info['name'],
|
||||
template_info['input_data'],
|
||||
template_info['file_path'],
|
||||
CURRENT_TIME,
|
||||
CREATED_BY,
|
||||
CURRENT_TIME,
|
||||
CREATED_BY,
|
||||
1 # state: 1表示启用
|
||||
))
|
||||
print(f" ✓ 创建文件配置: {template_info['name']}, ID: {file_config_id}")
|
||||
|
||||
conn.commit()
|
||||
return file_config_id
|
||||
|
||||
except Exception as e:
|
||||
conn.rollback()
|
||||
raise Exception(f"注册到数据库失败: {str(e)}")
|
||||
|
||||
|
||||
def process_templates_in_directory(base_dir: str):
|
||||
"""处理目录下所有模板文件"""
|
||||
base_path = Path(base_dir)
|
||||
if not base_path.exists():
|
||||
print(f"错误: 目录不存在 - {base_dir}")
|
||||
return
|
||||
|
||||
# 连接数据库和MinIO
|
||||
print("=" * 80)
|
||||
print("连接数据库和MinIO...")
|
||||
print("=" * 80)
|
||||
|
||||
conn = pymysql.connect(**DB_CONFIG)
|
||||
minio_client = Minio(
|
||||
MINIO_CONFIG['endpoint'],
|
||||
access_key=MINIO_CONFIG['access_key'],
|
||||
secret_key=MINIO_CONFIG['secret_key'],
|
||||
secure=MINIO_CONFIG['secure']
|
||||
)
|
||||
|
||||
# 检查存储桶
|
||||
if not minio_client.bucket_exists(BUCKET_NAME):
|
||||
print(f"错误: 存储桶 '{BUCKET_NAME}' 不存在")
|
||||
return
|
||||
|
||||
print(f"✓ 存储桶 '{BUCKET_NAME}' 已存在\n")
|
||||
|
||||
# 处理结果
|
||||
processed_count = 0
|
||||
success_count = 0
|
||||
failed_count = 0
|
||||
failed_files = []
|
||||
|
||||
# 遍历所有docx文件
|
||||
print("=" * 80)
|
||||
print("开始处理模板文件...")
|
||||
print("=" * 80)
|
||||
print()
|
||||
|
||||
for docx_file in sorted(base_path.rglob("*.docx")):
|
||||
# 跳过临时文件
|
||||
if docx_file.name.startswith("~$"):
|
||||
continue
|
||||
|
||||
processed_count += 1
|
||||
relative_path = docx_file.relative_to(base_path)
|
||||
|
||||
print(f"[{processed_count}] 处理: {relative_path}")
|
||||
|
||||
try:
|
||||
# 提取占位符
|
||||
placeholders = extract_placeholders_from_docx(str(docx_file))
|
||||
print(f" 占位符数量: {len(placeholders)}")
|
||||
|
||||
# 生成模板编码和名称
|
||||
template_code = generate_template_code(docx_file.name, str(relative_path))
|
||||
template_name = docx_file.name
|
||||
|
||||
# 上传到MinIO
|
||||
print(f" 正在上传到MinIO...")
|
||||
file_path = upload_to_minio(minio_client, str(docx_file), template_name)
|
||||
print(f" ✓ 上传成功: {file_path}")
|
||||
|
||||
# 准备数据库记录
|
||||
input_data = json.dumps({
|
||||
'template_code': template_code,
|
||||
'business_type': 'INVESTIGATION', # 默认为调查核实
|
||||
'placeholders': placeholders # 保存占位符列表供参考
|
||||
}, ensure_ascii=False)
|
||||
|
||||
template_info = {
|
||||
'name': template_name.replace('.docx', ''), # 去掉扩展名作为名称
|
||||
'template_code': template_code,
|
||||
'file_path': file_path,
|
||||
'input_data': input_data,
|
||||
'parent_id': None
|
||||
}
|
||||
|
||||
# 注册到数据库
|
||||
print(f" 正在注册到数据库...")
|
||||
file_config_id = register_template_to_db(conn, template_info)
|
||||
print(f" ✓ 注册成功,配置ID: {file_config_id}")
|
||||
|
||||
success_count += 1
|
||||
print()
|
||||
|
||||
except Exception as e:
|
||||
failed_count += 1
|
||||
failed_files.append((str(relative_path), str(e)))
|
||||
print(f" ✗ 处理失败: {str(e)}\n")
|
||||
|
||||
# 关闭连接
|
||||
conn.close()
|
||||
|
||||
# 打印汇总
|
||||
print("=" * 80)
|
||||
print("处理汇总")
|
||||
print("=" * 80)
|
||||
print(f"总文件数: {processed_count}")
|
||||
print(f"成功: {success_count}")
|
||||
print(f"失败: {failed_count}")
|
||||
|
||||
if failed_files:
|
||||
print("\n失败的文件:")
|
||||
for file_path, error in failed_files:
|
||||
print(f" - {file_path}: {error}")
|
||||
|
||||
print("\n处理完成!")
|
||||
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
template_dir = os.path.join(os.path.dirname(__file__), 'template_finish')
|
||||
|
||||
print(f"模板目录: {template_dir}")
|
||||
print()
|
||||
|
||||
process_templates_in_directory(template_dir)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
@ -60,29 +60,35 @@ class DocumentService:
|
||||
Returns:
|
||||
文件配置信息,包含: id, name, file_path, template_code
|
||||
"""
|
||||
import json
|
||||
conn = self.get_connection()
|
||||
cursor = conn.cursor(pymysql.cursors.DictCursor)
|
||||
|
||||
try:
|
||||
# 查询文件配置(使用template_code字段)
|
||||
# 查询文件配置(template_code存储在input_data的JSON字段中)
|
||||
sql = """
|
||||
SELECT id, name, file_path, template_code
|
||||
SELECT id, name, file_path, input_data
|
||||
FROM f_polic_file_config
|
||||
WHERE tenant_id = %s
|
||||
AND template_code = %s
|
||||
AND state = 1
|
||||
LIMIT 1
|
||||
"""
|
||||
cursor.execute(sql, (self.tenant_id, template_code))
|
||||
config = cursor.fetchone()
|
||||
cursor.execute(sql, (self.tenant_id,))
|
||||
configs = cursor.fetchall()
|
||||
|
||||
# 从input_data的JSON中查找匹配的template_code
|
||||
for config in configs:
|
||||
try:
|
||||
input_data = json.loads(config['input_data']) if config['input_data'] else {}
|
||||
if input_data.get('template_code') == template_code:
|
||||
return {
|
||||
'id': config['id'],
|
||||
'name': config['name'],
|
||||
'file_path': config['file_path'],
|
||||
'template_code': template_code
|
||||
}
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
continue
|
||||
|
||||
if config:
|
||||
return {
|
||||
'id': config['id'],
|
||||
'name': config['name'],
|
||||
'file_path': config['file_path'],
|
||||
'template_code': config.get('template_code', template_code)
|
||||
}
|
||||
return None
|
||||
|
||||
finally:
|
||||
|
||||
@ -24,6 +24,9 @@ class FieldService:
|
||||
|
||||
# 加载提示词配置文件
|
||||
self.prompt_config = self._load_prompt_config()
|
||||
|
||||
# 加载字段默认值配置
|
||||
self.field_defaults = self._load_field_defaults()
|
||||
|
||||
def _load_prompt_config(self) -> Dict:
|
||||
"""
|
||||
@ -79,6 +82,40 @@ class FieldService:
|
||||
}
|
||||
}
|
||||
|
||||
def _load_field_defaults(self) -> Dict:
|
||||
"""
|
||||
加载字段默认值配置文件
|
||||
|
||||
Returns:
|
||||
字段默认值字典
|
||||
"""
|
||||
current_dir = Path(__file__).parent
|
||||
project_root = current_dir.parent
|
||||
config_path = project_root / 'config' / 'field_defaults.json'
|
||||
|
||||
try:
|
||||
with open(config_path, 'r', encoding='utf-8') as f:
|
||||
config = json.load(f)
|
||||
return config.get('field_defaults', {})
|
||||
except FileNotFoundError:
|
||||
print(f"警告: 默认值配置文件 {config_path} 不存在,使用空默认值")
|
||||
return {}
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"错误: 默认值配置文件 {config_path} JSON格式错误: {e}")
|
||||
return {}
|
||||
|
||||
def get_field_default_value(self, field_code: str) -> Optional[str]:
|
||||
"""
|
||||
获取字段的默认值
|
||||
|
||||
Args:
|
||||
field_code: 字段编码
|
||||
|
||||
Returns:
|
||||
默认值字符串,如果不存在则返回None
|
||||
"""
|
||||
return self.field_defaults.get(field_code)
|
||||
|
||||
def get_connection(self):
|
||||
"""获取数据库连接"""
|
||||
return pymysql.connect(**self.db_config)
|
||||
@ -171,7 +208,14 @@ class FieldService:
|
||||
"""
|
||||
获取业务类型的所有字段(包括输入和输出字段)
|
||||
用于测试页面展示
|
||||
|
||||
Args:
|
||||
business_type: 业务类型,如 'INVESTIGATION'
|
||||
|
||||
Returns:
|
||||
包含input_fields和output_fields的字典
|
||||
"""
|
||||
import json
|
||||
conn = self.get_connection()
|
||||
cursor = conn.cursor(pymysql.cursors.DictCursor)
|
||||
|
||||
@ -189,6 +233,7 @@ class FieldService:
|
||||
input_fields = cursor.fetchall()
|
||||
|
||||
# 获取输出字段(field_type=2)
|
||||
# 根据business_type从input_data的JSON中查找匹配的文件配置
|
||||
sql_output = """
|
||||
SELECT f.id, f.name, f.filed_code as field_code, f.field_type
|
||||
FROM f_polic_field f
|
||||
@ -196,11 +241,49 @@ class FieldService:
|
||||
INNER JOIN f_polic_file_config fc ON ff.file_id = fc.id
|
||||
WHERE f.tenant_id = %s
|
||||
AND f.field_type = 2
|
||||
AND fc.name = '初步核实审批表'
|
||||
AND fc.state = 1
|
||||
ORDER BY f.id
|
||||
"""
|
||||
cursor.execute(sql_output, (self.tenant_id,))
|
||||
output_fields = cursor.fetchall()
|
||||
all_output_fields = cursor.fetchall()
|
||||
|
||||
# 根据business_type过滤输出字段
|
||||
# 需要查询文件配置的input_data来匹配business_type
|
||||
sql_file_configs = """
|
||||
SELECT id, name, input_data
|
||||
FROM f_polic_file_config
|
||||
WHERE tenant_id = %s
|
||||
AND state = 1
|
||||
"""
|
||||
cursor.execute(sql_file_configs, (self.tenant_id,))
|
||||
file_configs = cursor.fetchall()
|
||||
|
||||
# 找到匹配business_type的文件配置ID列表
|
||||
matching_file_ids = []
|
||||
for fc in file_configs:
|
||||
try:
|
||||
input_data = json.loads(fc['input_data']) if fc['input_data'] else {}
|
||||
if input_data.get('business_type') == business_type:
|
||||
matching_file_ids.append(fc['id'])
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
continue
|
||||
|
||||
# 过滤输出字段:只返回匹配的文件配置关联的字段
|
||||
output_fields = []
|
||||
if matching_file_ids:
|
||||
# 获取这些文件配置关联的字段
|
||||
placeholders = ','.join(['%s'] * len(matching_file_ids))
|
||||
sql_filtered = f"""
|
||||
SELECT DISTINCT f.id, f.name, f.filed_code as field_code, f.field_type
|
||||
FROM f_polic_field f
|
||||
INNER JOIN f_polic_file_field ff ON f.id = ff.filed_id
|
||||
WHERE f.tenant_id = %s
|
||||
AND f.field_type = 2
|
||||
AND ff.file_id IN ({placeholders})
|
||||
ORDER BY f.id
|
||||
"""
|
||||
cursor.execute(sql_filtered, [self.tenant_id] + matching_file_ids)
|
||||
output_fields = cursor.fetchall()
|
||||
|
||||
return {
|
||||
'input_fields': [
|
||||
@ -253,16 +336,28 @@ class FieldService:
|
||||
field_value=field_value
|
||||
) + "\n"
|
||||
|
||||
# 构建输出字段说明
|
||||
# 构建输出字段说明(包含字段特定规则)
|
||||
output_field_format = formatting.get('output_field_format', '- {field_name} (字段编码: {field_code})')
|
||||
field_specific_rules = self.prompt_config.get('field_specific_rules', {})
|
||||
output_fields_desc = ""
|
||||
for field in output_fields:
|
||||
field_name = field['name']
|
||||
field_code = field['field_code']
|
||||
output_fields_desc += output_field_format.format(
|
||||
field_desc = output_field_format.format(
|
||||
field_name=field_name,
|
||||
field_code=field_code
|
||||
) + "\n"
|
||||
)
|
||||
|
||||
# 如果字段有特定规则,添加到说明中
|
||||
if field_code in field_specific_rules:
|
||||
field_rule = field_specific_rules[field_code]
|
||||
field_desc += f"\n 说明:{field_rule.get('description', '')}"
|
||||
if 'rules' in field_rule and field_rule['rules']:
|
||||
field_desc += "\n 特殊要求:"
|
||||
for rule in field_rule['rules']:
|
||||
field_desc += f"\n - {rule}"
|
||||
|
||||
output_fields_desc += field_desc + "\n"
|
||||
|
||||
# 构建JSON格式示例
|
||||
json_example = {}
|
||||
|
||||
@ -381,13 +381,23 @@
|
||||
// ==================== 解析接口相关 ====================
|
||||
|
||||
function initExtractTab() {
|
||||
// 初始化默认输入字段
|
||||
addInputField('clue_info', '被举报用户名称是张三,年龄30岁,某公司总经理,男性,1980年5月出生,中共党员,正处级');
|
||||
// 初始化默认输入字段(虚拟测试数据)
|
||||
addInputField('clue_info', '被举报用户名称是张三,年龄44岁,某公司总经理,男性,1980年5月出生,本科文化程度,中共党员,正处级。主要问题线索:违反国家计划生育有关政策规定,于2010年10月生育二胎。线索来源:群众举报。');
|
||||
addInputField('target_basic_info_clue', '被核查人员工作基本情况:张三,男,1980年5月生,本科文化,中共党员,现为某公司总经理,正处级。');
|
||||
|
||||
// 初始化默认输出字段
|
||||
// 初始化默认输出字段(包含完整的字段列表)
|
||||
addOutputField('target_name');
|
||||
addOutputField('target_gender');
|
||||
addOutputField('target_age');
|
||||
addOutputField('target_date_of_birth');
|
||||
addOutputField('target_organization_and_position');
|
||||
addOutputField('target_organization');
|
||||
addOutputField('target_position');
|
||||
addOutputField('target_education_level');
|
||||
addOutputField('target_political_status');
|
||||
addOutputField('target_professional_rank');
|
||||
addOutputField('clue_source');
|
||||
addOutputField('target_issue_description');
|
||||
}
|
||||
|
||||
function addInputField(fieldCode = 'clue_info', fieldValue = '') {
|
||||
@ -539,13 +549,25 @@
|
||||
// ==================== 文档生成接口相关 ====================
|
||||
|
||||
function initGenerateTab() {
|
||||
// 初始化默认字段
|
||||
// 初始化默认字段(完整的虚拟测试数据)
|
||||
addGenerateField('target_name', '张三');
|
||||
addGenerateField('target_gender', '男');
|
||||
addGenerateField('target_age', '44');
|
||||
addGenerateField('target_date_of_birth', '198005');
|
||||
addGenerateField('target_organization_and_position', '某公司总经理');
|
||||
addGenerateField('target_organization', '某公司');
|
||||
addGenerateField('target_position', '总经理');
|
||||
addGenerateField('target_education_level', '本科');
|
||||
addGenerateField('target_political_status', '中共党员');
|
||||
addGenerateField('target_professional_rank', '正处级');
|
||||
addGenerateField('clue_source', '群众举报');
|
||||
addGenerateField('target_issue_description', '违反国家计划生育有关政策规定,于2010年10月生育二胎。');
|
||||
addGenerateField('department_opinion', '建议进行初步核实');
|
||||
addGenerateField('filler_name', '李四');
|
||||
|
||||
// 初始化默认文件
|
||||
// 初始化默认文件(包含多个模板用于测试)
|
||||
addFileItem(1, '初步核实审批表.doc', 'PRELIMINARY_VERIFICATION_APPROVAL');
|
||||
addFileItem(2, '请示报告卡.doc', 'REQUEST_REPORT_CARD');
|
||||
}
|
||||
|
||||
function addGenerateField(fieldCode = '', fieldValue = '') {
|
||||
|
||||
304
template_ai_helper.py
Normal file
304
template_ai_helper.py
Normal file
@ -0,0 +1,304 @@
|
||||
"""
|
||||
模板AI辅助工具 - 使用AI智能分析文档内容并识别占位符替换位置
|
||||
"""
|
||||
import os
|
||||
import json
|
||||
import requests
|
||||
from typing import Dict, List, Optional, Tuple
|
||||
from dotenv import load_dotenv
|
||||
|
||||
# 加载环境变量
|
||||
load_dotenv()
|
||||
|
||||
|
||||
class TemplateAIHelper:
|
||||
"""模板AI辅助类,用于智能分析文档内容"""
|
||||
|
||||
def __init__(self):
|
||||
self.api_key = os.getenv('SILICONFLOW_API_KEY')
|
||||
self.model = os.getenv('SILICONFLOW_MODEL', 'deepseek-ai/DeepSeek-V3.2-Exp')
|
||||
self.api_url = "https://api.siliconflow.cn/v1/chat/completions"
|
||||
|
||||
if not self.api_key:
|
||||
raise Exception("未配置 SILICONFLOW_API_KEY,请在 .env 文件中设置")
|
||||
|
||||
def test_api_connection(self) -> bool:
|
||||
"""
|
||||
测试API连接是否正常
|
||||
|
||||
Returns:
|
||||
是否连接成功
|
||||
"""
|
||||
try:
|
||||
print(" [测试] 正在测试硅基流动API连接...")
|
||||
test_payload = {
|
||||
"model": self.model,
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "测试"
|
||||
}
|
||||
],
|
||||
"max_tokens": 10
|
||||
}
|
||||
|
||||
headers = {
|
||||
"Authorization": f"Bearer {self.api_key}",
|
||||
"Content-Type": "application/json"
|
||||
}
|
||||
|
||||
response = requests.post(
|
||||
self.api_url,
|
||||
json=test_payload,
|
||||
headers=headers,
|
||||
timeout=10
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
print(" [测试] ✓ API连接正常")
|
||||
return True
|
||||
else:
|
||||
print(f" [测试] ✗ API连接失败: {response.status_code} - {response.text[:200]}")
|
||||
return False
|
||||
except Exception as e:
|
||||
print(f" [测试] ✗ API连接测试失败: {e}")
|
||||
return False
|
||||
|
||||
def analyze_document_content(
|
||||
self,
|
||||
document_text: str,
|
||||
available_fields: List[Dict],
|
||||
document_type: str = "未知"
|
||||
) -> List[Dict]:
|
||||
"""
|
||||
分析文档内容,识别需要替换为占位符的位置
|
||||
|
||||
Args:
|
||||
document_text: 文档文本内容
|
||||
available_fields: 可用字段列表,格式: [{"field_code": "xxx", "field_name": "xxx", "description": "xxx"}]
|
||||
document_type: 文档类型
|
||||
|
||||
Returns:
|
||||
替换建议列表,格式: [
|
||||
{
|
||||
"original_text": "原始文本",
|
||||
"replacement": "{{field_code}}",
|
||||
"field_code": "field_code",
|
||||
"field_name": "字段名称",
|
||||
"confidence": 0.9,
|
||||
"position": "段落/表格"
|
||||
}
|
||||
]
|
||||
"""
|
||||
try:
|
||||
print(f" [AI] 正在分析文本内容(长度: {len(document_text)} 字符)...")
|
||||
# 构建字段信息字符串
|
||||
fields_info = "\n".join([
|
||||
f"- {field['field_name']} ({{{{field_code}}}}): {field.get('description', '')}"
|
||||
for field in available_fields
|
||||
])
|
||||
|
||||
# 构建提示词
|
||||
prompt = f"""你是一个专业的文档模板分析助手。请分析以下文档内容,识别所有可以替换为占位符的位置。
|
||||
|
||||
文档类型:{document_type}
|
||||
|
||||
可用字段列表:
|
||||
{fields_info}
|
||||
|
||||
文档内容:
|
||||
{document_text}
|
||||
|
||||
请仔细分析文档内容,识别以下类型的可替换内容:
|
||||
1. 明确的字段值(如姓名、单位、职务等)
|
||||
2. 示例值(如"XXX"、"张三"、"某公司"等)
|
||||
3. 组合字段(如"山西XXXX集团有限公司(职务+姓名)"应替换为对应的占位符组合)
|
||||
4. 日期、时间等格式化的值
|
||||
5. 任何看起来是示例或占位符的内容
|
||||
|
||||
对于组合字段,如果包含多个字段信息,请使用多个占位符的组合,例如:
|
||||
- "山西XXXX集团有限公司(职务+姓名)" → "{{{{target_organization_and_position}}}}({{{{target_name}}}})"
|
||||
- "张三,男,1980年5月" → "{{{{target_name}}}},{{{{target_gender}}}},{{{{target_date_of_birth}}}}"
|
||||
|
||||
请以JSON格式返回分析结果,格式如下:
|
||||
{{
|
||||
"replacements": [
|
||||
{{
|
||||
"original_text": "原始文本内容",
|
||||
"replacement": "{{{{field_code}}}}或组合占位符",
|
||||
"field_code": "字段编码(如果是组合,用逗号分隔)",
|
||||
"field_name": "字段名称",
|
||||
"confidence": 0.9,
|
||||
"position": "位置描述(如:第X段、表格第X行第X列)",
|
||||
"reason": "替换原因"
|
||||
}}
|
||||
]
|
||||
}}
|
||||
|
||||
只返回JSON,不要其他说明文字。"""
|
||||
|
||||
# 调用AI API
|
||||
payload = {
|
||||
"model": self.model,
|
||||
"messages": [
|
||||
{
|
||||
"role": "system",
|
||||
"content": "你是一个专业的文档模板分析助手,能够准确识别文档中需要替换为占位符的内容。请严格按照JSON格式返回结果,不要添加任何解释性文字。"
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": prompt
|
||||
}
|
||||
],
|
||||
"temperature": 0.2,
|
||||
"max_tokens": 4000
|
||||
}
|
||||
|
||||
headers = {
|
||||
"Authorization": f"Bearer {self.api_key}",
|
||||
"Content-Type": "application/json"
|
||||
}
|
||||
|
||||
print(f" [AI] 正在调用API...")
|
||||
response = requests.post(
|
||||
self.api_url,
|
||||
json=payload,
|
||||
headers=headers,
|
||||
timeout=60
|
||||
)
|
||||
|
||||
print(f" [AI] API响应状态: {response.status_code}")
|
||||
|
||||
if response.status_code != 200:
|
||||
error_msg = response.text[:500] if len(response.text) > 500 else response.text
|
||||
raise Exception(f"API调用失败: {response.status_code} - {error_msg}")
|
||||
|
||||
result = response.json()
|
||||
print(f" [AI] API调用成功,正在解析响应...")
|
||||
|
||||
# 提取AI返回的内容
|
||||
if 'choices' in result and len(result['choices']) > 0:
|
||||
content = result['choices'][0]['message']['content']
|
||||
|
||||
# 尝试解析JSON
|
||||
try:
|
||||
# 如果返回的是代码块,提取JSON部分
|
||||
if '```json' in content:
|
||||
json_start = content.find('```json') + 7
|
||||
json_end = content.find('```', json_start)
|
||||
content = content[json_start:json_end].strip()
|
||||
elif '```' in content:
|
||||
json_start = content.find('```') + 3
|
||||
json_end = content.find('```', json_start)
|
||||
content = content[json_start:json_end].strip()
|
||||
|
||||
parsed_result = json.loads(content)
|
||||
replacements = parsed_result.get('replacements', [])
|
||||
print(f" [AI] ✓ 分析完成,识别到 {len(replacements)} 个替换建议")
|
||||
return replacements
|
||||
except json.JSONDecodeError as e:
|
||||
print(f" [AI] ⚠ JSON解析失败: {e}")
|
||||
print(f" [AI] 原始响应内容: {content[:200]}...")
|
||||
# 如果JSON解析失败,返回空列表
|
||||
return []
|
||||
else:
|
||||
raise Exception("API返回格式异常")
|
||||
|
||||
except requests.exceptions.Timeout:
|
||||
print(f" [AI] ✗ 请求超时(60秒),跳过AI分析")
|
||||
return []
|
||||
except Exception as e:
|
||||
print(f" [AI] ✗ 分析失败: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return []
|
||||
|
||||
def analyze_paragraph(
|
||||
self,
|
||||
paragraph_text: str,
|
||||
available_fields: List[Dict],
|
||||
document_type: str = "未知"
|
||||
) -> List[Dict]:
|
||||
"""
|
||||
分析单个段落,识别需要替换的位置
|
||||
|
||||
Args:
|
||||
paragraph_text: 段落文本
|
||||
available_fields: 可用字段列表
|
||||
document_type: 文档类型
|
||||
|
||||
Returns:
|
||||
替换建议列表
|
||||
"""
|
||||
# 跳过空内容或太短的内容(少于10个字符)
|
||||
if not paragraph_text or len(paragraph_text.strip()) < 10:
|
||||
return []
|
||||
|
||||
# 如果文本已经包含占位符,跳过
|
||||
if '{{' in paragraph_text and '}}' in paragraph_text:
|
||||
return []
|
||||
|
||||
return self.analyze_document_content(paragraph_text, available_fields, document_type)
|
||||
|
||||
def analyze_table_cell(
|
||||
self,
|
||||
cell_text: str,
|
||||
available_fields: List[Dict],
|
||||
document_type: str = "未知",
|
||||
row: int = 0,
|
||||
col: int = 0
|
||||
) -> List[Dict]:
|
||||
"""
|
||||
分析表格单元格,识别需要替换的位置
|
||||
|
||||
Args:
|
||||
cell_text: 单元格文本
|
||||
available_fields: 可用字段列表
|
||||
document_type: 文档类型
|
||||
row: 行号
|
||||
col: 列号
|
||||
|
||||
Returns:
|
||||
替换建议列表
|
||||
"""
|
||||
if not cell_text or len(cell_text.strip()) < 3:
|
||||
return []
|
||||
|
||||
replacements = self.analyze_document_content(cell_text, available_fields, document_type)
|
||||
|
||||
# 添加位置信息
|
||||
for replacement in replacements:
|
||||
replacement['position'] = f"表格第{row+1}行第{col+1}列"
|
||||
|
||||
return replacements
|
||||
|
||||
|
||||
def get_available_fields_for_document(doc_config: Dict, field_name_to_code: Dict) -> List[Dict]:
|
||||
"""
|
||||
根据文档配置获取可用字段列表
|
||||
|
||||
Args:
|
||||
doc_config: 文档配置
|
||||
field_name_to_code: 字段名称到编码的映射
|
||||
|
||||
Returns:
|
||||
可用字段列表
|
||||
"""
|
||||
available_fields = []
|
||||
|
||||
for field_code in doc_config.get('fields', []):
|
||||
# 查找字段名称
|
||||
field_name = None
|
||||
for name, code in field_name_to_code.items():
|
||||
if code == field_code:
|
||||
field_name = name
|
||||
break
|
||||
|
||||
if field_name:
|
||||
available_fields.append({
|
||||
'field_code': field_code,
|
||||
'field_name': field_name,
|
||||
'description': f"{field_name}字段"
|
||||
})
|
||||
|
||||
return available_fields
|
||||
BIN
template_finish/2-初核模版/1.初核请示/1.请示报告卡(XXX).docx
Normal file
BIN
template_finish/2-初核模版/1.初核请示/1.请示报告卡(XXX).docx
Normal file
Binary file not shown.
BIN
template_finish/2-初核模版/1.初核请示/2.初步核实审批表(XXX).docx
Normal file
BIN
template_finish/2-初核模版/1.初核请示/2.初步核实审批表(XXX).docx
Normal file
Binary file not shown.
BIN
template_finish/2-初核模版/1.初核请示/3.附件初核方案(XXX).docx
Normal file
BIN
template_finish/2-初核模版/1.初核请示/3.附件初核方案(XXX).docx
Normal file
Binary file not shown.
BIN
template_finish/2-初核模版/2.谈话审批/谈话通知书/谈话通知书第一联.docx
Normal file
BIN
template_finish/2-初核模版/2.谈话审批/谈话通知书/谈话通知书第一联.docx
Normal file
Binary file not shown.
BIN
template_finish/2-初核模版/2.谈话审批/谈话通知书/谈话通知书第三联.docx
Normal file
BIN
template_finish/2-初核模版/2.谈话审批/谈话通知书/谈话通知书第三联.docx
Normal file
Binary file not shown.
BIN
template_finish/2-初核模版/2.谈话审批/谈话通知书/谈话通知书第二联.docx
Normal file
BIN
template_finish/2-初核模版/2.谈话审批/谈话通知书/谈话通知书第二联.docx
Normal file
Binary file not shown.
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话审批/1.请示报告卡(初核谈话).docx
Normal file
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话审批/1.请示报告卡(初核谈话).docx
Normal file
Binary file not shown.
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话审批/2谈话审批表.docx
Normal file
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话审批/2谈话审批表.docx
Normal file
Binary file not shown.
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话审批/3.谈话前安全风险评估表.docx
Normal file
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话审批/3.谈话前安全风险评估表.docx
Normal file
Binary file not shown.
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话审批/4.谈话方案.docx
Normal file
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话审批/4.谈话方案.docx
Normal file
Binary file not shown.
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话审批/5.谈话后安全风险评估表.docx
Normal file
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话审批/5.谈话后安全风险评估表.docx
Normal file
Binary file not shown.
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话审批/~$谈话前安全风险评估表.docx
Normal file
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话审批/~$谈话前安全风险评估表.docx
Normal file
Binary file not shown.
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话审批/~$谈话后安全风险评估表.docx
Normal file
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话审批/~$谈话后安全风险评估表.docx
Normal file
Binary file not shown.
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话流程/1.谈话笔录.docx
Normal file
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话流程/1.谈话笔录.docx
Normal file
Binary file not shown.
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话流程/2.谈话询问对象情况摸底调查30问.docx
Normal file
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话流程/2.谈话询问对象情况摸底调查30问.docx
Normal file
Binary file not shown.
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话流程/3.被谈话人权利义务告知书.docx
Normal file
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话流程/3.被谈话人权利义务告知书.docx
Normal file
Binary file not shown.
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话流程/4.点对点交接单.docx
Normal file
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话流程/4.点对点交接单.docx
Normal file
Binary file not shown.
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话流程/5.陪送交接单(新).docx
Normal file
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话流程/5.陪送交接单(新).docx
Normal file
Binary file not shown.
Binary file not shown.
Binary file not shown.
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话流程/7.办案人员-办案安全保密承诺书.docx
Normal file
BIN
template_finish/2-初核模版/2.谈话审批/走读式谈话流程/7.办案人员-办案安全保密承诺书.docx
Normal file
Binary file not shown.
BIN
template_finish/2-初核模版/3.初核结论/8-1请示报告卡(初核报告结论) .docx
Normal file
BIN
template_finish/2-初核模版/3.初核结论/8-1请示报告卡(初核报告结论) .docx
Normal file
Binary file not shown.
BIN
template_finish/2-初核模版/3.初核结论/8.XXX初核情况报告.docx
Normal file
BIN
template_finish/2-初核模版/3.初核结论/8.XXX初核情况报告.docx
Normal file
Binary file not shown.
97
test_ai_api.py
Normal file
97
test_ai_api.py
Normal file
@ -0,0 +1,97 @@
|
||||
"""
|
||||
测试硅基流动API连接
|
||||
"""
|
||||
import os
|
||||
from dotenv import load_dotenv
|
||||
import requests
|
||||
|
||||
# 加载环境变量
|
||||
load_dotenv()
|
||||
|
||||
def test_siliconflow_api():
|
||||
"""测试硅基流动API连接"""
|
||||
print("="*80)
|
||||
print("测试硅基流动API连接")
|
||||
print("="*80)
|
||||
print()
|
||||
|
||||
# 读取配置
|
||||
api_key = os.getenv('SILICONFLOW_API_KEY')
|
||||
model = os.getenv('SILICONFLOW_MODEL', 'deepseek-ai/DeepSeek-V3.2-Exp')
|
||||
api_url = "https://api.siliconflow.cn/v1/chat/completions"
|
||||
|
||||
# 检查配置
|
||||
print("1. 检查配置...")
|
||||
if not api_key:
|
||||
print(" ✗ 错误: 未找到 SILICONFLOW_API_KEY")
|
||||
print(" 请在 .env 文件中设置: SILICONFLOW_API_KEY=你的API密钥")
|
||||
return False
|
||||
|
||||
print(f" ✓ API密钥: {api_key[:10]}...{api_key[-5:]}")
|
||||
print(f" ✓ 模型: {model}")
|
||||
print(f" ✓ API地址: {api_url}")
|
||||
print()
|
||||
|
||||
# 测试API调用
|
||||
print("2. 测试API调用...")
|
||||
try:
|
||||
payload = {
|
||||
"model": model,
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "请回复'测试成功'"
|
||||
}
|
||||
],
|
||||
"max_tokens": 20
|
||||
}
|
||||
|
||||
headers = {
|
||||
"Authorization": f"Bearer {api_key}",
|
||||
"Content-Type": "application/json"
|
||||
}
|
||||
|
||||
print(" 正在发送请求...")
|
||||
response = requests.post(
|
||||
api_url,
|
||||
json=payload,
|
||||
headers=headers,
|
||||
timeout=30
|
||||
)
|
||||
|
||||
print(f" 响应状态码: {response.status_code}")
|
||||
|
||||
if response.status_code == 200:
|
||||
result = response.json()
|
||||
if 'choices' in result and len(result['choices']) > 0:
|
||||
content = result['choices'][0]['message']['content']
|
||||
print(f" ✓ API调用成功!")
|
||||
print(f" 响应内容: {content}")
|
||||
return True
|
||||
else:
|
||||
print(f" ✗ API响应格式异常: {result}")
|
||||
return False
|
||||
else:
|
||||
print(f" ✗ API调用失败")
|
||||
print(f" 状态码: {response.status_code}")
|
||||
print(f" 错误信息: {response.text[:500]}")
|
||||
return False
|
||||
|
||||
except requests.exceptions.Timeout:
|
||||
print(f" ✗ 请求超时(30秒)")
|
||||
return False
|
||||
except Exception as e:
|
||||
print(f" ✗ 请求失败: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return False
|
||||
|
||||
if __name__ == '__main__':
|
||||
success = test_siliconflow_api()
|
||||
print()
|
||||
print("="*80)
|
||||
if success:
|
||||
print("✓ API测试通过,可以正常使用AI分析功能")
|
||||
else:
|
||||
print("✗ API测试失败,请检查配置后重试")
|
||||
print("="*80)
|
||||
261
test_api_endpoints.py
Normal file
261
test_api_endpoints.py
Normal file
@ -0,0 +1,261 @@
|
||||
"""
|
||||
测试API接口是否可以正确解析数据和生成文档
|
||||
"""
|
||||
import requests
|
||||
import json
|
||||
from typing import Dict, Any
|
||||
|
||||
|
||||
# API基础URL
|
||||
BASE_URL = "http://localhost:7500"
|
||||
|
||||
|
||||
def test_extract_api():
|
||||
"""测试AI解析接口"""
|
||||
print("=" * 80)
|
||||
print("测试 AI解析接口 (/ai/extract)")
|
||||
print("=" * 80)
|
||||
|
||||
url = f"{BASE_URL}/ai/extract"
|
||||
|
||||
# 测试数据
|
||||
request_data = {
|
||||
"inputData": [
|
||||
{
|
||||
"fieldCode": "clue_info",
|
||||
"fieldValue": "被举报用户名称是张三,年龄44岁,某公司总经理,男性,1980年5月出生,本科文化程度,中共党员,正处级。主要问题线索:违反国家计划生育有关政策规定,于2010年10月生育二胎。线索来源:群众举报。"
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_basic_info_clue",
|
||||
"fieldValue": "被核查人员工作基本情况:张三,男,1980年5月生,本科文化,中共党员,现为某公司总经理,正处级。"
|
||||
}
|
||||
],
|
||||
"outputData": [
|
||||
{"fieldCode": "target_name"},
|
||||
{"fieldCode": "target_gender"},
|
||||
{"fieldCode": "target_age"},
|
||||
{"fieldCode": "target_date_of_birth"},
|
||||
{"fieldCode": "target_organization_and_position"},
|
||||
{"fieldCode": "target_organization"},
|
||||
{"fieldCode": "target_position"},
|
||||
{"fieldCode": "target_education_level"},
|
||||
{"fieldCode": "target_political_status"},
|
||||
{"fieldCode": "target_professional_rank"},
|
||||
{"fieldCode": "clue_source"},
|
||||
{"fieldCode": "target_issue_description"}
|
||||
]
|
||||
}
|
||||
|
||||
print("\n请求数据:")
|
||||
print(json.dumps(request_data, ensure_ascii=False, indent=2))
|
||||
print("\n正在发送请求...")
|
||||
|
||||
try:
|
||||
response = requests.post(url, json=request_data, timeout=60)
|
||||
|
||||
print(f"\n响应状态码: {response.status_code}")
|
||||
|
||||
if response.status_code == 200:
|
||||
result = response.json()
|
||||
print("\n响应数据:")
|
||||
print(json.dumps(result, ensure_ascii=False, indent=2))
|
||||
|
||||
if result.get('isSuccess'):
|
||||
print("\n✓ 解析成功!")
|
||||
if result.get('data', {}).get('outData'):
|
||||
print("\n提取的字段:")
|
||||
for item in result['data']['outData']:
|
||||
value = item.get('fieldValue', '(空)')
|
||||
print(f" - {item.get('fieldCode')}: {value}")
|
||||
return True
|
||||
else:
|
||||
print(f"\n✗ 解析失败: {result.get('errorMsg', '未知错误')}")
|
||||
return False
|
||||
else:
|
||||
print(f"\n✗ HTTP错误: {response.status_code}")
|
||||
print(f"响应内容: {response.text}")
|
||||
return False
|
||||
|
||||
except requests.exceptions.Timeout:
|
||||
print("\n✗ 请求超时(60秒)")
|
||||
return False
|
||||
except requests.exceptions.ConnectionError:
|
||||
print(f"\n✗ 连接错误: 无法连接到 {BASE_URL}")
|
||||
print("请确保服务已启动: python app.py")
|
||||
return False
|
||||
except Exception as e:
|
||||
print(f"\n✗ 发生错误: {str(e)}")
|
||||
return False
|
||||
|
||||
|
||||
def test_generate_document_api():
|
||||
"""测试文档生成接口"""
|
||||
print("\n" + "=" * 80)
|
||||
print("测试 文档生成接口 (/ai/generate-document)")
|
||||
print("=" * 80)
|
||||
|
||||
url = f"{BASE_URL}/ai/generate-document"
|
||||
|
||||
# 测试数据
|
||||
request_data = {
|
||||
"inputData": [
|
||||
{"fieldCode": "target_name", "fieldValue": "张三"},
|
||||
{"fieldCode": "target_gender", "fieldValue": "男"},
|
||||
{"fieldCode": "target_age", "fieldValue": "44"},
|
||||
{"fieldCode": "target_date_of_birth", "fieldValue": "198005"},
|
||||
{"fieldCode": "target_organization_and_position", "fieldValue": "某公司总经理"},
|
||||
{"fieldCode": "target_organization", "fieldValue": "某公司"},
|
||||
{"fieldCode": "target_position", "fieldValue": "总经理"},
|
||||
{"fieldCode": "target_education_level", "fieldValue": "本科"},
|
||||
{"fieldCode": "target_political_status", "fieldValue": "中共党员"},
|
||||
{"fieldCode": "target_professional_rank", "fieldValue": "正处级"},
|
||||
{"fieldCode": "clue_source", "fieldValue": "群众举报"},
|
||||
{"fieldCode": "target_issue_description", "fieldValue": "违反国家计划生育有关政策规定,于2010年10月生育二胎。"},
|
||||
{"fieldCode": "department_opinion", "fieldValue": "建议进行初步核实"},
|
||||
{"fieldCode": "filler_name", "fieldValue": "李四"}
|
||||
],
|
||||
"fpolicFieldParamFileList": [
|
||||
{
|
||||
"fileId": 1,
|
||||
"fileName": "初步核实审批表.doc",
|
||||
"templateCode": "PRELIMINARY_VERIFICATION_APPROVAL"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
print("\n请求数据:")
|
||||
print(json.dumps(request_data, ensure_ascii=False, indent=2))
|
||||
print("\n正在发送请求...")
|
||||
|
||||
try:
|
||||
response = requests.post(url, json=request_data, timeout=120)
|
||||
|
||||
print(f"\n响应状态码: {response.status_code}")
|
||||
|
||||
if response.status_code == 200:
|
||||
result = response.json()
|
||||
print("\n响应数据:")
|
||||
print(json.dumps(result, ensure_ascii=False, indent=2))
|
||||
|
||||
if result.get('isSuccess'):
|
||||
print("\n✓ 文档生成成功!")
|
||||
if result.get('data'):
|
||||
data = result['data']
|
||||
print(f"\n文档ID: {data.get('documentId', 'N/A')}")
|
||||
print(f"文档名称: {data.get('documentName', 'N/A')}")
|
||||
|
||||
if data.get('fpolicFieldParamFileList'):
|
||||
print("\n生成的文件:")
|
||||
for file_info in data['fpolicFieldParamFileList']:
|
||||
print(f" - 文件ID: {file_info.get('fileId')}")
|
||||
print(f" 文件名: {file_info.get('fileName')}")
|
||||
print(f" 文件路径: {file_info.get('filePath')}")
|
||||
return True
|
||||
else:
|
||||
print(f"\n✗ 文档生成失败: {result.get('errorMsg', '未知错误')}")
|
||||
return False
|
||||
else:
|
||||
print(f"\n✗ HTTP错误: {response.status_code}")
|
||||
print(f"响应内容: {response.text}")
|
||||
return False
|
||||
|
||||
except requests.exceptions.Timeout:
|
||||
print("\n✗ 请求超时(120秒)")
|
||||
return False
|
||||
except requests.exceptions.ConnectionError:
|
||||
print(f"\n✗ 连接错误: 无法连接到 {BASE_URL}")
|
||||
print("请确保服务已启动: python app.py")
|
||||
return False
|
||||
except Exception as e:
|
||||
print(f"\n✗ 发生错误: {str(e)}")
|
||||
return False
|
||||
|
||||
|
||||
def test_fields_api():
|
||||
"""测试字段配置接口"""
|
||||
print("\n" + "=" * 80)
|
||||
print("测试 字段配置接口 (/api/fields)")
|
||||
print("=" * 80)
|
||||
|
||||
url = f"{BASE_URL}/api/fields"
|
||||
params = {"businessType": "INVESTIGATION"}
|
||||
|
||||
print(f"\n请求URL: {url}?{params}")
|
||||
print("正在发送请求...")
|
||||
|
||||
try:
|
||||
response = requests.get(url, params=params, timeout=30)
|
||||
|
||||
print(f"\n响应状态码: {response.status_code}")
|
||||
|
||||
if response.status_code == 200:
|
||||
result = response.json()
|
||||
print("\n响应数据:")
|
||||
print(json.dumps(result, ensure_ascii=False, indent=2))
|
||||
|
||||
if result.get('isSuccess'):
|
||||
print("\n✓ 获取字段配置成功!")
|
||||
if result.get('data', {}).get('fields'):
|
||||
fields = result['data']['fields']
|
||||
input_count = len(fields.get('input_fields', []))
|
||||
output_count = len(fields.get('output_fields', []))
|
||||
print(f"\n输入字段数: {input_count}")
|
||||
print(f"输出字段数: {output_count}")
|
||||
return True
|
||||
else:
|
||||
print(f"\n✗ 获取字段配置失败: {result.get('errorMsg', '未知错误')}")
|
||||
return False
|
||||
else:
|
||||
print(f"\n✗ HTTP错误: {response.status_code}")
|
||||
print(f"响应内容: {response.text}")
|
||||
return False
|
||||
|
||||
except requests.exceptions.ConnectionError:
|
||||
print(f"\n✗ 连接错误: 无法连接到 {BASE_URL}")
|
||||
print("请确保服务已启动: python app.py")
|
||||
return False
|
||||
except Exception as e:
|
||||
print(f"\n✗ 发生错误: {str(e)}")
|
||||
return False
|
||||
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
print("API接口测试工具")
|
||||
print("=" * 80)
|
||||
print(f"测试目标: {BASE_URL}")
|
||||
print("=" * 80)
|
||||
|
||||
results = []
|
||||
|
||||
# 测试字段配置接口
|
||||
results.append(("字段配置接口", test_fields_api()))
|
||||
|
||||
# 测试AI解析接口
|
||||
results.append(("AI解析接口", test_extract_api()))
|
||||
|
||||
# 测试文档生成接口
|
||||
results.append(("文档生成接口", test_generate_document_api()))
|
||||
|
||||
# 打印汇总
|
||||
print("\n" + "=" * 80)
|
||||
print("测试汇总")
|
||||
print("=" * 80)
|
||||
|
||||
for name, success in results:
|
||||
status = "✓ 通过" if success else "✗ 失败"
|
||||
print(f"{name}: {status}")
|
||||
|
||||
total = len(results)
|
||||
passed = sum(1 for _, success in results if success)
|
||||
|
||||
print(f"\n总计: {passed}/{total} 通过")
|
||||
|
||||
if passed == total:
|
||||
print("\n✓ 所有测试通过!")
|
||||
else:
|
||||
print(f"\n⚠ {total - passed} 个测试失败,请检查上述错误信息")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
@ -46,9 +46,10 @@ cursor.execute("""
|
||||
SELECT id, name, filed_code, field_type, state
|
||||
FROM f_polic_field
|
||||
WHERE tenant_id = %s
|
||||
AND filed_code IN ('target_name', 'target_organization_and_position', 'target_gender',
|
||||
'target_date_of_birth', 'target_political_status', 'target_professional_rank',
|
||||
'clue_source', 'target_issue_description', 'department_opinion', 'filler_name')
|
||||
AND filed_code IN ('target_name', 'target_organization_and_position', 'target_organization', 'target_position',
|
||||
'target_gender', 'target_date_of_birth', 'target_age', 'target_education_level',
|
||||
'target_political_status', 'target_professional_rank', 'clue_source', 'target_issue_description',
|
||||
'department_opinion', 'filler_name')
|
||||
ORDER BY field_type, name
|
||||
""", (TENANT_ID,))
|
||||
fields = cursor.fetchall()
|
||||
|
||||
134
verify_pre_interview_risk_assessment_data.py
Normal file
134
verify_pre_interview_risk_assessment_data.py
Normal file
@ -0,0 +1,134 @@
|
||||
"""
|
||||
验证谈话前安全风险评估表数据是否正确写入数据库
|
||||
"""
|
||||
import pymysql
|
||||
|
||||
DB_CONFIG = {
|
||||
'host': '152.136.177.240',
|
||||
'port': 5012,
|
||||
'user': 'finyx',
|
||||
'password': '6QsGK6MpePZDE57Z',
|
||||
'database': 'finyx',
|
||||
'charset': 'utf8mb4'
|
||||
}
|
||||
|
||||
TENANT_ID = 615873064429507639
|
||||
|
||||
# 期望的字段编码列表(从init_pre_interview_risk_assessment_fields.py中获取)
|
||||
EXPECTED_FIELDS = [
|
||||
'target_family_situation',
|
||||
'target_social_relations',
|
||||
'target_health_status',
|
||||
'target_personality',
|
||||
'target_tolerance',
|
||||
'target_issue_severity',
|
||||
'target_other_issues_possibility',
|
||||
'target_previous_investigation',
|
||||
'target_negative_events',
|
||||
'target_other_situation',
|
||||
'risk_level'
|
||||
]
|
||||
|
||||
FILE_NAME = '谈话前安全风险评估表'
|
||||
TEMPLATE_CODE = 'PRE_INTERVIEW_RISK_ASSESSMENT'
|
||||
|
||||
try:
|
||||
conn = pymysql.connect(**DB_CONFIG)
|
||||
cursor = conn.cursor(pymysql.cursors.DictCursor)
|
||||
except Exception as e:
|
||||
print(f"数据库连接失败: {e}")
|
||||
exit(1)
|
||||
|
||||
print("="*60)
|
||||
print("验证谈话前安全风险评估表相关数据")
|
||||
print("="*60)
|
||||
|
||||
# 1. 查看文件配置
|
||||
print("\n1. 文件配置 (f_polic_file_config):")
|
||||
cursor.execute("""
|
||||
SELECT id, name, file_path, input_data, state
|
||||
FROM f_polic_file_config
|
||||
WHERE tenant_id = %s AND name = %s
|
||||
""", (TENANT_ID, FILE_NAME))
|
||||
file_config = cursor.fetchone()
|
||||
if file_config:
|
||||
print(f" ✓ ID: {file_config['id']}")
|
||||
print(f" ✓ 名称: {file_config['name']}")
|
||||
print(f" ✓ 文件路径: {file_config['file_path']}")
|
||||
print(f" ✓ 输入数据: {file_config['input_data']}")
|
||||
print(f" ✓ 状态: {file_config['state']}")
|
||||
|
||||
# 验证input_data中是否包含正确的template_code
|
||||
import json
|
||||
try:
|
||||
input_data = json.loads(file_config['input_data'])
|
||||
if input_data.get('template_code') == TEMPLATE_CODE:
|
||||
print(f" ✓ 模板编码正确: {TEMPLATE_CODE}")
|
||||
else:
|
||||
print(f" ✗ 模板编码不匹配! 期望: {TEMPLATE_CODE}, 实际: {input_data.get('template_code')}")
|
||||
except:
|
||||
print(f" ✗ input_data格式错误,无法解析")
|
||||
|
||||
file_config_id = file_config['id']
|
||||
else:
|
||||
print(" ✗ 未找到文件配置")
|
||||
file_config_id = None
|
||||
|
||||
# 2. 查看相关字段
|
||||
print("\n2. 相关字段 (f_polic_field):")
|
||||
placeholders = ','.join(['%s'] * len(EXPECTED_FIELDS))
|
||||
cursor.execute(f"""
|
||||
SELECT id, name, filed_code, field_type, state
|
||||
FROM f_polic_field
|
||||
WHERE tenant_id = %s
|
||||
AND filed_code IN ({placeholders})
|
||||
ORDER BY field_type, name
|
||||
""", [TENANT_ID] + EXPECTED_FIELDS)
|
||||
fields = cursor.fetchall()
|
||||
found_field_codes = [f['filed_code'] for f in fields]
|
||||
print(f" 找到 {len(fields)} 个字段:")
|
||||
for field in fields:
|
||||
field_type_str = "输出字段" if field['field_type'] == 2 else "输入字段"
|
||||
state_str = "启用" if field['state'] == 1 else "未启用"
|
||||
print(f" - {field['name']} ({field['filed_code']}) [{field_type_str}] [状态: {state_str}]")
|
||||
|
||||
# 检查缺失的字段
|
||||
missing_fields = set(EXPECTED_FIELDS) - set(found_field_codes)
|
||||
if missing_fields:
|
||||
print(f"\n ✗ 缺失的字段 ({len(missing_fields)} 个):")
|
||||
for field_code in missing_fields:
|
||||
print(f" - {field_code}")
|
||||
else:
|
||||
print(f"\n ✓ 所有期望的字段都已存在")
|
||||
|
||||
# 3. 查看关联关系
|
||||
if file_config_id:
|
||||
print("\n3. 文件和字段关联关系 (f_polic_file_field):")
|
||||
cursor.execute("""
|
||||
SELECT ff.id, f.name, f.filed_code, ff.state
|
||||
FROM f_polic_file_field ff
|
||||
JOIN f_polic_field f ON ff.filed_id = f.id
|
||||
WHERE ff.tenant_id = %s AND ff.file_id = %s
|
||||
ORDER BY f.filed_code
|
||||
""", (TENANT_ID, file_config_id))
|
||||
relations = cursor.fetchall()
|
||||
print(f" 找到 {len(relations)} 个关联关系:")
|
||||
for rel in relations:
|
||||
state_str = "启用" if rel['state'] == 1 else "未启用"
|
||||
print(f" - {rel['name']} ({rel['filed_code']}) [状态: {state_str}]")
|
||||
|
||||
# 检查关联关系是否完整
|
||||
related_field_codes = [r['filed_code'] for r in relations]
|
||||
missing_relations = set(EXPECTED_FIELDS) - set(related_field_codes)
|
||||
if missing_relations:
|
||||
print(f"\n ✗ 缺失的关联关系 ({len(missing_relations)} 个):")
|
||||
for field_code in missing_relations:
|
||||
print(f" - {field_code}")
|
||||
else:
|
||||
print(f"\n ✓ 所有字段都已正确关联到文件配置")
|
||||
|
||||
print("\n" + "="*60)
|
||||
print("验证完成!")
|
||||
print("="*60)
|
||||
|
||||
conn.close()
|
||||
@ -30,14 +30,18 @@
|
||||
|
||||
### 3.1 输出字段(用于填充模板)
|
||||
|
||||
根据Excel数据字段汇总表和Word模板分析,设计了以下10个输出字段:
|
||||
根据Excel数据字段汇总表和Word模板分析,设计了以下14个输出字段:
|
||||
|
||||
| 字段名称 | 字段编码 (field_code) | 说明 | 示例 |
|
||||
|---------|---------------------|------|------|
|
||||
| 被核查人姓名 | target_name | 被核查人姓名 | 张三 |
|
||||
| 被核查人员单位及职务 | target_organization_and_position | 被核查人员单位及职务(包括兼职) | 某公司总经理 |
|
||||
| 被核查人员单位 | target_organization | 被核查人员单位 | 某公司 |
|
||||
| 被核查人员职务 | target_position | 被核查人员职务 | 总经理 |
|
||||
| 被核查人员性别 | target_gender | 被核查人员性别(男/女,不用男性和女性) | 男 |
|
||||
| 被核查人员出生年月 | target_date_of_birth | 被核查人员出生年月(YYYYMM格式,不需要日) | 45972 |
|
||||
| 被核查人员年龄 | target_age | 被核查人员年龄(数字,单位:岁) | 44 |
|
||||
| 被核查人员文化程度 | target_education_level | 被核查人员文化程度(如:本科、大专、高中等) | 本科 |
|
||||
| 被核查人员政治面貌 | target_political_status | 被核查人员政治面貌(中共党员、群众等) | 中共党员 |
|
||||
| 被核查人员职级 | target_professional_rank | 被核查人员职级(如:正处级) | 正处级 |
|
||||
| 线索来源 | clue_source | 线索来源 | - |
|
||||
@ -58,8 +62,12 @@ Word模板中使用的占位符格式为 `{{field_code}}`,与字段编码的
|
||||
|
||||
- `{{target_name}}` → 被核查人姓名
|
||||
- `{{target_organization_and_position}}` → 被核查人员单位及职务
|
||||
- `{{target_organization}}` → 被核查人员单位
|
||||
- `{{target_position}}` → 被核查人员职务
|
||||
- `{{target_gender}}` → 被核查人员性别
|
||||
- `{{target_date_of_birth}}` → 被核查人员出生年月
|
||||
- `{{target_age}}` → 被核查人员年龄
|
||||
- `{{target_education_level}}` → 被核查人员文化程度
|
||||
- `{{target_political_status}}` → 被核查人员政治面貌
|
||||
- `{{target_professional_rank}}` → 被核查人员职级
|
||||
- `{{target_issue_description}}` → 主要问题线索
|
||||
@ -94,7 +102,7 @@ python init_preliminary_verification_fields.py
|
||||
```
|
||||
|
||||
脚本功能:
|
||||
1. 创建12个字段记录(10个输出字段 + 2个输入字段)
|
||||
1. 创建16个字段记录(14个输出字段 + 2个输入字段)
|
||||
2. 创建文件配置记录
|
||||
3. 建立文件和字段的关联关系(仅关联输出字段)
|
||||
|
||||
@ -141,6 +149,14 @@ python verify_data.py
|
||||
"fieldCode": "target_organization_and_position",
|
||||
"fieldValue": "某公司总经理"
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_organization",
|
||||
"fieldValue": "某公司"
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_position",
|
||||
"fieldValue": "总经理"
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_gender",
|
||||
"fieldValue": "男"
|
||||
@ -149,6 +165,14 @@ python verify_data.py
|
||||
"fieldCode": "target_date_of_birth",
|
||||
"fieldValue": "198005"
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_age",
|
||||
"fieldValue": "44"
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_education_level",
|
||||
"fieldValue": "本科"
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_political_status",
|
||||
"fieldValue": "中共党员"
|
||||
|
||||
51
占位符与字段对照表.md
51
占位符与字段对照表.md
@ -31,8 +31,12 @@ Word模板中使用以下格式作为占位符:
|
||||
|---------|-----------------|------|------|
|
||||
| 被核查人姓名 | `{{target_name}}` | 被核查人姓名 | 张三 |
|
||||
| 被核查人员单位及职务 | `{{target_organization_and_position}}` | 被核查人员单位及职务(包括兼职) | 某公司总经理 |
|
||||
| 被核查人员单位 | `{{target_organization}}` | 被核查人员单位 | 某公司 |
|
||||
| 被核查人员职务 | `{{target_position}}` | 被核查人员职务 | 总经理 |
|
||||
| 被核查人员性别 | `{{target_gender}}` | 被核查人员性别(男/女,不用男性和女性) | 男 |
|
||||
| 被核查人员出生年月 | `{{target_date_of_birth}}` | 被核查人员出生年月(YYYYMM格式,不需要日) | 198005 |
|
||||
| 被核查人员年龄 | `{{target_age}}` | 被核查人员年龄(数字,单位:岁) | 44 |
|
||||
| 被核查人员文化程度 | `{{target_education_level}}` | 被核查人员文化程度(如:本科、大专、高中等) | 本科 |
|
||||
| 被核查人员政治面貌 | `{{target_political_status}}` | 被核查人员政治面貌(中共党员、群众等) | 中共党员 |
|
||||
| 被核查人员职级 | `{{target_professional_rank}}` | 被核查人员职级(如:正处级) | 正处级 |
|
||||
| 线索来源 | `{{clue_source}}` | 线索来源 | - |
|
||||
@ -149,6 +153,35 @@ Word模板中使用以下格式作为占位符:
|
||||
|
||||
---
|
||||
|
||||
## 六-1、谈话前安全风险评估表 (PRE_INTERVIEW_RISK_ASSESSMENT)
|
||||
|
||||
### 输入字段
|
||||
|
||||
| 字段名称 | 字段编码 (占位符) | 说明 |
|
||||
|---------|-----------------|------|
|
||||
| 线索信息 | `{{clue_info}}` | 线索信息(用于AI解析) |
|
||||
| 被核查人员工作基本情况线索 | `{{target_basic_info_clue}}` | 被核查人员工作基本情况线索(用于AI解析) |
|
||||
|
||||
### 输出字段(占位符,带默认值)
|
||||
|
||||
| 字段名称 | 字段编码 (占位符) | 说明 | 默认值 |
|
||||
|---------|-----------------|------|--------|
|
||||
| 被核查人员家庭情况 | `{{target_family_situation}}` | 被核查人员家庭情况 | 家庭关系和谐稳定 |
|
||||
| 被核查人员社会关系 | `{{target_social_relations}}` | 被核查人员社会关系 | 社会交往较多,人机关系基本正常 |
|
||||
| 被核查人员健康状况 | `{{target_health_status}}` | 被核查人员健康状况 | 良好 |
|
||||
| 被核查人员性格特征 | `{{target_personality}}` | 被核查人员性格特征 | 开朗 |
|
||||
| 被核查人员承受能力 | `{{target_tolerance}}` | 被核查人员承受能力 | 较强 |
|
||||
| 被核查人员涉及问题严重程度 | `{{target_issue_severity}}` | 被核查人员涉及问题严重程度 | 较轻 |
|
||||
| 被核查人员涉及其他问题的可能性 | `{{target_other_issues_possibility}}` | 被核查人员涉及其他问题的可能性 | 较小 |
|
||||
| 被核查人员此前被审查情况 | `{{target_previous_investigation}}` | 被核查人员此前被审查情况 | 无 |
|
||||
| 被核查人员社会负面事件 | `{{target_negative_events}}` | 被核查人员社会负面事件 | 无 |
|
||||
| 被核查人员其他情况 | `{{target_other_situation}}` | 被核查人员其他情况 | 无 |
|
||||
| 风险等级 | `{{risk_level}}` | 风险等级 | 低 |
|
||||
|
||||
**注意**:如果AI未提取到字段值,系统返回空字符串。默认值信息提供给前端,前端可根据业务需求决定是否应用。
|
||||
|
||||
---
|
||||
|
||||
## 七、请示报告卡(初核报告结论)
|
||||
|
||||
### 输出字段(占位符)
|
||||
@ -228,9 +261,13 @@ Word模板中使用以下格式作为占位符:
|
||||
|
||||
- `{{target_name}}` - 被核查人姓名
|
||||
- `{{target_organization_and_position}}` - 被核查人员单位及职务
|
||||
- `{{target_organization}}` - 被核查人员单位
|
||||
- `{{target_position}}` - 被核查人员职务
|
||||
- `{{target_gender}}` - 被核查人员性别
|
||||
- `{{target_date_of_birth}}` - 被核查人员出生年月
|
||||
- `{{target_date_of_birth_full}}` - 被核查人员出生年月日
|
||||
- `{{target_age}}` - 被核查人员年龄
|
||||
- `{{target_education_level}}` - 被核查人员文化程度
|
||||
- `{{target_political_status}}` - 被核查人员政治面貌
|
||||
- `{{target_professional_rank}}` - 被核查人员职级
|
||||
- `{{target_id_number}}` - 被核查人员身份证号
|
||||
@ -260,6 +297,20 @@ Word模板中使用以下格式作为占位符:
|
||||
- `{{investigation_team_member_names}}` - 核查组成员姓名
|
||||
- `{{investigation_location}}` - 核查地点
|
||||
|
||||
### 风险评估相关字段
|
||||
|
||||
- `{{target_family_situation}}` - 被核查人员家庭情况(默认值:家庭关系和谐稳定)
|
||||
- `{{target_social_relations}}` - 被核查人员社会关系(默认值:社会交往较多,人机关系基本正常)
|
||||
- `{{target_health_status}}` - 被核查人员健康状况(默认值:良好)
|
||||
- `{{target_personality}}` - 被核查人员性格特征(默认值:开朗)
|
||||
- `{{target_tolerance}}` - 被核查人员承受能力(默认值:较强)
|
||||
- `{{target_issue_severity}}` - 被核查人员涉及问题严重程度(默认值:较轻)
|
||||
- `{{target_other_issues_possibility}}` - 被核查人员涉及其他问题的可能性(默认值:较小)
|
||||
- `{{target_previous_investigation}}` - 被核查人员此前被审查情况(默认值:无)
|
||||
- `{{target_negative_events}}` - 被核查人员社会负面事件(默认值:无)
|
||||
- `{{target_other_situation}}` - 被核查人员其他情况(默认值:无)
|
||||
- `{{risk_level}}` - 风险等级(默认值:低)
|
||||
|
||||
### 其他字段
|
||||
|
||||
根据具体模板需求,还可能包含其他字段,请参考各模板的详细说明。
|
||||
|
||||
112
字段与提示词规则说明.md
112
字段与提示词规则说明.md
@ -91,8 +91,12 @@ sql = """
|
||||
|---------|---------------------|------|---------|
|
||||
| 被核查人姓名 | target_name | 被核查人姓名 | `f_polic_field` 表 |
|
||||
| 被核查人员单位及职务 | target_organization_and_position | 被核查人员单位及职务(包括兼职) | `f_polic_field` 表 |
|
||||
| 被核查人员单位 | target_organization | 被核查人员单位 | `f_polic_field` 表 |
|
||||
| 被核查人员职务 | target_position | 被核查人员职务 | `f_polic_field` 表 |
|
||||
| 被核查人员性别 | target_gender | 被核查人员性别(男/女) | `f_polic_field` 表 |
|
||||
| 被核查人员出生年月 | target_date_of_birth | 被核查人员出生年月(YYYYMM格式) | `f_polic_field` 表 |
|
||||
| 被核查人员年龄 | target_age | 被核查人员年龄(数字,单位:岁) | `f_polic_field` 表 |
|
||||
| 被核查人员文化程度 | target_education_level | 被核查人员文化程度(如:本科、大专、高中等) | `f_polic_field` 表 |
|
||||
| 被核查人员政治面貌 | target_political_status | 被核查人员政治面貌(中共党员、群众等) | `f_polic_field` 表 |
|
||||
| 被核查人员职级 | target_professional_rank | 被核查人员职级(如:正处级) | `f_polic_field` 表 |
|
||||
| 线索来源 | clue_source | 线索来源 | `f_polic_field` 表 |
|
||||
@ -100,15 +104,82 @@ sql = """
|
||||
| 初步核实审批表承办部门意见 | department_opinion | 初步核实审批表承办部门意见 | `f_polic_field` 表 |
|
||||
| 初步核实审批表填表人 | filler_name | 初步核实审批表填表人 | `f_polic_field` 表 |
|
||||
|
||||
## 四、提示词规则(Prompt)存储位置
|
||||
### 3.3 谈话前安全风险评估表字段(带默认值)
|
||||
|
||||
### 4.1 存储方式
|
||||
| 字段名称 | 字段编码 (field_code) | 说明 | 默认值 | 存储位置 |
|
||||
|---------|---------------------|------|--------|---------|
|
||||
| 被核查人员家庭情况 | target_family_situation | 被核查人员家庭情况 | 家庭关系和谐稳定 | `f_polic_field` 表 |
|
||||
| 被核查人员社会关系 | target_social_relations | 被核查人员社会关系 | 社会交往较多,人机关系基本正常 | `f_polic_field` 表 |
|
||||
| 被核查人员健康状况 | target_health_status | 被核查人员健康状况 | 良好 | `f_polic_field` 表 |
|
||||
| 被核查人员性格特征 | target_personality | 被核查人员性格特征 | 开朗 | `f_polic_field` 表 |
|
||||
| 被核查人员承受能力 | target_tolerance | 被核查人员承受能力 | 较强 | `f_polic_field` 表 |
|
||||
| 被核查人员涉及问题严重程度 | target_issue_severity | 被核查人员涉及问题严重程度 | 较轻 | `f_polic_field` 表 |
|
||||
| 被核查人员涉及其他问题的可能性 | target_other_issues_possibility | 被核查人员涉及其他问题的可能性 | 较小 | `f_polic_field` 表 |
|
||||
| 被核查人员此前被审查情况 | target_previous_investigation | 被核查人员此前被审查情况 | 无 | `f_polic_field` 表 |
|
||||
| 被核查人员社会负面事件 | target_negative_events | 被核查人员社会负面事件 | 无 | `f_polic_field` 表 |
|
||||
| 被核查人员其他情况 | target_other_situation | 被核查人员其他情况 | 无 | `f_polic_field` 表 |
|
||||
| 风险等级 | risk_level | 风险等级 | 低 | `f_polic_field` 表 |
|
||||
|
||||
## 四、字段默认值机制
|
||||
|
||||
### 4.1 默认值配置存储
|
||||
|
||||
**字段默认值存储在配置文件中,方便快速修改和调整。**
|
||||
|
||||
存储位置:`config/field_defaults.json`
|
||||
|
||||
### 4.2 默认值说明
|
||||
|
||||
**重要说明**:系统在AI提取阶段不会自动应用默认值。如果AI未提取到字段值,系统会返回空字符串。
|
||||
|
||||
默认值信息提供给前端开发人员,前端可以根据业务需求决定是否在界面上显示默认值或应用默认值。
|
||||
|
||||
### 4.2.1 默认值应用规则(前端参考)
|
||||
|
||||
1. **AI提取阶段**:系统首先尝试使用AI从输入文本中提取字段值
|
||||
2. **空值处理**:如果AI提取的字段值为空或未提取到,系统返回空字符串
|
||||
3. **前端应用**:前端可以根据业务需求,在用户界面中显示默认值提示,或允许用户选择应用默认值
|
||||
4. **优先级**:AI提取的值优先于默认值
|
||||
|
||||
### 4.3 默认值配置示例
|
||||
|
||||
```json
|
||||
{
|
||||
"field_defaults": {
|
||||
"target_family_situation": "家庭关系和谐稳定",
|
||||
"target_social_relations": "社会交往较多,人机关系基本正常",
|
||||
"target_health_status": "良好",
|
||||
"target_personality": "开朗",
|
||||
"target_tolerance": "较强",
|
||||
"target_issue_severity": "较轻",
|
||||
"target_other_issues_possibility": "较小",
|
||||
"target_previous_investigation": "无",
|
||||
"target_negative_events": "无",
|
||||
"target_other_situation": "无",
|
||||
"risk_level": "低"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.4 默认值相关代码位置
|
||||
|
||||
默认值相关代码在以下位置(供前端参考,系统不会自动应用):
|
||||
|
||||
1. **默认值加载**:`services/field_service.py` 的 `_load_field_defaults` 方法
|
||||
2. **默认值获取**:`services/field_service.py` 的 `get_field_default_value` 方法(前端可通过API获取)
|
||||
3. **默认值配置**:`config/field_defaults.json` 配置文件
|
||||
|
||||
**注意**:系统在 `/ai/extract` 接口中不会自动应用默认值,如果AI未提取到值,返回空字符串。
|
||||
|
||||
## 五、提示词规则(Prompt)存储位置
|
||||
|
||||
### 5.1 存储方式
|
||||
|
||||
**提示词规则存储在配置文件中,方便快速修改和调整。**
|
||||
|
||||
存储位置:`config/prompt_config.json`
|
||||
|
||||
### 4.2 配置文件说明
|
||||
### 5.2 配置文件说明
|
||||
|
||||
配置文件采用 JSON 格式,包含以下部分:
|
||||
- `prompt_template`: 提示词模板结构(介绍文本、标签、要求列表等)
|
||||
@ -117,7 +188,7 @@ sql = """
|
||||
|
||||
详细配置说明请参考:`config/README.md`
|
||||
|
||||
### 4.3 提示词构建逻辑
|
||||
### 5.3 提示词构建逻辑
|
||||
|
||||
提示词由以下部分组成:
|
||||
|
||||
@ -173,7 +244,7 @@ sql = """
|
||||
"""
|
||||
```
|
||||
|
||||
### 4.4 完整提示词示例
|
||||
### 5.4 完整提示词示例
|
||||
|
||||
假设输入数据为:
|
||||
```json
|
||||
@ -183,7 +254,7 @@ sql = """
|
||||
}
|
||||
```
|
||||
|
||||
输出字段为10个字段,生成的提示词如下:
|
||||
输出字段为14个字段,生成的提示词如下:
|
||||
|
||||
```
|
||||
请从以下输入文本中提取结构化信息。
|
||||
@ -194,8 +265,12 @@ clue_info: 被举报用户名称是张三,年龄30岁,某公司总经理
|
||||
需要提取的字段:
|
||||
- 被核查人姓名 (字段编码: target_name)
|
||||
- 被核查人员单位及职务 (字段编码: target_organization_and_position)
|
||||
- 被核查人员单位 (字段编码: target_organization)
|
||||
- 被核查人员职务 (字段编码: target_position)
|
||||
- 被核查人员性别 (字段编码: target_gender)
|
||||
- 被核查人员出生年月 (字段编码: target_date_of_birth)
|
||||
- 被核查人员年龄 (字段编码: target_age)
|
||||
- 被核查人员文化程度 (字段编码: target_education_level)
|
||||
- 被核查人员政治面貌 (字段编码: target_political_status)
|
||||
- 被核查人员职级 (字段编码: target_professional_rank)
|
||||
- 线索来源 (字段编码: clue_source)
|
||||
@ -207,8 +282,12 @@ clue_info: 被举报用户名称是张三,年龄30岁,某公司总经理
|
||||
{
|
||||
"target_name": "",
|
||||
"target_organization_and_position": "",
|
||||
"target_organization": "",
|
||||
"target_position": "",
|
||||
"target_gender": "",
|
||||
"target_date_of_birth": "",
|
||||
"target_age": "",
|
||||
"target_education_level": "",
|
||||
"target_political_status": "",
|
||||
"target_professional_rank": "",
|
||||
"clue_source": "",
|
||||
@ -226,9 +305,9 @@ clue_info: 被举报用户名称是张三,年龄30岁,某公司总经理
|
||||
6. 只返回JSON对象,不要包含markdown代码块标记
|
||||
```
|
||||
|
||||
## 五、字段与提示词规则的对应关系
|
||||
## 六、字段与提示词规则的对应关系
|
||||
|
||||
### 5.1 对应关系图
|
||||
### 6.1 对应关系图
|
||||
|
||||
```
|
||||
业务类型 (businessType: "INVESTIGATION")
|
||||
@ -244,22 +323,22 @@ clue_info: 被举报用户名称是张三,年龄30岁,某公司总经理
|
||||
调用AI服务 (services/ai_service.py)
|
||||
```
|
||||
|
||||
### 5.2 关键代码位置
|
||||
### 6.2 关键代码位置
|
||||
|
||||
1. **字段查询**:`services/field_service.py` 的 `get_output_fields_by_business_type` 方法
|
||||
2. **提示词构建**:`services/field_service.py` 的 `build_extract_prompt` 方法
|
||||
3. **AI调用**:`services/ai_service.py` 的 `extract_fields` 方法
|
||||
4. **API接口**:`app.py` 的 `/api/ai/extract` 路由
|
||||
|
||||
## 六、系统限制与扩展说明
|
||||
## 七、系统限制与扩展说明
|
||||
|
||||
### 6.1 当前限制
|
||||
### 7.1 当前限制
|
||||
|
||||
1. **业务类型支持**:目前只支持 `businessType: "INVESTIGATION"`(调查核实)
|
||||
2. **文件模板支持**:目前只支持"初步核实审批表"
|
||||
3. **提示词规则**:提示词规则硬编码在代码中,无法通过配置修改
|
||||
|
||||
### 6.2 扩展方式
|
||||
### 7.2 扩展方式
|
||||
|
||||
#### 添加新的业务类型
|
||||
|
||||
@ -285,7 +364,7 @@ clue_info: 被举报用户名称是张三,年龄30岁,某公司总经理
|
||||
|
||||
可以在配置文件的 `business_type_rules` 中为不同业务类型添加特定的提取规则说明。
|
||||
|
||||
## 七、相关文件
|
||||
## 八、相关文件
|
||||
|
||||
- `services/field_service.py` - 字段服务和提示词构建逻辑
|
||||
- `services/ai_service.py` - AI服务调用逻辑
|
||||
@ -293,10 +372,11 @@ clue_info: 被举报用户名称是张三,年龄30岁,某公司总经理
|
||||
- `init_preliminary_verification_fields.py` - 字段初始化脚本
|
||||
- `初步核实审批表数据设计说明.md` - 字段设计文档
|
||||
|
||||
## 八、总结
|
||||
## 九、总结
|
||||
|
||||
1. **字段存储**:所有字段定义存储在MySQL数据库的 `f_polic_field` 表中
|
||||
2. **提示词规则**:提示词规则存储在 `config/prompt_config.json` 配置文件中,方便快速修改和调整
|
||||
3. **对应关系**:通过 `business_type` → 文件配置 → 输出字段 → 从配置文件读取规则 → 动态构建提示词的流程建立对应关系
|
||||
4. **扩展性**:当前系统设计支持扩展新的业务类型和字段,提示词规则可通过配置文件灵活调整,无需修改代码
|
||||
3. **字段默认值**:字段默认值存储在 `config/field_defaults.json` 配置文件中,如果AI未提取到值,系统会自动应用默认值
|
||||
4. **对应关系**:通过 `business_type` → 文件配置 → 输出字段 → 从配置文件读取规则 → 动态构建提示词的流程建立对应关系
|
||||
5. **扩展性**:当前系统设计支持扩展新的业务类型和字段,提示词规则和默认值可通过配置文件灵活调整,无需修改代码
|
||||
|
||||
|
||||
54
批量转换doc到docx.bat
Normal file
54
批量转换doc到docx.bat
Normal file
@ -0,0 +1,54 @@
|
||||
@echo off
|
||||
chcp 65001 >nul
|
||||
echo ========================================
|
||||
echo 批量转换 .doc 文件到 .docx 格式
|
||||
echo ========================================
|
||||
echo.
|
||||
|
||||
REM 检查是否安装了 Word
|
||||
where winword.exe >nul 2>&1
|
||||
if %errorlevel% neq 0 (
|
||||
echo 错误: 未找到 Microsoft Word
|
||||
echo 请确保已安装 Microsoft Word
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
echo 正在查找 .doc 文件...
|
||||
echo.
|
||||
|
||||
REM 设置源目录和目标目录
|
||||
set "SOURCE_DIR=模板\原始模板"
|
||||
set "TARGET_DIR=模板"
|
||||
|
||||
REM 使用 PowerShell 脚本进行转换
|
||||
powershell -ExecutionPolicy Bypass -Command ^
|
||||
"$sourceDir = '%SOURCE_DIR%'; ^
|
||||
$targetDir = '%TARGET_DIR%'; ^
|
||||
$word = New-Object -ComObject Word.Application; ^
|
||||
$word.Visible = $false; ^
|
||||
Get-ChildItem -Path $sourceDir -Filter '*.doc' -Recurse | ForEach-Object { ^
|
||||
$docxPath = $_.FullName -replace '\.doc$', '.docx'; ^
|
||||
$relativePath = $_.FullName.Replace((Resolve-Path $sourceDir).Path + '\', ''); ^
|
||||
$targetPath = Join-Path $targetDir $relativePath; ^
|
||||
$targetPath = $targetPath -replace '\.doc$', '.docx'; ^
|
||||
$targetFolder = Split-Path $targetPath -Parent; ^
|
||||
if (-not (Test-Path $targetFolder)) { New-Item -ItemType Directory -Path $targetFolder -Force | Out-Null }; ^
|
||||
Write-Host \"转换: $($_.Name) -> $(Split-Path $targetPath -Leaf)\"; ^
|
||||
try { ^
|
||||
$doc = $word.Documents.Open($_.FullName, $false, $true); ^
|
||||
$doc.SaveAs2($targetPath, 16); ^
|
||||
$doc.Close($false); ^
|
||||
Write-Host \" ✓ 成功\" -ForegroundColor Green; ^
|
||||
} catch { ^
|
||||
Write-Host \" ✗ 失败: $_\" -ForegroundColor Red; ^
|
||||
} ^
|
||||
}; ^
|
||||
$word.Quit(); ^
|
||||
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($word) | Out-Null"
|
||||
|
||||
echo.
|
||||
echo ========================================
|
||||
echo 转换完成!
|
||||
echo ========================================
|
||||
pause
|
||||
231
数据同步检查报告.md
Normal file
231
数据同步检查报告.md
Normal file
@ -0,0 +1,231 @@
|
||||
# 数据结构、接口、测试页面和Swagger同步检查报告
|
||||
|
||||
## 检查时间
|
||||
2024年12月(当前时间)
|
||||
|
||||
## 检查目标
|
||||
验证"谈话前安全风险评估表"(模板编码:`PRE_INTERVIEW_RISK_ASSESSMENT`)的数据结构、接口、测试页面和Swagger是否已正确同步更新。
|
||||
|
||||
---
|
||||
|
||||
## 一、数据库检查
|
||||
|
||||
### 1.1 文件配置
|
||||
- **表名**: `f_polic_file_config`
|
||||
- **检查项**:
|
||||
- 文件配置是否存在
|
||||
- `input_data` JSON字段中是否包含正确的`template_code`
|
||||
- 文件路径是否正确
|
||||
|
||||
### 1.2 字段数据
|
||||
- **表名**: `f_polic_field`
|
||||
- **期望字段** (11个输出字段):
|
||||
1. `target_family_situation` - 被核查人员家庭情况
|
||||
2. `target_social_relations` - 被核查人员社会关系
|
||||
3. `target_health_status` - 被核查人员健康状况
|
||||
4. `target_personality` - 被核查人员性格特征
|
||||
5. `target_tolerance` - 被核查人员承受能力
|
||||
6. `target_issue_severity` - 被核查人员涉及问题严重程度
|
||||
7. `target_other_issues_possibility` - 被核查人员涉及其他问题的可能性
|
||||
8. `target_previous_investigation` - 被核查人员此前被审查情况
|
||||
9. `target_negative_events` - 被核查人员社会负面事件
|
||||
10. `target_other_situation` - 被核查人员其他情况
|
||||
11. `risk_level` - 风险等级
|
||||
|
||||
### 1.3 关联关系
|
||||
- **表名**: `f_polic_file_field`
|
||||
- **检查项**: 所有11个字段是否都已正确关联到文件配置
|
||||
|
||||
**验证脚本**: `verify_pre_interview_risk_assessment_data.py`
|
||||
|
||||
---
|
||||
|
||||
## 二、接口服务检查
|
||||
|
||||
### 2.1 field_service.py ✅ 已修复
|
||||
|
||||
**问题**: `get_fields_by_business_type`方法硬编码了模板名称`'初步核实审批表'`,导致无法查询新模板的字段。
|
||||
|
||||
**修复内容**:
|
||||
- ✅ 修改为根据`business_type`从`input_data` JSON字段中动态查询
|
||||
- ✅ 支持查询所有匹配`business_type`的文件配置及其关联字段
|
||||
- ✅ 不再硬编码模板名称
|
||||
|
||||
**修复前**:
|
||||
```python
|
||||
AND fc.name = '初步核实审批表'
|
||||
```
|
||||
|
||||
**修复后**:
|
||||
```python
|
||||
# 从input_data JSON中解析business_type,动态匹配
|
||||
sql_file_configs = """
|
||||
SELECT id, name, input_data
|
||||
FROM f_polic_file_config
|
||||
WHERE tenant_id = %s AND state = 1
|
||||
"""
|
||||
# 然后根据business_type过滤文件配置ID
|
||||
```
|
||||
|
||||
### 2.2 document_service.py ✅ 已修复
|
||||
|
||||
**问题**: `get_file_config_by_template_code`方法试图从不存在的`template_code`列查询,而实际上`template_code`存储在`input_data`的JSON字段中。
|
||||
|
||||
**修复内容**:
|
||||
- ✅ 修改为从`input_data` JSON字段中解析`template_code`
|
||||
- ✅ 支持通过`template_code`查找文件配置
|
||||
|
||||
**修复前**:
|
||||
```python
|
||||
sql = """
|
||||
SELECT id, name, file_path, template_code
|
||||
FROM f_polic_file_config
|
||||
WHERE tenant_id = %s
|
||||
AND template_code = %s # ❌ 列不存在
|
||||
"""
|
||||
```
|
||||
|
||||
**修复后**:
|
||||
```python
|
||||
# 查询所有文件配置,然后从input_data JSON中解析template_code
|
||||
sql = """
|
||||
SELECT id, name, file_path, input_data
|
||||
FROM f_polic_file_config
|
||||
WHERE tenant_id = %s AND state = 1
|
||||
"""
|
||||
# 遍历结果,从JSON中查找匹配的template_code
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 三、测试页面检查
|
||||
|
||||
### 3.1 static/index.html
|
||||
|
||||
**状态**: ✅ 通用设计,无需硬编码更新
|
||||
|
||||
**说明**:
|
||||
- 测试页面采用动态字段添加机制
|
||||
- 用户可以通过手动输入字段编码来测试任何模板
|
||||
- 默认示例使用"初步核实审批表"是合理的(作为示例)
|
||||
|
||||
**建议** (可选):
|
||||
- 可以添加模板选择器,让用户选择不同的模板并自动加载对应的字段
|
||||
- 当前设计已经足够灵活,支持所有模板
|
||||
|
||||
---
|
||||
|
||||
## 四、Swagger文档检查
|
||||
|
||||
### 4.1 Swagger配置 ✅ 正常
|
||||
|
||||
**状态**: ✅ Swagger已正确配置,文档是动态生成的
|
||||
|
||||
**说明**:
|
||||
- Swagger文档通过Flasgger自动生成
|
||||
- 接口文档中不包含硬编码的模板信息
|
||||
- 所有接口都通过参数(如`templateCode`)支持动态模板
|
||||
|
||||
**接口列表**:
|
||||
1. ✅ `/ai/extract` - AI字段提取接口(有完整Swagger文档)
|
||||
2. ✅ `/api/fields` - 获取字段配置接口(有完整Swagger文档)
|
||||
3. ✅ `/ai/generate-document` - 文档生成接口(有完整Swagger文档)
|
||||
|
||||
**结论**: Swagger文档无需更新,因为它是通用的,支持所有模板。
|
||||
|
||||
---
|
||||
|
||||
## 五、修复总结
|
||||
|
||||
### ✅ 已完成的修复
|
||||
|
||||
1. **field_service.py**
|
||||
- ✅ 修复`get_fields_by_business_type`方法,支持动态查询多个模板
|
||||
- ✅ 从`input_data` JSON中解析`business_type`
|
||||
|
||||
2. **document_service.py**
|
||||
- ✅ 修复`get_file_config_by_template_code`方法
|
||||
- ✅ 从`input_data` JSON中解析`template_code`
|
||||
|
||||
### ⚠️ 需要验证的项目
|
||||
|
||||
1. **数据库数据**
|
||||
- 需要运行`init_pre_interview_risk_assessment_fields.py`确保数据已初始化
|
||||
- 需要运行`verify_pre_interview_risk_assessment_data.py`验证数据完整性
|
||||
|
||||
2. **接口测试**
|
||||
- 测试`/api/fields?businessType=INVESTIGATION`是否返回新模板的字段
|
||||
- 测试`/ai/extract`接口是否能正确提取新模板的字段
|
||||
- 测试`/ai/generate-document`接口是否能正确生成新模板的文档
|
||||
|
||||
---
|
||||
|
||||
## 六、验证步骤
|
||||
|
||||
### 步骤1: 验证数据库数据
|
||||
```bash
|
||||
python verify_pre_interview_risk_assessment_data.py
|
||||
```
|
||||
|
||||
### 步骤2: 验证接口
|
||||
```bash
|
||||
# 启动服务
|
||||
python app.py
|
||||
|
||||
# 测试字段查询接口
|
||||
curl "http://localhost:7500/api/fields?businessType=INVESTIGATION"
|
||||
|
||||
# 测试AI提取接口(使用新模板的字段)
|
||||
curl -X POST http://localhost:7500/ai/extract \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"inputData": [{"fieldCode": "clue_info", "fieldValue": "..."}],
|
||||
"outputData": [{"fieldCode": "risk_level"}, {"fieldCode": "target_family_situation"}]
|
||||
}'
|
||||
```
|
||||
|
||||
### 步骤3: 验证测试页面
|
||||
1. 访问 `http://localhost:7500/`
|
||||
2. 手动添加新模板的字段编码进行测试
|
||||
3. 验证解析和生成功能
|
||||
|
||||
### 步骤4: 验证Swagger文档
|
||||
1. 访问 `http://localhost:7500/api-docs`
|
||||
2. 检查接口文档是否完整
|
||||
3. 测试接口是否正常工作
|
||||
|
||||
---
|
||||
|
||||
## 七、结论
|
||||
|
||||
### ✅ 接口和代码层面
|
||||
- ✅ `field_service.py`已修复,支持动态查询多个模板
|
||||
- ✅ `document_service.py`已修复,支持从JSON中查找模板
|
||||
- ✅ Swagger文档无需更新(通用设计)
|
||||
|
||||
### ⚠️ 需要确认
|
||||
- ⚠️ 数据库数据是否已正确初始化(需要运行初始化脚本)
|
||||
- ⚠️ 接口功能是否正常工作(需要实际测试)
|
||||
|
||||
### 📝 建议
|
||||
1. 运行`init_pre_interview_risk_assessment_fields.py`确保数据已初始化
|
||||
2. 运行`verify_pre_interview_risk_assessment_data.py`验证数据完整性
|
||||
3. 进行接口功能测试,确保新模板可以正常使用
|
||||
|
||||
---
|
||||
|
||||
## 八、相关文件
|
||||
|
||||
- **初始化脚本**: `init_pre_interview_risk_assessment_fields.py`
|
||||
- **验证脚本**: `verify_pre_interview_risk_assessment_data.py`
|
||||
- **综合检查脚本**: `check_data_sync_status.py`
|
||||
- **服务文件**:
|
||||
- `services/field_service.py` ✅ 已修复
|
||||
- `services/document_service.py` ✅ 已修复
|
||||
- **接口文件**: `app.py` ✅ Swagger已配置
|
||||
- **测试页面**: `static/index.html` ✅ 通用设计
|
||||
|
||||
---
|
||||
|
||||
**报告生成时间**: 2024年12月
|
||||
**检查人员**: AI助手
|
||||
BIN
模板/原始模板/2-初核模版/1.初核请示/1.请示报告卡(XXX).docx
Normal file
BIN
模板/原始模板/2-初核模版/1.初核请示/1.请示报告卡(XXX).docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/1.初核请示/2.初步核实审批表(XXX).docx
Normal file
BIN
模板/原始模板/2-初核模版/1.初核请示/2.初步核实审批表(XXX).docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/1.初核请示/3.附件初核方案(XXX).docx
Normal file
BIN
模板/原始模板/2-初核模版/1.初核请示/3.附件初核方案(XXX).docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/2.谈话审批/谈话通知书/谈话通知书第一联.docx
Normal file
BIN
模板/原始模板/2-初核模版/2.谈话审批/谈话通知书/谈话通知书第一联.docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/2.谈话审批/谈话通知书/谈话通知书第三联.docx
Normal file
BIN
模板/原始模板/2-初核模版/2.谈话审批/谈话通知书/谈话通知书第三联.docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/2.谈话审批/谈话通知书/谈话通知书第二联.docx
Normal file
BIN
模板/原始模板/2-初核模版/2.谈话审批/谈话通知书/谈话通知书第二联.docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话审批/1.请示报告卡(初核谈话).docx
Normal file
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话审批/1.请示报告卡(初核谈话).docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话审批/2谈话审批表.docx
Normal file
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话审批/2谈话审批表.docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话审批/3.谈话前安全风险评估表.docx
Normal file
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话审批/3.谈话前安全风险评估表.docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话审批/4.谈话方案.docx
Normal file
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话审批/4.谈话方案.docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话审批/5.谈话后安全风险评估表.docx
Normal file
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话审批/5.谈话后安全风险评估表.docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话流程/1.谈话笔录.docx
Normal file
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话流程/1.谈话笔录.docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话流程/2.谈话询问对象情况摸底调查30问.docx
Normal file
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话流程/2.谈话询问对象情况摸底调查30问.docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话流程/3.被谈话人权利义务告知书.docx
Normal file
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话流程/3.被谈话人权利义务告知书.docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话流程/4.点对点交接单.docx
Normal file
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话流程/4.点对点交接单.docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话流程/4.点对点交接单2.docx
Normal file
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话流程/4.点对点交接单2.docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话流程/5.陪送交接单(新).docx
Normal file
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话流程/5.陪送交接单(新).docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话流程/6.1保密承诺书(谈话对象使用-非中共党员用).docx
Normal file
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话流程/6.1保密承诺书(谈话对象使用-非中共党员用).docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话流程/6.2保密承诺书(谈话对象使用-中共党员用).docx
Normal file
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话流程/6.2保密承诺书(谈话对象使用-中共党员用).docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话流程/7.办案人员-办案安全保密承诺书.docx
Normal file
BIN
模板/原始模板/2-初核模版/2.谈话审批/走读式谈话流程/7.办案人员-办案安全保密承诺书.docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/3.初核结论/8-1请示报告卡(初核报告结论) .docx
Normal file
BIN
模板/原始模板/2-初核模版/3.初核结论/8-1请示报告卡(初核报告结论) .docx
Normal file
Binary file not shown.
BIN
模板/原始模板/2-初核模版/3.初核结论/8.XXX初核情况报告.docx
Normal file
BIN
模板/原始模板/2-初核模版/3.初核结论/8.XXX初核情况报告.docx
Normal file
Binary file not shown.
153
模板处理完成说明.md
Normal file
153
模板处理完成说明.md
Normal file
@ -0,0 +1,153 @@
|
||||
# 模板处理脚本完成说明
|
||||
|
||||
## 已完成的工作
|
||||
|
||||
### 1. 创建了两个主要脚本
|
||||
|
||||
#### `process_templates.py` - 模板占位符处理脚本
|
||||
- ✅ 自动扫描 `模板/原始模板` 目录下的所有 `.doc` 和 `.docx` 文件
|
||||
- ✅ 根据文件名智能识别文档类型(支持17种文档类型)
|
||||
- ✅ 自动将 `.doc` 文件转换为 `.docx` 格式(需要 Windows + pywin32)
|
||||
- ✅ 根据占位符与字段对照表,智能识别字段名称并添加占位符
|
||||
- ✅ 处理后的模板保存到 `模板` 文件夹,保持原有目录结构
|
||||
|
||||
#### `init_all_templates.py` - 模板初始化脚本
|
||||
- ✅ 扫描 `模板` 目录下的所有 `.docx` 文件(排除原始模板目录)
|
||||
- ✅ 自动识别文档类型和模板编码
|
||||
- ✅ 上传模板文件到 MinIO 服务器
|
||||
- ✅ 在数据库中创建或更新文件配置记录(使用 `template_code` 字段)
|
||||
|
||||
### 2. 支持的文档类型
|
||||
|
||||
脚本支持以下17种文档类型:
|
||||
|
||||
1. 请示报告卡 (REPORT_CARD)
|
||||
2. 初步核实审批表 (PRELIMINARY_VERIFICATION_APPROVAL)
|
||||
3. 初核方案 (INVESTIGATION_PLAN)
|
||||
4. 谈话通知书 (NOTIFICATION_LETTER)
|
||||
5. 谈话笔录 (INTERVIEW_RECORD)
|
||||
6. 谈话询问对象情况摸底调查30问 (INVESTIGATION_30_QUESTIONS)
|
||||
7. 被谈话人权利义务告知书 (RIGHTS_OBLIGATIONS_NOTICE)
|
||||
8. 点对点交接单 (HANDOVER_FORM)
|
||||
9. 陪送交接单 (ESCORT_HANDOVER_FORM)
|
||||
10. 保密承诺书 (CONFIDENTIALITY_COMMITMENT)
|
||||
11. 办案人员-办案安全保密承诺书 (INVESTIGATOR_CONFIDENTIALITY_COMMITMENT)
|
||||
12. 请示报告卡(初核报告结论) (REPORT_CARD_CONCLUSION)
|
||||
13. 初核情况报告 (INVESTIGATION_REPORT)
|
||||
14. 谈话审批表 (INTERVIEW_APPROVAL_FORM)
|
||||
15. 谈话前安全风险评估表 (PRE_INTERVIEW_RISK_ASSESSMENT)
|
||||
16. 谈话方案 (INTERVIEW_PLAN)
|
||||
17. 谈话后安全风险评估表 (POST_INTERVIEW_RISK_ASSESSMENT)
|
||||
|
||||
### 3. 占位符处理逻辑
|
||||
|
||||
脚本会智能识别以下模式并添加占位符:
|
||||
|
||||
1. **字段名称: 具体值** → **字段名称: {{field_code}}**
|
||||
- 例如:`被核查人姓名: 张三` → `被核查人姓名: {{target_name}}`
|
||||
|
||||
2. **字段名称: XXX/待填** → **字段名称: {{field_code}}**
|
||||
- 例如:`被核查人姓名: XXX` → `被核查人姓名: {{target_name}}`
|
||||
|
||||
3. **表格中的字段**:同样处理表格单元格中的字段
|
||||
|
||||
### 4. 使用步骤
|
||||
|
||||
#### 第一步:处理原始模板
|
||||
```bash
|
||||
python process_templates.py
|
||||
```
|
||||
|
||||
**输出:**
|
||||
- 处理后的模板保存在 `模板` 文件夹下
|
||||
- 保持原有的目录结构
|
||||
- 所有文件统一为 `.docx` 格式
|
||||
|
||||
**注意:**
|
||||
- 处理后的模板需要人工检查,确认占位符位置是否正确
|
||||
- 如有需要,可以手动调整占位符位置
|
||||
|
||||
#### 第二步:初始化模板到系统
|
||||
```bash
|
||||
python init_all_templates.py
|
||||
```
|
||||
|
||||
**功能:**
|
||||
- 上传所有模板到 MinIO
|
||||
- 在数据库中创建或更新文件配置记录
|
||||
- 使用 `template_code` 字段存储模板编码
|
||||
|
||||
**输出:**
|
||||
- 显示处理进度和结果
|
||||
- 统计成功、跳过、失败的文件数量
|
||||
|
||||
## 文件结构
|
||||
|
||||
```
|
||||
项目根目录/
|
||||
├── process_templates.py # 模板处理脚本
|
||||
├── init_all_templates.py # 初始化脚本
|
||||
├── 模板处理说明.md # 详细使用说明
|
||||
├── 模板/
|
||||
│ ├── 初步核实审批表模板.docx # 已处理的模板(示例)
|
||||
│ └── 原始模板/ # 原始模板(不处理)
|
||||
│ └── 2-初核模版/
|
||||
│ └── ...
|
||||
└── 占位符与字段对照表.md # 字段对照表
|
||||
```
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. **依赖要求:**
|
||||
- `python-docx` - 处理 Word 文档
|
||||
- `pywin32` - 转换 .doc 文件(仅 Windows,可选)
|
||||
- `pymysql` - 数据库操作
|
||||
- `minio` - MinIO 客户端
|
||||
|
||||
2. **.doc 文件处理:**
|
||||
- 如果未安装 `pywin32`,需要手动将 `.doc` 文件转换为 `.docx` 格式
|
||||
- 或者安装:`pip install pywin32`
|
||||
|
||||
3. **占位符检查:**
|
||||
- 处理后的模板需要人工检查
|
||||
- 确保占位符格式正确:`{{field_code}}`
|
||||
- 确保占位符位置合理
|
||||
|
||||
4. **数据库配置:**
|
||||
- 确保数据库连接配置正确
|
||||
- 确保 `f_polic_file_config` 表有 `template_code` 字段
|
||||
- 确保有足够的权限
|
||||
|
||||
5. **MinIO 配置:**
|
||||
- 确保 MinIO 连接配置正确
|
||||
- 确保存储桶 `finyx` 存在
|
||||
- 确保有上传权限
|
||||
|
||||
## 后续工作
|
||||
|
||||
1. **检查生成的模板:**
|
||||
- 打开处理后的模板文件
|
||||
- 检查占位符是否正确添加
|
||||
- 手动调整不正确的占位符
|
||||
|
||||
2. **运行初始化脚本:**
|
||||
- 确认所有模板检查无误后
|
||||
- 运行 `init_all_templates.py`
|
||||
- 检查输出信息,确认所有模板已成功上传
|
||||
|
||||
3. **测试文档生成:**
|
||||
- 使用 API 测试文档生成功能
|
||||
- 确认模板可以正常使用
|
||||
- 检查生成的文档是否正确填充
|
||||
|
||||
## 技术支持
|
||||
|
||||
如有问题,请检查:
|
||||
1. 脚本输出日志
|
||||
2. 数据库和 MinIO 连接状态
|
||||
3. 文件路径和权限
|
||||
4. 占位符格式是否正确
|
||||
|
||||
参考文档:
|
||||
- `模板处理说明.md` - 详细使用说明
|
||||
- `占位符与字段对照表.md` - 字段对照表
|
||||
179
模板处理说明.md
Normal file
179
模板处理说明.md
Normal file
@ -0,0 +1,179 @@
|
||||
# 模板处理说明
|
||||
|
||||
本文档说明如何使用脚本处理原始模板文档,自动添加占位符并初始化到数据库和MinIO。
|
||||
|
||||
## 脚本说明
|
||||
|
||||
### 1. process_templates.py - 模板占位符处理脚本
|
||||
|
||||
**功能:**
|
||||
- 自动扫描 `模板/原始模板` 目录下的所有 `.doc` 和 `.docx` 文件
|
||||
- 根据文件名识别文档类型
|
||||
- 根据占位符与字段对照表,智能识别并添加占位符
|
||||
- 将处理后的模板保存到 `模板` 文件夹下(保持目录结构)
|
||||
|
||||
**使用方法:**
|
||||
```bash
|
||||
python process_templates.py
|
||||
```
|
||||
|
||||
**处理流程:**
|
||||
1. 扫描原始模板目录
|
||||
2. 识别每个文档的类型(根据文件名匹配)
|
||||
3. 如果是 `.doc` 文件,自动转换为 `.docx` 格式(需要 Windows 系统和 pywin32)
|
||||
4. 在文档中查找字段名称,并在其后添加占位符(格式:`{{field_code}}`)
|
||||
5. 保存处理后的模板到输出目录
|
||||
|
||||
**注意事项:**
|
||||
- 如果系统未安装 `pywin32`,`.doc` 文件无法自动转换,需要手动转换为 `.docx` 格式
|
||||
- 脚本会智能识别字段名称后的内容,并替换为占位符
|
||||
- 常见的占位符(如 "XXX"、"待填" 等)会被自动替换
|
||||
- 处理后的模板需要人工检查,确认占位符位置是否正确
|
||||
|
||||
### 2. init_all_templates.py - 模板初始化脚本
|
||||
|
||||
**功能:**
|
||||
- 扫描 `模板` 目录下的所有 `.docx` 文件(排除原始模板目录)
|
||||
- 上传模板文件到 MinIO 服务器
|
||||
- 在数据库中创建或更新文件配置记录
|
||||
|
||||
**使用方法:**
|
||||
```bash
|
||||
python init_all_templates.py
|
||||
```
|
||||
|
||||
**处理流程:**
|
||||
1. 连接数据库和 MinIO
|
||||
2. 扫描模板目录下的所有 `.docx` 文件
|
||||
3. 根据文件名识别文档类型和模板编码
|
||||
4. 上传文件到 MinIO(路径:`/租户ID/TEMPLATE/年/月/文件名.docx`)
|
||||
5. 在数据库中创建或更新文件配置记录
|
||||
|
||||
**注意事项:**
|
||||
- 确保数据库和 MinIO 连接配置正确
|
||||
- 如果文件配置已存在,会更新文件路径
|
||||
- 如果文件配置不存在,会创建新记录
|
||||
|
||||
## 使用步骤
|
||||
|
||||
### 第一步:处理原始模板
|
||||
|
||||
1. 确保所有原始模板文件已放在 `模板/原始模板` 目录下
|
||||
2. 运行处理脚本:
|
||||
```bash
|
||||
python process_templates.py
|
||||
```
|
||||
3. 检查生成的模板文件,确认占位符是否正确添加
|
||||
4. 如有需要,手动调整占位符位置
|
||||
|
||||
### 第二步:初始化模板到系统
|
||||
|
||||
1. 确认所有模板文件已处理完成并检查无误
|
||||
2. 运行初始化脚本:
|
||||
```bash
|
||||
python init_all_templates.py
|
||||
```
|
||||
3. 检查输出信息,确认所有模板已成功上传和配置
|
||||
|
||||
## 文档类型识别
|
||||
|
||||
脚本会根据文件名自动识别文档类型,支持的文档类型包括:
|
||||
|
||||
| 文档名称 | 模板编码 | 说明 |
|
||||
|---------|---------|------|
|
||||
| 请示报告卡 | REPORT_CARD | 初核请示相关 |
|
||||
| 初步核实审批表 | PRELIMINARY_VERIFICATION_APPROVAL | 初步核实审批 |
|
||||
| 初核方案 | INVESTIGATION_PLAN | 初核方案 |
|
||||
| 谈话通知书 | NOTIFICATION_LETTER | 谈话通知 |
|
||||
| 谈话笔录 | INTERVIEW_RECORD | 谈话记录 |
|
||||
| 谈话询问对象情况摸底调查30问 | INVESTIGATION_30_QUESTIONS | 摸底调查 |
|
||||
| 被谈话人权利义务告知书 | RIGHTS_OBLIGATIONS_NOTICE | 权利义务告知 |
|
||||
| 点对点交接单 | HANDOVER_FORM | 交接单 |
|
||||
| 陪送交接单 | ESCORT_HANDOVER_FORM | 陪送交接 |
|
||||
| 保密承诺书 | CONFIDENTIALITY_COMMITMENT | 保密承诺 |
|
||||
| 办案人员-办案安全保密承诺书 | INVESTIGATOR_CONFIDENTIALITY_COMMITMENT | 办案人员承诺 |
|
||||
| 请示报告卡(初核报告结论) | REPORT_CARD_CONCLUSION | 初核结论 |
|
||||
| 初核情况报告 | INVESTIGATION_REPORT | 初核报告 |
|
||||
| 谈话审批表 | INTERVIEW_APPROVAL_FORM | 谈话审批 |
|
||||
| 谈话前安全风险评估表 | PRE_INTERVIEW_RISK_ASSESSMENT | 风险评估 |
|
||||
| 谈话方案 | INTERVIEW_PLAN | 谈话方案 |
|
||||
| 谈话后安全风险评估表 | POST_INTERVIEW_RISK_ASSESSMENT | 风险评估 |
|
||||
|
||||
## 占位符格式
|
||||
|
||||
所有占位符使用以下格式:
|
||||
```
|
||||
{{field_code}}
|
||||
```
|
||||
|
||||
例如:
|
||||
- `{{target_name}}` - 被核查人姓名
|
||||
- `{{target_organization_and_position}}` - 被核查人员单位及职务
|
||||
- `{{target_gender}}` - 被核查人员性别
|
||||
|
||||
完整的字段列表请参考 `占位符与字段对照表.md`。
|
||||
|
||||
## 常见问题
|
||||
|
||||
### Q1: .doc 文件无法转换怎么办?
|
||||
|
||||
**A:** 如果系统未安装 `pywin32`,可以:
|
||||
1. 安装 pywin32: `pip install pywin32`
|
||||
2. 或者手动将 `.doc` 文件转换为 `.docx` 格式
|
||||
|
||||
### Q2: 占位符位置不正确怎么办?
|
||||
|
||||
**A:** 脚本会自动识别字段名称并添加占位符,但可能不够精确。建议:
|
||||
1. 检查生成的模板文件
|
||||
2. 手动调整占位符位置
|
||||
3. 确保占位符格式正确:`{{field_code}}`
|
||||
|
||||
### Q3: 文档类型无法识别怎么办?
|
||||
|
||||
**A:** 如果文档类型无法识别:
|
||||
1. 检查文件名是否包含文档类型关键词
|
||||
2. 可以在 `process_templates.py` 中的 `DOCUMENT_TYPE_MAPPING` 添加新的映射
|
||||
3. 或者手动指定文档类型
|
||||
|
||||
### Q4: 上传到 MinIO 失败怎么办?
|
||||
|
||||
**A:** 检查:
|
||||
1. MinIO 连接配置是否正确
|
||||
2. 存储桶是否存在
|
||||
3. 网络连接是否正常
|
||||
4. 文件路径是否正确
|
||||
|
||||
### Q5: 数据库更新失败怎么办?
|
||||
|
||||
**A:** 检查:
|
||||
1. 数据库连接配置是否正确
|
||||
2. 数据库表结构是否正确
|
||||
3. 租户ID是否正确
|
||||
4. 是否有足够的权限
|
||||
|
||||
## 文件结构
|
||||
|
||||
处理后的文件结构:
|
||||
```
|
||||
模板/
|
||||
├── 初步核实审批表模板.docx # 已处理的模板
|
||||
├── 原始模板/ # 原始模板(不处理)
|
||||
│ └── 2-初核模版/
|
||||
│ └── ...
|
||||
└── [其他已处理的模板文件]
|
||||
```
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. **备份原始文件**:处理前建议备份原始模板文件
|
||||
2. **检查占位符**:处理后的模板需要人工检查,确认占位符位置正确
|
||||
3. **测试生成**:初始化后,建议测试文档生成功能,确认模板可用
|
||||
4. **版本控制**:建议使用版本控制管理模板文件
|
||||
|
||||
## 技术支持
|
||||
|
||||
如有问题,请检查:
|
||||
1. 脚本输出日志
|
||||
2. 数据库和 MinIO 连接状态
|
||||
3. 文件路径和权限
|
||||
4. 占位符格式是否正确
|
||||
211
模板检查和注册使用说明.md
Normal file
211
模板检查和注册使用说明.md
Normal file
@ -0,0 +1,211 @@
|
||||
# 模板检查和注册使用说明
|
||||
|
||||
## 一、检查模板文件占位符
|
||||
|
||||
### 1.1 运行检查脚本
|
||||
|
||||
```bash
|
||||
python check_template_placeholders.py
|
||||
```
|
||||
|
||||
### 1.2 功能说明
|
||||
|
||||
- 扫描 `template_finish` 文件夹下的所有 `.docx` 模板文件
|
||||
- 提取每个文件中的占位符(格式:`{{field_code}}`)
|
||||
- 统计占位符信息
|
||||
- 生成检查报告
|
||||
|
||||
### 1.3 输出内容
|
||||
|
||||
- 总文件数
|
||||
- 包含占位符的文件数量
|
||||
- 未找到占位符的文件数量
|
||||
- 所有唯一的占位符列表
|
||||
- 每个文件的详细占位符信息
|
||||
|
||||
---
|
||||
|
||||
## 二、批量注册模板到数据库并上传到MinIO
|
||||
|
||||
### 2.1 运行注册脚本
|
||||
|
||||
```bash
|
||||
python register_templates_to_db.py
|
||||
```
|
||||
|
||||
### 2.2 功能说明
|
||||
|
||||
- 扫描 `template_finish` 文件夹下的所有 `.docx` 模板文件
|
||||
- 提取每个文件的占位符
|
||||
- 生成模板编码(template_code)
|
||||
- 上传文件到MinIO
|
||||
- 注册模板信息到数据库
|
||||
|
||||
### 2.3 模板编码规则
|
||||
|
||||
脚本会根据文件名自动生成模板编码,例如:
|
||||
|
||||
| 文件名 | 模板编码 |
|
||||
|--------|----------|
|
||||
| 初步核实审批表(XXX).docx | PRELIMINARY_VERIFICATION_APPROVAL |
|
||||
| 请示报告卡(XXX).docx | REQUEST_REPORT_CARD |
|
||||
| 谈话通知书第一联.docx | INTERVIEW_NOTICE_FIRST |
|
||||
| 谈话笔录.docx | INTERVIEW_RECORD |
|
||||
|
||||
### 2.4 数据库记录
|
||||
|
||||
每个模板会在 `f_polic_file_config` 表中创建一条记录,包含:
|
||||
- `name`: 模板名称(文件名去掉扩展名)
|
||||
- `input_data`: JSON格式,包含 `template_code`、`business_type` 和 `placeholders`
|
||||
- `file_path`: MinIO中的相对路径
|
||||
|
||||
---
|
||||
|
||||
## 三、测试API接口
|
||||
|
||||
### 3.1 启动服务
|
||||
|
||||
```bash
|
||||
python app.py
|
||||
```
|
||||
|
||||
服务默认运行在 `http://localhost:7500`
|
||||
|
||||
### 3.2 运行接口测试脚本
|
||||
|
||||
```bash
|
||||
python test_api_endpoints.py
|
||||
```
|
||||
|
||||
### 3.3 测试内容
|
||||
|
||||
测试脚本会依次测试以下接口:
|
||||
|
||||
1. **字段配置接口** (`/api/fields`)
|
||||
- 测试获取字段配置
|
||||
|
||||
2. **AI解析接口** (`/ai/extract`)
|
||||
- 测试从输入文本中提取结构化字段
|
||||
|
||||
3. **文档生成接口** (`/ai/generate-document`)
|
||||
- 测试根据模板和数据生成文档
|
||||
|
||||
### 3.4 测试数据
|
||||
|
||||
测试脚本使用虚拟测试数据:
|
||||
|
||||
**输入数据示例:**
|
||||
- `clue_info`: "被举报用户名称是张三,年龄44岁..."
|
||||
- `target_basic_info_clue`: "被核查人员工作基本情况..."
|
||||
|
||||
**输出字段:**
|
||||
- `target_name`, `target_gender`, `target_age`, `target_date_of_birth` 等
|
||||
|
||||
---
|
||||
|
||||
## 四、使用测试页面
|
||||
|
||||
### 4.1 访问测试页面
|
||||
|
||||
打开浏览器访问:`http://localhost:7500`
|
||||
|
||||
### 4.2 功能说明
|
||||
|
||||
测试页面包含两个标签页:
|
||||
|
||||
1. **AI解析接口**
|
||||
- 已预填输入字段和输出字段
|
||||
- 点击"开始解析"按钮进行测试
|
||||
|
||||
2. **文档生成接口**
|
||||
- 已预填字段数据(包含完整的虚拟测试数据)
|
||||
- 已预填文件列表
|
||||
- 点击"生成文档"按钮进行测试
|
||||
|
||||
### 4.3 虚拟测试数据
|
||||
|
||||
测试页面已预填以下虚拟数据:
|
||||
|
||||
**输入字段(AI解析用):**
|
||||
- `clue_info`: 完整的线索信息
|
||||
- `target_basic_info_clue`: 被核查人员基本情况
|
||||
|
||||
**输出字段(AI解析用):**
|
||||
- 12个常用输出字段
|
||||
|
||||
**字段数据(文档生成用):**
|
||||
- 14个完整字段的测试数据
|
||||
- 包括:姓名、性别、年龄、出生年月、单位、职务、文化程度、政治面貌、职级、线索来源、问题描述等
|
||||
|
||||
**文件列表:**
|
||||
- 初步核实审批表 (PRELIMINARY_VERIFICATION_APPROVAL)
|
||||
- 请示报告卡 (REQUEST_REPORT_CARD)
|
||||
|
||||
---
|
||||
|
||||
## 五、执行顺序建议
|
||||
|
||||
1. **检查模板占位符**
|
||||
```bash
|
||||
python check_template_placeholders.py
|
||||
```
|
||||
确保模板文件中的占位符格式正确
|
||||
|
||||
2. **注册模板到数据库**
|
||||
```bash
|
||||
python register_templates_to_db.py
|
||||
```
|
||||
批量注册模板并上传到MinIO
|
||||
|
||||
3. **启动服务**
|
||||
```bash
|
||||
python app.py
|
||||
```
|
||||
|
||||
4. **测试接口**
|
||||
```bash
|
||||
python test_api_endpoints.py
|
||||
```
|
||||
或者在浏览器中访问测试页面进行交互式测试
|
||||
|
||||
---
|
||||
|
||||
## 六、注意事项
|
||||
|
||||
1. **占位符格式**:必须使用 `{{field_code}}` 格式
|
||||
2. **模板编码**:确保模板编码与数据库中的配置一致
|
||||
3. **字段编码**:占位符中的字段编码必须与数据库中的字段编码匹配
|
||||
4. **文件路径**:确保MinIO存储桶存在且有写入权限
|
||||
5. **数据库连接**:确保数据库连接配置正确
|
||||
|
||||
---
|
||||
|
||||
## 七、常见问题
|
||||
|
||||
### Q1: 检查脚本没有输出?
|
||||
|
||||
**A:** 确保:
|
||||
- `template_finish` 目录存在
|
||||
- 目录下有 `.docx` 文件
|
||||
- 已安装 `python-docx` 库:`pip install python-docx`
|
||||
|
||||
### Q2: 上传到MinIO失败?
|
||||
|
||||
**A:** 检查:
|
||||
- MinIO连接配置是否正确
|
||||
- 存储桶是否存在
|
||||
- 网络连接是否正常
|
||||
|
||||
### Q3: 注册到数据库失败?
|
||||
|
||||
**A:** 检查:
|
||||
- 数据库连接配置
|
||||
- 数据库表结构是否正确
|
||||
- 字段编码是否冲突
|
||||
|
||||
### Q4: 接口测试失败?
|
||||
|
||||
**A:** 确保:
|
||||
- 服务已启动
|
||||
- 端口7500未被占用
|
||||
- 数据库和MinIO连接正常
|
||||
337
谈话前安全风险评估表数据设计说明.md
Normal file
337
谈话前安全风险评估表数据设计说明.md
Normal file
@ -0,0 +1,337 @@
|
||||
# 谈话前安全风险评估表数据设计说明
|
||||
|
||||
## 一、设计概述
|
||||
|
||||
本文档说明"谈话前安全风险评估表"模板相关的数据字段设计,以及如何将数据存入数据库。
|
||||
|
||||
## 二、数据表结构
|
||||
|
||||
### 2.1 相关数据表
|
||||
|
||||
1. **f_polic_file_config** - 文件模板配置表
|
||||
- 存储文档模板的基本信息
|
||||
- 字段:id, tenant_id, parent_id, name, input_data, file_path, created_time, created_by, updated_time, updated_by, state
|
||||
|
||||
2. **f_polic_field** - 字段表
|
||||
- 存储字段定义信息
|
||||
- 字段:id, tenant_id, name, filed_code, field_type, created_time, created_by, updated_time, updated_by, state
|
||||
- 注意:表中字段名为 `filed_code`(拼写错误,但需保持一致)
|
||||
|
||||
3. **f_polic_file_field** - 文件和字段关联表
|
||||
- 存储文件模板与字段的关联关系
|
||||
- 字段:id, tenant_id, filed_id, file_id, created_time, created_by, updated_time, updated_by, state
|
||||
|
||||
### 2.2 字段类型说明
|
||||
|
||||
- `field_type = 1`: 输入字段(用于AI解析的原始数据)
|
||||
- `field_type = 2`: 输出字段(AI解析后生成的结构化数据,用于填充模板)
|
||||
|
||||
## 三、谈话前安全风险评估表字段设计
|
||||
|
||||
### 3.1 输出字段(用于填充模板)
|
||||
|
||||
根据风险评估需求,设计了以下11个输出字段,所有字段都配置了默认值:
|
||||
|
||||
| 字段名称 | 字段编码 (field_code) | 说明 | 默认值 | 示例 |
|
||||
|---------|---------------------|------|--------|------|
|
||||
| 被核查人员家庭情况 | target_family_situation | 被核查人员家庭情况 | 家庭关系和谐稳定 | 家庭关系和谐稳定 |
|
||||
| 被核查人员社会关系 | target_social_relations | 被核查人员社会关系 | 社会交往较多,人机关系基本正常 | 社会交往较多,人机关系基本正常 |
|
||||
| 被核查人员健康状况 | target_health_status | 被核查人员健康状况 | 良好 | 良好 |
|
||||
| 被核查人员性格特征 | target_personality | 被核查人员性格特征 | 开朗 | 开朗 |
|
||||
| 被核查人员承受能力 | target_tolerance | 被核查人员承受能力 | 较强 | 较强 |
|
||||
| 被核查人员涉及问题严重程度 | target_issue_severity | 被核查人员涉及问题严重程度 | 较轻 | 较轻 |
|
||||
| 被核查人员涉及其他问题的可能性 | target_other_issues_possibility | 被核查人员涉及其他问题的可能性 | 较小 | 较小 |
|
||||
| 被核查人员此前被审查情况 | target_previous_investigation | 被核查人员此前被审查情况 | 无 | 无 |
|
||||
| 被核查人员社会负面事件 | target_negative_events | 被核查人员社会负面事件 | 无 | 无 |
|
||||
| 被核查人员其他情况 | target_other_situation | 被核查人员其他情况 | 无 | 无 |
|
||||
| 风险等级 | risk_level | 风险等级 | 低 | 低 |
|
||||
|
||||
### 3.2 输入字段(用于AI解析)
|
||||
|
||||
该模板可以使用通用的输入字段:
|
||||
|
||||
| 字段名称 | 字段编码 (field_code) | 说明 |
|
||||
|---------|---------------------|------|
|
||||
| 线索信息 | clue_info | 线索信息(用于AI解析) |
|
||||
| 被核查人员工作基本情况线索 | target_basic_info_clue | 被核查人员工作基本情况线索(用于AI解析) |
|
||||
|
||||
### 3.3 Word模板占位符映射
|
||||
|
||||
Word模板中使用的占位符格式为 `{{field_code}}`,与字段编码的对应关系:
|
||||
|
||||
- `{{target_family_situation}}` → 被核查人员家庭情况
|
||||
- `{{target_social_relations}}` → 被核查人员社会关系
|
||||
- `{{target_health_status}}` → 被核查人员健康状况
|
||||
- `{{target_personality}}` → 被核查人员性格特征
|
||||
- `{{target_tolerance}}` → 被核查人员承受能力
|
||||
- `{{target_issue_severity}}` → 被核查人员涉及问题严重程度
|
||||
- `{{target_other_issues_possibility}}` → 被核查人员涉及其他问题的可能性
|
||||
- `{{target_previous_investigation}}` → 被核查人员此前被审查情况
|
||||
- `{{target_negative_events}}` → 被核查人员社会负面事件
|
||||
- `{{target_other_situation}}` → 被核查人员其他情况
|
||||
- `{{risk_level}}` → 风险等级
|
||||
|
||||
## 四、文件配置信息
|
||||
|
||||
### 4.1 文件配置记录
|
||||
|
||||
- **名称**: 谈话前安全风险评估表
|
||||
- **模板编码 (template_code)**: `PRE_INTERVIEW_RISK_ASSESSMENT`
|
||||
- **业务类型 (business_type)**: `INVESTIGATION`(调查核实)
|
||||
- **文件路径 (file_path)**: `/templates/谈话前安全风险评估表模板.docx`(MinIO相对路径)
|
||||
|
||||
### 4.2 配置存储
|
||||
|
||||
文件配置的 `input_data` 字段存储JSON格式数据:
|
||||
```json
|
||||
{
|
||||
"template_code": "PRE_INTERVIEW_RISK_ASSESSMENT",
|
||||
"business_type": "INVESTIGATION"
|
||||
}
|
||||
```
|
||||
|
||||
## 五、默认值机制
|
||||
|
||||
### 5.1 默认值配置
|
||||
|
||||
所有风险评估字段都配置了默认值,存储在 `config/field_defaults.json` 配置文件中。
|
||||
|
||||
### 5.2 默认值说明
|
||||
|
||||
**重要说明**:系统在AI提取阶段不会自动应用默认值。如果AI未提取到字段值,系统会返回空字符串。
|
||||
|
||||
默认值信息提供给前端开发人员,前端可以根据业务需求决定是否在界面上显示默认值或应用默认值。
|
||||
|
||||
### 5.3 默认值应用规则(前端参考)
|
||||
|
||||
1. **AI提取阶段**:如果AI从输入文本中提取到字段值,则使用提取的值
|
||||
2. **空值处理**:如果AI提取的字段值为空或未提取到,系统返回空字符串
|
||||
3. **前端应用**:前端可以根据业务需求,在用户界面中显示默认值提示,或允许用户选择应用默认值
|
||||
4. **默认值优先级**:AI提取的值优先于默认值
|
||||
|
||||
### 5.3 默认值配置位置
|
||||
|
||||
默认值配置文件:`config/field_defaults.json`
|
||||
|
||||
```json
|
||||
{
|
||||
"field_defaults": {
|
||||
"target_family_situation": "家庭关系和谐稳定",
|
||||
"target_social_relations": "社会交往较多,人机关系基本正常",
|
||||
"target_health_status": "良好",
|
||||
"target_personality": "开朗",
|
||||
"target_tolerance": "较强",
|
||||
"target_issue_severity": "较轻",
|
||||
"target_other_issues_possibility": "较小",
|
||||
"target_previous_investigation": "无",
|
||||
"target_negative_events": "无",
|
||||
"target_other_situation": "无",
|
||||
"risk_level": "低"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 六、使用说明
|
||||
|
||||
### 6.1 初始化脚本
|
||||
|
||||
使用 `init_pre_interview_risk_assessment_fields.py` 脚本初始化数据:
|
||||
|
||||
```bash
|
||||
python init_pre_interview_risk_assessment_fields.py
|
||||
```
|
||||
|
||||
脚本功能:
|
||||
1. 创建11个字段记录(输出字段)
|
||||
2. 创建文件配置记录
|
||||
3. 建立文件和字段的关联关系
|
||||
|
||||
### 6.2 接口调用示例
|
||||
|
||||
#### 1. AI解析接口 (`/ai/extract`)
|
||||
|
||||
```json
|
||||
{
|
||||
"businessType": "INVESTIGATION",
|
||||
"inputData": [
|
||||
{
|
||||
"fieldCode": "clue_info",
|
||||
"fieldValue": "被举报用户名称是张三,年龄30岁"
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_basic_info_clue",
|
||||
"fieldValue": "张三,男,汉族,1980年5月出生,山西太原人,本科学历,2000年参加工作,2005年加入中国共产党。"
|
||||
}
|
||||
],
|
||||
"outputData": [
|
||||
{"fieldCode": "target_family_situation"},
|
||||
{"fieldCode": "target_social_relations"},
|
||||
{"fieldCode": "target_health_status"},
|
||||
{"fieldCode": "target_personality"},
|
||||
{"fieldCode": "target_tolerance"},
|
||||
{"fieldCode": "target_issue_severity"},
|
||||
{"fieldCode": "target_other_issues_possibility"},
|
||||
{"fieldCode": "target_previous_investigation"},
|
||||
{"fieldCode": "target_negative_events"},
|
||||
{"fieldCode": "target_other_situation"},
|
||||
{"fieldCode": "risk_level"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**返回结果**(如果AI未提取到值,返回空字符串,前端可根据需要应用默认值):
|
||||
|
||||
```json
|
||||
{
|
||||
"code": 0,
|
||||
"data": {
|
||||
"outData": [
|
||||
{
|
||||
"fieldCode": "target_family_situation",
|
||||
"fieldValue": ""
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_social_relations",
|
||||
"fieldValue": ""
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_health_status",
|
||||
"fieldValue": ""
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_personality",
|
||||
"fieldValue": ""
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_tolerance",
|
||||
"fieldValue": ""
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_issue_severity",
|
||||
"fieldValue": ""
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_other_issues_possibility",
|
||||
"fieldValue": ""
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_previous_investigation",
|
||||
"fieldValue": ""
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_negative_events",
|
||||
"fieldValue": ""
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_other_situation",
|
||||
"fieldValue": ""
|
||||
},
|
||||
{
|
||||
"fieldCode": "risk_level",
|
||||
"fieldValue": ""
|
||||
}
|
||||
]
|
||||
},
|
||||
"msg": "ok",
|
||||
"isSuccess": true
|
||||
}
|
||||
```
|
||||
|
||||
#### 2. 文档生成接口 (`/ai/generate-document`)
|
||||
|
||||
```json
|
||||
{
|
||||
"templateCode": "PRE_INTERVIEW_RISK_ASSESSMENT",
|
||||
"businessType": "INVESTIGATION",
|
||||
"inputData": [
|
||||
{
|
||||
"fieldCode": "target_family_situation",
|
||||
"fieldValue": "家庭关系和谐稳定"
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_social_relations",
|
||||
"fieldValue": "社会交往较多,人机关系基本正常"
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_health_status",
|
||||
"fieldValue": "良好"
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_personality",
|
||||
"fieldValue": "开朗"
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_tolerance",
|
||||
"fieldValue": "较强"
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_issue_severity",
|
||||
"fieldValue": "较轻"
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_other_issues_possibility",
|
||||
"fieldValue": "较小"
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_previous_investigation",
|
||||
"fieldValue": "无"
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_negative_events",
|
||||
"fieldValue": "无"
|
||||
},
|
||||
{
|
||||
"fieldCode": "target_other_situation",
|
||||
"fieldValue": "无"
|
||||
},
|
||||
{
|
||||
"fieldCode": "risk_level",
|
||||
"fieldValue": "低"
|
||||
}
|
||||
],
|
||||
"fpolicFieldParamFileList": [
|
||||
{
|
||||
"fileId": 1764656918061150,
|
||||
"fileName": "谈话前安全风险评估表.doc"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## 七、注意事项
|
||||
|
||||
1. **字段编码一致性**: 确保Word模板中的占位符 `{{field_code}}` 与数据库中的 `filed_code` 字段值完全一致。
|
||||
|
||||
2. **默认值机制**:
|
||||
- 系统在AI提取阶段不会自动应用默认值
|
||||
- 如果AI提取到值,则使用AI提取的值
|
||||
- 如果AI未提取到值或值为空,系统返回空字符串
|
||||
- 默认值信息提供给前端,前端可根据业务需求决定是否应用
|
||||
|
||||
3. **字段类型**:
|
||||
- 输入字段(field_type=1)用于接收用户输入的原始数据,供AI解析使用
|
||||
- 输出字段(field_type=2)用于存储AI解析后的结构化数据,用于填充模板
|
||||
|
||||
4. **文件路径**: `f_polic_file_config` 表中的 `file_path` 存储的是MinIO的相对路径,不是绝对路径。
|
||||
|
||||
5. **状态字段**:
|
||||
- `state = 0` 表示未启用
|
||||
- `state = 1` 表示启用
|
||||
|
||||
6. **数据去重**: 初始化脚本会自动检查数据是否已存在,避免重复创建。
|
||||
|
||||
## 八、数据库连接信息
|
||||
|
||||
- **IP地址**: 152.136.177.240
|
||||
- **端口**: 5012
|
||||
- **用户名**: finyx
|
||||
- **密码**: 6QsGK6MpePZDE57Z
|
||||
- **数据库名称**: finyx
|
||||
|
||||
## 九、相关文件
|
||||
|
||||
- `init_pre_interview_risk_assessment_fields.py` - 数据初始化脚本
|
||||
- `config/field_defaults.json` - 字段默认值配置文件
|
||||
- `services/field_service.py` - 字段服务(包含默认值逻辑)
|
||||
- `app.py` - API接口(包含默认值应用逻辑)
|
||||
- `模板/谈话前安全风险评估表模板.docx` - Word模板文件
|
||||
Loading…
x
Reference in New Issue
Block a user