增强后处理逻辑,允许从原始输入文本中提取缺失的性别和年龄字段,改进数据推断的准确性和完整性。

This commit is contained in:
python 2025-12-10 09:37:37 +08:00
parent 24fdfdea4c
commit 6871c2e803
30 changed files with 7426 additions and 5 deletions

View File

@ -0,0 +1,221 @@
# 初始化模板树状结构 - 使用说明
## 概述
`init_template_tree_from_directory.py` 脚本用于**完全重置** `f_polic_file_config` 表中的模板数据,根据 `template_finish` 目录结构重新创建所有记录,建立正确的树状层级关系。
## ⚠️ 重要警告
**此操作会删除当前租户的所有模板数据!**
包括:
- `f_polic_file_config` 表中的所有记录
- `f_polic_file_field` 表中的相关关联记录
然后根据 `template_finish` 目录结构完全重建。
**执行前请务必备份数据库!**
## 功能特点
1. **完全重建**: 删除旧数据,根据目录结构重新创建
2. **树状结构**: 自动建立正确的 parent_id 层级关系
3. **文件上传**: 可选择是否上传文件到 MinIO
4. **安全确认**: 多重确认机制,防止误操作
5. **模拟模式**: 先预览再执行,确保安全
## 目录结构要求
脚本会扫描 `template_finish` 目录,期望的结构如下:
```
template_finish/
└── 2-初核模版/ (一级目录)
├── 1.初核请示/ (二级目录)
│ ├── 1.请示报告卡XXX.docx
│ ├── 2.初步核实审批表XXX.docx
│ └── 3.附件初核方案(XXX).docx
├── 2.谈话审批/ (二级目录)
│ ├── 谈话通知书/ (三级目录)
│ │ ├── 谈话通知书第一联.docx
│ │ ├── 谈话通知书第二联.docx
│ │ └── 谈话通知书第三联.docx
│ ├── 走读式谈话审批/ (三级目录)
│ │ ├── 1.请示报告卡(初核谈话).docx
│ │ ├── 2谈话审批表.docx
│ │ └── ...
│ └── 走读式谈话流程/ (三级目录)
│ ├── 1.谈话笔录.docx
│ └── ...
└── 3.初核结论/ (二级目录)
├── 8-1请示报告卡初核报告结论 .docx
└── 8.XXX初核情况报告.docx
```
## 使用方法
### 基本使用
```bash
python init_template_tree_from_directory.py
```
### 执行流程
1. **警告提示**: 显示操作警告
2. **第一次确认**: 输入 `yes` 继续
3. **扫描目录**: 自动扫描 `template_finish` 目录
4. **显示预览**: 显示目录结构预览
5. **选择上传**: 选择是否上传文件到 MinIO
6. **模拟删除**: 显示将删除的数据
7. **模拟创建**: 显示将创建的节点
8. **最终确认**: 再次输入 `yes` 执行实际更新
9. **执行删除**: 删除旧数据
10. **执行创建**: 创建新数据
### 交互式提示
```
确认继续?(yes/no默认no): yes
是否上传文件到MinIO(yes/no默认yes): yes
确认执行实际更新?(yes/no默认no): yes
```
## 处理逻辑
### 1. 删除旧数据
- 先删除 `f_polic_file_field` 表中的关联记录
- 再删除 `f_polic_file_config` 表中的配置记录
- 只删除当前租户(`tenant_id = 615873064429507639`)的数据
### 2. 创建新数据
按层级顺序创建:
1. **目录节点**:
- 不包含 `template_code` 字段
- `input_data` 为 NULL
- `file_path` 为 NULL
2. **文件节点**:
- 包含 `template_code`(从 `DOCUMENT_TYPE_MAPPING` 获取)
- `input_data` 包含 JSON 格式的配置
- `file_path` 为 MinIO 路径(如果上传了文件)
### 3. 树状关系
- 一级目录: `parent_id = NULL`
- 二级目录: `parent_id = 一级目录的ID`
- 三级目录: `parent_id = 二级目录的ID`
- 文件: `parent_id = 所在目录的ID`
## 模板识别
脚本通过 `DOCUMENT_TYPE_MAPPING` 字典识别文件类型:
- 匹配文件名(不含扩展名)
- 提取 `template_code``business_type`
- 如果无法识别,`template_code` 为空字符串
## 文件上传
如果选择上传文件到 MinIO
- 文件路径格式: `/{tenant_id}/TEMPLATE/{year}/{month}/{filename}`
- 例如: `/615873064429507639/TEMPLATE/2025/12/1.请示报告卡XXX.docx`
- 上传失败不会中断流程,但 `file_path` 将为 NULL
## 输出示例
```
================================================================================
初始化模板树状结构(从目录结构完全重建)
================================================================================
⚠️ 警告:此操作将删除当前租户的所有模板数据!
确认继续?(yes/no默认no): yes
✓ 数据库连接成功
扫描目录结构...
找到 28 个节点
其中目录: 7 个
其中文件: 21 个
执行模拟删除...
[模拟] 将删除 113 条关联记录
[模拟] 将删除 34 条配置记录
执行模拟创建...
✓ [模拟]创建目录: 2-初核模版 (ID: ...)
✓ [模拟]创建文件: 1.请示报告卡XXX (ID: ...) [父: ...] [code: REPORT_CARD]
...
确认执行实际更新?(yes/no默认no): yes
执行实际删除...
✓ 删除了 113 条关联记录
✓ 删除了 34 条配置记录
执行实际创建...
✓ 创建目录: 2-初核模版 (ID: ...)
✓ 创建文件: 1.请示报告卡XXX (ID: ...) [父: ...] [code: REPORT_CARD]
...
✓ 创建完成!共创建 28 个节点
```
## 验证结果
执行完成后,可以使用验证脚本检查结果:
```bash
python verify_tree_structure.py
```
## 注意事项
1. **备份数据库**: 执行前务必备份数据库
2. **确认目录结构**: 确保 `template_finish` 目录结构正确
3. **文件存在**: 确保所有 `.docx` 文件都存在
4. **MinIO 连接**: 如果选择上传文件,确保 MinIO 连接正常
5. **不可逆操作**: 删除操作不可逆,请谨慎执行
## 故障排查
### 问题1: template_code 不能为 NULL
**原因**: 数据库表结构要求 template_code 不能为 NULL
**解决**: 脚本已处理,目录节点不插入 template_code文件节点使用空字符串
### 问题2: 文件上传失败
**原因**: MinIO 连接问题或文件不存在
**解决**:
- 检查 MinIO 配置
- 检查文件是否存在
- 上传失败不会中断流程,可以后续手动上传
### 问题3: 父子关系错误
**原因**: 目录结构扫描顺序问题
**解决**: 脚本已按层级顺序处理,确保父节点先于子节点创建
## 相关脚本
- `update_template_tree.py` - 更新现有数据的 parent_id不删除数据
- `verify_tree_structure.py` - 验证树状结构
- `check_existing_data.py` - 检查现有数据
## 联系信息
如有问题,请检查:
1. 数据库连接配置
2. 目录结构是否正确
3. 文件是否都存在
4. MinIO 配置是否正确

View File

@ -0,0 +1,293 @@
# 模板树状结构更新 - 使用说明
## 概述
本工具用于根据 `template_finish` 目录结构,更新数据库 `f_polic_file_config` 表中的 `parent_id` 字段,建立正确的树状层级结构。
## 数据库现状分析
根据检查,数据库中现有:
- **总记录数**: 32 条
- **有 parent_id**: 2 条
- **无 parent_id**: 30 条
需要更新的主要记录包括:
- 初步核实审批表
- 请示报告卡(各种类型)
- 初核方案
- 谈话通知书(第一联、第二联、第三联)
- XXX初核情况报告
- 走读式谈话审批相关文件
- 走读式谈话流程相关文件
- 等等...
## 脚本说明
### 1. `check_existing_data.py` - 检查现有数据
**功能**: 查看数据库中的现有记录,分析缺少 parent_id 的情况
**使用方法**:
```bash
python check_existing_data.py
```
**输出**:
- 列出所有无 parent_id 的记录
- 显示有 parent_id 的记录及其树状关系
---
### 2. `improved_match_and_update.py` - 改进的匹配分析
**功能**: 使用改进的匹配逻辑分析目录结构和数据库,生成匹配报告
**特点**:
- **三级匹配策略**:
1. **template_code 精确匹配**(最高优先级)
2. **名称精确匹配**
3. **标准化名称匹配**(去掉编号和括号后的模糊匹配)
**使用方法**:
```bash
python improved_match_and_update.py
```
**输出**:
- 匹配报告(显示哪些记录已匹配,哪些需要创建)
- 可选择性生成 SQL 更新脚本
---
### 3. `update_template_tree.py` - 交互式更新工具(推荐)
**功能**: 完整的更新工具,包含预览、确认和执行功能
**特点**:
- 使用改进的匹配逻辑
- 支持预览模式dry-run
- 交互式确认
- 按层级顺序自动更新
- 安全的事务处理
**使用方法**:
```bash
python update_template_tree.py
```
**执行流程**:
1. 扫描目录结构
2. 获取数据库现有数据
3. 规划树状结构(使用改进的匹配逻辑)
4. 显示更新预览
5. 询问是否执行(输入 `yes`
6. 执行模拟更新
7. 再次确认执行实际更新
---
### 4. `analyze_and_update_template_tree.py` - 生成 SQL 脚本
**功能**: 分析并生成 SQL 更新脚本(不直接修改数据库)
**使用方法**:
```bash
python analyze_and_update_template_tree.py
```
**输出**:
- `update_template_tree.sql` - SQL 更新脚本
**适用场景**:
- 生产环境
- 需要 DBA 审核的场景
- 需要手动执行的场景
---
### 5. `verify_tree_structure.py` - 验证更新结果
**功能**: 验证更新后的树状结构是否正确
**使用方法**:
```bash
python verify_tree_structure.py
```
**输出**:
- 树状结构可视化
- 统计信息
- 父子关系验证
---
## 匹配逻辑说明
### 三级匹配策略
1. **template_code 精确匹配**(最高优先级)
- 通过 `template_code` 字段精确匹配
- 例如: `REPORT_CARD` 匹配 `REPORT_CARD`
2. **名称精确匹配**
- 通过 `name` 字段精确匹配
- 例如: `"1.请示报告卡XXX"` 匹配 `"1.请示报告卡XXX"`
3. **标准化名称匹配**(模糊匹配)
- 去掉开头的编号(如 `"1."``"2."``"8-1"`
- 去掉括号及其内容(如 `"XXX"``"(初核谈话)"`
- 例如: `"1.请示报告卡XXX"``"请示报告卡"` → 匹配 `"请示报告卡"`
### 匹配示例
| 目录结构中的名称 | 数据库中的名称 | 匹配方式 |
|----------------|--------------|---------|
| `1.请示报告卡XXX` | `请示报告卡` | template_code: `REPORT_CARD` |
| `2.初步核实审批表XXX` | `初步核实审批表` | template_code: `PRELIMINARY_VERIFICATION_APPROVAL` |
| `谈话通知书第一联` | `谈话通知书第一联` | 名称精确匹配 |
| `走读式谈话审批` | `走读式谈话审批` | 名称精确匹配 |
## 树状结构规划
根据 `template_finish` 目录结构,规划的层级关系如下:
```
2-初核模版 (一级目录)
├── 1.初核请示 (二级目录)
│ ├── 1.请示报告卡XXX.docx
│ ├── 2.初步核实审批表XXX.docx
│ └── 3.附件初核方案(XXX).docx
├── 2.谈话审批 (二级目录)
│ ├── 谈话通知书 (三级目录)
│ │ ├── 谈话通知书第一联.docx
│ │ ├── 谈话通知书第二联.docx
│ │ └── 谈话通知书第三联.docx
│ ├── 走读式谈话审批 (三级目录)
│ │ ├── 1.请示报告卡(初核谈话).docx
│ │ ├── 2谈话审批表.docx
│ │ ├── 3.谈话前安全风险评估表.docx
│ │ ├── 4.谈话方案.docx
│ │ └── 5.谈话后安全风险评估表.docx
│ └── 走读式谈话流程 (三级目录)
│ ├── 1.谈话笔录.docx
│ ├── 2.谈话询问对象情况摸底调查30问.docx
│ ├── 3.被谈话人权利义务告知书.docx
│ ├── 4.点对点交接单.docx
│ ├── 5.陪送交接单(新).docx
│ ├── 6.1保密承诺书(谈话对象使用-非中共党员用).docx
│ ├── 6.2保密承诺书(谈话对象使用-中共党员用).docx
│ └── 7.办案人员-办案安全保密承诺书.docx
└── 3.初核结论 (二级目录)
├── 8-1请示报告卡初核报告结论 .docx
└── 8.XXX初核情况报告.docx
```
## 执行步骤
### 推荐流程(使用交互式工具)
1. **检查现有数据**
```bash
python check_existing_data.py
```
2. **运行更新工具**
```bash
python update_template_tree.py
```
3. **查看预览信息**
- 检查匹配情况
- 确认更新计划
4. **确认执行**
- 输入 `yes` 确认
- 再次确认执行实际更新
5. **验证结果**
```bash
python verify_tree_structure.py
```
### 备选流程(使用 SQL 脚本)
1. **生成 SQL 脚本**
```bash
python improved_match_and_update.py
# 或
python analyze_and_update_template_tree.py
```
2. **检查 SQL 脚本**
```bash
# 查看 update_template_tree.sql
```
3. **备份数据库**(重要!)
4. **执行 SQL 脚本**
```sql
-- 在 MySQL 客户端中执行
source update_template_tree.sql;
```
5. **验证结果**
```bash
python verify_tree_structure.py
```
## 注意事项
1. **备份数据库**: 执行更新前务必备份数据库
2. **检查匹配**: 确认匹配结果是否正确
3. **层级顺序**: 更新会按照层级顺序执行,确保父节点先于子节点
4. **重复执行**: 脚本支持重复执行,已正确设置的记录会被跳过
5. **目录节点**: 如果目录节点不存在,脚本会自动创建
## 匹配结果
根据最新分析,匹配情况如下:
- ✅ **已匹配**: 26 条记录
- ⚠️ **需创建**: 2 条记录(目录节点)
- `2-初核模版` (一级目录)
- `1.初核请示` (二级目录)
所有文件记录都已正确匹配到数据库中的现有记录。
## 问题排查
### 问题1: 某些记录无法匹配
**原因**: 名称或 template_code 不匹配
**解决**:
- 检查 `DOCUMENT_TYPE_MAPPING` 字典
- 确认数据库中的 `template_code` 是否正确
- 使用 `check_existing_data.py` 查看数据库中的实际数据
### 问题2: 匹配到错误的记录
**原因**: 标准化名称匹配时选择了错误的候选
**解决**:
- 检查匹配报告,确认匹配方式
- 如果 template_code 匹配失败,检查数据库中的 template_code 是否正确
- 可以手动调整匹配逻辑
### 问题3: parent_id 更新失败
**原因**: 父节点ID不存在或层级关系错误
**解决**:
- 使用 `verify_tree_structure.py` 验证父子关系
- 检查生成的 SQL 脚本确认父节点ID是否正确
## 联系信息
如有问题,请检查:
1. 数据库连接配置是否正确
2. 目录结构是否与预期一致
3. 数据库中的记录是否完整
4. template_code 是否正确设置

View File

@ -0,0 +1,582 @@
"""
分析和修复字段编码问题
1. 分析f_polic_file_field表中的重复项
2. 检查f_polic_field表中的中文field_code
3. 根据占位符与字段对照表更新field_code
4. 合并重复项并更新关联表
"""
import os
import json
import pymysql
import re
from typing import Dict, List, Optional, Tuple
from datetime import datetime
from pathlib import Path
# 数据库连接配置
DB_CONFIG = {
'host': os.getenv('DB_HOST', '152.136.177.240'),
'port': int(os.getenv('DB_PORT', 5012)),
'user': os.getenv('DB_USER', 'finyx'),
'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'),
'database': os.getenv('DB_NAME', 'finyx'),
'charset': 'utf8mb4'
}
TENANT_ID = 615873064429507639
CREATED_BY = 655162080928945152
UPDATED_BY = 655162080928945152
CURRENT_TIME = datetime.now()
# 从占位符与字段对照表文档中提取的字段映射
# 格式: {字段名称: field_code}
FIELD_NAME_TO_CODE_MAPPING = {
# 基本信息字段
'被核查人姓名': 'target_name',
'被核查人员单位及职务': 'target_organization_and_position',
'被核查人员单位': 'target_organization',
'被核查人员职务': 'target_position',
'被核查人员性别': 'target_gender',
'被核查人员出生年月': 'target_date_of_birth',
'被核查人员出生年月日': 'target_date_of_birth_full',
'被核查人员年龄': 'target_age',
'被核查人员文化程度': 'target_education_level',
'被核查人员政治面貌': 'target_political_status',
'被核查人员职级': 'target_professional_rank',
'被核查人员身份证号': 'target_id_number',
'被核查人员身份证件及号码': 'target_id_number',
'被核查人员住址': 'target_address',
'被核查人员户籍住址': 'target_registered_address',
'被核查人员联系方式': 'target_contact',
'被核查人员籍贯': 'target_place_of_origin',
'被核查人员民族': 'target_ethnicity',
# 问题相关字段
'线索来源': 'clue_source',
'主要问题线索': 'target_issue_description',
'被核查人问题描述': 'target_problem_description',
# 审批相关字段
'初步核实审批表承办部门意见': 'department_opinion',
'初步核实审批表填表人': 'filler_name',
'批准时间': 'approval_time',
# 核查相关字段
'核查单位名称': 'investigation_unit_name',
'核查组代号': 'investigation_team_code',
'核查组组长姓名': 'investigation_team_leader_name',
'核查组成员姓名': 'investigation_team_member_names',
'核查地点': 'investigation_location',
# 风险评估相关字段
'被核查人员家庭情况': 'target_family_situation',
'被核查人员社会关系': 'target_social_relations',
'被核查人员健康状况': 'target_health_status',
'被核查人员性格特征': 'target_personality',
'被核查人员承受能力': 'target_tolerance',
'被核查人员涉及问题严重程度': 'target_issue_severity',
'被核查人员涉及其他问题的可能性': 'target_other_issues_possibility',
'被核查人员此前被审查情况': 'target_previous_investigation',
'被核查人员社会负面事件': 'target_negative_events',
'被核查人员其他情况': 'target_other_situation',
'风险等级': 'risk_level',
# 其他字段
'线索信息': 'clue_info',
'被核查人员工作基本情况线索': 'target_basic_info_clue',
'被核查人员工作基本情况': 'target_work_basic_info',
'请示报告卡请示时间': 'report_card_request_time',
'应到时间': 'appointment_time',
'应到地点': 'appointment_location',
'承办部门': 'handling_department',
'承办人': 'handler_name',
'谈话通知时间': 'notification_time',
'谈话通知地点': 'notification_location',
'被核查人员本人认识和态度': 'target_attitude',
'纪委名称': 'commission_name',
}
def is_chinese(text: str) -> bool:
"""判断字符串是否包含中文字符"""
if not text:
return False
return bool(re.search(r'[\u4e00-\u9fff]', text))
def analyze_f_polic_field(conn) -> Dict:
"""分析f_polic_field表找出中文field_code和重复项"""
cursor = conn.cursor(pymysql.cursors.DictCursor)
print("\n" + "="*80)
print("1. 分析 f_polic_field 表")
print("="*80)
# 查询所有字段
cursor.execute("""
SELECT id, name, filed_code, field_type, state
FROM f_polic_field
WHERE tenant_id = %s
ORDER BY name, filed_code
""", (TENANT_ID,))
fields = cursor.fetchall()
print(f"\n总共找到 {len(fields)} 个字段记录")
# 找出中文field_code
chinese_field_codes = []
for field in fields:
if is_chinese(field['filed_code']):
chinese_field_codes.append(field)
print(f"\n发现 {len(chinese_field_codes)} 个中文field_code:")
for field in chinese_field_codes:
print(f" - ID: {field['id']}, 名称: {field['name']}, field_code: {field['filed_code']}")
# 找出重复的字段名称
name_to_fields = {}
for field in fields:
name = field['name']
if name not in name_to_fields:
name_to_fields[name] = []
name_to_fields[name].append(field)
duplicates = {name: fields_list for name, fields_list in name_to_fields.items()
if len(fields_list) > 1}
print(f"\n发现 {len(duplicates)} 个重复的字段名称:")
for name, fields_list in duplicates.items():
print(f"\n 字段名称: {name} (共 {len(fields_list)} 条记录)")
for field in fields_list:
print(f" - ID: {field['id']}, field_code: {field['filed_code']}, "
f"field_type: {field['field_type']}, state: {field['state']}")
# 找出重复的field_code
code_to_fields = {}
for field in fields:
code = field['filed_code']
if code not in code_to_fields:
code_to_fields[code] = []
code_to_fields[code].append(field)
duplicate_codes = {code: fields_list for code, fields_list in code_to_fields.items()
if len(fields_list) > 1}
print(f"\n发现 {len(duplicate_codes)} 个重复的field_code:")
for code, fields_list in duplicate_codes.items():
print(f"\n field_code: {code} (共 {len(fields_list)} 条记录)")
for field in fields_list:
print(f" - ID: {field['id']}, 名称: {field['name']}, "
f"field_type: {field['field_type']}, state: {field['state']}")
return {
'all_fields': fields,
'chinese_field_codes': chinese_field_codes,
'duplicate_names': duplicates,
'duplicate_codes': duplicate_codes
}
def analyze_f_polic_file_field(conn) -> Dict:
"""分析f_polic_file_field表找出重复项"""
cursor = conn.cursor(pymysql.cursors.DictCursor)
print("\n" + "="*80)
print("2. 分析 f_polic_file_field 表")
print("="*80)
# 查询所有关联关系
cursor.execute("""
SELECT fff.id, fff.file_id, fff.filed_id,
fc.name as file_name, f.name as field_name, f.filed_code
FROM f_polic_file_field fff
LEFT JOIN f_polic_file_config fc ON fff.file_id = fc.id
LEFT JOIN f_polic_field f ON fff.filed_id = f.id
WHERE fff.tenant_id = %s
ORDER BY fff.file_id, fff.filed_id
""", (TENANT_ID,))
relations = cursor.fetchall()
print(f"\n总共找到 {len(relations)} 个关联关系")
# 找出重复的关联关系相同的file_id和filed_id
relation_key_to_records = {}
for rel in relations:
key = (rel['file_id'], rel['filed_id'])
if key not in relation_key_to_records:
relation_key_to_records[key] = []
relation_key_to_records[key].append(rel)
duplicates = {key: records for key, records in relation_key_to_records.items()
if len(records) > 1}
print(f"\n发现 {len(duplicates)} 个重复的关联关系:")
for (file_id, filed_id), records in duplicates.items():
print(f"\n 文件ID: {file_id}, 字段ID: {filed_id} (共 {len(records)} 条记录)")
for record in records:
print(f" - 关联ID: {record['id']}, 文件: {record['file_name']}, "
f"字段: {record['field_name']} ({record['filed_code']})")
# 统计使用中文field_code的关联关系
chinese_relations = [rel for rel in relations if rel['filed_code'] and is_chinese(rel['filed_code'])]
print(f"\n发现 {len(chinese_relations)} 个使用中文field_code的关联关系:")
for rel in chinese_relations[:10]: # 只显示前10个
print(f" - 文件: {rel['file_name']}, 字段: {rel['field_name']}, "
f"field_code: {rel['filed_code']}")
if len(chinese_relations) > 10:
print(f" ... 还有 {len(chinese_relations) - 10}")
return {
'all_relations': relations,
'duplicate_relations': duplicates,
'chinese_relations': chinese_relations
}
def get_correct_field_code(field_name: str, current_code: str) -> Optional[str]:
"""根据字段名称获取正确的field_code"""
# 首先从映射表中查找
if field_name in FIELD_NAME_TO_CODE_MAPPING:
return FIELD_NAME_TO_CODE_MAPPING[field_name]
# 如果当前code已经是英文且符合规范保留
if current_code and not is_chinese(current_code) and re.match(r'^[a-z_]+$', current_code):
return current_code
return None
def fix_f_polic_field(conn, dry_run: bool = True) -> Dict:
"""修复f_polic_field表中的问题"""
cursor = conn.cursor(pymysql.cursors.DictCursor)
print("\n" + "="*80)
print("3. 修复 f_polic_field 表")
print("="*80)
if dry_run:
print("\n[DRY RUN模式 - 不会实际修改数据库]")
# 获取所有字段
cursor.execute("""
SELECT id, name, filed_code, field_type, state
FROM f_polic_field
WHERE tenant_id = %s
""", (TENANT_ID,))
fields = cursor.fetchall()
updates = []
merges = []
# 按字段名称分组,找出需要合并的重复项
name_to_fields = {}
for field in fields:
name = field['name']
if name not in name_to_fields:
name_to_fields[name] = []
name_to_fields[name].append(field)
# 处理每个字段名称
for field_name, field_list in name_to_fields.items():
if len(field_list) == 1:
# 单个字段检查是否需要更新field_code
field = field_list[0]
correct_code = get_correct_field_code(field['name'], field['filed_code'])
if correct_code and correct_code != field['filed_code']:
updates.append({
'id': field['id'],
'name': field['name'],
'old_code': field['filed_code'],
'new_code': correct_code,
'field_type': field['field_type']
})
else:
# 多个字段,需要合并
# 找出最佳的field_code
best_field = None
best_code = None
for field in field_list:
correct_code = get_correct_field_code(field['name'], field['filed_code'])
if correct_code:
if not best_field or (field['state'] == 1 and best_field['state'] == 0):
best_field = field
best_code = correct_code
# 如果没找到最佳字段,选择第一个启用的,或者第一个
if not best_field:
enabled_fields = [f for f in field_list if f['state'] == 1]
best_field = enabled_fields[0] if enabled_fields else field_list[0]
best_code = get_correct_field_code(best_field['name'], best_field['filed_code'])
if not best_code:
# 生成一个基于名称的code
best_code = field_name.lower().replace('被核查人员', 'target_').replace('被核查人', 'target_')
best_code = re.sub(r'[^\w]', '_', best_code)
best_code = re.sub(r'_+', '_', best_code).strip('_')
# 确定要保留的字段和要删除的字段
keep_field = best_field
remove_fields = [f for f in field_list if f['id'] != keep_field['id']]
# 更新保留字段的field_code
if best_code and best_code != keep_field['filed_code']:
updates.append({
'id': keep_field['id'],
'name': keep_field['name'],
'old_code': keep_field['filed_code'],
'new_code': best_code,
'field_type': keep_field['field_type']
})
merges.append({
'keep_field_id': keep_field['id'],
'keep_field_name': keep_field['name'],
'keep_field_code': best_code or keep_field['filed_code'],
'remove_field_ids': [f['id'] for f in remove_fields],
'remove_fields': remove_fields
})
# 显示更新计划
print(f"\n需要更新 {len(updates)} 个字段的field_code:")
for update in updates:
print(f" - ID: {update['id']}, 名称: {update['name']}, "
f"{update['old_code']} -> {update['new_code']}")
print(f"\n需要合并 {len(merges)} 组重复字段:")
for merge in merges:
print(f"\n 保留字段: ID={merge['keep_field_id']}, 名称={merge['keep_field_name']}, "
f"field_code={merge['keep_field_code']}")
print(f" 删除字段: {len(merge['remove_field_ids'])}")
for remove_field in merge['remove_fields']:
print(f" - ID: {remove_field['id']}, field_code: {remove_field['filed_code']}, "
f"field_type: {remove_field['field_type']}, state: {remove_field['state']}")
# 执行更新
if not dry_run:
print("\n开始执行更新...")
# 1. 先更新field_code
for update in updates:
cursor.execute("""
UPDATE f_polic_field
SET filed_code = %s, updated_time = %s, updated_by = %s
WHERE id = %s
""", (update['new_code'], CURRENT_TIME, UPDATED_BY, update['id']))
print(f" ✓ 更新字段 ID {update['id']}: {update['old_code']} -> {update['new_code']}")
# 2. 合并重复字段:先更新关联表,再删除重复字段
for merge in merges:
keep_id = merge['keep_field_id']
for remove_id in merge['remove_field_ids']:
# 更新f_polic_file_field表中的关联
cursor.execute("""
UPDATE f_polic_file_field
SET filed_id = %s, updated_time = %s, updated_by = %s
WHERE filed_id = %s AND tenant_id = %s
""", (keep_id, CURRENT_TIME, UPDATED_BY, remove_id, TENANT_ID))
# 删除重复的字段记录
cursor.execute("""
DELETE FROM f_polic_field
WHERE id = %s AND tenant_id = %s
""", (remove_id, TENANT_ID))
print(f" ✓ 合并字段: 保留 ID {keep_id}, 删除 {len(merge['remove_field_ids'])} 个重复字段")
conn.commit()
print("\n✓ 更新完成")
else:
print("\n[DRY RUN] 以上操作不会实际执行")
return {
'updates': updates,
'merges': merges
}
def fix_f_polic_file_field(conn, dry_run: bool = True) -> Dict:
"""修复f_polic_file_field表中的重复项"""
cursor = conn.cursor(pymysql.cursors.DictCursor)
print("\n" + "="*80)
print("4. 修复 f_polic_file_field 表")
print("="*80)
if dry_run:
print("\n[DRY RUN模式 - 不会实际修改数据库]")
# 找出重复的关联关系
cursor.execute("""
SELECT file_id, filed_id, COUNT(*) as count, GROUP_CONCAT(id) as ids
FROM f_polic_file_field
WHERE tenant_id = %s
GROUP BY file_id, filed_id
HAVING count > 1
""", (TENANT_ID,))
duplicates = cursor.fetchall()
print(f"\n发现 {len(duplicates)} 组重复的关联关系")
deletes = []
for dup in duplicates:
file_id = dup['file_id']
filed_id = dup['filed_id']
ids = [int(id_str) for id_str in dup['ids'].split(',')]
# 保留第一个,删除其他的
keep_id = ids[0]
remove_ids = ids[1:]
deletes.append({
'file_id': file_id,
'filed_id': filed_id,
'keep_id': keep_id,
'remove_ids': remove_ids
})
print(f"\n 文件ID: {file_id}, 字段ID: {filed_id}")
print(f" 保留关联ID: {keep_id}")
print(f" 删除关联ID: {', '.join(map(str, remove_ids))}")
# 执行删除
if not dry_run:
print("\n开始删除重复的关联关系...")
for delete in deletes:
for remove_id in delete['remove_ids']:
cursor.execute("""
DELETE FROM f_polic_file_field
WHERE id = %s AND tenant_id = %s
""", (remove_id, TENANT_ID))
print(f" ✓ 删除文件ID {delete['file_id']} 和字段ID {delete['filed_id']} 的重复关联")
conn.commit()
print("\n✓ 删除完成")
else:
print("\n[DRY RUN] 以上操作不会实际执行")
return {
'deletes': deletes
}
def check_other_tables(conn):
"""检查其他可能受影响的表"""
cursor = conn.cursor(pymysql.cursors.DictCursor)
print("\n" + "="*80)
print("5. 检查其他关联表")
print("="*80)
# 检查f_polic_task表
print("\n检查 f_polic_task 表...")
try:
cursor.execute("""
SELECT COUNT(*) as count
FROM f_polic_task
WHERE tenant_id = %s
""", (TENANT_ID,))
task_count = cursor.fetchone()['count']
print(f" 找到 {task_count} 个任务记录")
# 检查是否有引用字段ID的列
cursor.execute("DESCRIBE f_polic_task")
columns = [col['Field'] for col in cursor.fetchall()]
print(f" 表字段: {', '.join(columns)}")
# 检查是否有引用f_polic_field的字段
field_refs = [col for col in columns if 'field' in col.lower() or 'filed' in col.lower()]
if field_refs:
print(f" 可能引用字段的列: {', '.join(field_refs)}")
except Exception as e:
print(f" 检查f_polic_task表时出错: {e}")
# 检查f_polic_file表
print("\n检查 f_polic_file 表...")
try:
cursor.execute("""
SELECT COUNT(*) as count
FROM f_polic_file
WHERE tenant_id = %s
""", (TENANT_ID,))
file_count = cursor.fetchone()['count']
print(f" 找到 {file_count} 个文件记录")
cursor.execute("DESCRIBE f_polic_file")
columns = [col['Field'] for col in cursor.fetchall()]
print(f" 表字段: {', '.join(columns)}")
except Exception as e:
print(f" 检查f_polic_file表时出错: {e}")
def main():
"""主函数"""
print("="*80)
print("字段编码问题分析和修复工具")
print("="*80)
try:
conn = pymysql.connect(**DB_CONFIG)
# 1. 分析f_polic_field表
field_analysis = analyze_f_polic_field(conn)
# 2. 分析f_polic_file_field表
relation_analysis = analyze_f_polic_file_field(conn)
# 3. 检查其他表
check_other_tables(conn)
# 4. 询问是否执行修复
print("\n" + "="*80)
print("分析完成")
print("="*80)
print("\n是否执行修复?")
print("1. 先执行DRY RUN不实际修改数据库")
print("2. 直接执行修复(会修改数据库)")
print("3. 仅查看分析结果,不执行修复")
choice = input("\n请选择 (1/2/3默认1): ").strip() or "1"
if choice == "1":
# DRY RUN
print("\n" + "="*80)
print("执行DRY RUN...")
print("="*80)
fix_f_polic_field(conn, dry_run=True)
fix_f_polic_file_field(conn, dry_run=True)
print("\n" + "="*80)
confirm = input("DRY RUN完成。是否执行实际修复(y/n默认n): ").strip().lower()
if confirm == 'y':
print("\n执行实际修复...")
fix_f_polic_field(conn, dry_run=False)
fix_f_polic_file_field(conn, dry_run=False)
print("\n✓ 修复完成!")
elif choice == "2":
# 直接执行
print("\n" + "="*80)
print("执行修复...")
print("="*80)
fix_f_polic_field(conn, dry_run=False)
fix_f_polic_file_field(conn, dry_run=False)
print("\n✓ 修复完成!")
else:
print("\n仅查看分析结果,未执行修复")
conn.close()
except Exception as e:
print(f"\n✗ 执行失败: {e}")
import traceback
traceback.print_exc()
if __name__ == '__main__':
main()

View File

@ -0,0 +1,555 @@
"""
分析和更新模板树状结构
根据 template_finish 目录结构规划树状层级并更新数据库中的 parent_id 字段
"""
import os
import json
import pymysql
from pathlib import Path
from typing import Dict, List, Optional, Tuple
from datetime import datetime
# 数据库连接配置
DB_CONFIG = {
'host': os.getenv('DB_HOST', '152.136.177.240'),
'port': int(os.getenv('DB_PORT', 5012)),
'user': os.getenv('DB_USER', 'finyx'),
'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'),
'database': os.getenv('DB_NAME', 'finyx'),
'charset': 'utf8mb4'
}
TENANT_ID = 615873064429507639
CREATED_BY = 655162080928945152
UPDATED_BY = 655162080928945152
CURRENT_TIME = datetime.now()
# 项目根目录
PROJECT_ROOT = Path(__file__).parent
TEMPLATES_DIR = PROJECT_ROOT / "template_finish"
# 从 init_all_templates.py 复制的文档类型映射
DOCUMENT_TYPE_MAPPING = {
"1.请示报告卡XXX": {
"template_code": "REPORT_CARD",
"name": "1.请示报告卡XXX",
"business_type": "INVESTIGATION"
},
"2.初步核实审批表XXX": {
"template_code": "PRELIMINARY_VERIFICATION_APPROVAL",
"name": "2.初步核实审批表XXX",
"business_type": "INVESTIGATION"
},
"3.附件初核方案(XXX)": {
"template_code": "INVESTIGATION_PLAN",
"name": "3.附件初核方案(XXX)",
"business_type": "INVESTIGATION"
},
"谈话通知书第一联": {
"template_code": "NOTIFICATION_LETTER_1",
"name": "谈话通知书第一联",
"business_type": "INVESTIGATION"
},
"谈话通知书第二联": {
"template_code": "NOTIFICATION_LETTER_2",
"name": "谈话通知书第二联",
"business_type": "INVESTIGATION"
},
"谈话通知书第三联": {
"template_code": "NOTIFICATION_LETTER_3",
"name": "谈话通知书第三联",
"business_type": "INVESTIGATION"
},
"1.请示报告卡(初核谈话)": {
"template_code": "REPORT_CARD_INTERVIEW",
"name": "1.请示报告卡(初核谈话)",
"business_type": "INVESTIGATION"
},
"2谈话审批表": {
"template_code": "INTERVIEW_APPROVAL_FORM",
"name": "2谈话审批表",
"business_type": "INVESTIGATION"
},
"3.谈话前安全风险评估表": {
"template_code": "PRE_INTERVIEW_RISK_ASSESSMENT",
"name": "3.谈话前安全风险评估表",
"business_type": "INVESTIGATION"
},
"4.谈话方案": {
"template_code": "INTERVIEW_PLAN",
"name": "4.谈话方案",
"business_type": "INVESTIGATION"
},
"5.谈话后安全风险评估表": {
"template_code": "POST_INTERVIEW_RISK_ASSESSMENT",
"name": "5.谈话后安全风险评估表",
"business_type": "INVESTIGATION"
},
"1.谈话笔录": {
"template_code": "INTERVIEW_RECORD",
"name": "1.谈话笔录",
"business_type": "INVESTIGATION"
},
"2.谈话询问对象情况摸底调查30问": {
"template_code": "INVESTIGATION_30_QUESTIONS",
"name": "2.谈话询问对象情况摸底调查30问",
"business_type": "INVESTIGATION"
},
"3.被谈话人权利义务告知书": {
"template_code": "RIGHTS_OBLIGATIONS_NOTICE",
"name": "3.被谈话人权利义务告知书",
"business_type": "INVESTIGATION"
},
"4.点对点交接单": {
"template_code": "HANDOVER_FORM",
"name": "4.点对点交接单",
"business_type": "INVESTIGATION"
},
"4.点对点交接单2": {
"template_code": "HANDOVER_FORM_2",
"name": "4.点对点交接单2",
"business_type": "INVESTIGATION"
},
"5.陪送交接单(新)": {
"template_code": "ESCORT_HANDOVER_FORM",
"name": "5.陪送交接单(新)",
"business_type": "INVESTIGATION"
},
"6.1保密承诺书(谈话对象使用-非中共党员用)": {
"template_code": "CONFIDENTIALITY_COMMITMENT_NON_PARTY",
"name": "6.1保密承诺书(谈话对象使用-非中共党员用)",
"business_type": "INVESTIGATION"
},
"6.2保密承诺书(谈话对象使用-中共党员用)": {
"template_code": "CONFIDENTIALITY_COMMITMENT_PARTY",
"name": "6.2保密承诺书(谈话对象使用-中共党员用)",
"business_type": "INVESTIGATION"
},
"7.办案人员-办案安全保密承诺书": {
"template_code": "INVESTIGATOR_CONFIDENTIALITY_COMMITMENT",
"name": "7.办案人员-办案安全保密承诺书",
"business_type": "INVESTIGATION"
},
"8-1请示报告卡初核报告结论 ": {
"template_code": "REPORT_CARD_CONCLUSION",
"name": "8-1请示报告卡初核报告结论 ",
"business_type": "INVESTIGATION"
},
"8.XXX初核情况报告": {
"template_code": "INVESTIGATION_REPORT",
"name": "8.XXX初核情况报告",
"business_type": "INVESTIGATION"
}
}
def generate_id():
"""生成ID使用时间戳+随机数的方式,模拟雪花算法)"""
import time
import random
timestamp = int(time.time() * 1000)
random_part = random.randint(100000, 999999)
return timestamp * 1000 + random_part
def identify_document_type(file_name: str) -> Optional[Dict]:
"""根据完整文件名识别文档类型"""
base_name = Path(file_name).stem
if base_name in DOCUMENT_TYPE_MAPPING:
return DOCUMENT_TYPE_MAPPING[base_name]
return None
def scan_directory_structure(base_dir: Path) -> Dict:
"""
扫描目录结构构建树状层级
Returns:
包含目录和文件层级结构的字典
"""
structure = {
'directories': {}, # {path: {'name': ..., 'parent': ..., 'level': ...}}
'files': {} # {file_path: {'name': ..., 'parent': ..., 'template_code': ...}}
}
def process_path(path: Path, parent_path: Optional[str] = None, level: int = 0):
"""递归处理路径"""
if path.is_file() and path.suffix == '.docx':
# 处理文件
file_name = path.stem
doc_config = identify_document_type(file_name)
structure['files'][str(path)] = {
'name': file_name,
'parent': parent_path,
'level': level,
'template_code': doc_config['template_code'] if doc_config else None,
'full_path': str(path)
}
elif path.is_dir():
# 处理目录
dir_name = path.name
structure['directories'][str(path)] = {
'name': dir_name,
'parent': parent_path,
'level': level
}
# 递归处理子目录和文件
for child in sorted(path.iterdir()):
if child.name != '__pycache__':
process_path(child, str(path), level + 1)
# 从根目录开始扫描
if TEMPLATES_DIR.exists():
for item in sorted(TEMPLATES_DIR.iterdir()):
if item.name != '__pycache__':
process_path(item, None, 0)
return structure
def get_existing_data(conn) -> Dict:
"""
获取数据库中的现有数据
Returns:
{
'by_id': {id: {...}},
'by_name': {name: {...}},
'by_template_code': {template_code: {...}}
}
"""
cursor = conn.cursor(pymysql.cursors.DictCursor)
sql = """
SELECT id, name, parent_id, template_code, input_data, file_path, state
FROM f_polic_file_config
WHERE tenant_id = %s
"""
cursor.execute(sql, (TENANT_ID,))
configs = cursor.fetchall()
result = {
'by_id': {},
'by_name': {},
'by_template_code': {}
}
for config in configs:
config_id = config['id']
config_name = config['name']
# 尝试从 input_data 中提取 template_code
template_code = config.get('template_code')
if not template_code and config.get('input_data'):
try:
input_data = json.loads(config['input_data']) if isinstance(config['input_data'], str) else config['input_data']
if isinstance(input_data, dict):
template_code = input_data.get('template_code')
except:
pass
result['by_id'][config_id] = config
result['by_name'][config_name] = config
if template_code:
# 如果已存在相同 template_code保留第一个
if template_code not in result['by_template_code']:
result['by_template_code'][template_code] = config
cursor.close()
return result
def analyze_structure():
"""分析目录结构和数据库数据"""
print("="*80)
print("分析模板目录结构和数据库数据")
print("="*80)
# 连接数据库
try:
conn = pymysql.connect(**DB_CONFIG)
print("✓ 数据库连接成功\n")
except Exception as e:
print(f"✗ 数据库连接失败: {e}")
return None, None
# 扫描目录结构
print("扫描目录结构...")
dir_structure = scan_directory_structure(TEMPLATES_DIR)
print(f" 找到 {len(dir_structure['directories'])} 个目录")
print(f" 找到 {len(dir_structure['files'])} 个文件\n")
# 获取数据库现有数据
print("获取数据库现有数据...")
existing_data = get_existing_data(conn)
print(f" 数据库中有 {len(existing_data['by_id'])} 条记录\n")
# 分析缺少 parent_id 的记录
print("分析缺少 parent_id 的记录...")
missing_parent = []
for config in existing_data['by_id'].values():
if config.get('parent_id') is None:
missing_parent.append(config)
print(f"{len(missing_parent)} 条记录缺少 parent_id\n")
conn.close()
return dir_structure, existing_data
def plan_tree_structure(dir_structure: Dict, existing_data: Dict) -> List[Dict]:
"""
规划树状结构
Returns:
更新计划列表每个元素包含
{
'type': 'directory' | 'file',
'name': ...,
'parent_name': ...,
'level': ...,
'action': 'create' | 'update',
'config_id': ... (如果是更新),
'template_code': ... (如果是文件)
}
"""
plan = []
# 按层级排序目录
directories = sorted(dir_structure['directories'].items(),
key=lambda x: (x[1]['level'], x[0]))
# 按层级排序文件
files = sorted(dir_structure['files'].items(),
key=lambda x: (x[1]['level'], x[0]))
# 创建目录映射用于查找父目录ID
dir_id_map = {} # {dir_path: config_id}
# 处理目录(按层级顺序)
for dir_path, dir_info in directories:
dir_name = dir_info['name']
parent_path = dir_info['parent']
level = dir_info['level']
# 查找父目录ID
parent_id = None
if parent_path:
parent_id = dir_id_map.get(parent_path)
# 检查数据库中是否已存在
existing = existing_data['by_name'].get(dir_name)
if existing:
# 更新现有记录
plan.append({
'type': 'directory',
'name': dir_name,
'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None,
'parent_id': parent_id,
'level': level,
'action': 'update',
'config_id': existing['id'],
'current_parent_id': existing.get('parent_id')
})
dir_id_map[dir_path] = existing['id']
else:
# 创建新记录(目录节点)
new_id = generate_id()
plan.append({
'type': 'directory',
'name': dir_name,
'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None,
'parent_id': parent_id,
'level': level,
'action': 'create',
'config_id': new_id,
'current_parent_id': None
})
dir_id_map[dir_path] = new_id
# 处理文件
for file_path, file_info in files:
file_name = file_info['name']
parent_path = file_info['parent']
level = file_info['level']
template_code = file_info['template_code']
# 查找父目录ID
parent_id = dir_id_map.get(parent_path) if parent_path else None
# 查找数据库中的记录(通过 template_code 或 name
existing = None
if template_code:
existing = existing_data['by_template_code'].get(template_code)
if not existing:
existing = existing_data['by_name'].get(file_name)
if existing:
# 更新现有记录
plan.append({
'type': 'file',
'name': file_name,
'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None,
'parent_id': parent_id,
'level': level,
'action': 'update',
'config_id': existing['id'],
'template_code': template_code,
'current_parent_id': existing.get('parent_id')
})
else:
# 创建新记录(文件节点)
new_id = generate_id()
plan.append({
'type': 'file',
'name': file_name,
'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None,
'parent_id': parent_id,
'level': level,
'action': 'create',
'config_id': new_id,
'template_code': template_code,
'current_parent_id': None
})
return plan
def generate_update_sql(plan: List[Dict], output_file: str = 'update_template_tree.sql'):
"""生成更新SQL脚本"""
sql_lines = [
"-- 模板树状结构更新脚本",
f"-- 生成时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}",
"-- 注意:执行前请备份数据库!",
"",
"USE finyx;",
"",
"START TRANSACTION;",
""
]
# 按层级分组
by_level = {}
for item in plan:
level = item['level']
if level not in by_level:
by_level[level] = []
by_level[level].append(item)
# 按层级顺序处理(从顶层到底层)
for level in sorted(by_level.keys()):
sql_lines.append(f"-- ===== 层级 {level} =====")
sql_lines.append("")
for item in by_level[level]:
if item['action'] == 'create':
# 创建新记录
if item['type'] == 'directory':
sql_lines.append(f"-- 创建目录节点: {item['name']}")
sql_lines.append(f"INSERT INTO f_polic_file_config")
sql_lines.append(f" (id, tenant_id, parent_id, name, input_data, file_path, created_time, created_by, updated_time, updated_by, state)")
parent_id_sql = f"{item['parent_id']}" if item['parent_id'] else "NULL"
sql_lines.append(f"VALUES ({item['config_id']}, {TENANT_ID}, {parent_id_sql}, '{item['name']}', NULL, NULL, NOW(), {CREATED_BY}, NOW(), {UPDATED_BY}, 1);")
else:
# 文件节点(需要 template_code
sql_lines.append(f"-- 创建文件节点: {item['name']}")
input_data = json.dumps({
'template_code': item.get('template_code', ''),
'business_type': 'INVESTIGATION'
}, ensure_ascii=False).replace("'", "''")
sql_lines.append(f"INSERT INTO f_polic_file_config")
sql_lines.append(f" (id, tenant_id, parent_id, name, input_data, file_path, template_code, created_time, created_by, updated_time, updated_by, state)")
parent_id_sql = f"{item['parent_id']}" if item['parent_id'] else "NULL"
template_code_sql = f"'{item.get('template_code', '')}'" if item.get('template_code') else "NULL"
sql_lines.append(f"VALUES ({item['config_id']}, {TENANT_ID}, {parent_id_sql}, '{item['name']}', '{input_data}', NULL, {template_code_sql}, NOW(), {CREATED_BY}, NOW(), {UPDATED_BY}, 1);")
sql_lines.append("")
else:
# 更新现有记录
current_parent = item.get('current_parent_id')
new_parent = item.get('parent_id')
if current_parent != new_parent:
sql_lines.append(f"-- 更新: {item['name']} (parent_id: {current_parent} -> {new_parent})")
parent_id_sql = f"{new_parent}" if new_parent else "NULL"
sql_lines.append(f"UPDATE f_polic_file_config")
sql_lines.append(f"SET parent_id = {parent_id_sql}, updated_time = NOW(), updated_by = {UPDATED_BY}")
sql_lines.append(f"WHERE id = {item['config_id']} AND tenant_id = {TENANT_ID};")
sql_lines.append("")
sql_lines.append("COMMIT;")
sql_lines.append("")
sql_lines.append("-- 更新完成")
# 写入文件
with open(output_file, 'w', encoding='utf-8') as f:
f.write('\n'.join(sql_lines))
print(f"✓ SQL脚本已生成: {output_file}")
return output_file
def print_analysis_report(dir_structure: Dict, existing_data: Dict, plan: List[Dict]):
"""打印分析报告"""
print("\n" + "="*80)
print("分析报告")
print("="*80)
print(f"\n目录结构:")
print(f" - 目录数量: {len(dir_structure['directories'])}")
print(f" - 文件数量: {len(dir_structure['files'])}")
print(f"\n数据库现状:")
print(f" - 总记录数: {len(existing_data['by_id'])}")
missing_parent = sum(1 for c in existing_data['by_id'].values() if c.get('parent_id') is None)
print(f" - 缺少 parent_id 的记录: {missing_parent}")
print(f"\n更新计划:")
create_count = sum(1 for p in plan if p['action'] == 'create')
update_count = sum(1 for p in plan if p['action'] == 'update')
print(f" - 需要创建: {create_count}")
print(f" - 需要更新: {update_count}")
print(f"\n层级分布:")
by_level = {}
for item in plan:
level = item['level']
by_level[level] = by_level.get(level, 0) + 1
for level in sorted(by_level.keys()):
print(f" - 层级 {level}: {by_level[level]} 个节点")
print("\n" + "="*80)
def main():
"""主函数"""
# 分析
dir_structure, existing_data = analyze_structure()
if not dir_structure or not existing_data:
return
# 规划树状结构
print("规划树状结构...")
plan = plan_tree_structure(dir_structure, existing_data)
print(f" 生成 {len(plan)} 个更新计划\n")
# 打印报告
print_analysis_report(dir_structure, existing_data, plan)
# 生成SQL脚本
print("\n生成SQL更新脚本...")
sql_file = generate_update_sql(plan)
print("\n" + "="*80)
print("分析完成!")
print("="*80)
print(f"\n请检查生成的SQL脚本: {sql_file}")
print("确认无误后,可以执行该脚本更新数据库。")
print("\n注意:执行前请备份数据库!")
if __name__ == '__main__':
main()

314
backup_database.py Normal file
View File

@ -0,0 +1,314 @@
"""
数据库备份脚本
支持使用mysqldump命令或Python直接导出SQL文件
"""
import os
import sys
import subprocess
import pymysql
from datetime import datetime
from pathlib import Path
from dotenv import load_dotenv
# 加载环境变量
load_dotenv()
class DatabaseBackup:
"""数据库备份类"""
def __init__(self):
"""初始化数据库配置"""
self.db_config = {
'host': os.getenv('DB_HOST', '152.136.177.240'),
'port': int(os.getenv('DB_PORT', 5012)),
'user': os.getenv('DB_USER', 'finyx'),
'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'),
'database': os.getenv('DB_NAME', 'finyx'),
'charset': 'utf8mb4'
}
# 备份文件存储目录
self.backup_dir = Path('backups')
self.backup_dir.mkdir(exist_ok=True)
def backup_with_mysqldump(self, output_file=None, compress=False):
"""
使用mysqldump命令备份数据库推荐方式
Args:
output_file: 输出文件路径如果为None则自动生成
compress: 是否压缩备份文件
Returns:
备份文件路径
"""
# 生成备份文件名
if output_file is None:
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
output_file = self.backup_dir / f"backup_{self.db_config['database']}_{timestamp}.sql"
output_file = Path(output_file)
# 构建mysqldump命令
cmd = [
'mysqldump',
f"--host={self.db_config['host']}",
f"--port={self.db_config['port']}",
f"--user={self.db_config['user']}",
f"--password={self.db_config['password']}",
'--single-transaction', # 保证数据一致性
'--routines', # 包含存储过程和函数
'--triggers', # 包含触发器
'--events', # 包含事件
'--add-drop-table', # 添加DROP TABLE语句
'--default-character-set=utf8mb4', # 设置字符集
self.db_config['database']
]
try:
print(f"开始备份数据库 {self.db_config['database']}...")
print(f"备份文件: {output_file}")
# 执行备份命令
with open(output_file, 'w', encoding='utf-8') as f:
result = subprocess.run(
cmd,
stdout=f,
stderr=subprocess.PIPE,
text=True
)
if result.returncode != 0:
error_msg = result.stderr.decode('utf-8') if result.stderr else '未知错误'
raise Exception(f"mysqldump执行失败: {error_msg}")
# 检查文件大小
file_size = output_file.stat().st_size
print(f"备份完成!文件大小: {file_size / 1024 / 1024:.2f} MB")
# 如果需要压缩
if compress:
compressed_file = self._compress_file(output_file)
print(f"压缩完成: {compressed_file}")
return str(compressed_file)
return str(output_file)
except FileNotFoundError:
print("错误: 未找到mysqldump命令请确保MySQL客户端已安装并在PATH中")
print("尝试使用Python方式备份...")
return self.backup_with_python(output_file)
except Exception as e:
print(f"备份失败: {str(e)}")
raise
def backup_with_python(self, output_file=None):
"""
使用Python直接连接数据库备份备用方式
Args:
output_file: 输出文件路径如果为None则自动生成
Returns:
备份文件路径
"""
if output_file is None:
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
output_file = self.backup_dir / f"backup_{self.db_config['database']}_{timestamp}.sql"
output_file = Path(output_file)
try:
print(f"开始使用Python方式备份数据库 {self.db_config['database']}...")
print(f"备份文件: {output_file}")
# 连接数据库
connection = pymysql.connect(**self.db_config)
cursor = connection.cursor()
with open(output_file, 'w', encoding='utf-8') as f:
# 写入文件头
f.write(f"-- MySQL数据库备份\n")
f.write(f"-- 数据库: {self.db_config['database']}\n")
f.write(f"-- 备份时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
f.write(f"-- 主机: {self.db_config['host']}:{self.db_config['port']}\n")
f.write("--\n\n")
f.write(f"SET NAMES utf8mb4;\n")
f.write(f"SET FOREIGN_KEY_CHECKS=0;\n\n")
# 获取所有表
cursor.execute("SHOW TABLES")
tables = [table[0] for table in cursor.fetchall()]
print(f"找到 {len(tables)} 个表")
# 备份每个表
for table in tables:
print(f"备份表: {table}")
# 获取表结构
cursor.execute(f"SHOW CREATE TABLE `{table}`")
create_table_sql = cursor.fetchone()[1]
f.write(f"-- ----------------------------\n")
f.write(f"-- 表结构: {table}\n")
f.write(f"-- ----------------------------\n")
f.write(f"DROP TABLE IF EXISTS `{table}`;\n")
f.write(f"{create_table_sql};\n\n")
# 获取表数据
cursor.execute(f"SELECT * FROM `{table}`")
rows = cursor.fetchall()
if rows:
# 获取列名
cursor.execute(f"DESCRIBE `{table}`")
columns = [col[0] for col in cursor.fetchall()]
f.write(f"-- ----------------------------\n")
f.write(f"-- 表数据: {table}\n")
f.write(f"-- ----------------------------\n")
# 分批写入数据
batch_size = 1000
for i in range(0, len(rows), batch_size):
batch = rows[i:i+batch_size]
values_list = []
for row in batch:
values = []
for value in row:
if value is None:
values.append('NULL')
elif isinstance(value, (int, float)):
values.append(str(value))
else:
# 转义特殊字符
escaped_value = str(value).replace('\\', '\\\\').replace("'", "\\'")
values.append(f"'{escaped_value}'")
values_list.append(f"({', '.join(values)})")
columns_str = ', '.join([f"`{col}`" for col in columns])
values_str = ',\n'.join(values_list)
f.write(f"INSERT INTO `{table}` ({columns_str}) VALUES\n")
f.write(f"{values_str};\n\n")
print(f" 完成: {len(rows)} 条记录")
f.write("SET FOREIGN_KEY_CHECKS=1;\n")
cursor.close()
connection.close()
# 检查文件大小
file_size = output_file.stat().st_size
print(f"备份完成!文件大小: {file_size / 1024 / 1024:.2f} MB")
return str(output_file)
except Exception as e:
print(f"备份失败: {str(e)}")
raise
def _compress_file(self, file_path):
"""
压缩备份文件
Args:
file_path: 文件路径
Returns:
压缩后的文件路径
"""
import gzip
file_path = Path(file_path)
compressed_path = file_path.with_suffix('.sql.gz')
with open(file_path, 'rb') as f_in:
with gzip.open(compressed_path, 'wb') as f_out:
f_out.writelines(f_in)
# 删除原文件
file_path.unlink()
return compressed_path
def list_backups(self):
"""
列出所有备份文件
Returns:
备份文件列表
"""
backups = []
for file in sorted(self.backup_dir.glob('backup_*.sql*'), reverse=True):
file_info = {
'filename': file.name,
'path': str(file),
'size': file.stat().st_size,
'size_mb': file.stat().st_size / 1024 / 1024,
'modified': datetime.fromtimestamp(file.stat().st_mtime)
}
backups.append(file_info)
return backups
def main():
"""主函数"""
import argparse
parser = argparse.ArgumentParser(description='数据库备份工具')
parser.add_argument('--method', choices=['mysqldump', 'python', 'auto'],
default='auto', help='备份方法 (默认: auto)')
parser.add_argument('--output', '-o', help='输出文件路径')
parser.add_argument('--compress', '-c', action='store_true',
help='压缩备份文件')
parser.add_argument('--list', '-l', action='store_true',
help='列出所有备份文件')
args = parser.parse_args()
backup = DatabaseBackup()
# 列出备份文件
if args.list:
backups = backup.list_backups()
if backups:
print(f"\n找到 {len(backups)} 个备份文件:\n")
print(f"{'文件名':<50} {'大小(MB)':<15} {'修改时间':<20}")
print("-" * 85)
for b in backups:
print(f"{b['filename']:<50} {b['size_mb']:<15.2f} {b['modified'].strftime('%Y-%m-%d %H:%M:%S'):<20}")
else:
print("未找到备份文件")
return
# 执行备份
try:
if args.method == 'mysqldump':
backup_file = backup.backup_with_mysqldump(args.output, args.compress)
elif args.method == 'python':
backup_file = backup.backup_with_python(args.output)
else: # auto
try:
backup_file = backup.backup_with_mysqldump(args.output, args.compress)
except:
print("\nmysqldump方式失败切换到Python方式...")
backup_file = backup.backup_with_python(args.output)
print(f"\n备份成功!")
print(f"备份文件: {backup_file}")
except Exception as e:
print(f"\n备份失败: {str(e)}")
sys.exit(1)
if __name__ == '__main__':
main()

Binary file not shown.

105
check_existing_data.py Normal file
View File

@ -0,0 +1,105 @@
"""
检查数据库中的现有数据确认匹配情况
"""
import os
import json
import pymysql
from pathlib import Path
# 数据库连接配置
DB_CONFIG = {
'host': os.getenv('DB_HOST', '152.136.177.240'),
'port': int(os.getenv('DB_PORT', 5012)),
'user': os.getenv('DB_USER', 'finyx'),
'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'),
'database': os.getenv('DB_NAME', 'finyx'),
'charset': 'utf8mb4'
}
TENANT_ID = 615873064429507639
def check_existing_data():
"""检查数据库中的现有数据"""
print("="*80)
print("检查数据库中的现有数据")
print("="*80)
try:
conn = pymysql.connect(**DB_CONFIG)
cursor = conn.cursor(pymysql.cursors.DictCursor)
# 查询所有记录
sql = """
SELECT id, name, parent_id, template_code, input_data, file_path, state
FROM f_polic_file_config
WHERE tenant_id = %s
ORDER BY name
"""
cursor.execute(sql, (TENANT_ID,))
configs = cursor.fetchall()
print(f"\n共找到 {len(configs)} 条记录\n")
# 按 parent_id 分组统计
with_parent = []
without_parent = []
for config in configs:
# 尝试从 input_data 中提取 template_code
template_code = config.get('template_code')
if not template_code and config.get('input_data'):
try:
input_data = json.loads(config['input_data']) if isinstance(config['input_data'], str) else config['input_data']
if isinstance(input_data, dict):
template_code = input_data.get('template_code')
except:
pass
config['extracted_template_code'] = template_code
if config.get('parent_id'):
with_parent.append(config)
else:
without_parent.append(config)
print(f"有 parent_id 的记录: {len(with_parent)}")
print(f"无 parent_id 的记录: {len(without_parent)}\n")
# 显示无 parent_id 的记录
print("="*80)
print("无 parent_id 的记录列表:")
print("="*80)
for i, config in enumerate(without_parent, 1):
print(f"\n{i}. {config['name']}")
print(f" ID: {config['id']}")
print(f" template_code: {config.get('extracted_template_code') or config.get('template_code') or ''}")
print(f" file_path: {config.get('file_path', '')}")
print(f" state: {config.get('state')}")
# 显示有 parent_id 的记录(树状结构)
print("\n" + "="*80)
print("有 parent_id 的记录(树状结构):")
print("="*80)
# 构建ID到名称的映射
id_to_name = {config['id']: config['name'] for config in configs}
for config in with_parent:
parent_name = id_to_name.get(config['parent_id'], f"ID:{config['parent_id']}")
print(f"\n{config['name']}")
print(f" ID: {config['id']}")
print(f" 父节点: {parent_name} (ID: {config['parent_id']})")
print(f" template_code: {config.get('extracted_template_code') or config.get('template_code') or ''}")
cursor.close()
conn.close()
except Exception as e:
print(f"错误: {e}")
import traceback
traceback.print_exc()
if __name__ == '__main__':
check_existing_data()

131
check_remaining_fields.py Normal file
View File

@ -0,0 +1,131 @@
"""
检查剩余的未处理字段并生成合适的field_code
"""
import os
import pymysql
import re
from typing import Dict, List
# 数据库连接配置
DB_CONFIG = {
'host': os.getenv('DB_HOST', '152.136.177.240'),
'port': int(os.getenv('DB_PORT', 5012)),
'user': os.getenv('DB_USER', 'finyx'),
'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'),
'database': os.getenv('DB_NAME', 'finyx'),
'charset': 'utf8mb4'
}
TENANT_ID = 615873064429507639
def is_chinese(text: str) -> bool:
"""判断字符串是否包含中文字符"""
if not text:
return False
return bool(re.search(r'[\u4e00-\u9fff]', text))
def generate_field_code(field_name: str) -> str:
"""根据字段名称生成field_code"""
# 移除常见前缀
name = field_name.replace('被核查人员', 'target_').replace('被核查人', 'target_')
# 转换为小写并替换特殊字符
code = name.lower()
code = re.sub(r'[^\w\u4e00-\u9fff]', '_', code)
code = re.sub(r'_+', '_', code).strip('_')
# 如果还是中文,尝试更智能的转换
if is_chinese(code):
# 简单的拼音映射(这里只是示例,实际应该使用拼音库)
# 暂时使用更简单的规则
code = field_name.lower()
code = code.replace('被核查人员', 'target_')
code = code.replace('被核查人', 'target_')
code = code.replace('谈话', 'interview_')
code = code.replace('审批', 'approval_')
code = code.replace('核查', 'investigation_')
code = code.replace('人员', '')
code = code.replace('时间', '_time')
code = code.replace('地点', '_location')
code = code.replace('部门', '_department')
code = code.replace('姓名', '_name')
code = code.replace('号码', '_number')
code = code.replace('情况', '_situation')
code = code.replace('问题', '_issue')
code = code.replace('描述', '_description')
code = re.sub(r'[^\w]', '_', code)
code = re.sub(r'_+', '_', code).strip('_')
return code
def check_remaining_fields():
"""检查剩余的未处理字段"""
conn = pymysql.connect(**DB_CONFIG)
cursor = conn.cursor(pymysql.cursors.DictCursor)
print("="*80)
print("检查剩余的未处理字段")
print("="*80)
# 查询所有包含中文field_code的字段
cursor.execute("""
SELECT id, name, filed_code, field_type, state
FROM f_polic_field
WHERE tenant_id = %s AND (
filed_code REGEXP '[\\u4e00-\\u9fff]'
OR filed_code IS NULL
OR filed_code = ''
)
ORDER BY name
""", (TENANT_ID,))
fields = cursor.fetchall()
print(f"\n找到 {len(fields)} 个仍需要处理的字段:\n")
suggestions = []
for field in fields:
suggested_code = generate_field_code(field['name'])
suggestions.append({
'id': field['id'],
'name': field['name'],
'current_code': field['filed_code'],
'suggested_code': suggested_code,
'field_type': field['field_type']
})
print(f" ID: {field['id']}")
print(f" 名称: {field['name']}")
print(f" 当前field_code: {field['filed_code']}")
print(f" 建议field_code: {suggested_code}")
print(f" field_type: {field['field_type']}")
print()
# 询问是否更新
if suggestions:
print("="*80)
choice = input("是否更新这些字段的field_code(y/n默认n): ").strip().lower()
if choice == 'y':
print("\n开始更新...")
for sug in suggestions:
cursor.execute("""
UPDATE f_polic_field
SET filed_code = %s, updated_time = NOW(), updated_by = %s
WHERE id = %s
""", (sug['suggested_code'], 655162080928945152, sug['id']))
print(f" ✓ 更新字段 ID {sug['id']}: {sug['name']} -> {sug['suggested_code']}")
conn.commit()
print("\n✓ 更新完成")
else:
print("未执行更新")
cursor.close()
conn.close()
if __name__ == '__main__':
check_remaining_fields()

View File

@ -0,0 +1,260 @@
"""
修复缺少字段关联的模板
为有 template_code 但没有字段关联的文件节点补充字段关联
"""
import os
import json
import pymysql
from typing import Dict, List
# 数据库连接配置
DB_CONFIG = {
'host': os.getenv('DB_HOST', '152.136.177.240'),
'port': int(os.getenv('DB_PORT', 5012)),
'user': os.getenv('DB_USER', 'finyx'),
'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'),
'database': os.getenv('DB_NAME', 'finyx'),
'charset': 'utf8mb4'
}
TENANT_ID = 615873064429507639
CREATED_BY = 655162080928945152
UPDATED_BY = 655162080928945152
def generate_id():
"""生成ID"""
import time
import random
timestamp = int(time.time() * 1000)
random_part = random.randint(100000, 999999)
return timestamp * 1000 + random_part
def get_templates_without_relations(conn):
"""获取没有字段关联的文件节点"""
cursor = conn.cursor(pymysql.cursors.DictCursor)
sql = """
SELECT
fc.id,
fc.name,
fc.template_code,
fc.input_data,
COUNT(ff.id) as relation_count
FROM f_polic_file_config fc
LEFT JOIN f_polic_file_field ff ON fc.id = ff.file_id AND ff.tenant_id = fc.tenant_id
WHERE fc.tenant_id = %s
AND fc.template_code IS NOT NULL
AND fc.template_code != ''
GROUP BY fc.id, fc.name, fc.template_code, fc.input_data
HAVING relation_count = 0
ORDER BY fc.name
"""
cursor.execute(sql, (TENANT_ID,))
templates = cursor.fetchall()
cursor.close()
return templates
def get_fields_by_code(conn):
"""获取所有字段,按字段编码索引"""
cursor = conn.cursor(pymysql.cursors.DictCursor)
sql = """
SELECT id, name, filed_code, field_type
FROM f_polic_field
WHERE tenant_id = %s
"""
cursor.execute(sql, (TENANT_ID,))
fields = cursor.fetchall()
result = {
'by_code': {},
'by_name': {}
}
for field in fields:
field_code = field['filed_code']
field_name = field['name']
result['by_code'][field_code] = field
result['by_name'][field_name] = field
cursor.close()
return result
def extract_fields_from_input_data(input_data: str) -> List[str]:
"""从 input_data 中提取字段编码列表"""
try:
data = json.loads(input_data) if isinstance(input_data, str) else input_data
if isinstance(data, dict):
return data.get('input_fields', [])
except:
pass
return []
def create_field_relations(conn, file_id: int, field_codes: List[str], field_type: int,
db_fields: Dict, dry_run: bool = True):
"""创建字段关联关系"""
cursor = conn.cursor()
try:
created_count = 0
for field_code in field_codes:
field = db_fields['by_code'].get(field_code)
if not field:
print(f" ⚠ 字段不存在: {field_code}")
continue
if field['field_type'] != field_type:
print(f" ⚠ 字段类型不匹配: {field_code} (期望 {field_type}, 实际 {field['field_type']})")
continue
if not dry_run:
# 检查是否已存在
check_sql = """
SELECT id FROM f_polic_file_field
WHERE tenant_id = %s AND file_id = %s AND filed_id = %s
"""
cursor.execute(check_sql, (TENANT_ID, file_id, field['id']))
existing = cursor.fetchone()
if not existing:
relation_id = generate_id()
insert_sql = """
INSERT INTO f_polic_file_field
(id, tenant_id, file_id, filed_id, created_time, created_by, updated_time, updated_by, state)
VALUES (%s, %s, %s, %s, NOW(), %s, NOW(), %s, %s)
"""
cursor.execute(insert_sql, (
relation_id, TENANT_ID, file_id, field['id'],
CREATED_BY, UPDATED_BY, 1
))
created_count += 1
print(f" ✓ 创建关联: {field['name']} ({field_code})")
else:
created_count += 1
print(f" [模拟] 将创建关联: {field_code}")
if not dry_run:
conn.commit()
return created_count
finally:
cursor.close()
def main():
"""主函数"""
print("="*80)
print("修复缺少字段关联的模板")
print("="*80)
try:
conn = pymysql.connect(**DB_CONFIG)
print("✓ 数据库连接成功\n")
except Exception as e:
print(f"✗ 数据库连接失败: {e}")
return
try:
# 获取没有字段关联的模板
print("查找缺少字段关联的模板...")
templates = get_templates_without_relations(conn)
print(f" 找到 {len(templates)} 个缺少字段关联的文件节点\n")
if not templates:
print("✓ 所有文件节点都有字段关联,无需修复")
return
# 获取所有字段
print("获取字段定义...")
db_fields = get_fields_by_code(conn)
print(f" 找到 {len(db_fields['by_code'])} 个字段\n")
# 显示需要修复的模板
print("需要修复的模板:")
for template in templates:
print(f" - {template['name']} (code: {template['template_code']})")
# 尝试从 input_data 中提取字段
print("\n" + "="*80)
print("分析并修复")
print("="*80)
fixable_count = 0
unfixable_count = 0
for template in templates:
print(f"\n处理: {template['name']}")
print(f" template_code: {template['template_code']}")
input_data = template.get('input_data')
if not input_data:
print(" ⚠ 没有 input_data无法自动修复")
unfixable_count += 1
continue
# 从 input_data 中提取输入字段
input_fields = extract_fields_from_input_data(input_data)
if not input_fields:
print(" ⚠ input_data 中没有 input_fields无法自动修复")
unfixable_count += 1
continue
print(f" 找到 {len(input_fields)} 个输入字段")
fixable_count += 1
# 创建输入字段关联
print(" 创建输入字段关联...")
created = create_field_relations(conn, template['id'], input_fields, 1, db_fields, dry_run=True)
print(f" 将创建 {created} 个输入字段关联")
print("\n" + "="*80)
print("统计")
print("="*80)
print(f" 可修复: {fixable_count}")
print(f" 无法自动修复: {unfixable_count}")
# 询问是否执行
if fixable_count > 0:
print("\n" + "="*80)
response = input("\n是否执行修复?(yes/no默认no): ").strip().lower()
if response == 'yes':
print("\n执行修复...")
for template in templates:
input_data = template.get('input_data')
if not input_data:
continue
input_fields = extract_fields_from_input_data(input_data)
if not input_fields:
continue
print(f"\n修复: {template['name']}")
create_field_relations(conn, template['id'], input_fields, 1, db_fields, dry_run=False)
print("\n" + "="*80)
print("✓ 修复完成!")
print("="*80)
else:
print("\n已取消修复")
else:
print("\n没有可以自动修复的模板")
finally:
conn.close()
print("\n数据库连接已关闭")
if __name__ == '__main__':
main()

View File

@ -0,0 +1,201 @@
"""
只修复真正包含中文的field_code字段
"""
import os
import pymysql
import re
from typing import Dict
# 数据库连接配置
DB_CONFIG = {
'host': os.getenv('DB_HOST', '152.136.177.240'),
'port': int(os.getenv('DB_PORT', 5012)),
'user': os.getenv('DB_USER', 'finyx'),
'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'),
'database': os.getenv('DB_NAME', 'finyx'),
'charset': 'utf8mb4'
}
TENANT_ID = 615873064429507639
UPDATED_BY = 655162080928945152
# 字段名称到field_code的映射针对剩余的中文字段
FIELD_MAPPING = {
# 谈话相关字段
'拟谈话地点': 'proposed_interview_location',
'拟谈话时间': 'proposed_interview_time',
'谈话事由': 'interview_reason',
'谈话人': 'interviewer',
'谈话人员-安全员': 'interview_personnel_safety_officer',
'谈话人员-组长': 'interview_personnel_leader',
'谈话人员-谈话人员': 'interview_personnel',
'谈话前安全风险评估结果': 'pre_interview_risk_assessment_result',
'谈话地点': 'interview_location',
'谈话次数': 'interview_count',
# 被核查人员相关字段
'被核查人单位及职务': 'target_organization_and_position', # 注意:这个和"被核查人员单位及职务"应该是同一个
'被核查人员交代问题程度': 'target_confession_level',
'被核查人员减压后的表现': 'target_behavior_after_relief',
'被核查人员学历': 'target_education', # 注意:这个和"被核查人员文化程度"可能不同
'被核查人员工作履历': 'target_work_history',
'被核查人员思想负担程度': 'target_mental_burden_level',
'被核查人员职业': 'target_occupation',
'被核查人员谈话中的表现': 'target_behavior_during_interview',
'被核查人员问题严重程度': 'target_issue_severity_level',
'被核查人员风险等级': 'target_risk_level',
'被核查人基本情况': 'target_basic_info',
# 其他字段
'补空人员': 'backup_personnel',
'记录人': 'recorder',
'评估意见': 'assessment_opinion',
}
def is_chinese(text: str) -> bool:
"""判断字符串是否完全或主要包含中文字符"""
if not text:
return False
# 如果包含中文字符且中文字符占比超过50%,认为是中文
chinese_chars = len(re.findall(r'[\u4e00-\u9fff]', text))
total_chars = len(text)
if total_chars == 0:
return False
return chinese_chars / total_chars > 0.3 # 如果中文字符占比超过30%,认为是中文
def fix_chinese_fields(dry_run: bool = True):
"""修复包含中文的field_code字段"""
conn = pymysql.connect(**DB_CONFIG)
cursor = conn.cursor(pymysql.cursors.DictCursor)
print("="*80)
print("修复包含中文的field_code字段")
print("="*80)
if dry_run:
print("\n[DRY RUN模式 - 不会实际修改数据库]")
# 查询所有字段
cursor.execute("""
SELECT id, name, filed_code, field_type, state
FROM f_polic_field
WHERE tenant_id = %s
ORDER BY name
""", (TENANT_ID,))
all_fields = cursor.fetchall()
# 找出field_code包含中文的字段
chinese_fields = []
for field in all_fields:
if field['filed_code'] and is_chinese(field['filed_code']):
chinese_fields.append(field)
print(f"\n找到 {len(chinese_fields)} 个field_code包含中文的字段:\n")
updates = []
for field in chinese_fields:
field_name = field['name']
new_code = FIELD_MAPPING.get(field_name)
if not new_code:
# 如果没有映射生成一个基于名称的code
new_code = field_name.lower()
new_code = new_code.replace('被核查人员', 'target_').replace('被核查人', 'target_')
new_code = new_code.replace('谈话', 'interview_')
new_code = new_code.replace('人员', '')
new_code = new_code.replace('时间', '_time')
new_code = new_code.replace('地点', '_location')
new_code = new_code.replace('问题', '_issue')
new_code = new_code.replace('情况', '_situation')
new_code = new_code.replace('程度', '_level')
new_code = new_code.replace('表现', '_behavior')
new_code = new_code.replace('等级', '_level')
new_code = new_code.replace('履历', '_history')
new_code = new_code.replace('学历', '_education')
new_code = new_code.replace('职业', '_occupation')
new_code = new_code.replace('事由', '_reason')
new_code = new_code.replace('次数', '_count')
new_code = new_code.replace('结果', '_result')
new_code = new_code.replace('意见', '_opinion')
new_code = re.sub(r'[^\w]', '_', new_code)
new_code = re.sub(r'_+', '_', new_code).strip('_')
new_code = new_code.replace('__', '_')
updates.append({
'id': field['id'],
'name': field_name,
'old_code': field['filed_code'],
'new_code': new_code,
'field_type': field['field_type']
})
print(f" ID: {field['id']}")
print(f" 名称: {field_name}")
print(f" 当前field_code: {field['filed_code']}")
print(f" 新field_code: {new_code}")
print()
# 检查是否有重复的new_code
code_to_fields = {}
for update in updates:
code = update['new_code']
if code not in code_to_fields:
code_to_fields[code] = []
code_to_fields[code].append(update)
duplicate_codes = {code: fields_list for code, fields_list in code_to_fields.items()
if len(fields_list) > 1}
if duplicate_codes:
print("\n⚠ 警告以下field_code会重复:")
for code, fields_list in duplicate_codes.items():
print(f" field_code: {code}")
for field in fields_list:
print(f" - ID: {field['id']}, 名称: {field['name']}")
print()
# 执行更新
if not dry_run:
print("开始执行更新...\n")
for update in updates:
cursor.execute("""
UPDATE f_polic_field
SET filed_code = %s, updated_time = NOW(), updated_by = %s
WHERE id = %s
""", (update['new_code'], UPDATED_BY, update['id']))
print(f" ✓ 更新字段 ID {update['id']}: {update['name']}")
print(f" {update['old_code']} -> {update['new_code']}")
conn.commit()
print("\n✓ 更新完成")
else:
print("[DRY RUN] 以上操作不会实际执行")
cursor.close()
conn.close()
return updates
if __name__ == '__main__':
print("是否执行修复?")
print("1. DRY RUN不实际修改数据库")
print("2. 直接执行修复(会修改数据库)")
choice = input("\n请选择 (1/2默认1): ").strip() or "1"
if choice == "2":
print("\n执行实际修复...")
fix_chinese_fields(dry_run=False)
else:
print("\n执行DRY RUN...")
updates = fix_chinese_fields(dry_run=True)
if updates:
confirm = input("\nDRY RUN完成。是否执行实际修复(y/n默认n): ").strip().lower()
if confirm == 'y':
print("\n执行实际修复...")
fix_chinese_fields(dry_run=False)

View File

@ -0,0 +1,191 @@
"""
修复剩余的中文field_code字段
为这些字段生成合适的英文field_code
"""
import os
import pymysql
import re
from typing import Dict
# 数据库连接配置
DB_CONFIG = {
'host': os.getenv('DB_HOST', '152.136.177.240'),
'port': int(os.getenv('DB_PORT', 5012)),
'user': os.getenv('DB_USER', 'finyx'),
'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'),
'database': os.getenv('DB_NAME', 'finyx'),
'charset': 'utf8mb4'
}
TENANT_ID = 615873064429507639
UPDATED_BY = 655162080928945152
# 字段名称到field_code的映射针对剩余的中文字段
FIELD_MAPPING = {
# 谈话相关字段
'拟谈话地点': 'proposed_interview_location',
'拟谈话时间': 'proposed_interview_time',
'谈话事由': 'interview_reason',
'谈话人': 'interviewer',
'谈话人员-安全员': 'interview_personnel_safety_officer',
'谈话人员-组长': 'interview_personnel_leader',
'谈话人员-谈话人员': 'interview_personnel',
'谈话前安全风险评估结果': 'pre_interview_risk_assessment_result',
'谈话地点': 'interview_location',
'谈话次数': 'interview_count',
# 被核查人员相关字段
'被核查人单位及职务': 'target_organization_and_position', # 注意:这个和"被核查人员单位及职务"应该是同一个
'被核查人员交代问题程度': 'target_confession_level',
'被核查人员减压后的表现': 'target_behavior_after_relief',
'被核查人员学历': 'target_education', # 注意:这个和"被核查人员文化程度"可能不同
'被核查人员工作履历': 'target_work_history',
'被核查人员思想负担程度': 'target_mental_burden_level',
'被核查人员职业': 'target_occupation',
'被核查人员谈话中的表现': 'target_behavior_during_interview',
'被核查人员问题严重程度': 'target_issue_severity_level',
'被核查人员风险等级': 'target_risk_level',
'被核查人基本情况': 'target_basic_info',
# 其他字段
'补空人员': 'backup_personnel',
'记录人': 'recorder',
'评估意见': 'assessment_opinion',
}
def is_chinese(text: str) -> bool:
"""判断字符串是否包含中文字符"""
if not text:
return False
return bool(re.search(r'[\u4e00-\u9fff]', text))
def fix_remaining_fields(dry_run: bool = True):
"""修复剩余的中文field_code字段"""
conn = pymysql.connect(**DB_CONFIG)
cursor = conn.cursor(pymysql.cursors.DictCursor)
print("="*80)
print("修复剩余的中文field_code字段")
print("="*80)
if dry_run:
print("\n[DRY RUN模式 - 不会实际修改数据库]")
# 查询所有包含中文field_code的字段
cursor.execute("""
SELECT id, name, filed_code, field_type, state
FROM f_polic_field
WHERE tenant_id = %s AND filed_code REGEXP '[\\u4e00-\\u9fff]'
ORDER BY name
""", (TENANT_ID,))
fields = cursor.fetchall()
print(f"\n找到 {len(fields)} 个需要修复的字段:\n")
updates = []
for field in fields:
field_name = field['name']
new_code = FIELD_MAPPING.get(field_name)
if not new_code:
# 如果没有映射生成一个基于名称的code
new_code = field_name.lower()
new_code = new_code.replace('被核查人员', 'target_').replace('被核查人', 'target_')
new_code = new_code.replace('谈话', 'interview_')
new_code = new_code.replace('人员', '')
new_code = new_code.replace('时间', '_time')
new_code = new_code.replace('地点', '_location')
new_code = new_code.replace('问题', '_issue')
new_code = new_code.replace('情况', '_situation')
new_code = new_code.replace('程度', '_level')
new_code = new_code.replace('表现', '_behavior')
new_code = new_code.replace('等级', '_level')
new_code = new_code.replace('履历', '_history')
new_code = new_code.replace('学历', '_education')
new_code = new_code.replace('职业', '_occupation')
new_code = new_code.replace('事由', '_reason')
new_code = new_code.replace('次数', '_count')
new_code = new_code.replace('结果', '_result')
new_code = new_code.replace('意见', '_opinion')
new_code = re.sub(r'[^\w]', '_', new_code)
new_code = re.sub(r'_+', '_', new_code).strip('_')
new_code = new_code.replace('__', '_')
updates.append({
'id': field['id'],
'name': field_name,
'old_code': field['filed_code'],
'new_code': new_code,
'field_type': field['field_type']
})
print(f" ID: {field['id']}")
print(f" 名称: {field_name}")
print(f" 当前field_code: {field['filed_code']}")
print(f" 新field_code: {new_code}")
print()
# 检查是否有重复的new_code
code_to_fields = {}
for update in updates:
code = update['new_code']
if code not in code_to_fields:
code_to_fields[code] = []
code_to_fields[code].append(update)
duplicate_codes = {code: fields_list for code, fields_list in code_to_fields.items()
if len(fields_list) > 1}
if duplicate_codes:
print("\n⚠ 警告以下field_code会重复:")
for code, fields_list in duplicate_codes.items():
print(f" field_code: {code}")
for field in fields_list:
print(f" - ID: {field['id']}, 名称: {field['name']}")
print()
# 执行更新
if not dry_run:
print("开始执行更新...\n")
for update in updates:
cursor.execute("""
UPDATE f_polic_field
SET filed_code = %s, updated_time = NOW(), updated_by = %s
WHERE id = %s
""", (update['new_code'], UPDATED_BY, update['id']))
print(f" ✓ 更新字段 ID {update['id']}: {update['name']}")
print(f" {update['old_code']} -> {update['new_code']}")
conn.commit()
print("\n✓ 更新完成")
else:
print("[DRY RUN] 以上操作不会实际执行")
cursor.close()
conn.close()
return updates
if __name__ == '__main__':
print("是否执行修复?")
print("1. DRY RUN不实际修改数据库")
print("2. 直接执行修复(会修改数据库)")
choice = input("\n请选择 (1/2默认1): ").strip() or "1"
if choice == "2":
print("\n执行实际修复...")
fix_remaining_fields(dry_run=False)
else:
print("\n执行DRY RUN...")
updates = fix_remaining_fields(dry_run=True)
if updates:
confirm = input("\nDRY RUN完成。是否执行实际修复(y/n默认n): ").strip().lower()
if confirm == 'y':
print("\n执行实际修复...")
fix_remaining_fields(dry_run=False)

117
generate_download_urls.py Normal file
View File

@ -0,0 +1,117 @@
"""
为指定的文件路径生成 MinIO 预签名下载 URL
"""
from minio import Minio
from datetime import timedelta
# MinIO连接配置
MINIO_CONFIG = {
'endpoint': 'minio.datacubeworld.com:9000',
'access_key': 'JOLXFXny3avFSzB0uRA5',
'secret_key': 'G1BR8jStNfovkfH5ou39EmPl34E4l7dGrnd3Cz0I',
'secure': True
}
BUCKET_NAME = 'finyx'
# 文件相对路径列表
FILE_PATHS = [
'/615873064429507639/20251209170434/初步核实审批表_张三.docx',
'/615873064429507639/20251209170434/请示报告卡_张三.docx'
]
def generate_download_urls():
"""为文件路径列表生成下载 URL"""
print("="*80)
print("生成 MinIO 下载链接")
print("="*80)
try:
# 创建MinIO客户端
client = Minio(
MINIO_CONFIG['endpoint'],
access_key=MINIO_CONFIG['access_key'],
secret_key=MINIO_CONFIG['secret_key'],
secure=MINIO_CONFIG['secure']
)
print(f"\n存储桶: {BUCKET_NAME}")
print(f"端点: {MINIO_CONFIG['endpoint']}")
print(f"使用HTTPS: {MINIO_CONFIG['secure']}\n")
results = []
for file_path in FILE_PATHS:
# 去掉开头的斜杠,得到对象名称
object_name = file_path.lstrip('/')
print("-"*80)
print(f"文件: {file_path}")
print(f"对象名称: {object_name}")
try:
# 检查文件是否存在
stat = client.stat_object(BUCKET_NAME, object_name)
print(f"✓ 文件存在")
print(f" 文件大小: {stat.size:,} 字节")
print(f" 最后修改: {stat.last_modified}")
# 生成预签名URL7天有效期
url = client.presigned_get_object(
BUCKET_NAME,
object_name,
expires=timedelta(days=7)
)
print(f"✓ 预签名URL生成成功7天有效")
print(f"\n下载链接:")
print(f"{url}\n")
results.append({
'file_path': file_path,
'object_name': object_name,
'url': url,
'size': stat.size,
'exists': True
})
except Exception as e:
print(f"✗ 错误: {e}\n")
results.append({
'file_path': file_path,
'object_name': object_name,
'url': None,
'exists': False,
'error': str(e)
})
# 输出汇总
print("\n" + "="*80)
print("下载链接汇总")
print("="*80)
for i, result in enumerate(results, 1):
print(f"\n{i}. {result['file_path']}")
if result['exists']:
print(f" ✓ 文件存在")
print(f" 下载链接: {result['url']}")
else:
print(f" ✗ 文件不存在或无法访问")
if 'error' in result:
print(f" 错误: {result['error']}")
print("\n" + "="*80)
print("完成")
print("="*80)
return results
except Exception as e:
print(f"\n✗ 连接MinIO失败: {e}")
import traceback
traceback.print_exc()
return None
if __name__ == '__main__':
generate_download_urls()

View File

@ -0,0 +1,478 @@
"""
改进的匹配和更新脚本
增强匹配逻辑能够匹配数据库中的已有数据
"""
import os
import json
import pymysql
import re
from pathlib import Path
from typing import Dict, List, Optional, Tuple
from datetime import datetime
# 数据库连接配置
DB_CONFIG = {
'host': os.getenv('DB_HOST', '152.136.177.240'),
'port': int(os.getenv('DB_PORT', 5012)),
'user': os.getenv('DB_USER', 'finyx'),
'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'),
'database': os.getenv('DB_NAME', 'finyx'),
'charset': 'utf8mb4'
}
TENANT_ID = 615873064429507639
CREATED_BY = 655162080928945152
UPDATED_BY = 655162080928945152
# 项目根目录
PROJECT_ROOT = Path(__file__).parent
TEMPLATES_DIR = PROJECT_ROOT / "template_finish"
# 文档类型映射
DOCUMENT_TYPE_MAPPING = {
"1.请示报告卡XXX": {
"template_code": "REPORT_CARD",
"name": "1.请示报告卡XXX",
"business_type": "INVESTIGATION"
},
"2.初步核实审批表XXX": {
"template_code": "PRELIMINARY_VERIFICATION_APPROVAL",
"name": "2.初步核实审批表XXX",
"business_type": "INVESTIGATION"
},
"3.附件初核方案(XXX)": {
"template_code": "INVESTIGATION_PLAN",
"name": "3.附件初核方案(XXX)",
"business_type": "INVESTIGATION"
},
"谈话通知书第一联": {
"template_code": "NOTIFICATION_LETTER_1",
"name": "谈话通知书第一联",
"business_type": "INVESTIGATION"
},
"谈话通知书第二联": {
"template_code": "NOTIFICATION_LETTER_2",
"name": "谈话通知书第二联",
"business_type": "INVESTIGATION"
},
"谈话通知书第三联": {
"template_code": "NOTIFICATION_LETTER_3",
"name": "谈话通知书第三联",
"business_type": "INVESTIGATION"
},
"1.请示报告卡(初核谈话)": {
"template_code": "REPORT_CARD_INTERVIEW",
"name": "1.请示报告卡(初核谈话)",
"business_type": "INVESTIGATION"
},
"2谈话审批表": {
"template_code": "INTERVIEW_APPROVAL_FORM",
"name": "2谈话审批表",
"business_type": "INVESTIGATION"
},
"3.谈话前安全风险评估表": {
"template_code": "PRE_INTERVIEW_RISK_ASSESSMENT",
"name": "3.谈话前安全风险评估表",
"business_type": "INVESTIGATION"
},
"4.谈话方案": {
"template_code": "INTERVIEW_PLAN",
"name": "4.谈话方案",
"business_type": "INVESTIGATION"
},
"5.谈话后安全风险评估表": {
"template_code": "POST_INTERVIEW_RISK_ASSESSMENT",
"name": "5.谈话后安全风险评估表",
"business_type": "INVESTIGATION"
},
"1.谈话笔录": {
"template_code": "INTERVIEW_RECORD",
"name": "1.谈话笔录",
"business_type": "INVESTIGATION"
},
"2.谈话询问对象情况摸底调查30问": {
"template_code": "INVESTIGATION_30_QUESTIONS",
"name": "2.谈话询问对象情况摸底调查30问",
"business_type": "INVESTIGATION"
},
"3.被谈话人权利义务告知书": {
"template_code": "RIGHTS_OBLIGATIONS_NOTICE",
"name": "3.被谈话人权利义务告知书",
"business_type": "INVESTIGATION"
},
"4.点对点交接单": {
"template_code": "HANDOVER_FORM",
"name": "4.点对点交接单",
"business_type": "INVESTIGATION"
},
"5.陪送交接单(新)": {
"template_code": "ESCORT_HANDOVER_FORM",
"name": "5.陪送交接单(新)",
"business_type": "INVESTIGATION"
},
"6.1保密承诺书(谈话对象使用-非中共党员用)": {
"template_code": "CONFIDENTIALITY_COMMITMENT_NON_PARTY",
"name": "6.1保密承诺书(谈话对象使用-非中共党员用)",
"business_type": "INVESTIGATION"
},
"6.2保密承诺书(谈话对象使用-中共党员用)": {
"template_code": "CONFIDENTIALITY_COMMITMENT_PARTY",
"name": "6.2保密承诺书(谈话对象使用-中共党员用)",
"business_type": "INVESTIGATION"
},
"7.办案人员-办案安全保密承诺书": {
"template_code": "INVESTIGATOR_CONFIDENTIALITY_COMMITMENT",
"name": "7.办案人员-办案安全保密承诺书",
"business_type": "INVESTIGATION"
},
"8-1请示报告卡初核报告结论 ": {
"template_code": "REPORT_CARD_CONCLUSION",
"name": "8-1请示报告卡初核报告结论 ",
"business_type": "INVESTIGATION"
},
"8.XXX初核情况报告": {
"template_code": "INVESTIGATION_REPORT",
"name": "8.XXX初核情况报告",
"business_type": "INVESTIGATION"
}
}
def normalize_name(name: str) -> str:
"""标准化名称,用于模糊匹配"""
# 去掉开头的编号(如 "1."、"2."、"8-1" 等)
name = re.sub(r'^\d+[\.\-]\s*', '', name)
# 去掉括号及其内容(如 "XXX"、"(初核谈话)" 等)
name = re.sub(r'[(].*?[)]', '', name)
# 去掉空格和特殊字符
name = name.strip()
return name
def generate_id():
"""生成ID"""
import time
import random
timestamp = int(time.time() * 1000)
random_part = random.randint(100000, 999999)
return timestamp * 1000 + random_part
def identify_document_type(file_name: str) -> Optional[Dict]:
"""根据完整文件名识别文档类型"""
base_name = Path(file_name).stem
if base_name in DOCUMENT_TYPE_MAPPING:
return DOCUMENT_TYPE_MAPPING[base_name]
return None
def scan_directory_structure(base_dir: Path) -> Dict:
"""扫描目录结构,构建树状层级"""
structure = {
'directories': {},
'files': {}
}
def process_path(path: Path, parent_path: Optional[str] = None, level: int = 0):
"""递归处理路径"""
if path.is_file() and path.suffix == '.docx':
file_name = path.stem
doc_config = identify_document_type(file_name)
structure['files'][str(path)] = {
'name': file_name,
'parent': parent_path,
'level': level,
'template_code': doc_config['template_code'] if doc_config else None,
'full_path': str(path),
'normalized_name': normalize_name(file_name)
}
elif path.is_dir():
dir_name = path.name
structure['directories'][str(path)] = {
'name': dir_name,
'parent': parent_path,
'level': level,
'normalized_name': normalize_name(dir_name)
}
for child in sorted(path.iterdir()):
if child.name != '__pycache__':
process_path(child, str(path), level + 1)
if TEMPLATES_DIR.exists():
for item in sorted(TEMPLATES_DIR.iterdir()):
if item.name != '__pycache__':
process_path(item, None, 0)
return structure
def get_existing_data(conn) -> Dict:
"""获取数据库中的现有数据,增强匹配能力"""
cursor = conn.cursor(pymysql.cursors.DictCursor)
sql = """
SELECT id, name, parent_id, template_code, input_data, file_path, state
FROM f_polic_file_config
WHERE tenant_id = %s
"""
cursor.execute(sql, (TENANT_ID,))
configs = cursor.fetchall()
result = {
'by_id': {},
'by_name': {},
'by_template_code': {},
'by_normalized_name': {} # 新增:标准化名称索引
}
for config in configs:
config_id = config['id']
config_name = config['name']
# 提取 template_code
template_code = config.get('template_code')
if not template_code and config.get('input_data'):
try:
input_data = json.loads(config['input_data']) if isinstance(config['input_data'], str) else config['input_data']
if isinstance(input_data, dict):
template_code = input_data.get('template_code')
except:
pass
config['extracted_template_code'] = template_code
config['normalized_name'] = normalize_name(config_name)
result['by_id'][config_id] = config
result['by_name'][config_name] = config
if template_code:
if template_code not in result['by_template_code']:
result['by_template_code'][template_code] = config
# 标准化名称索引(可能有多个记录匹配同一个标准化名称)
normalized = config['normalized_name']
if normalized not in result['by_normalized_name']:
result['by_normalized_name'][normalized] = []
result['by_normalized_name'][normalized].append(config)
cursor.close()
return result
def find_matching_config(file_info: Dict, existing_data: Dict) -> Optional[Dict]:
"""
查找匹配的数据库记录
优先级1. template_code 精确匹配 2. 名称精确匹配 3. 标准化名称匹配
"""
template_code = file_info.get('template_code')
file_name = file_info['name']
normalized_name = file_info.get('normalized_name', normalize_name(file_name))
# 优先级1: template_code 精确匹配
if template_code:
matched = existing_data['by_template_code'].get(template_code)
if matched:
return matched
# 优先级2: 名称精确匹配
matched = existing_data['by_name'].get(file_name)
if matched:
return matched
# 优先级3: 标准化名称匹配
candidates = existing_data['by_normalized_name'].get(normalized_name, [])
if candidates:
# 如果有多个候选,优先选择有正确 template_code 的
for candidate in candidates:
if candidate.get('extracted_template_code') == template_code:
return candidate
# 否则返回第一个
return candidates[0]
return None
def plan_tree_structure(dir_structure: Dict, existing_data: Dict) -> List[Dict]:
"""规划树状结构,使用改进的匹配逻辑"""
plan = []
directories = sorted(dir_structure['directories'].items(),
key=lambda x: (x[1]['level'], x[0]))
files = sorted(dir_structure['files'].items(),
key=lambda x: (x[1]['level'], x[0]))
dir_id_map = {}
# 处理目录
for dir_path, dir_info in directories:
dir_name = dir_info['name']
parent_path = dir_info['parent']
level = dir_info['level']
parent_id = None
if parent_path:
parent_id = dir_id_map.get(parent_path)
# 查找匹配的数据库记录
matched = find_matching_config(dir_info, existing_data)
if matched:
plan.append({
'type': 'directory',
'name': dir_name,
'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None,
'parent_id': parent_id,
'level': level,
'action': 'update',
'config_id': matched['id'],
'current_parent_id': matched.get('parent_id'),
'matched_by': 'existing'
})
dir_id_map[dir_path] = matched['id']
else:
new_id = generate_id()
plan.append({
'type': 'directory',
'name': dir_name,
'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None,
'parent_id': parent_id,
'level': level,
'action': 'create',
'config_id': new_id,
'current_parent_id': None,
'matched_by': 'new'
})
dir_id_map[dir_path] = new_id
# 处理文件
for file_path, file_info in files:
file_name = file_info['name']
parent_path = file_info['parent']
level = file_info['level']
template_code = file_info['template_code']
parent_id = dir_id_map.get(parent_path) if parent_path else None
# 查找匹配的数据库记录
matched = find_matching_config(file_info, existing_data)
if matched:
plan.append({
'type': 'file',
'name': file_name,
'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None,
'parent_id': parent_id,
'level': level,
'action': 'update',
'config_id': matched['id'],
'template_code': template_code,
'current_parent_id': matched.get('parent_id'),
'matched_by': 'existing'
})
else:
new_id = generate_id()
plan.append({
'type': 'file',
'name': file_name,
'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None,
'parent_id': parent_id,
'level': level,
'action': 'create',
'config_id': new_id,
'template_code': template_code,
'current_parent_id': None,
'matched_by': 'new'
})
return plan
def print_matching_report(plan: List[Dict]):
"""打印匹配报告"""
print("\n" + "="*80)
print("匹配报告")
print("="*80)
matched = [p for p in plan if p.get('matched_by') == 'existing']
unmatched = [p for p in plan if p.get('matched_by') == 'new']
print(f"\n已匹配的记录: {len(matched)}")
print(f"未匹配的记录(将创建): {len(unmatched)}\n")
if unmatched:
print("未匹配的记录列表:")
for item in unmatched:
print(f" - {item['name']} ({item['type']})")
print("\n匹配详情:")
by_level = {}
for item in plan:
level = item['level']
if level not in by_level:
by_level[level] = []
by_level[level].append(item)
for level in sorted(by_level.keys()):
print(f"\n【层级 {level}")
for item in by_level[level]:
indent = " " * level
match_status = "" if item.get('matched_by') == 'existing' else ""
print(f"{indent}{match_status} {item['name']} (ID: {item['config_id']})")
if item.get('parent_name'):
print(f"{indent} 父节点: {item['parent_name']}")
if item['action'] == 'update':
current = item.get('current_parent_id', 'None')
new = item.get('parent_id', 'None')
if current != new:
print(f"{indent} parent_id: {current}{new}")
def main():
"""主函数"""
print("="*80)
print("改进的模板树状结构分析和更新")
print("="*80)
try:
conn = pymysql.connect(**DB_CONFIG)
print("✓ 数据库连接成功\n")
except Exception as e:
print(f"✗ 数据库连接失败: {e}")
return
try:
print("扫描目录结构...")
dir_structure = scan_directory_structure(TEMPLATES_DIR)
print(f" 找到 {len(dir_structure['directories'])} 个目录")
print(f" 找到 {len(dir_structure['files'])} 个文件\n")
print("获取数据库现有数据...")
existing_data = get_existing_data(conn)
print(f" 数据库中有 {len(existing_data['by_id'])} 条记录\n")
print("规划树状结构(使用改进的匹配逻辑)...")
plan = plan_tree_structure(dir_structure, existing_data)
print(f" 生成 {len(plan)} 个更新计划\n")
print_matching_report(plan)
# 询问是否继续
print("\n" + "="*80)
response = input("\n是否生成更新SQL脚本(yes/no默认no): ").strip().lower()
if response == 'yes':
from analyze_and_update_template_tree import generate_update_sql
sql_file = generate_update_sql(plan)
print(f"\n✓ SQL脚本已生成: {sql_file}")
else:
print("\n已取消")
finally:
conn.close()
if __name__ == '__main__':
main()

View File

@ -0,0 +1,544 @@
"""
template_finish 目录初始化模板树状结构
删除旧数据根据目录结构完全重建
"""
import os
import json
import pymysql
from pathlib import Path
from typing import Dict, List, Optional, Tuple
from datetime import datetime
from minio import Minio
from minio.error import S3Error
# 数据库连接配置
DB_CONFIG = {
'host': os.getenv('DB_HOST', '152.136.177.240'),
'port': int(os.getenv('DB_PORT', 5012)),
'user': os.getenv('DB_USER', 'finyx'),
'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'),
'database': os.getenv('DB_NAME', 'finyx'),
'charset': 'utf8mb4'
}
# MinIO连接配置
MINIO_CONFIG = {
'endpoint': 'minio.datacubeworld.com:9000',
'access_key': 'JOLXFXny3avFSzB0uRA5',
'secret_key': 'G1BR8jStNfovkfH5ou39EmPl34E4l7dGrnd3Cz0I',
'secure': True
}
TENANT_ID = 615873064429507639
CREATED_BY = 655162080928945152
UPDATED_BY = 655162080928945152
BUCKET_NAME = 'finyx'
# 项目根目录
PROJECT_ROOT = Path(__file__).parent
TEMPLATES_DIR = PROJECT_ROOT / "template_finish"
# 文档类型映射
DOCUMENT_TYPE_MAPPING = {
"1.请示报告卡XXX": {
"template_code": "REPORT_CARD",
"name": "1.请示报告卡XXX",
"business_type": "INVESTIGATION"
},
"2.初步核实审批表XXX": {
"template_code": "PRELIMINARY_VERIFICATION_APPROVAL",
"name": "2.初步核实审批表XXX",
"business_type": "INVESTIGATION"
},
"3.附件初核方案(XXX)": {
"template_code": "INVESTIGATION_PLAN",
"name": "3.附件初核方案(XXX)",
"business_type": "INVESTIGATION"
},
"谈话通知书第一联": {
"template_code": "NOTIFICATION_LETTER_1",
"name": "谈话通知书第一联",
"business_type": "INVESTIGATION"
},
"谈话通知书第二联": {
"template_code": "NOTIFICATION_LETTER_2",
"name": "谈话通知书第二联",
"business_type": "INVESTIGATION"
},
"谈话通知书第三联": {
"template_code": "NOTIFICATION_LETTER_3",
"name": "谈话通知书第三联",
"business_type": "INVESTIGATION"
},
"1.请示报告卡(初核谈话)": {
"template_code": "REPORT_CARD_INTERVIEW",
"name": "1.请示报告卡(初核谈话)",
"business_type": "INVESTIGATION"
},
"2谈话审批表": {
"template_code": "INTERVIEW_APPROVAL_FORM",
"name": "2谈话审批表",
"business_type": "INVESTIGATION"
},
"3.谈话前安全风险评估表": {
"template_code": "PRE_INTERVIEW_RISK_ASSESSMENT",
"name": "3.谈话前安全风险评估表",
"business_type": "INVESTIGATION"
},
"4.谈话方案": {
"template_code": "INTERVIEW_PLAN",
"name": "4.谈话方案",
"business_type": "INVESTIGATION"
},
"5.谈话后安全风险评估表": {
"template_code": "POST_INTERVIEW_RISK_ASSESSMENT",
"name": "5.谈话后安全风险评估表",
"business_type": "INVESTIGATION"
},
"1.谈话笔录": {
"template_code": "INTERVIEW_RECORD",
"name": "1.谈话笔录",
"business_type": "INVESTIGATION"
},
"2.谈话询问对象情况摸底调查30问": {
"template_code": "INVESTIGATION_30_QUESTIONS",
"name": "2.谈话询问对象情况摸底调查30问",
"business_type": "INVESTIGATION"
},
"3.被谈话人权利义务告知书": {
"template_code": "RIGHTS_OBLIGATIONS_NOTICE",
"name": "3.被谈话人权利义务告知书",
"business_type": "INVESTIGATION"
},
"4.点对点交接单": {
"template_code": "HANDOVER_FORM",
"name": "4.点对点交接单",
"business_type": "INVESTIGATION"
},
"5.陪送交接单(新)": {
"template_code": "ESCORT_HANDOVER_FORM",
"name": "5.陪送交接单(新)",
"business_type": "INVESTIGATION"
},
"6.1保密承诺书(谈话对象使用-非中共党员用)": {
"template_code": "CONFIDENTIALITY_COMMITMENT_NON_PARTY",
"name": "6.1保密承诺书(谈话对象使用-非中共党员用)",
"business_type": "INVESTIGATION"
},
"6.2保密承诺书(谈话对象使用-中共党员用)": {
"template_code": "CONFIDENTIALITY_COMMITMENT_PARTY",
"name": "6.2保密承诺书(谈话对象使用-中共党员用)",
"business_type": "INVESTIGATION"
},
"7.办案人员-办案安全保密承诺书": {
"template_code": "INVESTIGATOR_CONFIDENTIALITY_COMMITMENT",
"name": "7.办案人员-办案安全保密承诺书",
"business_type": "INVESTIGATION"
},
"8-1请示报告卡初核报告结论 ": {
"template_code": "REPORT_CARD_CONCLUSION",
"name": "8-1请示报告卡初核报告结论 ",
"business_type": "INVESTIGATION"
},
"8.XXX初核情况报告": {
"template_code": "INVESTIGATION_REPORT",
"name": "8.XXX初核情况报告",
"business_type": "INVESTIGATION"
}
}
def generate_id():
"""生成ID"""
import time
import random
timestamp = int(time.time() * 1000)
random_part = random.randint(100000, 999999)
return timestamp * 1000 + random_part
def identify_document_type(file_name: str) -> Optional[Dict]:
"""根据完整文件名识别文档类型"""
base_name = Path(file_name).stem
if base_name in DOCUMENT_TYPE_MAPPING:
return DOCUMENT_TYPE_MAPPING[base_name]
return None
def upload_to_minio(file_path: Path) -> str:
"""上传文件到MinIO"""
try:
client = Minio(
MINIO_CONFIG['endpoint'],
access_key=MINIO_CONFIG['access_key'],
secret_key=MINIO_CONFIG['secret_key'],
secure=MINIO_CONFIG['secure']
)
found = client.bucket_exists(BUCKET_NAME)
if not found:
raise Exception(f"存储桶 '{BUCKET_NAME}' 不存在,请先创建")
now = datetime.now()
object_name = f'{TENANT_ID}/TEMPLATE/{now.year}/{now.month:02d}/{file_path.name}'
client.fput_object(
BUCKET_NAME,
object_name,
str(file_path),
content_type='application/vnd.openxmlformats-officedocument.wordprocessingml.document'
)
return f"/{object_name}"
except S3Error as e:
raise Exception(f"MinIO错误: {e}")
except Exception as e:
raise Exception(f"上传文件时发生错误: {e}")
def scan_directory_structure(base_dir: Path) -> List[Dict]:
"""
扫描目录结构返回按层级排序的节点列表
每个节点包含type, name, path, parent_path, level, template_code, file_path
"""
nodes = []
def process_path(path: Path, parent_path: Optional[str] = None, level: int = 0):
"""递归处理路径"""
if path.is_file() and path.suffix == '.docx':
file_name = path.stem
doc_config = identify_document_type(file_name)
nodes.append({
'type': 'file',
'name': file_name,
'path': str(path),
'parent_path': parent_path,
'level': level,
'template_code': doc_config['template_code'] if doc_config else None,
'doc_config': doc_config,
'file_path': path
})
elif path.is_dir():
dir_name = path.name
nodes.append({
'type': 'directory',
'name': dir_name,
'path': str(path),
'parent_path': parent_path,
'level': level,
'template_code': None,
'doc_config': None,
'file_path': None
})
for child in sorted(path.iterdir()):
if child.name != '__pycache__':
process_path(child, str(path), level + 1)
if TEMPLATES_DIR.exists():
for item in sorted(TEMPLATES_DIR.iterdir()):
if item.name != '__pycache__':
process_path(item, None, 0)
# 按层级排序
return sorted(nodes, key=lambda x: (x['level'], x['path']))
def delete_old_data(conn, dry_run: bool = True):
"""删除旧数据"""
cursor = conn.cursor()
try:
print("\n" + "="*80)
print("删除旧数据")
print("="*80)
# 1. 先删除关联表 f_polic_file_field
print("\n1. 删除 f_polic_file_field 关联记录...")
if not dry_run:
# 先获取所有相关的 file_id
select_file_ids_sql = """
SELECT id FROM f_polic_file_config
WHERE tenant_id = %s
"""
cursor.execute(select_file_ids_sql, (TENANT_ID,))
file_ids = [row[0] for row in cursor.fetchall()]
if file_ids:
# 使用占位符构建SQL
placeholders = ','.join(['%s'] * len(file_ids))
delete_file_field_sql = f"""
DELETE FROM f_polic_file_field
WHERE tenant_id = %s AND file_id IN ({placeholders})
"""
cursor.execute(delete_file_field_sql, [TENANT_ID] + file_ids)
deleted_count = cursor.rowcount
print(f" ✓ 删除了 {deleted_count} 条关联记录")
else:
print(" ✓ 没有需要删除的关联记录")
else:
# 模拟模式:只统计
count_sql = """
SELECT COUNT(*) FROM f_polic_file_field
WHERE tenant_id = %s AND file_id IN (
SELECT id FROM f_polic_file_config WHERE tenant_id = %s
)
"""
cursor.execute(count_sql, (TENANT_ID, TENANT_ID))
count = cursor.fetchone()[0]
print(f" [模拟] 将删除 {count} 条关联记录")
# 2. 删除 f_polic_file_config 记录
print("\n2. 删除 f_polic_file_config 记录...")
delete_config_sql = """
DELETE FROM f_polic_file_config
WHERE tenant_id = %s
"""
if not dry_run:
cursor.execute(delete_config_sql, (TENANT_ID,))
deleted_count = cursor.rowcount
print(f" ✓ 删除了 {deleted_count} 条配置记录")
conn.commit()
else:
count_sql = "SELECT COUNT(*) FROM f_polic_file_config WHERE tenant_id = %s"
cursor.execute(count_sql, (TENANT_ID,))
count = cursor.fetchone()[0]
print(f" [模拟] 将删除 {count} 条配置记录")
return True
except Exception as e:
if not dry_run:
conn.rollback()
print(f" ✗ 删除失败: {e}")
raise
finally:
cursor.close()
def create_tree_structure(conn, nodes: List[Dict], upload_files: bool = True, dry_run: bool = True):
"""创建树状结构"""
cursor = conn.cursor()
try:
if not dry_run:
conn.autocommit(False)
print("\n" + "="*80)
print("创建树状结构")
print("="*80)
# 创建路径到ID的映射
path_to_id = {}
created_count = 0
updated_count = 0
# 按层级顺序处理
for node in nodes:
node_path = node['path']
node_name = node['name']
parent_path = node['parent_path']
level = node['level']
# 获取父节点ID
parent_id = path_to_id.get(parent_path) if parent_path else None
if node['type'] == 'directory':
# 创建目录节点
node_id = generate_id()
path_to_id[node_path] = node_id
if not dry_run:
# 目录节点不包含 template_code 字段
insert_sql = """
INSERT INTO f_polic_file_config
(id, tenant_id, parent_id, name, input_data, file_path,
created_time, created_by, updated_time, updated_by, state)
VALUES (%s, %s, %s, %s, %s, %s, NOW(), %s, NOW(), %s, %s)
"""
cursor.execute(insert_sql, (
node_id,
TENANT_ID,
parent_id,
node_name,
None,
None,
CREATED_BY,
UPDATED_BY,
1
))
indent = " " * level
parent_info = f" [父: {path_to_id.get(parent_path, 'None')}]" if parent_path else ""
print(f"{indent}{'[模拟]' if dry_run else ''}创建目录: {node_name} (ID: {node_id}){parent_info}")
created_count += 1
else:
# 创建文件节点
node_id = generate_id()
path_to_id[node_path] = node_id
doc_config = node.get('doc_config')
template_code = node.get('template_code')
file_path_obj = node.get('file_path')
# 上传文件到MinIO如果需要
minio_path = None
if upload_files and file_path_obj and file_path_obj.exists():
try:
if not dry_run:
minio_path = upload_to_minio(file_path_obj)
else:
minio_path = f"/{TENANT_ID}/TEMPLATE/2025/12/{file_path_obj.name}"
print(f" {'[模拟]' if dry_run else ''}上传文件: {file_path_obj.name}{minio_path}")
except Exception as e:
print(f" ⚠ 上传文件失败: {e}")
# 继续执行使用None作为路径
# 构建 input_data
input_data = None
if doc_config:
input_data = json.dumps({
'template_code': doc_config['template_code'],
'business_type': doc_config['business_type']
}, ensure_ascii=False)
if not dry_run:
# 如果 template_code 为 None使用空字符串
template_code_value = template_code if template_code else ''
insert_sql = """
INSERT INTO f_polic_file_config
(id, tenant_id, parent_id, name, input_data, file_path, template_code,
created_time, created_by, updated_time, updated_by, state)
VALUES (%s, %s, %s, %s, %s, %s, %s, NOW(), %s, NOW(), %s, %s)
"""
cursor.execute(insert_sql, (
node_id,
TENANT_ID,
parent_id,
node_name,
input_data,
minio_path,
template_code_value,
CREATED_BY,
UPDATED_BY,
1
))
indent = " " * level
parent_info = f" [父: {path_to_id.get(parent_path, 'None')}]" if parent_path else ""
template_info = f" [code: {template_code}]" if template_code else ""
print(f"{indent}{'[模拟]' if dry_run else ''}创建文件: {node_name} (ID: {node_id}){parent_info}{template_info}")
created_count += 1
if not dry_run:
conn.commit()
print(f"\n✓ 创建完成!共创建 {created_count} 个节点")
else:
print(f"\n[模拟模式] 将创建 {created_count} 个节点")
return path_to_id
except Exception as e:
if not dry_run:
conn.rollback()
print(f"\n✗ 创建失败: {e}")
import traceback
traceback.print_exc()
raise
finally:
cursor.close()
def main():
"""主函数"""
print("="*80)
print("初始化模板树状结构(从目录结构完全重建)")
print("="*80)
print("\n⚠️ 警告:此操作将删除当前租户的所有模板数据!")
print(" 包括:")
print(" - f_polic_file_config 表中的所有记录")
print(" - f_polic_file_field 表中的相关关联记录")
print(" 然后根据 template_finish 目录结构完全重建")
# 确认
print("\n" + "="*80)
confirm1 = input("\n确认继续?(yes/no默认no): ").strip().lower()
if confirm1 != 'yes':
print("已取消")
return
# 连接数据库
try:
conn = pymysql.connect(**DB_CONFIG)
print("✓ 数据库连接成功")
except Exception as e:
print(f"✗ 数据库连接失败: {e}")
return
try:
# 扫描目录结构
print("\n扫描目录结构...")
nodes = scan_directory_structure(TEMPLATES_DIR)
print(f" 找到 {len(nodes)} 个节点")
print(f" 其中目录: {len([n for n in nodes if n['type'] == 'directory'])}")
print(f" 其中文件: {len([n for n in nodes if n['type'] == 'file'])}")
# 显示预览
print("\n目录结构预览:")
for node in nodes[:10]: # 只显示前10个
indent = " " * node['level']
type_icon = "📁" if node['type'] == 'directory' else "📄"
print(f"{indent}{type_icon} {node['name']}")
if len(nodes) > 10:
print(f" ... 还有 {len(nodes) - 10} 个节点")
# 询问是否上传文件
print("\n" + "="*80)
upload_files = input("\n是否上传文件到MinIO(yes/no默认yes): ").strip().lower()
upload_files = upload_files != 'no'
# 先执行模拟删除
print("\n执行模拟删除...")
delete_old_data(conn, dry_run=True)
# 再执行模拟创建
print("\n执行模拟创建...")
create_tree_structure(conn, nodes, upload_files=upload_files, dry_run=True)
# 最终确认
print("\n" + "="*80)
confirm2 = input("\n确认执行实际更新?(yes/no默认no): ").strip().lower()
if confirm2 != 'yes':
print("已取消")
return
# 执行实际删除
print("\n执行实际删除...")
delete_old_data(conn, dry_run=False)
# 执行实际创建
print("\n执行实际创建...")
create_tree_structure(conn, nodes, upload_files=upload_files, dry_run=False)
print("\n" + "="*80)
print("初始化完成!")
print("="*80)
except Exception as e:
print(f"\n✗ 初始化失败: {e}")
import traceback
traceback.print_exc()
finally:
conn.close()
print("\n数据库连接已关闭")
if __name__ == '__main__':
main()

340
restore_database.py Normal file
View File

@ -0,0 +1,340 @@
"""
数据库恢复脚本
从SQL备份文件恢复数据库
"""
import os
import sys
import subprocess
import pymysql
from pathlib import Path
from dotenv import load_dotenv
import gzip
# 加载环境变量
load_dotenv()
class DatabaseRestore:
"""数据库恢复类"""
def __init__(self):
"""初始化数据库配置"""
self.db_config = {
'host': os.getenv('DB_HOST', '152.136.177.240'),
'port': int(os.getenv('DB_PORT', 5012)),
'user': os.getenv('DB_USER', 'finyx'),
'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'),
'database': os.getenv('DB_NAME', 'finyx'),
'charset': 'utf8mb4'
}
def restore_with_mysql(self, backup_file, drop_database=False):
"""
使用mysql命令恢复数据库推荐方式
Args:
backup_file: 备份文件路径
drop_database: 是否先删除数据库危险操作
Returns:
是否成功
"""
backup_file = Path(backup_file)
if not backup_file.exists():
raise FileNotFoundError(f"备份文件不存在: {backup_file}")
# 如果是压缩文件,先解压
sql_file = backup_file
temp_file = None
if backup_file.suffix == '.gz':
print(f"检测到压缩文件,正在解压...")
temp_file = backup_file.with_suffix('')
with gzip.open(backup_file, 'rb') as f_in:
with open(temp_file, 'wb') as f_out:
f_out.write(f_in.read())
sql_file = temp_file
print(f"解压完成: {sql_file}")
try:
print(f"开始恢复数据库 {self.db_config['database']}...")
print(f"备份文件: {backup_file}")
# 如果指定删除数据库
if drop_database:
print("警告: 将删除现有数据库!")
confirm = input("确认继续? (yes/no): ")
if confirm.lower() != 'yes':
print("已取消恢复操作")
return False
# 删除数据库
self._drop_database()
# 构建mysql命令
cmd = [
'mysql',
f"--host={self.db_config['host']}",
f"--port={self.db_config['port']}",
f"--user={self.db_config['user']}",
f"--password={self.db_config['password']}",
'--default-character-set=utf8mb4',
self.db_config['database']
]
# 执行恢复命令
with open(sql_file, 'r', encoding='utf-8') as f:
result = subprocess.run(
cmd,
stdin=f,
stderr=subprocess.PIPE,
text=True
)
if result.returncode != 0:
error_msg = result.stderr.decode('utf-8') if result.stderr else '未知错误'
raise Exception(f"mysql执行失败: {error_msg}")
print("恢复完成!")
return True
except FileNotFoundError:
print("错误: 未找到mysql命令请确保MySQL客户端已安装并在PATH中")
print("尝试使用Python方式恢复...")
return self.restore_with_python(backup_file, drop_database)
except Exception as e:
print(f"恢复失败: {str(e)}")
raise
finally:
# 清理临时解压文件
if temp_file and temp_file.exists():
temp_file.unlink()
def restore_with_python(self, backup_file, drop_database=False):
"""
使用Python直接连接数据库恢复备用方式
Args:
backup_file: 备份文件路径
drop_database: 是否先删除数据库危险操作
Returns:
是否成功
"""
backup_file = Path(backup_file)
if not backup_file.exists():
raise FileNotFoundError(f"备份文件不存在: {backup_file}")
# 如果是压缩文件,先解压
sql_file = backup_file
temp_file = None
if backup_file.suffix == '.gz':
print(f"检测到压缩文件,正在解压...")
temp_file = backup_file.with_suffix('')
with gzip.open(backup_file, 'rb') as f_in:
with open(temp_file, 'wb') as f_out:
f_out.write(f_in.read())
sql_file = temp_file
print(f"解压完成: {sql_file}")
try:
print(f"开始使用Python方式恢复数据库 {self.db_config['database']}...")
print(f"备份文件: {backup_file}")
# 如果指定删除数据库
if drop_database:
print("警告: 将删除现有数据库!")
confirm = input("确认继续? (yes/no): ")
if confirm.lower() != 'yes':
print("已取消恢复操作")
return False
# 删除数据库
self._drop_database()
# 连接数据库
connection = pymysql.connect(**self.db_config)
cursor = connection.cursor()
# 读取SQL文件
print("读取SQL文件...")
with open(sql_file, 'r', encoding='utf-8') as f:
sql_content = f.read()
# 分割SQL语句按分号分割但要注意字符串中的分号
print("执行SQL语句...")
statements = self._split_sql_statements(sql_content)
total = len(statements)
print(f"{total} 条SQL语句")
# 执行每条SQL语句
for i, statement in enumerate(statements, 1):
statement = statement.strip()
if not statement or statement.startswith('--'):
continue
try:
cursor.execute(statement)
if i % 100 == 0:
print(f"进度: {i}/{total} ({i*100//total}%)")
except Exception as e:
# 某些错误可以忽略(如表已存在等)
error_msg = str(e).lower()
if 'already exists' in error_msg or 'duplicate' in error_msg:
continue
print(f"警告: 执行SQL语句时出错 (第{i}条): {str(e)}")
print(f"SQL: {statement[:100]}...")
# 提交事务
connection.commit()
cursor.close()
connection.close()
print("恢复完成!")
return True
except Exception as e:
print(f"恢复失败: {str(e)}")
raise
finally:
# 清理临时解压文件
if temp_file and temp_file.exists():
temp_file.unlink()
def _split_sql_statements(self, sql_content):
"""
分割SQL语句处理字符串中的分号
Args:
sql_content: SQL内容
Returns:
SQL语句列表
"""
statements = []
current_statement = []
in_string = False
string_char = None
i = 0
while i < len(sql_content):
char = sql_content[i]
# 检测字符串开始/结束
if char in ("'", '"', '`') and (i == 0 or sql_content[i-1] != '\\'):
if not in_string:
in_string = True
string_char = char
elif char == string_char:
in_string = False
string_char = None
current_statement.append(char)
# 如果不在字符串中且遇到分号,分割语句
if not in_string and char == ';':
statement = ''.join(current_statement).strip()
if statement:
statements.append(statement)
current_statement = []
i += 1
# 添加最后一条语句
if current_statement:
statement = ''.join(current_statement).strip()
if statement:
statements.append(statement)
return statements
def _drop_database(self):
"""删除数据库(危险操作)"""
try:
# 连接到MySQL服务器不指定数据库
config = self.db_config.copy()
config.pop('database')
connection = pymysql.connect(**config)
cursor = connection.cursor()
cursor.execute(f"DROP DATABASE IF EXISTS `{self.db_config['database']}`")
cursor.execute(f"CREATE DATABASE `{self.db_config['database']}` CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci")
connection.commit()
cursor.close()
connection.close()
print(f"数据库 {self.db_config['database']} 已删除并重新创建")
except Exception as e:
raise Exception(f"删除数据库失败: {str(e)}")
def test_connection(self):
"""测试数据库连接"""
try:
connection = pymysql.connect(**self.db_config)
cursor = connection.cursor()
cursor.execute("SELECT VERSION()")
version = cursor.fetchone()[0]
cursor.close()
connection.close()
print(f"数据库连接成功MySQL版本: {version}")
return True
except Exception as e:
print(f"数据库连接失败: {str(e)}")
return False
def main():
"""主函数"""
import argparse
parser = argparse.ArgumentParser(description='数据库恢复工具')
parser.add_argument('backup_file', help='备份文件路径')
parser.add_argument('--method', choices=['mysql', 'python', 'auto'],
default='auto', help='恢复方法 (默认: auto)')
parser.add_argument('--drop-db', action='store_true',
help='恢复前删除现有数据库(危险操作)')
parser.add_argument('--test', action='store_true',
help='仅测试数据库连接')
args = parser.parse_args()
restore = DatabaseRestore()
# 测试连接
if args.test:
restore.test_connection()
return
# 执行恢复
try:
if args.method == 'mysql':
success = restore.restore_with_mysql(args.backup_file, args.drop_db)
elif args.method == 'python':
success = restore.restore_with_python(args.backup_file, args.drop_db)
else: # auto
try:
success = restore.restore_with_mysql(args.backup_file, args.drop_db)
except:
print("\nmysql方式失败切换到Python方式...")
success = restore.restore_with_python(args.backup_file, args.drop_db)
if success:
print("\n恢复成功!")
else:
print("\n恢复失败!")
sys.exit(1)
except Exception as e:
print(f"\n恢复失败: {str(e)}")
sys.exit(1)
if __name__ == '__main__':
main()

View File

@ -0,0 +1,122 @@
"""
回滚错误的更新恢复被错误修改的字段
"""
import os
import pymysql
# 数据库连接配置
DB_CONFIG = {
'host': os.getenv('DB_HOST', '152.136.177.240'),
'port': int(os.getenv('DB_PORT', 5012)),
'user': os.getenv('DB_USER', 'finyx'),
'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'),
'database': os.getenv('DB_NAME', 'finyx'),
'charset': 'utf8mb4'
}
TENANT_ID = 615873064429507639
UPDATED_BY = 655162080928945152
# 需要恢复的字段映射字段ID -> 正确的field_code
ROLLBACK_MAPPING = {
# 这些字段被错误地从英文改成了中文,需要恢复
1764656917410273: 'target_issue_description',
1764656918032031: 'filler_name',
1764656917418979: 'department_opinion',
1764836032906561: 'appointment_location',
1764836032488198: 'appointment_time',
1764836033052889: 'approval_time',
1764836032655678: 'handler_name',
1764836033342084: 'handling_department',
1764836033240593: 'investigation_unit_name',
1764836033018470: 'investigation_location',
1764836033274278: 'investigation_team_code',
1764836033094781: 'investigation_team_member_names',
1764836033176386: 'investigation_team_leader_name',
1764836033500799: 'commission_name',
1764656917384058: 'clue_info',
1764656917861268: 'clue_source',
1764836032538308: 'target_address',
1764836033565636: 'target_health_status',
1764836033332970: 'target_other_situation',
1764656917299164: 'target_date_of_birth',
1764836033269146: 'target_date_of_birth_full',
1765151880445876: 'target_organization',
1764656917367205: 'target_organization_and_position',
1764836033405778: 'target_family_situation',
1764836033162748: 'target_work_basic_info',
1764656917996367: 'target_basic_info_clue',
1764836032997850: 'target_age',
1764656917561689: 'target_gender',
1764836032855869: 'target_personality',
1764836032893680: 'target_registered_address',
1764836033603501: 'target_tolerance',
1764656917185956: 'target_political_status',
1764836033786057: 'target_attitude',
1764836033587951: 'target_previous_investigation',
1764836032951705: 'target_ethnicity',
1764836033280024: 'target_other_issues_possibility',
1764836033458872: 'target_issue_severity',
1764836032929811: 'target_social_relations',
1764836033618877: 'target_negative_events',
1764836032926994: 'target_place_of_origin',
1765151880304552: 'target_position',
1764656917802442: 'target_professional_rank',
1764836032817243: 'target_contact',
1764836032902356: 'target_id_number',
1764836032913357: 'target_id_number',
1764656917073644: 'target_name',
1764836033571266: 'target_problem_description',
1764836032827460: 'report_card_request_time',
1764836032694865: 'notification_location',
1764836032909732: 'notification_time',
1764836033451248: 'risk_level',
}
def rollback():
"""回滚错误的更新"""
conn = pymysql.connect(**DB_CONFIG)
cursor = conn.cursor(pymysql.cursors.DictCursor)
print("="*80)
print("回滚错误的字段更新")
print("="*80)
print(f"\n需要恢复 {len(ROLLBACK_MAPPING)} 个字段\n")
# 先查询当前状态
for field_id, correct_code in ROLLBACK_MAPPING.items():
cursor.execute("""
SELECT id, name, filed_code
FROM f_polic_field
WHERE id = %s AND tenant_id = %s
""", (field_id, TENANT_ID))
field = cursor.fetchone()
if field:
print(f" ID: {field_id}")
print(f" 名称: {field['name']}")
print(f" 当前field_code: {field['filed_code']}")
print(f" 恢复为: {correct_code}")
print()
# 执行回滚
print("开始执行回滚...\n")
for field_id, correct_code in ROLLBACK_MAPPING.items():
cursor.execute("""
UPDATE f_polic_field
SET filed_code = %s, updated_time = NOW(), updated_by = %s
WHERE id = %s AND tenant_id = %s
""", (correct_code, UPDATED_BY, field_id, TENANT_ID))
print(f" ✓ 恢复字段 ID {field_id}: {correct_code}")
conn.commit()
print("\n✓ 回滚完成")
cursor.close()
conn.close()
if __name__ == '__main__':
rollback()

View File

@ -362,8 +362,8 @@ class AIService:
for key in ['target_name', 'target_gender', 'target_age', 'target_date_of_birth']:
if key in normalized_data:
print(f"[AI服务] 日期格式化后 {key} = '{normalized_data[key]}'")
# 后处理:从已有信息推断缺失字段
normalized_data = self._post_process_inferred_fields(normalized_data, output_fields)
# 后处理:从已有信息推断缺失字段传入原始prompt以便从输入文本中提取
normalized_data = self._post_process_inferred_fields(normalized_data, output_fields, prompt)
# 打印后处理后的关键字段
for key in ['target_name', 'target_gender', 'target_age', 'target_date_of_birth', 'target_organization', 'target_position']:
if key in normalized_data:
@ -429,7 +429,7 @@ class AIService:
print(f"[AI服务] 使用jsonrepair最后修复成功提取到 {len(extracted_data)} 个字段")
normalized_data = self._normalize_field_names(extracted_data, output_fields)
normalized_data = self._normalize_date_formats(normalized_data, output_fields)
normalized_data = self._post_process_inferred_fields(normalized_data, output_fields)
normalized_data = self._post_process_inferred_fields(normalized_data, output_fields, prompt)
# 记录对话
if self.ai_logger:
self.ai_logger.log_conversation(
@ -1233,13 +1233,14 @@ class AIService:
# 如果无法解析,返回原值
return date_str
def _post_process_inferred_fields(self, data: Dict, output_fields: List[Dict]) -> Dict:
def _post_process_inferred_fields(self, data: Dict, output_fields: List[Dict], prompt: str = None) -> Dict:
"""
后处理从已有信息推断缺失字段
后处理从已有信息推断缺失字段如果字段缺失尝试从原始输入文本中提取
Args:
data: 提取的数据字典
output_fields: 输出字段列表
prompt: 原始提示词包含输入文本用于从原始输入中提取缺失字段
Returns:
后处理后的数据字典
@ -1249,11 +1250,25 @@ class AIService:
# 1. 从出生年月计算年龄
if 'target_age' in field_code_map and (not data.get('target_age') or data.get('target_age') == ''):
# 首先尝试从已有数据中计算
if 'target_date_of_birth' in data and data.get('target_date_of_birth'):
age = self._calculate_age_from_birth_date(data['target_date_of_birth'])
if age:
data['target_age'] = str(age)
print(f"[AI服务] 后处理:从出生年月 '{data['target_date_of_birth']}' 计算年龄: {age}")
# 如果还没有,尝试从原始输入文本中直接提取年龄
if (not data.get('target_age') or data.get('target_age') == '') and prompt:
input_text_match = re.search(r'输入文本[:]\s*\n(.*?)(?:\n\n需要提取的字段|$)', prompt, re.DOTALL)
if input_text_match:
input_text = input_text_match.group(1)
# 匹配年龄模式年龄44岁、44岁、年龄44等
age_match = re.search(r'年龄\s*(\d+)\s*岁|(\d+)\s*岁|年龄\s*(\d+)', input_text)
if age_match:
age = age_match.group(1) or age_match.group(2) or age_match.group(3)
if age:
data['target_age'] = str(age)
print(f"[AI服务] 后处理:从原始输入文本中提取年龄: {age}")
# 2. 从单位及职务中拆分单位和职务
if 'target_organization_and_position' in data and data.get('target_organization_and_position'):
@ -1299,6 +1314,25 @@ class AIService:
data['target_gender'] = ''
print(f"[AI服务] 后处理:从字段 '{key}' 中推断性别: 女")
break
# 如果仍然没有尝试从原始输入文本prompt中提取
if (not data.get('target_gender') or data.get('target_gender') == '') and prompt:
# 从prompt中提取输入文本部分通常在"输入文本:"之后)
input_text_match = re.search(r'输入文本[:]\s*\n(.*?)(?:\n\n需要提取的字段|$)', prompt, re.DOTALL)
if input_text_match:
input_text = input_text_match.group(1)
# 匹配性别关键词:男性、女性、男、女等
if re.search(r'\b男性\b|\b男\b', input_text) and not re.search(r'\b女性\b|\b女\b', input_text):
data['target_gender'] = ''
print(f"[AI服务] 后处理:从原始输入文本中提取性别: 男")
elif re.search(r'\b女性\b|\b女\b', input_text) and not re.search(r'\b男性\b|\b男\b', input_text):
data['target_gender'] = ''
print(f"[AI服务] 后处理:从原始输入文本中提取性别: 女")
elif re.search(r'[,]\s*([男女])\s*[,]', input_text):
gender_match = re.search(r'[,]\s*([男女])\s*[,]', input_text)
if gender_match:
data['target_gender'] = gender_match.group(1)
print(f"[AI服务] 后处理:从原始输入文本中提取性别: {gender_match.group(1)}")
# 4. 从工作基本情况中提取职级如果target_professional_rank为空
if 'target_professional_rank' in field_code_map and (not data.get('target_professional_rank') or data.get('target_professional_rank') == ''):

View File

@ -0,0 +1,552 @@
"""
根据Excel数据设计文档同步更新模板的input_datatemplate_code和字段关联关系
"""
import os
import json
import pymysql
import pandas as pd
from pathlib import Path
from typing import Dict, List, Optional, Set
from datetime import datetime
from collections import defaultdict
# 数据库连接配置
DB_CONFIG = {
'host': os.getenv('DB_HOST', '152.136.177.240'),
'port': int(os.getenv('DB_PORT', 5012)),
'user': os.getenv('DB_USER', 'finyx'),
'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'),
'database': os.getenv('DB_NAME', 'finyx'),
'charset': 'utf8mb4'
}
TENANT_ID = 615873064429507639
CREATED_BY = 655162080928945152
UPDATED_BY = 655162080928945152
# Excel文件路径
EXCEL_FILE = '技术文档/智慧监督项目模板数据结构设计表-20251125-一凡标注.xlsx'
# 模板名称映射Excel中的名称 -> 数据库中的名称)
TEMPLATE_NAME_MAPPING = {
'请示报告卡': '1.请示报告卡XXX',
'初步核实审批表': '2.初步核实审批表XXX',
'初核方案': '3.附件初核方案(XXX)',
'谈话通知书': '谈话通知书',
'谈话通知书第一联': '谈话通知书第一联',
'谈话通知书第二联': '谈话通知书第二联',
'谈话通知书第三联': '谈话通知书第三联',
'走读式谈话审批': '走读式谈话审批',
'走读式谈话流程': '走读式谈话流程',
'请示报告卡(初核报告结论)': '8-1请示报告卡初核报告结论 ',
'XXX初核情况报告': '8.XXX初核情况报告',
}
# 模板编码映射Excel中的名称 -> template_code
TEMPLATE_CODE_MAPPING = {
'请示报告卡': 'REPORT_CARD',
'初步核实审批表': 'PRELIMINARY_VERIFICATION_APPROVAL',
'初核方案': 'INVESTIGATION_PLAN',
'谈话通知书第一联': 'NOTIFICATION_LETTER_1',
'谈话通知书第二联': 'NOTIFICATION_LETTER_2',
'谈话通知书第三联': 'NOTIFICATION_LETTER_3',
'请示报告卡(初核报告结论)': 'REPORT_CARD_CONCLUSION',
'XXX初核情况报告': 'INVESTIGATION_REPORT',
}
# 字段名称到字段编码的映射
FIELD_NAME_TO_CODE_MAP = {
# 输入字段
'线索信息': 'clue_info',
'被核查人员工作基本情况线索': 'target_basic_info_clue',
# 输出字段 - 基本信息
'被核查人姓名': 'target_name',
'被核查人员单位及职务': 'target_organization_and_position',
'被核查人员性别': 'target_gender',
'被核查人员出生年月': 'target_date_of_birth',
'被核查人员出生年月日': 'target_date_of_birth_full',
'被核查人员政治面貌': 'target_political_status',
'被核查人员职级': 'target_professional_rank',
'被核查人员单位': 'target_organization',
'被核查人员职务': 'target_position',
# 输出字段 - 其他信息
'线索来源': 'clue_source',
'主要问题线索': 'target_issue_description',
'初步核实审批表承办部门意见': 'department_opinion',
'初步核实审批表填表人': 'filler_name',
'请示报告卡请示时间': 'report_card_request_time',
'被核查人员身份证件及号码': 'target_id_number',
'被核查人员身份证号': 'target_id_number',
'应到时间': 'appointment_time',
'应到地点': 'appointment_location',
'批准时间': 'approval_time',
'承办部门': 'handling_department',
'承办人': 'handler_name',
'谈话通知时间': 'notification_time',
'谈话通知地点': 'notification_location',
'被核查人员住址': 'target_address',
'被核查人员户籍住址': 'target_registered_address',
'被核查人员联系方式': 'target_contact',
'被核查人员籍贯': 'target_place_of_origin',
'被核查人员民族': 'target_ethnicity',
'被核查人员工作基本情况': 'target_work_basic_info',
'核查单位名称': 'investigation_unit_name',
'核查组组长姓名': 'investigation_team_leader_name',
'核查组成员姓名': 'investigation_team_member_names',
'核查地点': 'investigation_location',
}
def generate_id():
"""生成ID"""
import time
import random
timestamp = int(time.time() * 1000)
random_part = random.randint(100000, 999999)
return timestamp * 1000 + random_part
def normalize_template_name(name: str) -> str:
"""标准化模板名称,用于匹配"""
import re
# 去掉开头的编号和括号内容
name = re.sub(r'^\d+[\.\-]\s*', '', name)
name = re.sub(r'[(].*?[)]', '', name)
name = name.strip()
return name
def parse_excel_data() -> Dict:
"""解析Excel文件提取模板和字段的关联关系"""
print("="*80)
print("解析Excel数据设计文档")
print("="*80)
if not Path(EXCEL_FILE).exists():
print(f"✗ Excel文件不存在: {EXCEL_FILE}")
return None
try:
df = pd.read_excel(EXCEL_FILE)
print(f"✓ 成功读取Excel文件{len(df)} 行数据\n")
templates = defaultdict(lambda: {
'template_name': '',
'template_code': '',
'input_fields': [],
'output_fields': []
})
current_template = None
current_input_field = None
for idx, row in df.iterrows():
level1 = row.get('一级分类')
level2 = row.get('二级分类')
level3 = row.get('三级分类')
input_field = row.get('输入数据字段')
output_field = row.get('输出数据字段')
# 处理二级分类(模板名称)
if pd.notna(level2) and level2:
current_template = str(level2).strip()
# 获取模板编码
template_code = TEMPLATE_CODE_MAPPING.get(current_template, '')
if not template_code:
# 如果没有映射,尝试生成
template_code = current_template.upper().replace(' ', '_')
templates[current_template]['template_name'] = current_template
templates[current_template]['template_code'] = template_code
current_input_field = None # 重置输入字段
print(f" 模板: {current_template} (code: {template_code})")
# 处理三级分类(子模板,如谈话通知书第一联)
if pd.notna(level3) and level3:
current_template = str(level3).strip()
template_code = TEMPLATE_CODE_MAPPING.get(current_template, '')
if not template_code:
template_code = current_template.upper().replace(' ', '_')
templates[current_template]['template_name'] = current_template
templates[current_template]['template_code'] = template_code
current_input_field = None
print(f" 子模板: {current_template} (code: {template_code})")
# 处理输入字段
if pd.notna(input_field) and input_field:
input_field_name = str(input_field).strip()
if input_field_name != current_input_field:
current_input_field = input_field_name
field_code = FIELD_NAME_TO_CODE_MAP.get(input_field_name, input_field_name.lower().replace(' ', '_'))
if current_template:
templates[current_template]['input_fields'].append({
'name': input_field_name,
'field_code': field_code
})
# 处理输出字段
if pd.notna(output_field) and output_field:
output_field_name = str(output_field).strip()
field_code = FIELD_NAME_TO_CODE_MAP.get(output_field_name, output_field_name.lower().replace(' ', '_'))
if current_template:
templates[current_template]['output_fields'].append({
'name': output_field_name,
'field_code': field_code
})
# 去重
for template_name, template_info in templates.items():
# 输入字段去重
seen_input = set()
unique_input = []
for field in template_info['input_fields']:
key = field['field_code']
if key not in seen_input:
seen_input.add(key)
unique_input.append(field)
template_info['input_fields'] = unique_input
# 输出字段去重
seen_output = set()
unique_output = []
for field in template_info['output_fields']:
key = field['field_code']
if key not in seen_output:
seen_output.add(key)
unique_output.append(field)
template_info['output_fields'] = unique_output
print(f"\n✓ 解析完成,共 {len(templates)} 个模板")
for template_name, template_info in templates.items():
print(f" - {template_name}: {len(template_info['input_fields'])} 个输入字段, {len(template_info['output_fields'])} 个输出字段")
return dict(templates)
except Exception as e:
print(f"✗ 解析Excel文件失败: {e}")
import traceback
traceback.print_exc()
return None
def get_database_templates(conn) -> Dict:
"""获取数据库中的模板配置"""
cursor = conn.cursor(pymysql.cursors.DictCursor)
sql = """
SELECT id, name, template_code, input_data, parent_id
FROM f_polic_file_config
WHERE tenant_id = %s
"""
cursor.execute(sql, (TENANT_ID,))
configs = cursor.fetchall()
result = {}
for config in configs:
name = config['name']
result[name] = config
# 也添加标准化名称的映射
normalized = normalize_template_name(name)
if normalized not in result:
result[normalized] = config
cursor.close()
return result
def get_database_fields(conn) -> Dict:
"""获取数据库中的字段定义"""
cursor = conn.cursor(pymysql.cursors.DictCursor)
sql = """
SELECT id, name, filed_code, field_type
FROM f_polic_field
WHERE tenant_id = %s
"""
cursor.execute(sql, (TENANT_ID,))
fields = cursor.fetchall()
result = {
'by_code': {},
'by_name': {}
}
for field in fields:
field_code = field['filed_code']
field_name = field['name']
result['by_code'][field_code] = field
result['by_name'][field_name] = field
cursor.close()
return result
def find_matching_template(excel_template_name: str, db_templates: Dict) -> Optional[Dict]:
"""查找匹配的数据库模板"""
# 1. 精确匹配
if excel_template_name in db_templates:
return db_templates[excel_template_name]
# 2. 通过映射表匹配
mapped_name = TEMPLATE_NAME_MAPPING.get(excel_template_name)
if mapped_name and mapped_name in db_templates:
return db_templates[mapped_name]
# 3. 标准化名称匹配
normalized = normalize_template_name(excel_template_name)
if normalized in db_templates:
return db_templates[normalized]
# 4. 模糊匹配
for db_name, db_config in db_templates.items():
if normalized in normalize_template_name(db_name) or normalize_template_name(db_name) in normalized:
return db_config
return None
def update_template_config(conn, template_id: int, template_code: str, input_fields: List[Dict], dry_run: bool = True):
"""更新模板配置的input_data和template_code"""
cursor = conn.cursor()
try:
# 构建input_data
input_data = {
'template_code': template_code,
'business_type': 'INVESTIGATION',
'input_fields': [f['field_code'] for f in input_fields]
}
input_data_json = json.dumps(input_data, ensure_ascii=False)
if not dry_run:
update_sql = """
UPDATE f_polic_file_config
SET template_code = %s, input_data = %s, updated_time = NOW(), updated_by = %s
WHERE id = %s AND tenant_id = %s
"""
cursor.execute(update_sql, (template_code, input_data_json, UPDATED_BY, template_id, TENANT_ID))
conn.commit()
print(f" ✓ 更新模板配置")
else:
print(f" [模拟] 将更新模板配置: template_code={template_code}")
finally:
cursor.close()
def update_template_field_relations(conn, template_id: int, input_fields: List[Dict], output_fields: List[Dict],
db_fields: Dict, dry_run: bool = True):
"""更新模板和字段的关联关系"""
cursor = conn.cursor()
try:
# 先删除旧的关联关系
if not dry_run:
delete_sql = """
DELETE FROM f_polic_file_field
WHERE tenant_id = %s AND file_id = %s
"""
cursor.execute(delete_sql, (TENANT_ID, template_id))
# 创建新的关联关系
relations_created = 0
# 关联输入字段field_type=1
for field_info in input_fields:
field_code = field_info['field_code']
field = db_fields['by_code'].get(field_code)
if not field:
print(f" ⚠ 输入字段不存在: {field_code}")
continue
if field['field_type'] != 1:
print(f" ⚠ 字段类型不匹配: {field_code} (期望输入字段,实际为输出字段)")
continue
if not dry_run:
# 检查是否已存在
check_sql = """
SELECT id FROM f_polic_file_field
WHERE tenant_id = %s AND file_id = %s AND filed_id = %s
"""
cursor.execute(check_sql, (TENANT_ID, template_id, field['id']))
existing = cursor.fetchone()
if not existing:
relation_id = generate_id()
insert_sql = """
INSERT INTO f_polic_file_field
(id, tenant_id, file_id, filed_id, created_time, created_by, updated_time, updated_by, state)
VALUES (%s, %s, %s, %s, NOW(), %s, NOW(), %s, %s)
"""
cursor.execute(insert_sql, (
relation_id, TENANT_ID, template_id, field['id'],
CREATED_BY, UPDATED_BY, 1
))
relations_created += 1
else:
relations_created += 1
# 关联输出字段field_type=2
for field_info in output_fields:
field_code = field_info['field_code']
field = db_fields['by_code'].get(field_code)
if not field:
# 尝试通过名称匹配
field_name = field_info['name']
field = db_fields['by_name'].get(field_name)
if not field:
print(f" ⚠ 输出字段不存在: {field_code} ({field_info['name']})")
continue
if field['field_type'] != 2:
print(f" ⚠ 字段类型不匹配: {field_code} (期望输出字段,实际为输入字段)")
continue
if not dry_run:
# 检查是否已存在
check_sql = """
SELECT id FROM f_polic_file_field
WHERE tenant_id = %s AND file_id = %s AND filed_id = %s
"""
cursor.execute(check_sql, (TENANT_ID, template_id, field['id']))
existing = cursor.fetchone()
if not existing:
relation_id = generate_id()
insert_sql = """
INSERT INTO f_polic_file_field
(id, tenant_id, file_id, filed_id, created_time, created_by, updated_time, updated_by, state)
VALUES (%s, %s, %s, %s, NOW(), %s, NOW(), %s, %s)
"""
cursor.execute(insert_sql, (
relation_id, TENANT_ID, template_id, field['id'],
CREATED_BY, UPDATED_BY, 1
))
relations_created += 1
else:
relations_created += 1
if not dry_run:
conn.commit()
print(f" ✓ 创建 {relations_created} 个字段关联关系")
else:
print(f" [模拟] 将创建 {relations_created} 个字段关联关系")
finally:
cursor.close()
def main():
"""主函数"""
print("="*80)
print("同步模板字段信息根据Excel数据设计文档")
print("="*80)
# 解析Excel
excel_data = parse_excel_data()
if not excel_data:
return
# 连接数据库
try:
conn = pymysql.connect(**DB_CONFIG)
print("\n✓ 数据库连接成功")
except Exception as e:
print(f"\n✗ 数据库连接失败: {e}")
return
try:
# 获取数据库中的模板和字段
print("\n获取数据库中的模板和字段...")
db_templates = get_database_templates(conn)
db_fields = get_database_fields(conn)
print(f" 数据库中有 {len(db_templates)} 个模板")
print(f" 数据库中有 {len(db_fields['by_code'])} 个字段")
# 匹配和更新
print("\n" + "="*80)
print("匹配模板并更新配置")
print("="*80)
matched_count = 0
unmatched_templates = []
for excel_template_name, template_info in excel_data.items():
print(f"\n处理模板: {excel_template_name}")
# 查找匹配的数据库模板
db_template = find_matching_template(excel_template_name, db_templates)
if not db_template:
print(f" ✗ 未找到匹配的数据库模板")
unmatched_templates.append(excel_template_name)
continue
print(f" ✓ 匹配到数据库模板: {db_template['name']} (ID: {db_template['id']})")
matched_count += 1
# 更新模板配置
template_code = template_info['template_code']
input_fields = template_info['input_fields']
output_fields = template_info['output_fields']
print(f" 模板编码: {template_code}")
print(f" 输入字段: {len(input_fields)}")
print(f" 输出字段: {len(output_fields)}")
# 先执行模拟更新
print(" [模拟模式]")
update_template_config(conn, db_template['id'], template_code, input_fields, dry_run=True)
update_template_field_relations(conn, db_template['id'], input_fields, output_fields, db_fields, dry_run=True)
# 显示统计
print("\n" + "="*80)
print("统计信息")
print("="*80)
print(f"Excel中的模板数: {len(excel_data)}")
print(f"成功匹配: {matched_count}")
print(f"未匹配: {len(unmatched_templates)}")
if unmatched_templates:
print("\n未匹配的模板:")
for template in unmatched_templates:
print(f" - {template}")
# 询问是否执行实际更新
print("\n" + "="*80)
response = input("\n是否执行实际更新?(yes/no默认no): ").strip().lower()
if response == 'yes':
print("\n执行实际更新...")
for excel_template_name, template_info in excel_data.items():
db_template = find_matching_template(excel_template_name, db_templates)
if db_template:
print(f"\n更新: {db_template['name']}")
update_template_config(conn, db_template['id'], template_info['template_code'],
template_info['input_fields'], dry_run=False)
update_template_field_relations(conn, db_template['id'],
template_info['input_fields'],
template_info['output_fields'],
db_fields, dry_run=False)
print("\n" + "="*80)
print("✓ 同步完成!")
print("="*80)
else:
print("\n已取消更新")
finally:
conn.close()
print("\n数据库连接已关闭")
if __name__ == '__main__':
main()

618
update_template_tree.py Normal file
View File

@ -0,0 +1,618 @@
"""
更新模板树状结构
根据 template_finish 目录结构更新数据库中的 parent_id 字段
"""
import os
import json
import pymysql
from pathlib import Path
from typing import Dict, List, Optional, Tuple
from datetime import datetime
# 数据库连接配置
DB_CONFIG = {
'host': os.getenv('DB_HOST', '152.136.177.240'),
'port': int(os.getenv('DB_PORT', 5012)),
'user': os.getenv('DB_USER', 'finyx'),
'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'),
'database': os.getenv('DB_NAME', 'finyx'),
'charset': 'utf8mb4'
}
TENANT_ID = 615873064429507639
CREATED_BY = 655162080928945152
UPDATED_BY = 655162080928945152
# 项目根目录
PROJECT_ROOT = Path(__file__).parent
TEMPLATES_DIR = PROJECT_ROOT / "template_finish"
# 从 init_all_templates.py 复制的文档类型映射
DOCUMENT_TYPE_MAPPING = {
"1.请示报告卡XXX": {
"template_code": "REPORT_CARD",
"name": "1.请示报告卡XXX",
"business_type": "INVESTIGATION"
},
"2.初步核实审批表XXX": {
"template_code": "PRELIMINARY_VERIFICATION_APPROVAL",
"name": "2.初步核实审批表XXX",
"business_type": "INVESTIGATION"
},
"3.附件初核方案(XXX)": {
"template_code": "INVESTIGATION_PLAN",
"name": "3.附件初核方案(XXX)",
"business_type": "INVESTIGATION"
},
"谈话通知书第一联": {
"template_code": "NOTIFICATION_LETTER_1",
"name": "谈话通知书第一联",
"business_type": "INVESTIGATION"
},
"谈话通知书第二联": {
"template_code": "NOTIFICATION_LETTER_2",
"name": "谈话通知书第二联",
"business_type": "INVESTIGATION"
},
"谈话通知书第三联": {
"template_code": "NOTIFICATION_LETTER_3",
"name": "谈话通知书第三联",
"business_type": "INVESTIGATION"
},
"1.请示报告卡(初核谈话)": {
"template_code": "REPORT_CARD_INTERVIEW",
"name": "1.请示报告卡(初核谈话)",
"business_type": "INVESTIGATION"
},
"2谈话审批表": {
"template_code": "INTERVIEW_APPROVAL_FORM",
"name": "2谈话审批表",
"business_type": "INVESTIGATION"
},
"3.谈话前安全风险评估表": {
"template_code": "PRE_INTERVIEW_RISK_ASSESSMENT",
"name": "3.谈话前安全风险评估表",
"business_type": "INVESTIGATION"
},
"4.谈话方案": {
"template_code": "INTERVIEW_PLAN",
"name": "4.谈话方案",
"business_type": "INVESTIGATION"
},
"5.谈话后安全风险评估表": {
"template_code": "POST_INTERVIEW_RISK_ASSESSMENT",
"name": "5.谈话后安全风险评估表",
"business_type": "INVESTIGATION"
},
"1.谈话笔录": {
"template_code": "INTERVIEW_RECORD",
"name": "1.谈话笔录",
"business_type": "INVESTIGATION"
},
"2.谈话询问对象情况摸底调查30问": {
"template_code": "INVESTIGATION_30_QUESTIONS",
"name": "2.谈话询问对象情况摸底调查30问",
"business_type": "INVESTIGATION"
},
"3.被谈话人权利义务告知书": {
"template_code": "RIGHTS_OBLIGATIONS_NOTICE",
"name": "3.被谈话人权利义务告知书",
"business_type": "INVESTIGATION"
},
"4.点对点交接单": {
"template_code": "HANDOVER_FORM",
"name": "4.点对点交接单",
"business_type": "INVESTIGATION"
},
"4.点对点交接单2": {
"template_code": "HANDOVER_FORM_2",
"name": "4.点对点交接单2",
"business_type": "INVESTIGATION"
},
"5.陪送交接单(新)": {
"template_code": "ESCORT_HANDOVER_FORM",
"name": "5.陪送交接单(新)",
"business_type": "INVESTIGATION"
},
"6.1保密承诺书(谈话对象使用-非中共党员用)": {
"template_code": "CONFIDENTIALITY_COMMITMENT_NON_PARTY",
"name": "6.1保密承诺书(谈话对象使用-非中共党员用)",
"business_type": "INVESTIGATION"
},
"6.2保密承诺书(谈话对象使用-中共党员用)": {
"template_code": "CONFIDENTIALITY_COMMITMENT_PARTY",
"name": "6.2保密承诺书(谈话对象使用-中共党员用)",
"business_type": "INVESTIGATION"
},
"7.办案人员-办案安全保密承诺书": {
"template_code": "INVESTIGATOR_CONFIDENTIALITY_COMMITMENT",
"name": "7.办案人员-办案安全保密承诺书",
"business_type": "INVESTIGATION"
},
"8-1请示报告卡初核报告结论 ": {
"template_code": "REPORT_CARD_CONCLUSION",
"name": "8-1请示报告卡初核报告结论 ",
"business_type": "INVESTIGATION"
},
"8.XXX初核情况报告": {
"template_code": "INVESTIGATION_REPORT",
"name": "8.XXX初核情况报告",
"business_type": "INVESTIGATION"
}
}
def generate_id():
"""生成ID使用时间戳+随机数的方式,模拟雪花算法)"""
import time
import random
timestamp = int(time.time() * 1000)
random_part = random.randint(100000, 999999)
return timestamp * 1000 + random_part
def normalize_name(name: str) -> str:
"""标准化名称,用于模糊匹配"""
import re
# 去掉开头的编号(如 "1."、"2."、"8-1" 等)
name = re.sub(r'^\d+[\.\-]\s*', '', name)
# 去掉括号及其内容(如 "XXX"、"(初核谈话)" 等)
name = re.sub(r'[(].*?[)]', '', name)
# 去掉空格和特殊字符
name = name.strip()
return name
def identify_document_type(file_name: str) -> Optional[Dict]:
"""根据完整文件名识别文档类型"""
base_name = Path(file_name).stem
if base_name in DOCUMENT_TYPE_MAPPING:
return DOCUMENT_TYPE_MAPPING[base_name]
return None
def scan_directory_structure(base_dir: Path) -> Dict:
"""扫描目录结构,构建树状层级"""
structure = {
'directories': {}, # {path: {'name': ..., 'parent': ..., 'level': ...}}
'files': {} # {file_path: {'name': ..., 'parent': ..., 'template_code': ...}}
}
def process_path(path: Path, parent_path: Optional[str] = None, level: int = 0):
"""递归处理路径"""
if path.is_file() and path.suffix == '.docx':
# 处理文件
file_name = path.stem
doc_config = identify_document_type(file_name)
structure['files'][str(path)] = {
'name': file_name,
'parent': parent_path,
'level': level,
'template_code': doc_config['template_code'] if doc_config else None,
'full_path': str(path),
'normalized_name': normalize_name(file_name)
}
elif path.is_dir():
# 处理目录
dir_name = path.name
structure['directories'][str(path)] = {
'name': dir_name,
'parent': parent_path,
'level': level,
'normalized_name': normalize_name(dir_name)
}
# 递归处理子目录和文件
for child in sorted(path.iterdir()):
if child.name != '__pycache__':
process_path(child, str(path), level + 1)
# 从根目录开始扫描
if TEMPLATES_DIR.exists():
for item in sorted(TEMPLATES_DIR.iterdir()):
if item.name != '__pycache__':
process_path(item, None, 0)
return structure
def find_matching_config(file_info: Dict, existing_data: Dict) -> Optional[Dict]:
"""
查找匹配的数据库记录
优先级1. template_code 精确匹配 2. 名称精确匹配 3. 标准化名称匹配
"""
template_code = file_info.get('template_code')
file_name = file_info['name']
normalized_name = file_info.get('normalized_name', normalize_name(file_name))
# 优先级1: template_code 精确匹配
if template_code:
matched = existing_data['by_template_code'].get(template_code)
if matched:
return matched
# 优先级2: 名称精确匹配
matched = existing_data['by_name'].get(file_name)
if matched:
return matched
# 优先级3: 标准化名称匹配
candidates = existing_data['by_normalized_name'].get(normalized_name, [])
if candidates:
# 如果有多个候选,优先选择有正确 template_code 的
for candidate in candidates:
if candidate.get('extracted_template_code') == template_code:
return candidate
# 否则返回第一个
return candidates[0]
return None
def get_existing_data(conn) -> Dict:
"""获取数据库中的现有数据"""
cursor = conn.cursor(pymysql.cursors.DictCursor)
sql = """
SELECT id, name, parent_id, template_code, input_data, file_path, state
FROM f_polic_file_config
WHERE tenant_id = %s
"""
cursor.execute(sql, (TENANT_ID,))
configs = cursor.fetchall()
result = {
'by_id': {},
'by_name': {},
'by_template_code': {},
'by_normalized_name': {} # 新增:标准化名称索引
}
for config in configs:
config_id = config['id']
config_name = config['name']
# 尝试从 input_data 中提取 template_code
template_code = config.get('template_code')
if not template_code and config.get('input_data'):
try:
input_data = json.loads(config['input_data']) if isinstance(config['input_data'], str) else config['input_data']
if isinstance(input_data, dict):
template_code = input_data.get('template_code')
except:
pass
config['extracted_template_code'] = template_code
config['normalized_name'] = normalize_name(config_name)
result['by_id'][config_id] = config
result['by_name'][config_name] = config
if template_code:
# 如果已存在相同 template_code保留第一个
if template_code not in result['by_template_code']:
result['by_template_code'][template_code] = config
# 标准化名称索引(可能有多个记录匹配同一个标准化名称)
normalized = config['normalized_name']
if normalized not in result['by_normalized_name']:
result['by_normalized_name'][normalized] = []
result['by_normalized_name'][normalized].append(config)
cursor.close()
return result
def plan_tree_structure(dir_structure: Dict, existing_data: Dict) -> List[Dict]:
"""规划树状结构"""
plan = []
# 按层级排序目录
directories = sorted(dir_structure['directories'].items(),
key=lambda x: (x[1]['level'], x[0]))
# 按层级排序文件
files = sorted(dir_structure['files'].items(),
key=lambda x: (x[1]['level'], x[0]))
# 创建目录映射用于查找父目录ID
dir_id_map = {} # {dir_path: config_id}
# 处理目录(按层级顺序)
for dir_path, dir_info in directories:
dir_name = dir_info['name']
parent_path = dir_info['parent']
level = dir_info['level']
# 查找父目录ID
parent_id = None
if parent_path:
parent_id = dir_id_map.get(parent_path)
# 查找匹配的数据库记录(使用改进的匹配逻辑)
existing = find_matching_config(dir_info, existing_data)
if existing:
# 使用现有记录
plan.append({
'type': 'directory',
'name': dir_name,
'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None,
'parent_id': parent_id,
'level': level,
'action': 'update',
'config_id': existing['id'],
'current_parent_id': existing.get('parent_id')
})
dir_id_map[dir_path] = existing['id']
else:
# 创建新记录(目录节点)
new_id = generate_id()
plan.append({
'type': 'directory',
'name': dir_name,
'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None,
'parent_id': parent_id,
'level': level,
'action': 'create',
'config_id': new_id,
'current_parent_id': None
})
dir_id_map[dir_path] = new_id
# 处理文件
for file_path, file_info in files:
file_name = file_info['name']
parent_path = file_info['parent']
level = file_info['level']
template_code = file_info['template_code']
# 查找父目录ID
parent_id = dir_id_map.get(parent_path) if parent_path else None
# 查找匹配的数据库记录(使用改进的匹配逻辑)
existing = find_matching_config(file_info, existing_data)
if existing:
# 更新现有记录
plan.append({
'type': 'file',
'name': file_name,
'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None,
'parent_id': parent_id,
'level': level,
'action': 'update',
'config_id': existing['id'],
'template_code': template_code,
'current_parent_id': existing.get('parent_id')
})
else:
# 创建新记录(文件节点)- 这种情况应该很少,因为文件应该已经在数据库中
new_id = generate_id()
plan.append({
'type': 'file',
'name': file_name,
'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None,
'parent_id': parent_id,
'level': level,
'action': 'create',
'config_id': new_id,
'template_code': template_code,
'current_parent_id': None
})
return plan
def print_preview(plan: List[Dict]):
"""打印更新预览"""
print("\n" + "="*80)
print("更新预览")
print("="*80)
# 按层级分组
by_level = {}
for item in plan:
level = item['level']
if level not in by_level:
by_level[level] = []
by_level[level].append(item)
# 按层级顺序显示
for level in sorted(by_level.keys()):
print(f"\n【层级 {level}")
for item in by_level[level]:
indent = " " * level
if item['action'] == 'create':
print(f"{indent}+ 创建: {item['name']} (ID: {item['config_id']})")
if item['parent_name']:
print(f"{indent} 父节点: {item['parent_name']}")
else:
current = item.get('current_parent_id', 'None')
new = item.get('parent_id', 'None')
if current != new:
print(f"{indent}→ 更新: {item['name']} (ID: {item['config_id']})")
print(f"{indent} parent_id: {current}{new}")
if item['parent_name']:
print(f"{indent} 父节点: {item['parent_name']}")
else:
print(f"{indent}✓ 无需更新: {item['name']} (parent_id 已正确)")
def execute_update(conn, plan: List[Dict], dry_run: bool = True):
"""执行更新"""
cursor = conn.cursor()
try:
if not dry_run:
conn.autocommit(False)
# 按层级分组
by_level = {}
for item in plan:
level = item['level']
if level not in by_level:
by_level[level] = []
by_level[level].append(item)
create_count = 0
update_count = 0
skip_count = 0
# 按层级顺序处理(从顶层到底层)
for level in sorted(by_level.keys()):
for item in by_level[level]:
if item['action'] == 'create':
# 创建新记录
if not dry_run:
if item['type'] == 'directory':
insert_sql = """
INSERT INTO f_polic_file_config
(id, tenant_id, parent_id, name, input_data, file_path, created_time, created_by, updated_time, updated_by, state)
VALUES (%s, %s, %s, %s, %s, %s, NOW(), %s, NOW(), %s, %s)
"""
cursor.execute(insert_sql, (
item['config_id'],
TENANT_ID,
item['parent_id'],
item['name'],
None,
None,
CREATED_BY,
UPDATED_BY,
1
))
else:
# 文件节点
input_data = json.dumps({
'template_code': item.get('template_code', ''),
'business_type': 'INVESTIGATION'
}, ensure_ascii=False)
insert_sql = """
INSERT INTO f_polic_file_config
(id, tenant_id, parent_id, name, input_data, file_path, template_code, created_time, created_by, updated_time, updated_by, state)
VALUES (%s, %s, %s, %s, %s, %s, %s, NOW(), %s, NOW(), %s, %s)
"""
cursor.execute(insert_sql, (
item['config_id'],
TENANT_ID,
item['parent_id'],
item['name'],
input_data,
None,
item.get('template_code'),
CREATED_BY,
UPDATED_BY,
1
))
create_count += 1
print(f"{'[模拟]' if dry_run else ''}创建: {item['name']}")
else:
# 更新现有记录
current_parent = item.get('current_parent_id')
new_parent = item.get('parent_id')
if current_parent != new_parent:
if not dry_run:
update_sql = """
UPDATE f_polic_file_config
SET parent_id = %s, updated_time = NOW(), updated_by = %s
WHERE id = %s AND tenant_id = %s
"""
cursor.execute(update_sql, (
new_parent,
UPDATED_BY,
item['config_id'],
TENANT_ID
))
update_count += 1
print(f"{'[模拟]' if dry_run else ''}更新: {item['name']} (parent_id: {current_parent}{new_parent})")
else:
skip_count += 1
if not dry_run:
conn.commit()
print(f"\n✓ 更新完成!")
else:
print(f"\n[模拟模式] 未实际执行更新")
print(f"\n统计:")
print(f" - 创建: {create_count}")
print(f" - 更新: {update_count}")
print(f" - 跳过: {skip_count}")
except Exception as e:
if not dry_run:
conn.rollback()
print(f"\n✗ 更新失败: {e}")
import traceback
traceback.print_exc()
raise
finally:
cursor.close()
def main():
"""主函数"""
print("="*80)
print("更新模板树状结构")
print("="*80)
# 连接数据库
try:
conn = pymysql.connect(**DB_CONFIG)
print("✓ 数据库连接成功\n")
except Exception as e:
print(f"✗ 数据库连接失败: {e}")
return
try:
# 扫描目录结构
print("扫描目录结构...")
dir_structure = scan_directory_structure(TEMPLATES_DIR)
print(f" 找到 {len(dir_structure['directories'])} 个目录")
print(f" 找到 {len(dir_structure['files'])} 个文件\n")
# 获取数据库现有数据
print("获取数据库现有数据...")
existing_data = get_existing_data(conn)
print(f" 数据库中有 {len(existing_data['by_id'])} 条记录\n")
# 规划树状结构
print("规划树状结构...")
plan = plan_tree_structure(dir_structure, existing_data)
print(f" 生成 {len(plan)} 个更新计划\n")
# 打印预览
print_preview(plan)
# 询问是否执行
print("\n" + "="*80)
response = input("\n是否执行更新?(yes/no默认no): ").strip().lower()
if response == 'yes':
# 先执行一次模拟
print("\n执行模拟更新...")
execute_update(conn, plan, dry_run=True)
# 再次确认
print("\n" + "="*80)
confirm = input("\n确认执行实际更新?(yes/no默认no): ").strip().lower()
if confirm == 'yes':
print("\n执行实际更新...")
execute_update(conn, plan, dry_run=False)
else:
print("\n已取消更新")
else:
print("\n已取消更新")
finally:
conn.close()
print("\n数据库连接已关闭")
if __name__ == '__main__':
main()

159
update_template_tree.sql Normal file
View File

@ -0,0 +1,159 @@
-- 模板树状结构更新脚本
-- 生成时间: 2025-12-09 17:39:51
-- 注意:执行前请备份数据库!
USE finyx;
START TRANSACTION;
-- ===== 层级 0 =====
-- 创建目录节点: 2-初核模版
INSERT INTO f_polic_file_config
(id, tenant_id, parent_id, name, input_data, file_path, created_time, created_by, updated_time, updated_by, state)
VALUES (1765273080357704, 615873064429507639, NULL, '2-初核模版', NULL, NULL, NOW(), 655162080928945152, NOW(), 655162080928945152, 1);
-- ===== 层级 1 =====
-- 创建目录节点: 1.初核请示
INSERT INTO f_polic_file_config
(id, tenant_id, parent_id, name, input_data, file_path, created_time, created_by, updated_time, updated_by, state)
VALUES (1765273080719940, 615873064429507639, 1765273080357704, '1.初核请示', NULL, NULL, NOW(), 655162080928945152, NOW(), 655162080928945152, 1);
-- 更新: 2.谈话审批 (parent_id: None -> 1765273080357704)
UPDATE f_polic_file_config
SET parent_id = 1765273080357704, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 704825582342212610 AND tenant_id = 615873064429507639;
-- 更新: 3.初核结论 (parent_id: None -> 1765273080357704)
UPDATE f_polic_file_config
SET parent_id = 1765273080357704, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 704825582342212611 AND tenant_id = 615873064429507639;
-- ===== 层级 2 =====
-- 更新: 谈话通知书 (parent_id: None -> 704825582342212610)
UPDATE f_polic_file_config
SET parent_id = 704825582342212610, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1764836033451564 AND tenant_id = 615873064429507639;
-- 更新: 走读式谈话审批 (parent_id: None -> 704825582342212610)
UPDATE f_polic_file_config
SET parent_id = 704825582342212610, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1764836034070056 AND tenant_id = 615873064429507639;
-- 更新: 走读式谈话流程 (parent_id: None -> 704825582342212610)
UPDATE f_polic_file_config
SET parent_id = 704825582342212610, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1764836034052009 AND tenant_id = 615873064429507639;
-- 更新: 1.请示报告卡XXX (parent_id: None -> 1765273080719940)
UPDATE f_polic_file_config
SET parent_id = 1765273080719940, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1764836033251691 AND tenant_id = 615873064429507639;
-- 更新: 2.初步核实审批表XXX (parent_id: None -> 1765273080719940)
UPDATE f_polic_file_config
SET parent_id = 1765273080719940, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1764656918061150 AND tenant_id = 615873064429507639;
-- 更新: 3.附件初核方案(XXX) (parent_id: None -> 1765273080719940)
UPDATE f_polic_file_config
SET parent_id = 1765273080719940, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1765242273284972 AND tenant_id = 615873064429507639;
-- 更新: 8-1请示报告卡初核报告结论 (parent_id: None -> 704825582342212611)
UPDATE f_polic_file_config
SET parent_id = 704825582342212611, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1765242278419277 AND tenant_id = 615873064429507639;
-- 更新: 8.XXX初核情况报告 (parent_id: None -> 704825582342212611)
UPDATE f_polic_file_config
SET parent_id = 704825582342212611, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1765242278832792 AND tenant_id = 615873064429507639;
-- ===== 层级 3 =====
-- 更新: 谈话通知书第一联 (parent_id: None -> 1764836033451564)
UPDATE f_polic_file_config
SET parent_id = 1764836033451564, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1765242274101483 AND tenant_id = 615873064429507639;
-- 更新: 谈话通知书第三联 (parent_id: None -> 1764836033451564)
UPDATE f_polic_file_config
SET parent_id = 1764836033451564, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1765242274109904 AND tenant_id = 615873064429507639;
-- 更新: 谈话通知书第二联 (parent_id: None -> 1764836033451564)
UPDATE f_polic_file_config
SET parent_id = 1764836033451564, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1765242273898117 AND tenant_id = 615873064429507639;
-- 更新: 1.请示报告卡(初核谈话) (parent_id: None -> 1764836034070056)
UPDATE f_polic_file_config
SET parent_id = 1764836034070056, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1765242274961528 AND tenant_id = 615873064429507639;
-- 更新: 2谈话审批表 (parent_id: None -> 1764836034070056)
UPDATE f_polic_file_config
SET parent_id = 1764836034070056, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1765242275071133 AND tenant_id = 615873064429507639;
-- 更新: 3.谈话前安全风险评估表 (parent_id: None -> 1764836034070056)
UPDATE f_polic_file_config
SET parent_id = 1764836034070056, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1765242275362306 AND tenant_id = 615873064429507639;
-- 更新: 4.谈话方案 (parent_id: None -> 1764836034070056)
UPDATE f_polic_file_config
SET parent_id = 1764836034070056, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1765242275716334 AND tenant_id = 615873064429507639;
-- 更新: 5.谈话后安全风险评估表 (parent_id: None -> 1764836034070056)
UPDATE f_polic_file_config
SET parent_id = 1764836034070056, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1765242275780395 AND tenant_id = 615873064429507639;
-- 更新: 1.谈话笔录 (parent_id: None -> 1764836034052009)
UPDATE f_polic_file_config
SET parent_id = 1764836034052009, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1765242276549299 AND tenant_id = 615873064429507639;
-- 更新: 2.谈话询问对象情况摸底调查30问 (parent_id: None -> 1764836034052009)
UPDATE f_polic_file_config
SET parent_id = 1764836034052009, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1765242276522490 AND tenant_id = 615873064429507639;
-- 更新: 3.被谈话人权利义务告知书 (parent_id: None -> 1764836034052009)
UPDATE f_polic_file_config
SET parent_id = 1764836034052009, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1765242277165087 AND tenant_id = 615873064429507639;
-- 更新: 4.点对点交接单 (parent_id: None -> 1764836034052009)
UPDATE f_polic_file_config
SET parent_id = 1764836034052009, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1765242276709614 AND tenant_id = 615873064429507639;
-- 更新: 5.陪送交接单(新) (parent_id: None -> 1764836034052009)
UPDATE f_polic_file_config
SET parent_id = 1764836034052009, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1765242277149374 AND tenant_id = 615873064429507639;
-- 更新: 6.1保密承诺书(谈话对象使用-非中共党员用) (parent_id: None -> 1764836034052009)
UPDATE f_polic_file_config
SET parent_id = 1764836034052009, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1765242277776686 AND tenant_id = 615873064429507639;
-- 更新: 6.2保密承诺书(谈话对象使用-中共党员用) (parent_id: None -> 1764836034052009)
UPDATE f_polic_file_config
SET parent_id = 1764836034052009, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1765242277897239 AND tenant_id = 615873064429507639;
-- 更新: 7.办案人员-办案安全保密承诺书 (parent_id: None -> 1764836034052009)
UPDATE f_polic_file_config
SET parent_id = 1764836034052009, updated_time = NOW(), updated_by = 655162080928945152
WHERE id = 1765242278111656 AND tenant_id = 615873064429507639;
COMMIT;
-- 更新完成

148
verify_field_code_fix.py Normal file
View File

@ -0,0 +1,148 @@
"""
验证字段编码修复结果并处理剩余的真正问题
"""
import os
import pymysql
import re
from typing import Dict, List
# 数据库连接配置
DB_CONFIG = {
'host': os.getenv('DB_HOST', '152.136.177.240'),
'port': int(os.getenv('DB_PORT', 5012)),
'user': os.getenv('DB_USER', 'finyx'),
'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'),
'database': os.getenv('DB_NAME', 'finyx'),
'charset': 'utf8mb4'
}
TENANT_ID = 615873064429507639
def is_chinese(text: str) -> bool:
"""判断字符串是否包含中文字符"""
if not text:
return False
return bool(re.search(r'[\u4e00-\u9fff]', text))
def verify_fix():
"""验证修复结果"""
conn = pymysql.connect(**DB_CONFIG)
cursor = conn.cursor(pymysql.cursors.DictCursor)
print("="*80)
print("验证字段编码修复结果")
print("="*80)
# 查询所有字段
cursor.execute("""
SELECT id, name, filed_code, field_type, state
FROM f_polic_field
WHERE tenant_id = %s
ORDER BY name
""", (TENANT_ID,))
fields = cursor.fetchall()
# 找出仍然包含中文的field_code
chinese_fields = []
for field in fields:
if field['filed_code'] and is_chinese(field['filed_code']):
chinese_fields.append(field)
print(f"\n总共 {len(fields)} 个字段")
print(f"仍有 {len(chinese_fields)} 个字段的field_code包含中文:\n")
if chinese_fields:
for field in chinese_fields:
print(f" ID: {field['id']}")
print(f" 名称: {field['name']}")
print(f" field_code: {field['filed_code']}")
print(f" field_type: {field['field_type']}")
print()
# 检查重复的字段名称
name_to_fields = {}
for field in fields:
name = field['name']
if name not in name_to_fields:
name_to_fields[name] = []
name_to_fields[name].append(field)
duplicates = {name: fields_list for name, fields_list in name_to_fields.items()
if len(fields_list) > 1}
print(f"\n仍有 {len(duplicates)} 个重复的字段名称:\n")
for name, fields_list in duplicates.items():
print(f" 字段名称: {name} (共 {len(fields_list)} 条记录)")
for field in fields_list:
print(f" - ID: {field['id']}, field_code: {field['filed_code']}, "
f"field_type: {field['field_type']}, state: {field['state']}")
print()
# 检查f_polic_file_field表中的关联关系
print("="*80)
print("检查 f_polic_file_field 表")
print("="*80)
cursor.execute("""
SELECT fff.id, fff.file_id, fff.filed_id,
fc.name as file_name, f.name as field_name, f.filed_code
FROM f_polic_file_field fff
LEFT JOIN f_polic_file_config fc ON fff.file_id = fc.id
LEFT JOIN f_polic_field f ON fff.filed_id = f.id
WHERE fff.tenant_id = %s AND f.filed_code IS NOT NULL
ORDER BY fff.file_id, fff.filed_id
""", (TENANT_ID,))
relations = cursor.fetchall()
# 检查是否有重复的关联关系
relation_keys = {}
for rel in relations:
key = (rel['file_id'], rel['filed_id'])
if key not in relation_keys:
relation_keys[key] = []
relation_keys[key].append(rel)
duplicate_relations = {key: records for key, records in relation_keys.items()
if len(records) > 1}
print(f"\n总共 {len(relations)} 个关联关系")
print(f"发现 {len(duplicate_relations)} 个重复的关联关系")
# 检查使用中文field_code的关联关系
chinese_relations = [rel for rel in relations
if rel['filed_code'] and is_chinese(rel['filed_code'])]
print(f"使用中文field_code的关联关系: {len(chinese_relations)}")
if chinese_relations:
print("\n前10个使用中文field_code的关联关系:")
for rel in chinese_relations[:10]:
print(f" - 文件: {rel['file_name']}, 字段: {rel['field_name']}, "
f"field_code: {rel['filed_code']}")
cursor.close()
conn.close()
return {
'total_fields': len(fields),
'chinese_fields': len(chinese_fields),
'duplicate_names': len(duplicates),
'duplicate_relations': len(duplicate_relations),
'chinese_relations': len(chinese_relations)
}
if __name__ == '__main__':
result = verify_fix()
print("\n" + "="*80)
print("验证完成")
print("="*80)
print(f"总字段数: {result['total_fields']}")
print(f"中文field_code字段数: {result['chinese_fields']}")
print(f"重复字段名称数: {result['duplicate_names']}")
print(f"重复关联关系数: {result['duplicate_relations']}")
print(f"使用中文field_code的关联关系数: {result['chinese_relations']}")

View File

@ -0,0 +1,345 @@
"""
验证模板字段同步结果
检查 input_datatemplate_code 和字段关联关系是否正确
"""
import os
import json
import pymysql
from typing import Dict, List
# 数据库连接配置
DB_CONFIG = {
'host': os.getenv('DB_HOST', '152.136.177.240'),
'port': int(os.getenv('DB_PORT', 5012)),
'user': os.getenv('DB_USER', 'finyx'),
'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'),
'database': os.getenv('DB_NAME', 'finyx'),
'charset': 'utf8mb4'
}
TENANT_ID = 615873064429507639
def verify_template_configs(conn):
"""验证模板配置的 input_data 和 template_code"""
cursor = conn.cursor(pymysql.cursors.DictCursor)
print("="*80)
print("验证模板配置")
print("="*80)
sql = """
SELECT id, name, template_code, input_data, parent_id
FROM f_polic_file_config
WHERE tenant_id = %s
ORDER BY parent_id, name
"""
cursor.execute(sql, (TENANT_ID,))
configs = cursor.fetchall()
print(f"\n{len(configs)} 个模板配置\n")
# 统计
has_template_code = 0
has_input_data = 0
has_both = 0
missing_both = 0
# 文件节点(有 template_code 的)
file_nodes = []
# 目录节点(没有 template_code 的)
dir_nodes = []
for config in configs:
template_code = config.get('template_code')
input_data = config.get('input_data')
if template_code:
has_template_code += 1
file_nodes.append(config)
else:
dir_nodes.append(config)
if input_data:
has_input_data += 1
try:
input_data_dict = json.loads(input_data) if isinstance(input_data, str) else input_data
if isinstance(input_data_dict, dict) and input_data_dict.get('template_code'):
has_both += 1
except:
pass
if not template_code and not input_data:
missing_both += 1
print("统计信息:")
print(f" 文件节点(有 template_code: {len(file_nodes)}")
print(f" 目录节点(无 template_code: {len(dir_nodes)}")
print(f" 有 input_data: {has_input_data}")
print(f" 同时有 template_code 和 input_data: {has_both}")
print(f" 两者都没有: {missing_both}")
# 检查文件节点的 input_data
print("\n文件节点 input_data 检查:")
missing_input_data = []
for config in file_nodes:
input_data = config.get('input_data')
if not input_data:
missing_input_data.append(config)
else:
try:
input_data_dict = json.loads(input_data) if isinstance(input_data, str) else input_data
if not isinstance(input_data_dict, dict) or 'template_code' not in input_data_dict:
missing_input_data.append(config)
except:
missing_input_data.append(config)
if missing_input_data:
print(f" ⚠ 有 {len(missing_input_data)} 个文件节点缺少或格式错误的 input_data:")
for config in missing_input_data[:10]: # 只显示前10个
print(f" - {config['name']} (ID: {config['id']})")
if len(missing_input_data) > 10:
print(f" ... 还有 {len(missing_input_data) - 10}")
else:
print(" ✓ 所有文件节点都有正确的 input_data")
cursor.close()
return {
'total': len(configs),
'file_nodes': len(file_nodes),
'dir_nodes': len(dir_nodes),
'has_input_data': has_input_data,
'has_both': has_both,
'missing_input_data': len(missing_input_data)
}
def verify_field_relations(conn):
"""验证字段关联关系"""
cursor = conn.cursor(pymysql.cursors.DictCursor)
print("\n" + "="*80)
print("验证字段关联关系")
print("="*80)
# 获取所有文件节点的字段关联
sql = """
SELECT
fc.id as file_id,
fc.name as file_name,
fc.template_code,
COUNT(ff.id) as field_count,
SUM(CASE WHEN f.field_type = 1 THEN 1 ELSE 0 END) as input_field_count,
SUM(CASE WHEN f.field_type = 2 THEN 1 ELSE 0 END) as output_field_count
FROM f_polic_file_config fc
LEFT JOIN f_polic_file_field ff ON fc.id = ff.file_id AND ff.tenant_id = fc.tenant_id
LEFT JOIN f_polic_field f ON ff.filed_id = f.id AND f.tenant_id = fc.tenant_id
WHERE fc.tenant_id = %s AND fc.template_code IS NOT NULL
GROUP BY fc.id, fc.name, fc.template_code
ORDER BY fc.name
"""
cursor.execute(sql, (TENANT_ID,))
relations = cursor.fetchall()
print(f"\n{len(relations)} 个文件节点有字段关联\n")
# 统计
has_relations = 0
no_relations = 0
has_input_fields = 0
has_output_fields = 0
no_relation_templates = []
for rel in relations:
field_count = rel['field_count'] or 0
input_count = rel['input_field_count'] or 0
output_count = rel['output_field_count'] or 0
if field_count > 0:
has_relations += 1
if input_count > 0:
has_input_fields += 1
if output_count > 0:
has_output_fields += 1
else:
no_relations += 1
no_relation_templates.append(rel)
print("统计信息:")
print(f" 有字段关联: {has_relations}")
print(f" 无字段关联: {no_relations}")
print(f" 有输入字段: {has_input_fields}")
print(f" 有输出字段: {has_output_fields}")
if no_relation_templates:
print(f"\n ⚠ 有 {len(no_relation_templates)} 个文件节点没有字段关联:")
for rel in no_relation_templates[:10]:
print(f" - {rel['file_name']} (code: {rel['template_code']})")
if len(no_relation_templates) > 10:
print(f" ... 还有 {len(no_relation_templates) - 10}")
else:
print("\n ✓ 所有文件节点都有字段关联")
# 显示详细的关联信息前10个
print("\n字段关联详情前10个")
for rel in relations[:10]:
print(f"\n {rel['file_name']} (code: {rel['template_code']})")
print(f" 总字段数: {rel['field_count']}")
print(f" 输入字段: {rel['input_field_count']}")
print(f" 输出字段: {rel['output_field_count']}")
cursor.close()
return {
'total': len(relations),
'has_relations': has_relations,
'no_relations': no_relations,
'has_input_fields': has_input_fields,
'has_output_fields': has_output_fields
}
def verify_input_data_structure(conn):
"""验证 input_data 的结构"""
cursor = conn.cursor(pymysql.cursors.DictCursor)
print("\n" + "="*80)
print("验证 input_data 结构")
print("="*80)
sql = """
SELECT id, name, template_code, input_data
FROM f_polic_file_config
WHERE tenant_id = %s AND template_code IS NOT NULL AND input_data IS NOT NULL
"""
cursor.execute(sql, (TENANT_ID,))
configs = cursor.fetchall()
print(f"\n检查 {len(configs)} 个有 input_data 的文件节点\n")
correct_structure = 0
incorrect_structure = 0
incorrect_items = []
for config in configs:
try:
input_data = json.loads(config['input_data']) if isinstance(config['input_data'], str) else config['input_data']
if not isinstance(input_data, dict):
incorrect_structure += 1
incorrect_items.append({
'name': config['name'],
'reason': 'input_data 不是字典格式'
})
continue
# 检查必需字段
required_fields = ['template_code', 'business_type']
missing_fields = [f for f in required_fields if f not in input_data]
if missing_fields:
incorrect_structure += 1
incorrect_items.append({
'name': config['name'],
'reason': f'缺少字段: {", ".join(missing_fields)}'
})
continue
# 检查 template_code 是否匹配
if input_data.get('template_code') != config.get('template_code'):
incorrect_structure += 1
incorrect_items.append({
'name': config['name'],
'reason': f"template_code 不匹配: input_data中为 '{input_data.get('template_code')}', 字段中为 '{config.get('template_code')}'"
})
continue
correct_structure += 1
except json.JSONDecodeError as e:
incorrect_structure += 1
incorrect_items.append({
'name': config['name'],
'reason': f'JSON解析错误: {str(e)}'
})
except Exception as e:
incorrect_structure += 1
incorrect_items.append({
'name': config['name'],
'reason': f'其他错误: {str(e)}'
})
print(f" 结构正确: {correct_structure}")
print(f" 结构错误: {incorrect_structure}")
if incorrect_items:
print("\n 错误详情:")
for item in incorrect_items[:10]:
print(f" - {item['name']}: {item['reason']}")
if len(incorrect_items) > 10:
print(f" ... 还有 {len(incorrect_items) - 10} 个错误")
else:
print("\n ✓ 所有 input_data 结构都正确")
cursor.close()
return {
'correct': correct_structure,
'incorrect': incorrect_structure
}
def main():
"""主函数"""
print("="*80)
print("验证模板字段同步结果")
print("="*80)
try:
conn = pymysql.connect(**DB_CONFIG)
print("✓ 数据库连接成功\n")
except Exception as e:
print(f"✗ 数据库连接失败: {e}")
return
try:
# 验证模板配置
config_stats = verify_template_configs(conn)
# 验证字段关联
relation_stats = verify_field_relations(conn)
# 验证 input_data 结构
input_data_stats = verify_input_data_structure(conn)
# 总结
print("\n" + "="*80)
print("验证总结")
print("="*80)
print(f"模板配置:")
print(f" - 总模板数: {config_stats['total']}")
print(f" - 文件节点: {config_stats['file_nodes']}")
print(f" - 缺少 input_data: {config_stats['missing_input_data']}")
print(f"\n字段关联:")
print(f" - 有字段关联: {relation_stats['has_relations']}")
print(f" - 无字段关联: {relation_stats['no_relations']}")
print(f"\ninput_data 结构:")
print(f" - 正确: {input_data_stats['correct']}")
print(f" - 错误: {input_data_stats['incorrect']}")
# 总体评估
print("\n" + "="*80)
if (config_stats['missing_input_data'] == 0 and
relation_stats['no_relations'] == 0 and
input_data_stats['incorrect'] == 0):
print("✓ 所有验证通过!同步成功!")
else:
print("⚠ 发现一些问题,请检查上述详情")
finally:
conn.close()
print("\n数据库连接已关闭")
if __name__ == '__main__':
main()

169
verify_tree_structure.py Normal file
View File

@ -0,0 +1,169 @@
"""
验证树状结构更新结果
"""
import os
import json
import pymysql
from typing import Dict, List
# 数据库连接配置
DB_CONFIG = {
'host': os.getenv('DB_HOST', '152.136.177.240'),
'port': int(os.getenv('DB_PORT', 5012)),
'user': os.getenv('DB_USER', 'finyx'),
'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'),
'database': os.getenv('DB_NAME', 'finyx'),
'charset': 'utf8mb4'
}
TENANT_ID = 615873064429507639
def print_tree_structure(conn):
"""打印树状结构"""
cursor = conn.cursor(pymysql.cursors.DictCursor)
sql = """
SELECT id, name, parent_id, template_code, input_data, state
FROM f_polic_file_config
WHERE tenant_id = %s
ORDER BY parent_id, name
"""
cursor.execute(sql, (TENANT_ID,))
configs = cursor.fetchall()
# 构建ID到配置的映射
id_to_config = {config['id']: config for config in configs}
# 找出根节点parent_id为NULL
root_nodes = [config for config in configs if config.get('parent_id') is None]
def print_node(config, indent=0, visited=None):
"""递归打印节点"""
if visited is None:
visited = set()
if config['id'] in visited:
return
visited.add(config['id'])
prefix = " " * indent
parent_info = ""
if config.get('parent_id'):
parent_name = id_to_config.get(config['parent_id'], {}).get('name', f"ID:{config['parent_id']}")
parent_info = f" [父: {parent_name}]"
template_code = config.get('template_code')
if not template_code and config.get('input_data'):
try:
input_data = json.loads(config['input_data']) if isinstance(config['input_data'], str) else config['input_data']
if isinstance(input_data, dict):
template_code = input_data.get('template_code')
except:
pass
template_info = f" [code: {template_code}]" if template_code else ""
state_info = " [启用]" if config.get('state') == 1 else " [未启用]"
print(f"{prefix}├─ {config['name']}{parent_info}{template_info}{state_info}")
# 打印子节点
children = [c for c in configs if c.get('parent_id') == config['id']]
for i, child in enumerate(sorted(children, key=lambda x: x['name'])):
is_last = i == len(children) - 1
if is_last:
print_node(child, indent + 1, visited)
else:
print_node(child, indent + 1, visited)
print("="*80)
print("树状结构")
print("="*80)
for root in sorted(root_nodes, key=lambda x: x['name']):
print_node(root)
print()
# 统计信息
print("="*80)
print("统计信息")
print("="*80)
print(f"总记录数: {len(configs)}")
print(f"根节点数: {len(root_nodes)}")
print(f"有父节点的记录: {len([c for c in configs if c.get('parent_id')])}")
print(f"无父节点的记录: {len([c for c in configs if not c.get('parent_id')])}")
cursor.close()
def verify_parent_relationships(conn):
"""验证父子关系"""
cursor = conn.cursor(pymysql.cursors.DictCursor)
sql = """
SELECT id, name, parent_id
FROM f_polic_file_config
WHERE tenant_id = %s AND parent_id IS NOT NULL
"""
cursor.execute(sql, (TENANT_ID,))
configs = cursor.fetchall()
print("\n" + "="*80)
print("验证父子关系")
print("="*80)
errors = []
for config in configs:
parent_id = config['parent_id']
check_sql = """
SELECT id, name FROM f_polic_file_config
WHERE id = %s AND tenant_id = %s
"""
cursor.execute(check_sql, (parent_id, TENANT_ID))
parent = cursor.fetchone()
if not parent:
errors.append({
'child': config['name'],
'child_id': config['id'],
'parent_id': parent_id,
'error': '父节点不存在'
})
if errors:
print(f"\n✗ 发现 {len(errors)} 个错误:")
for error in errors:
print(f" - {error['child']} (ID: {error['child_id']})")
print(f" 父节点ID {error['parent_id']} 不存在")
else:
print("\n✓ 所有父子关系验证通过")
cursor.close()
return len(errors) == 0
def main():
"""主函数"""
print("="*80)
print("验证树状结构")
print("="*80)
try:
conn = pymysql.connect(**DB_CONFIG)
print("✓ 数据库连接成功\n")
print_tree_structure(conn)
verify_parent_relationships(conn)
conn.close()
except Exception as e:
print(f"✗ 错误: {e}")
import traceback
traceback.print_exc()
if __name__ == '__main__':
main()

152
同步结果总结.md Normal file
View File

@ -0,0 +1,152 @@
# 模板字段同步结果总结
## 执行时间
根据验证脚本执行结果生成
## 同步状态概览
### ✅ 成功同步的部分
1. **模板配置 (f_polic_file_config)**
- ✓ 所有 23 个文件节点都有正确的 `template_code`
- ✓ 所有 23 个文件节点都有正确的 `input_data`
- ✓ 所有 `input_data` 结构都正确,包含:
- `template_code`: 模板编码
- `business_type`: 业务类型INVESTIGATION
- `input_fields`: 输入字段列表(部分模板)
2. **字段关联 (f_polic_file_field)**
- ✓ 19 个文件节点有完整的字段关联
- ✓ 17 个文件节点有输入字段关联
- ✓ 19 个文件节点有输出字段关联
### ⚠️ 需要关注的部分
1. **缺少字段关联的节点9个**
**目录节点5个- 正常情况,无需处理:**
- `1.初核请示` - 目录节点
- `2-初核模版` - 根目录节点
- `3.初核结论` - 目录节点
- `谈话通知书` - 目录节点(但 template_code 不为空,可能需要检查)
- `走读式谈话审批` - 目录节点(但 template_code 不为空,可能需要检查)
- `走读式谈话流程` - 目录节点(但 template_code 不为空,可能需要检查)
**文件节点4个- 需要检查:**
- `1.请示报告卡(初核谈话)` - template_code 为空,可能是匹配问题
- `2谈话审批表` - 有 template_code (INTERVIEW_APPROVAL_FORM),但无字段关联
- `6.1保密承诺书(谈话对象使用-非中共党员用)` - template_code 为空
## 详细统计
### 模板配置统计
- 总模板数: 28
- 文件节点: 23
- 目录节点: 5
- 有 input_data: 23
- 同时有 template_code 和 input_data: 23
- 缺少 input_data: 0
### 字段关联统计
- 有字段关联: 19 个
- 无字段关联: 9 个(其中 5 个是目录节点)
- 有输入字段: 17 个
- 有输出字段: 19 个
### input_data 结构验证
- 结构正确: 23 个
- 结构错误: 0 个
## 已同步的模板列表
根据验证结果,以下模板已成功同步:
1. `1.请示报告卡XXX` - 4个字段关联1输入+3输出
2. `2.初步核实审批表XXX` - 12个字段关联2输入+10输出
3. `3.附件初核方案(XXX)` - 10个字段关联2输入+8输出
4. `谈话通知书第一联` - 字段关联
5. `谈话通知书第二联` - 字段关联
6. `谈话通知书第三联` - 字段关联
7. `1.谈话笔录` - 8个字段关联1输入+7输出
8. `2.谈话询问对象情况摸底调查30问` - 11个字段关联1输入+10输出
9. `3.被谈话人权利义务告知书` - 字段关联
10. `4.点对点交接单` - 字段关联
11. `5.陪送交接单(新)` - 字段关联
12. `6.2保密承诺书(谈话对象使用-中共党员用)` - 字段关联
13. `7.办案人员-办案安全保密承诺书` - 字段关联
14. `2.谈话审批` - 13个字段关联2输入+11输出
15. `3.谈话前安全风险评估表` - 18个字段关联1输入+17输出
16. `4.谈话方案` - 8个字段关联1输入+7输出
17. `5.谈话后安全风险评估表` - 17个字段关联1输入+16输出
18. `8-1请示报告卡初核报告结论` - 5个字段关联1输入+4输出
19. `8.XXX初核情况报告` - 8个字段关联2输入+6输出
## 需要手动处理的问题
### 1. 目录节点的 template_code
以下目录节点有 template_code但按照设计应该是 NULL
- `谈话通知书` (code: 谈话通知书)
- `走读式谈话审批` (code: 走读式谈话审批)
- `走读式谈话流程` (code: 走读式谈话流程)
**建议处理:**
- 如果这些确实是目录节点,应该将 template_code 设置为 NULL
- 如果这些是文件节点,需要补充字段关联
### 2. 缺少字段关联的文件节点
以下文件节点有 template_code 但没有字段关联:
- `2谈话审批表` (code: INTERVIEW_APPROVAL_FORM)
**可能原因:**
- Excel 中对应的模板名称不匹配
- 字段定义不存在
- 需要手动检查并补充
### 3. template_code 为空的文件节点
以下文件节点应该是文件但 template_code 为空:
- `1.请示报告卡(初核谈话)`
- `6.1保密承诺书(谈话对象使用-非中共党员用)`
**可能原因:**
- Excel 中名称不匹配
- 需要手动检查并补充 template_code
## 建议的后续操作
1. **检查目录节点**
- 确认 `谈话通知书``走读式谈话审批``走读式谈话流程` 是目录还是文件
- 如果是目录,将 template_code 设置为 NULL
2. **补充缺失的字段关联**
- 检查 `2谈话审批表` 在 Excel 中的定义
- 确认字段是否存在
- 手动补充字段关联
3. **修复 template_code**
- 检查 `1.请示报告卡(初核谈话)``6.1保密承诺书(谈话对象使用-非中共党员用)` 的 template_code
- 根据 Excel 文档补充正确的 template_code
## 验证命令
运行以下命令验证同步结果:
```bash
python verify_template_fields_sync.py
```
## 总结
✅ **主要同步工作已完成**
- 23 个文件节点的 input_data 和 template_code 已正确同步
- 19 个文件节点有完整的字段关联
- input_data 结构全部正确
⚠️ **需要手动处理**
- 4 个文件节点缺少字段关联(需要检查 Excel 定义)
- 3 个目录节点有 template_code可能需要清理
总体同步成功率:**约 83%** (19/23 文件节点有完整关联)

24
备份数据库.bat Normal file
View File

@ -0,0 +1,24 @@
@echo off
chcp 65001 >nul
echo ========================================
echo 数据库备份工具
echo ========================================
echo.
REM 检查Python是否安装
python --version >nul 2>&1
if errorlevel 1 (
echo 错误: 未找到Python请先安装Python
pause
exit /b 1
)
REM 执行备份
python backup_database.py --compress
echo.
echo ========================================
echo 备份完成!
echo ========================================
pause

182
字段编码修复总结.md Normal file
View File

@ -0,0 +1,182 @@
# 字段编码修复总结
## 修复日期
2025-01-XX
## 修复目标
1. 分析并修复 `f_polic_field` 表中的中文 `field_code` 问题
2. 合并 `f_polic_file_field` 表中的重复项
3. 确保所有 `field_code` 与占位符与字段对照表文档中的英文名称对应
## 发现的问题
### 1. f_polic_field 表问题
- **初始状态**87个字段记录
- **中文field_code字段**69个
- **重复字段名称**8组每组2条记录
- **重复field_code**0个
### 2. f_polic_file_field 表问题
- **初始状态**144个关联关系
- **重复关联关系**0个已通过之前的修复处理
- **使用中文field_code的关联关系**81个
## 修复操作
### 第一阶段:主要字段修复
1. **更新37个字段的field_code**将中文field_code更新为英文field_code
2. **合并8组重复字段**
- 主要问题线索
- 初步核实审批表填表人
- 初步核实审批表承办部门意见
- 线索来源
- 被核查人员出生年月
- 被核查人员性别
- 被核查人员政治面貌
- 被核查人员职级
### 第二阶段:剩余字段修复
修复了24个剩余的中文field_code字段包括
- 谈话相关字段(拟谈话地点、拟谈话时间、谈话事由等)
- 被核查人员相关字段(被核查人员学历、工作履历、职业等)
- 其他字段(补空人员、记录人、评估意见等)
## 修复结果
### 最终状态
- **总字段数**79个
- **中文field_code字段数**4个系统字段保留
- 年龄 (ID: 704553856941259783)
- 用户 (ID: 704553856941259782)
- 用户名称 (ID: 704553856941259780)
- 用户名称1 (ID: 704553856941259781)
- **重复字段名称数**0个
- **重复关联关系数**0个
- **使用中文field_code的关联关系数**0个
### 字段映射对照
#### 基本信息字段
- `target_name` - 被核查人姓名
- `target_organization_and_position` - 被核查人员单位及职务 / 被核查人单位及职务
- `target_organization` - 被核查人员单位
- `target_position` - 被核查人员职务
- `target_gender` - 被核查人员性别
- `target_date_of_birth` - 被核查人员出生年月
- `target_date_of_birth_full` - 被核查人员出生年月日
- `target_age` - 被核查人员年龄
- `target_education_level` - 被核查人员文化程度
- `target_political_status` - 被核查人员政治面貌
- `target_professional_rank` - 被核查人员职级
- `target_id_number` - 被核查人员身份证号 / 被核查人员身份证件及号码
- `target_address` - 被核查人员住址
- `target_registered_address` - 被核查人员户籍住址
- `target_contact` - 被核查人员联系方式
- `target_place_of_origin` - 被核查人员籍贯
- `target_ethnicity` - 被核查人员民族
#### 问题相关字段
- `clue_source` - 线索来源
- `target_issue_description` - 主要问题线索
- `target_problem_description` - 被核查人问题描述
#### 审批相关字段
- `department_opinion` - 初步核实审批表承办部门意见
- `filler_name` - 初步核实审批表填表人
- `approval_time` - 批准时间
#### 核查相关字段
- `investigation_unit_name` - 核查单位名称
- `investigation_team_code` - 核查组代号
- `investigation_team_leader_name` - 核查组组长姓名
- `investigation_team_member_names` - 核查组成员姓名
- `investigation_location` - 核查地点
#### 风险评估相关字段
- `target_family_situation` - 被核查人员家庭情况
- `target_social_relations` - 被核查人员社会关系
- `target_health_status` - 被核查人员健康状况
- `target_personality` - 被核查人员性格特征
- `target_tolerance` - 被核查人员承受能力
- `target_issue_severity` - 被核查人员涉及问题严重程度
- `target_other_issues_possibility` - 被核查人员涉及其他问题的可能性
- `target_previous_investigation` - 被核查人员此前被审查情况
- `target_negative_events` - 被核查人员社会负面事件
- `target_other_situation` - 被核查人员其他情况
- `risk_level` - 风险等级
#### 谈话相关字段(新增)
- `proposed_interview_location` - 拟谈话地点
- `proposed_interview_time` - 拟谈话时间
- `interview_reason` - 谈话事由
- `interviewer` - 谈话人
- `interview_personnel_safety_officer` - 谈话人员-安全员
- `interview_personnel_leader` - 谈话人员-组长
- `interview_personnel` - 谈话人员-谈话人员
- `pre_interview_risk_assessment_result` - 谈话前安全风险评估结果
- `interview_location` - 谈话地点
- `interview_count` - 谈话次数
#### 其他新增字段
- `target_education` - 被核查人员学历
- `target_work_history` - 被核查人员工作履历
- `target_occupation` - 被核查人员职业
- `target_confession_level` - 被核查人员交代问题程度
- `target_behavior_after_relief` - 被核查人员减压后的表现
- `target_mental_burden_level` - 被核查人员思想负担程度
- `target_behavior_during_interview` - 被核查人员谈话中的表现
- `target_issue_severity_level` - 被核查人员问题严重程度
- `target_risk_level` - 被核查人员风险等级
- `target_basic_info` - 被核查人基本情况
- `backup_personnel` - 补空人员
- `recorder` - 记录人
- `assessment_opinion` - 评估意见
## 关联表检查
### f_polic_file_field 表
- ✅ 无重复关联关系
- ✅ 所有关联关系使用的field_code均为英文
### f_polic_task 表
- 检查了表结构未发现直接引用字段ID的列
- 表字段id, tenant_id, task_name, input_data, output_data, task_status, created_time, created_by, updated_time, updated_by, state
### f_polic_file 表
- 检查了表结构
- 表字段id, tenant_id, task_id, file_id, name, input_data, file_path, created_time, created_by, updated_time, updated_by, state
- 未发现需要更新的关联关系
## 使用的脚本
1. **analyze_and_fix_field_code_issues.py** - 主要分析和修复脚本
2. **verify_field_code_fix.py** - 验证修复结果
3. **fix_only_chinese_field_codes.py** - 修复剩余的中文field_code
4. **rollback_incorrect_updates.py** - 回滚错误的更新(已使用)
## 注意事项
1. **保留的系统字段**以下4个字段的field_code仍为中文这些可能是系统字段或测试数据暂时保留
- 年龄
- 用户
- 用户名称
- 用户名称1
2. **字段合并**:在合并重复字段时,系统自动更新了 `f_polic_file_field` 表中的关联关系,将删除字段的关联关系指向保留的字段。
3. **数据一致性**:所有修复操作都确保了数据的一致性,关联表已同步更新。
## 后续建议
1. 如果"年龄"、"用户"等字段是业务字段建议为其设置合适的英文field_code
2. 定期检查是否有新的中文field_code字段产生
3. 在新增字段时确保field_code使用英文命名规范
## 完成状态
✅ **主要修复任务已完成**
- 所有业务相关字段的field_code已更新为英文
- 重复字段已合并
- 关联表已同步更新
- 数据一致性已确保

View File

@ -0,0 +1,147 @@
# 性别和年龄字段缺失问题深度修复
## 问题描述
测试数据中明明有"男性"、"男"、"年龄44岁"等明确信息,但解析结果中`target_gender``target_age`都是空。
## 根本原因分析
### 问题1后处理逻辑无法访问原始输入文本
**问题**
- 后处理函数`_post_process_inferred_fields`只能访问模型返回的JSON解析结果`data`
- 如果模型根本没有提取这些字段,后处理也无法从原始输入文本中提取
- 后处理逻辑只能从已提取的数据中推断无法访问原始prompt
**影响**
- 即使原始输入文本中明确有"男性"、"年龄44岁"等信息
- 如果模型没有提取,后处理也无法补充
### 问题2模型可能没有正确提取
虽然我们强化了system prompt但模型可能仍然
- 忽略了某些字段
- 返回了空值
- 字段名错误导致规范化失败
## 修复方案
### 1. 增强后处理逻辑,支持从原始输入文本提取 ✅
**修改位置**`services/ai_service.py` 第1236-1350行
**改进内容**
1. **修改函数签名**,增加`prompt`参数:
```python
def _post_process_inferred_fields(self, data: Dict, output_fields: List[Dict], prompt: str = None) -> Dict:
```
2. **从原始输入文本中提取性别**
```python
# 如果仍然没有尝试从原始输入文本prompt中提取
if (not data.get('target_gender') or data.get('target_gender') == '') and prompt:
# 从prompt中提取输入文本部分通常在"输入文本:"之后)
input_text_match = re.search(r'输入文本[:]\s*\n(.*?)(?:\n\n需要提取的字段|$)', prompt, re.DOTALL)
if input_text_match:
input_text = input_text_match.group(1)
# 匹配性别关键词:男性、女性、男、女等
if re.search(r'\b男性\b|\b男\b', input_text) and not re.search(r'\b女性\b|\b女\b', input_text):
data['target_gender'] = '男'
elif re.search(r'\b女性\b|\b女\b', input_text) and not re.search(r'\b男性\b|\b男\b', input_text):
data['target_gender'] = '女'
elif re.search(r'[,]\s*([男女])\s*[,]', input_text):
gender_match = re.search(r'[,]\s*([男女])\s*[,]', input_text)
if gender_match:
data['target_gender'] = gender_match.group(1)
```
3. **从原始输入文本中提取年龄**
```python
# 如果还没有,尝试从原始输入文本中直接提取年龄
if (not data.get('target_age') or data.get('target_age') == '') and prompt:
input_text_match = re.search(r'输入文本[:]\s*\n(.*?)(?:\n\n需要提取的字段|$)', prompt, re.DOTALL)
if input_text_match:
input_text = input_text_match.group(1)
# 匹配年龄模式年龄44岁、44岁、年龄44等
age_match = re.search(r'年龄\s*(\d+)\s*岁|(\d+)\s*岁|年龄\s*(\d+)', input_text)
if age_match:
age = age_match.group(1) or age_match.group(2) or age_match.group(3)
if age:
data['target_age'] = str(age)
```
4. **更新所有调用点**,传入`prompt`参数:
```python
# 修改前
normalized_data = self._post_process_inferred_fields(normalized_data, output_fields)
# 修改后
normalized_data = self._post_process_inferred_fields(normalized_data, output_fields, prompt)
```
### 2. 提取逻辑的优先级
后处理逻辑按以下优先级提取字段:
**对于性别target_gender**
1. 从`target_work_basic_info`中提取(匹配`XXX...`模式)
2. 从所有已提取的文本字段中查找(使用正则表达式)
3. **从原始输入文本中提取**(新增)
**对于年龄target_age**
1. 从`target_date_of_birth`计算(根据出生年月和当前年份)
2. **从原始输入文本中直接提取**(新增,匹配"年龄44岁"等模式)
## 预期效果
1. **提高字段提取成功率**
- 即使模型没有提取,后处理也能从原始输入文本中提取
- 多层保障确保关键字段不会为空
2. **增强容错能力**
- 不依赖模型的提取准确性
- 即使模型返回空值,也能从原始输入中补充
3. **提高数据完整性**
- 确保性别、年龄等关键字段有值
- 减少空值的情况
## 测试建议
1. **功能测试**
- 使用包含"男性"、"年龄44岁"的测试数据
- 验证后处理是否能从原始输入文本中提取
- 检查日志输出,确认提取来源
2. **边界测试**
- 测试性别信息在不同位置的情况
- 测试年龄的不同表述方式("44岁"、"年龄44"、"年龄44岁"等)
- 测试模型返回空值的情况
3. **日志检查**
- 查看日志中的"后处理"信息
- 确认是从哪个来源提取的字段
- 验证提取逻辑是否正确执行
## 调试建议
如果问题仍然存在,可以:
1. **检查日志输出**
- 查看`[AI服务] 后处理:从原始输入文本中提取...`的日志
- 确认prompt是否正确传入
- 确认正则表达式是否匹配成功
2. **手动测试正则表达式**
- 测试`r'输入文本[:]\s*\n(.*?)(?:\n\n需要提取的字段|$)'`是否能正确提取输入文本
- 测试性别和年龄的正则表达式是否能匹配
3. **检查prompt格式**
- 确认prompt中确实包含"输入文本:"标签
- 确认输入文本的格式是否符合预期
## 总结
通过增强后处理逻辑让它能够访问原始输入文本prompt即使模型没有正确提取字段也能从原始输入中补充。这提供了多层保障确保关键字段不会为空。

41
恢复数据库.bat Normal file
View File

@ -0,0 +1,41 @@
@echo off
chcp 65001 >nul
echo ========================================
echo 数据库恢复工具
echo ========================================
echo.
echo 警告: 恢复操作会覆盖现有数据!
echo.
REM 检查Python是否安装
python --version >nul 2>&1
if errorlevel 1 (
echo 错误: 未找到Python请先安装Python
pause
exit /b 1
)
REM 检查是否提供了备份文件路径
if "%~1"=="" (
echo 用法: 恢复数据库.bat [备份文件路径]
echo.
echo 示例:
echo 恢复数据库.bat backups\backup_finyx_20241205_120000.sql
echo 恢复数据库.bat backups\backup_finyx_20241205_120000.sql.gz
echo.
echo 可用的备份文件:
python backup_database.py --list
echo.
pause
exit /b 1
)
REM 执行恢复
python restore_database.py "%~1"
echo.
echo ========================================
echo 恢复完成!
echo ========================================
pause

View File

@ -0,0 +1,216 @@
# 数据库备份和恢复工具使用说明
## 概述
本项目提供了两个Python脚本用于MySQL数据库的备份和恢复
- `backup_database.py` - 数据库备份脚本
- `restore_database.py` - 数据库恢复脚本
## 功能特性
### 备份功能
- ✅ 支持使用 `mysqldump` 命令备份(推荐,速度快)
- ✅ 支持使用 Python 直接连接备份(备用方案)
- ✅ 自动检测可用方法auto模式
- ✅ 支持压缩备份文件(.sql.gz格式
- ✅ 备份包含表结构、数据、存储过程、触发器、事件等
- ✅ 自动生成带时间戳的备份文件名
- ✅ 列出所有备份文件
### 恢复功能
- ✅ 支持使用 `mysql` 命令恢复(推荐,速度快)
- ✅ 支持使用 Python 直接连接恢复(备用方案)
- ✅ 自动检测可用方法auto模式
- ✅ 支持恢复压缩的备份文件(.sql.gz格式
- ✅ 可选择恢复前删除现有数据库
- ✅ 测试数据库连接功能
## 环境要求
- Python 3.6+
- pymysql 库(已包含在 requirements.txt 中)
- MySQL客户端工具可选用于mysqldump/mysql命令
- 数据库连接配置(通过环境变量或默认配置)
## 安装依赖
```bash
pip install pymysql python-dotenv
```
## 使用方法
### 1. 数据库备份
#### 基本用法(自动选择方法)
```bash
python backup_database.py
```
#### 指定备份方法
```bash
# 使用mysqldump命令备份
python backup_database.py --method mysqldump
# 使用Python方式备份
python backup_database.py --method python
```
#### 指定输出文件
```bash
python backup_database.py --output backups/my_backup.sql
```
#### 压缩备份文件
```bash
python backup_database.py --compress
```
#### 列出所有备份文件
```bash
python backup_database.py --list
```
#### 完整示例
```bash
# 使用mysqldump备份并压缩
python backup_database.py --method mysqldump --compress --output backups/finyx_backup.sql.gz
```
### 2. 数据库恢复
#### 基本用法(自动选择方法)
```bash
python restore_database.py backups/backup_finyx_20241205_120000.sql
```
#### 指定恢复方法
```bash
# 使用mysql命令恢复
python restore_database.py backups/backup.sql --method mysql
# 使用Python方式恢复
python restore_database.py backups/backup.sql --method python
```
#### 恢复压缩的备份文件
```bash
python restore_database.py backups/backup.sql.gz
```
#### 恢复前删除现有数据库(危险操作)
```bash
python restore_database.py backups/backup.sql --drop-db
```
#### 测试数据库连接
```bash
python restore_database.py --test
```
#### 完整示例
```bash
# 恢复压缩的备份文件,恢复前删除现有数据库
python restore_database.py backups/backup.sql.gz --drop-db --method mysql
```
## 备份文件存储
- 默认备份目录:`backups/`
- 备份文件命名格式:`backup_{数据库名}_{时间戳}.sql`
- 压缩文件格式:`backup_{数据库名}_{时间戳}.sql.gz`
- 时间戳格式:`YYYYMMDD_HHMMSS`
## 数据库配置
脚本会自动从以下位置读取数据库配置:
1. **环境变量**(优先):
- `DB_HOST` - 数据库主机(默认: 152.136.177.240
- `DB_PORT` - 数据库端口(默认: 5012
- `DB_USER` - 数据库用户名(默认: finyx
- `DB_PASSWORD` - 数据库密码(默认: 6QsGK6MpePZDE57Z
- `DB_NAME` - 数据库名称(默认: finyx
2. **.env文件**
在项目根目录创建 `.env` 文件:
```env
DB_HOST=152.136.177.240
DB_PORT=5012
DB_USER=finyx
DB_PASSWORD=6QsGK6MpePZDE57Z
DB_NAME=finyx
```
## 注意事项
### 备份注意事项
1. ⚠️ 备份大数据库时可能需要较长时间,请耐心等待
2. ⚠️ 确保有足够的磁盘空间存储备份文件
3. ⚠️ 建议定期备份,并保留多个备份版本
4. ⚠️ 生产环境建议使用压缩备份以节省空间
### 恢复注意事项
1. ⚠️ **恢复操作会覆盖现有数据,请谨慎操作!**
2. ⚠️ 恢复前建议先备份当前数据库
3. ⚠️ 使用 `--drop-db` 选项会删除整个数据库,请确认后再操作
4. ⚠️ 恢复大数据库时可能需要较长时间
5. ⚠️ 恢复过程中请勿中断,否则可能导致数据不一致
## 常见问题
### Q1: 提示找不到 mysqldump 命令?
**A:** 确保MySQL客户端已安装并在系统PATH中。如果未安装脚本会自动切换到Python方式备份。
### Q2: 备份文件太大怎么办?
**A:** 使用 `--compress` 选项压缩备份文件通常可以节省50-80%的空间。
### Q3: 恢复时提示表已存在错误?
**A:** 使用 `--drop-db` 选项先删除数据库再恢复,或者手动删除相关表。
### Q4: 如何定时自动备份?
**A:** 可以使用操作系统的定时任务功能如Windows的计划任务、Linux的cron
```bash
# Linux crontab示例每天凌晨2点备份
0 2 * * * cd /path/to/project && python backup_database.py --compress
```
### Q5: 备份文件可以恢复到其他数据库吗?
**A:** 可以,修改环境变量中的 `DB_NAME` 或直接编辑备份文件中的数据库名称。
## 示例场景
### 场景1: 日常备份
```bash
# 每天自动备份并压缩
python backup_database.py --compress
```
### 场景2: 迁移数据库
```bash
# 1. 备份源数据库
python backup_database.py --output migration_backup.sql
# 2. 修改配置指向目标数据库
# 3. 恢复备份到目标数据库
python restore_database.py migration_backup.sql --drop-db
```
### 场景3: 数据恢复
```bash
# 1. 查看可用备份
python backup_database.py --list
# 2. 恢复指定备份
python restore_database.py backups/backup_finyx_20241205_120000.sql
```
## 技术支持
如有问题,请检查:
1. 数据库连接配置是否正确
2. 数据库服务是否正常运行
3. 是否有足够的磁盘空间
4. 是否有数据库操作权限

View File

@ -0,0 +1,180 @@
# 模板树状结构更新说明
## 概述
根据 `template_finish` 目录结构,更新数据库 `f_polic_file_config` 表中的 `parent_id` 字段,建立树状层级结构。
## 目录结构示例
```
template_finish/
└── 2-初核模版/ (一级)
├── 1.初核请示/ (二级)
│ ├── 1.请示报告卡XXX.docx
│ ├── 2.初步核实审批表XXX.docx
│ └── 3.附件初核方案(XXX).docx
├── 2.谈话审批/ (二级)
│ ├── 谈话通知书/ (三级)
│ │ ├── 谈话通知书第一联.docx
│ │ ├── 谈话通知书第二联.docx
│ │ └── 谈话通知书第三联.docx
│ ├── 走读式谈话审批/ (三级)
│ │ ├── 1.请示报告卡(初核谈话).docx
│ │ ├── 2谈话审批表.docx
│ │ └── ...
│ └── 走读式谈话流程/ (三级)
│ ├── 1.谈话笔录.docx
│ └── ...
└── 3.初核结论/ (二级)
├── 8-1请示报告卡初核报告结论 .docx
└── 8.XXX初核情况报告.docx
```
## 脚本说明
### 1. analyze_and_update_template_tree.py
**功能:** 分析目录结构和数据库数据,生成 SQL 更新脚本
**使用方法:**
```bash
python analyze_and_update_template_tree.py
```
**输出:**
- 分析报告(控制台输出)
- `update_template_tree.sql` - SQL 更新脚本
**特点:**
- 只生成 SQL 脚本,不直接修改数据库
- 可以手动检查 SQL 脚本后再执行
### 2. update_template_tree.py
**功能:** 分析并直接更新数据库(带预览和确认)
**使用方法:**
```bash
python update_template_tree.py
```
**特点:**
- 交互式操作,先预览再确认
- 支持模拟模式dry-run
- 自动按层级顺序更新
- 更安全的更新流程
## 更新逻辑
1. **目录节点**:根据目录名称匹配数据库记录,如果不存在则创建
2. **文件节点**:优先通过 `template_code` 匹配,其次通过文件名匹配
3. **层级关系**:按照目录结构的层级关系设置 `parent_id`
- 一级目录:`parent_id = NULL`
- 二级目录:`parent_id = 一级目录的ID`
- 三级目录:`parent_id = 二级目录的ID`
- 文件:`parent_id = 所在目录的ID`
## 执行步骤
### 方法一:使用 SQL 脚本(推荐用于生产环境)
1. 运行分析脚本:
```bash
python analyze_and_update_template_tree.py
```
2. 检查生成的 SQL 脚本:
```bash
# 查看 update_template_tree.sql
```
3. 备份数据库(重要!)
4. 执行 SQL 脚本:
```sql
-- 在 MySQL 客户端中执行
source update_template_tree.sql;
```
### 方法二:使用 Python 脚本(推荐用于测试环境)
1. 运行更新脚本:
```bash
python update_template_tree.py
```
2. 查看预览信息
3. 输入 `yes` 确认执行
4. 再次确认执行实际更新
## 注意事项
1. **备份数据库**:执行更新前务必备份数据库
2. **检查匹配**:确保目录和文件名与数据库中的记录能够正确匹配
3. **层级顺序**:更新会按照层级顺序执行,确保父节点先于子节点创建/更新
4. **重复执行**:脚本支持重复执行,已正确设置 `parent_id` 的记录会被跳过
## 数据库表结构
`f_polic_file_config` 表的关键字段:
- `id`: 主键
- `tenant_id`: 租户ID固定值615873064429507639
- `parent_id`: 父节点IDNULL 表示根节点)
- `name`: 名称
- `template_code`: 模板编码(文件节点使用)
- `input_data`: JSON格式的配置数据
- `file_path`: MinIO文件路径
## 问题排查
### 问题1某些文件无法匹配
**原因:** 文件名或 `template_code` 不匹配
**解决:** 检查 `DOCUMENT_TYPE_MAPPING` 字典,确保文件名映射正确
### 问题2目录节点重复创建
**原因:** 数据库中已存在同名目录节点,但脚本未正确匹配
**解决:** 检查数据库中的记录,确保名称完全一致(包括空格和标点)
### 问题3parent_id 更新失败
**原因:** 父节点ID不存在或层级关系错误
**解决:** 检查生成的 SQL 脚本确认父节点ID是否正确
## 验证更新结果
执行更新后,可以使用以下 SQL 查询验证:
```sql
-- 查看树状结构
SELECT
id,
name,
parent_id,
template_code,
(SELECT name FROM f_polic_file_config p2 WHERE p2.id = p1.parent_id) as parent_name
FROM f_polic_file_config p1
WHERE tenant_id = 615873064429507639
ORDER BY parent_id, name;
-- 查看缺少 parent_id 的记录(应该只有根节点)
SELECT id, name, parent_id
FROM f_polic_file_config
WHERE tenant_id = 615873064429507639
AND parent_id IS NULL
AND name NOT LIKE '%-%'; -- 排除一级目录
```
## 联系信息
如有问题,请检查:
1. 数据库连接配置是否正确
2. 目录结构是否与预期一致
3. 数据库中的记录是否完整