diff --git a/README_初始化模板树状结构.md b/README_初始化模板树状结构.md new file mode 100644 index 0000000..ffded21 --- /dev/null +++ b/README_初始化模板树状结构.md @@ -0,0 +1,221 @@ +# 初始化模板树状结构 - 使用说明 + +## 概述 + +`init_template_tree_from_directory.py` 脚本用于**完全重置** `f_polic_file_config` 表中的模板数据,根据 `template_finish` 目录结构重新创建所有记录,建立正确的树状层级关系。 + +## ⚠️ 重要警告 + +**此操作会删除当前租户的所有模板数据!** + +包括: +- `f_polic_file_config` 表中的所有记录 +- `f_polic_file_field` 表中的相关关联记录 + +然后根据 `template_finish` 目录结构完全重建。 + +**执行前请务必备份数据库!** + +## 功能特点 + +1. **完全重建**: 删除旧数据,根据目录结构重新创建 +2. **树状结构**: 自动建立正确的 parent_id 层级关系 +3. **文件上传**: 可选择是否上传文件到 MinIO +4. **安全确认**: 多重确认机制,防止误操作 +5. **模拟模式**: 先预览再执行,确保安全 + +## 目录结构要求 + +脚本会扫描 `template_finish` 目录,期望的结构如下: + +``` +template_finish/ +└── 2-初核模版/ (一级目录) + ├── 1.初核请示/ (二级目录) + │ ├── 1.请示报告卡(XXX).docx + │ ├── 2.初步核实审批表(XXX).docx + │ └── 3.附件初核方案(XXX).docx + ├── 2.谈话审批/ (二级目录) + │ ├── 谈话通知书/ (三级目录) + │ │ ├── 谈话通知书第一联.docx + │ │ ├── 谈话通知书第二联.docx + │ │ └── 谈话通知书第三联.docx + │ ├── 走读式谈话审批/ (三级目录) + │ │ ├── 1.请示报告卡(初核谈话).docx + │ │ ├── 2谈话审批表.docx + │ │ └── ... + │ └── 走读式谈话流程/ (三级目录) + │ ├── 1.谈话笔录.docx + │ └── ... + └── 3.初核结论/ (二级目录) + ├── 8-1请示报告卡(初核报告结论) .docx + └── 8.XXX初核情况报告.docx +``` + +## 使用方法 + +### 基本使用 + +```bash +python init_template_tree_from_directory.py +``` + +### 执行流程 + +1. **警告提示**: 显示操作警告 +2. **第一次确认**: 输入 `yes` 继续 +3. **扫描目录**: 自动扫描 `template_finish` 目录 +4. **显示预览**: 显示目录结构预览 +5. **选择上传**: 选择是否上传文件到 MinIO +6. **模拟删除**: 显示将删除的数据 +7. **模拟创建**: 显示将创建的节点 +8. **最终确认**: 再次输入 `yes` 执行实际更新 +9. **执行删除**: 删除旧数据 +10. **执行创建**: 创建新数据 + +### 交互式提示 + +``` +确认继续?(yes/no,默认no): yes +是否上传文件到MinIO?(yes/no,默认yes): yes +确认执行实际更新?(yes/no,默认no): yes +``` + +## 处理逻辑 + +### 1. 删除旧数据 + +- 先删除 `f_polic_file_field` 表中的关联记录 +- 再删除 `f_polic_file_config` 表中的配置记录 +- 只删除当前租户(`tenant_id = 615873064429507639`)的数据 + +### 2. 创建新数据 + +按层级顺序创建: + +1. **目录节点**: + - 不包含 `template_code` 字段 + - `input_data` 为 NULL + - `file_path` 为 NULL + +2. **文件节点**: + - 包含 `template_code`(从 `DOCUMENT_TYPE_MAPPING` 获取) + - `input_data` 包含 JSON 格式的配置 + - `file_path` 为 MinIO 路径(如果上传了文件) + +### 3. 树状关系 + +- 一级目录: `parent_id = NULL` +- 二级目录: `parent_id = 一级目录的ID` +- 三级目录: `parent_id = 二级目录的ID` +- 文件: `parent_id = 所在目录的ID` + +## 模板识别 + +脚本通过 `DOCUMENT_TYPE_MAPPING` 字典识别文件类型: + +- 匹配文件名(不含扩展名) +- 提取 `template_code` 和 `business_type` +- 如果无法识别,`template_code` 为空字符串 + +## 文件上传 + +如果选择上传文件到 MinIO: + +- 文件路径格式: `/{tenant_id}/TEMPLATE/{year}/{month}/{filename}` +- 例如: `/615873064429507639/TEMPLATE/2025/12/1.请示报告卡(XXX).docx` +- 上传失败不会中断流程,但 `file_path` 将为 NULL + +## 输出示例 + +``` +================================================================================ +初始化模板树状结构(从目录结构完全重建) +================================================================================ + +⚠️ 警告:此操作将删除当前租户的所有模板数据! + +确认继续?(yes/no,默认no): yes +✓ 数据库连接成功 + +扫描目录结构... + 找到 28 个节点 + 其中目录: 7 个 + 其中文件: 21 个 + +执行模拟删除... + [模拟] 将删除 113 条关联记录 + [模拟] 将删除 34 条配置记录 + +执行模拟创建... + ✓ [模拟]创建目录: 2-初核模版 (ID: ...) + ✓ [模拟]创建文件: 1.请示报告卡(XXX) (ID: ...) [父: ...] [code: REPORT_CARD] + ... + +确认执行实际更新?(yes/no,默认no): yes + +执行实际删除... + ✓ 删除了 113 条关联记录 + ✓ 删除了 34 条配置记录 + +执行实际创建... + ✓ 创建目录: 2-初核模版 (ID: ...) + ✓ 创建文件: 1.请示报告卡(XXX) (ID: ...) [父: ...] [code: REPORT_CARD] + ... + +✓ 创建完成!共创建 28 个节点 +``` + +## 验证结果 + +执行完成后,可以使用验证脚本检查结果: + +```bash +python verify_tree_structure.py +``` + +## 注意事项 + +1. **备份数据库**: 执行前务必备份数据库 +2. **确认目录结构**: 确保 `template_finish` 目录结构正确 +3. **文件存在**: 确保所有 `.docx` 文件都存在 +4. **MinIO 连接**: 如果选择上传文件,确保 MinIO 连接正常 +5. **不可逆操作**: 删除操作不可逆,请谨慎执行 + +## 故障排查 + +### 问题1: template_code 不能为 NULL + +**原因**: 数据库表结构要求 template_code 不能为 NULL + +**解决**: 脚本已处理,目录节点不插入 template_code,文件节点使用空字符串 + +### 问题2: 文件上传失败 + +**原因**: MinIO 连接问题或文件不存在 + +**解决**: +- 检查 MinIO 配置 +- 检查文件是否存在 +- 上传失败不会中断流程,可以后续手动上传 + +### 问题3: 父子关系错误 + +**原因**: 目录结构扫描顺序问题 + +**解决**: 脚本已按层级顺序处理,确保父节点先于子节点创建 + +## 相关脚本 + +- `update_template_tree.py` - 更新现有数据的 parent_id(不删除数据) +- `verify_tree_structure.py` - 验证树状结构 +- `check_existing_data.py` - 检查现有数据 + +## 联系信息 + +如有问题,请检查: +1. 数据库连接配置 +2. 目录结构是否正确 +3. 文件是否都存在 +4. MinIO 配置是否正确 + diff --git a/README_模板树状结构更新.md b/README_模板树状结构更新.md new file mode 100644 index 0000000..b73554e --- /dev/null +++ b/README_模板树状结构更新.md @@ -0,0 +1,293 @@ +# 模板树状结构更新 - 使用说明 + +## 概述 + +本工具用于根据 `template_finish` 目录结构,更新数据库 `f_polic_file_config` 表中的 `parent_id` 字段,建立正确的树状层级结构。 + +## 数据库现状分析 + +根据检查,数据库中现有: +- **总记录数**: 32 条 +- **有 parent_id**: 2 条 +- **无 parent_id**: 30 条 + +需要更新的主要记录包括: +- 初步核实审批表 +- 请示报告卡(各种类型) +- 初核方案 +- 谈话通知书(第一联、第二联、第三联) +- XXX初核情况报告 +- 走读式谈话审批相关文件 +- 走读式谈话流程相关文件 +- 等等... + +## 脚本说明 + +### 1. `check_existing_data.py` - 检查现有数据 + +**功能**: 查看数据库中的现有记录,分析缺少 parent_id 的情况 + +**使用方法**: +```bash +python check_existing_data.py +``` + +**输出**: +- 列出所有无 parent_id 的记录 +- 显示有 parent_id 的记录及其树状关系 + +--- + +### 2. `improved_match_and_update.py` - 改进的匹配分析 + +**功能**: 使用改进的匹配逻辑分析目录结构和数据库,生成匹配报告 + +**特点**: +- **三级匹配策略**: + 1. **template_code 精确匹配**(最高优先级) + 2. **名称精确匹配** + 3. **标准化名称匹配**(去掉编号和括号后的模糊匹配) + +**使用方法**: +```bash +python improved_match_and_update.py +``` + +**输出**: +- 匹配报告(显示哪些记录已匹配,哪些需要创建) +- 可选择性生成 SQL 更新脚本 + +--- + +### 3. `update_template_tree.py` - 交互式更新工具(推荐) + +**功能**: 完整的更新工具,包含预览、确认和执行功能 + +**特点**: +- 使用改进的匹配逻辑 +- 支持预览模式(dry-run) +- 交互式确认 +- 按层级顺序自动更新 +- 安全的事务处理 + +**使用方法**: +```bash +python update_template_tree.py +``` + +**执行流程**: +1. 扫描目录结构 +2. 获取数据库现有数据 +3. 规划树状结构(使用改进的匹配逻辑) +4. 显示更新预览 +5. 询问是否执行(输入 `yes`) +6. 执行模拟更新 +7. 再次确认执行实际更新 + +--- + +### 4. `analyze_and_update_template_tree.py` - 生成 SQL 脚本 + +**功能**: 分析并生成 SQL 更新脚本(不直接修改数据库) + +**使用方法**: +```bash +python analyze_and_update_template_tree.py +``` + +**输出**: +- `update_template_tree.sql` - SQL 更新脚本 + +**适用场景**: +- 生产环境 +- 需要 DBA 审核的场景 +- 需要手动执行的场景 + +--- + +### 5. `verify_tree_structure.py` - 验证更新结果 + +**功能**: 验证更新后的树状结构是否正确 + +**使用方法**: +```bash +python verify_tree_structure.py +``` + +**输出**: +- 树状结构可视化 +- 统计信息 +- 父子关系验证 + +--- + +## 匹配逻辑说明 + +### 三级匹配策略 + +1. **template_code 精确匹配**(最高优先级) + - 通过 `template_code` 字段精确匹配 + - 例如: `REPORT_CARD` 匹配 `REPORT_CARD` + +2. **名称精确匹配** + - 通过 `name` 字段精确匹配 + - 例如: `"1.请示报告卡(XXX)"` 匹配 `"1.请示报告卡(XXX)"` + +3. **标准化名称匹配**(模糊匹配) + - 去掉开头的编号(如 `"1."`、`"2."`、`"8-1"`) + - 去掉括号及其内容(如 `"(XXX)"`、`"(初核谈话)"`) + - 例如: `"1.请示报告卡(XXX)"` → `"请示报告卡"` → 匹配 `"请示报告卡"` + +### 匹配示例 + +| 目录结构中的名称 | 数据库中的名称 | 匹配方式 | +|----------------|--------------|---------| +| `1.请示报告卡(XXX)` | `请示报告卡` | template_code: `REPORT_CARD` | +| `2.初步核实审批表(XXX)` | `初步核实审批表` | template_code: `PRELIMINARY_VERIFICATION_APPROVAL` | +| `谈话通知书第一联` | `谈话通知书第一联` | 名称精确匹配 | +| `走读式谈话审批` | `走读式谈话审批` | 名称精确匹配 | + +## 树状结构规划 + +根据 `template_finish` 目录结构,规划的层级关系如下: + +``` +2-初核模版 (一级目录) +├── 1.初核请示 (二级目录) +│ ├── 1.请示报告卡(XXX).docx +│ ├── 2.初步核实审批表(XXX).docx +│ └── 3.附件初核方案(XXX).docx +├── 2.谈话审批 (二级目录) +│ ├── 谈话通知书 (三级目录) +│ │ ├── 谈话通知书第一联.docx +│ │ ├── 谈话通知书第二联.docx +│ │ └── 谈话通知书第三联.docx +│ ├── 走读式谈话审批 (三级目录) +│ │ ├── 1.请示报告卡(初核谈话).docx +│ │ ├── 2谈话审批表.docx +│ │ ├── 3.谈话前安全风险评估表.docx +│ │ ├── 4.谈话方案.docx +│ │ └── 5.谈话后安全风险评估表.docx +│ └── 走读式谈话流程 (三级目录) +│ ├── 1.谈话笔录.docx +│ ├── 2.谈话询问对象情况摸底调查30问.docx +│ ├── 3.被谈话人权利义务告知书.docx +│ ├── 4.点对点交接单.docx +│ ├── 5.陪送交接单(新).docx +│ ├── 6.1保密承诺书(谈话对象使用-非中共党员用).docx +│ ├── 6.2保密承诺书(谈话对象使用-中共党员用).docx +│ └── 7.办案人员-办案安全保密承诺书.docx +└── 3.初核结论 (二级目录) + ├── 8-1请示报告卡(初核报告结论) .docx + └── 8.XXX初核情况报告.docx +``` + +## 执行步骤 + +### 推荐流程(使用交互式工具) + +1. **检查现有数据** + ```bash + python check_existing_data.py + ``` + +2. **运行更新工具** + ```bash + python update_template_tree.py + ``` + +3. **查看预览信息** + - 检查匹配情况 + - 确认更新计划 + +4. **确认执行** + - 输入 `yes` 确认 + - 再次确认执行实际更新 + +5. **验证结果** + ```bash + python verify_tree_structure.py + ``` + +### 备选流程(使用 SQL 脚本) + +1. **生成 SQL 脚本** + ```bash + python improved_match_and_update.py + # 或 + python analyze_and_update_template_tree.py + ``` + +2. **检查 SQL 脚本** + ```bash + # 查看 update_template_tree.sql + ``` + +3. **备份数据库**(重要!) + +4. **执行 SQL 脚本** + ```sql + -- 在 MySQL 客户端中执行 + source update_template_tree.sql; + ``` + +5. **验证结果** + ```bash + python verify_tree_structure.py + ``` + +## 注意事项 + +1. **备份数据库**: 执行更新前务必备份数据库 +2. **检查匹配**: 确认匹配结果是否正确 +3. **层级顺序**: 更新会按照层级顺序执行,确保父节点先于子节点 +4. **重复执行**: 脚本支持重复执行,已正确设置的记录会被跳过 +5. **目录节点**: 如果目录节点不存在,脚本会自动创建 + +## 匹配结果 + +根据最新分析,匹配情况如下: + +- ✅ **已匹配**: 26 条记录 +- ⚠️ **需创建**: 2 条记录(目录节点) + - `2-初核模版` (一级目录) + - `1.初核请示` (二级目录) + +所有文件记录都已正确匹配到数据库中的现有记录。 + +## 问题排查 + +### 问题1: 某些记录无法匹配 + +**原因**: 名称或 template_code 不匹配 + +**解决**: +- 检查 `DOCUMENT_TYPE_MAPPING` 字典 +- 确认数据库中的 `template_code` 是否正确 +- 使用 `check_existing_data.py` 查看数据库中的实际数据 + +### 问题2: 匹配到错误的记录 + +**原因**: 标准化名称匹配时选择了错误的候选 + +**解决**: +- 检查匹配报告,确认匹配方式 +- 如果 template_code 匹配失败,检查数据库中的 template_code 是否正确 +- 可以手动调整匹配逻辑 + +### 问题3: parent_id 更新失败 + +**原因**: 父节点ID不存在或层级关系错误 + +**解决**: +- 使用 `verify_tree_structure.py` 验证父子关系 +- 检查生成的 SQL 脚本,确认父节点ID是否正确 + +## 联系信息 + +如有问题,请检查: +1. 数据库连接配置是否正确 +2. 目录结构是否与预期一致 +3. 数据库中的记录是否完整 +4. template_code 是否正确设置 + diff --git a/analyze_and_fix_field_code_issues.py b/analyze_and_fix_field_code_issues.py new file mode 100644 index 0000000..70ba85c --- /dev/null +++ b/analyze_and_fix_field_code_issues.py @@ -0,0 +1,582 @@ +""" +分析和修复字段编码问题 +1. 分析f_polic_file_field表中的重复项 +2. 检查f_polic_field表中的中文field_code +3. 根据占位符与字段对照表更新field_code +4. 合并重复项并更新关联表 +""" +import os +import json +import pymysql +import re +from typing import Dict, List, Optional, Tuple +from datetime import datetime +from pathlib import Path + +# 数据库连接配置 +DB_CONFIG = { + 'host': os.getenv('DB_HOST', '152.136.177.240'), + 'port': int(os.getenv('DB_PORT', 5012)), + 'user': os.getenv('DB_USER', 'finyx'), + 'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'), + 'database': os.getenv('DB_NAME', 'finyx'), + 'charset': 'utf8mb4' +} + +TENANT_ID = 615873064429507639 +CREATED_BY = 655162080928945152 +UPDATED_BY = 655162080928945152 +CURRENT_TIME = datetime.now() + +# 从占位符与字段对照表文档中提取的字段映射 +# 格式: {字段名称: field_code} +FIELD_NAME_TO_CODE_MAPPING = { + # 基本信息字段 + '被核查人姓名': 'target_name', + '被核查人员单位及职务': 'target_organization_and_position', + '被核查人员单位': 'target_organization', + '被核查人员职务': 'target_position', + '被核查人员性别': 'target_gender', + '被核查人员出生年月': 'target_date_of_birth', + '被核查人员出生年月日': 'target_date_of_birth_full', + '被核查人员年龄': 'target_age', + '被核查人员文化程度': 'target_education_level', + '被核查人员政治面貌': 'target_political_status', + '被核查人员职级': 'target_professional_rank', + '被核查人员身份证号': 'target_id_number', + '被核查人员身份证件及号码': 'target_id_number', + '被核查人员住址': 'target_address', + '被核查人员户籍住址': 'target_registered_address', + '被核查人员联系方式': 'target_contact', + '被核查人员籍贯': 'target_place_of_origin', + '被核查人员民族': 'target_ethnicity', + + # 问题相关字段 + '线索来源': 'clue_source', + '主要问题线索': 'target_issue_description', + '被核查人问题描述': 'target_problem_description', + + # 审批相关字段 + '初步核实审批表承办部门意见': 'department_opinion', + '初步核实审批表填表人': 'filler_name', + '批准时间': 'approval_time', + + # 核查相关字段 + '核查单位名称': 'investigation_unit_name', + '核查组代号': 'investigation_team_code', + '核查组组长姓名': 'investigation_team_leader_name', + '核查组成员姓名': 'investigation_team_member_names', + '核查地点': 'investigation_location', + + # 风险评估相关字段 + '被核查人员家庭情况': 'target_family_situation', + '被核查人员社会关系': 'target_social_relations', + '被核查人员健康状况': 'target_health_status', + '被核查人员性格特征': 'target_personality', + '被核查人员承受能力': 'target_tolerance', + '被核查人员涉及问题严重程度': 'target_issue_severity', + '被核查人员涉及其他问题的可能性': 'target_other_issues_possibility', + '被核查人员此前被审查情况': 'target_previous_investigation', + '被核查人员社会负面事件': 'target_negative_events', + '被核查人员其他情况': 'target_other_situation', + '风险等级': 'risk_level', + + # 其他字段 + '线索信息': 'clue_info', + '被核查人员工作基本情况线索': 'target_basic_info_clue', + '被核查人员工作基本情况': 'target_work_basic_info', + '请示报告卡请示时间': 'report_card_request_time', + '应到时间': 'appointment_time', + '应到地点': 'appointment_location', + '承办部门': 'handling_department', + '承办人': 'handler_name', + '谈话通知时间': 'notification_time', + '谈话通知地点': 'notification_location', + '被核查人员本人认识和态度': 'target_attitude', + '纪委名称': 'commission_name', +} + + +def is_chinese(text: str) -> bool: + """判断字符串是否包含中文字符""" + if not text: + return False + return bool(re.search(r'[\u4e00-\u9fff]', text)) + + +def analyze_f_polic_field(conn) -> Dict: + """分析f_polic_field表,找出中文field_code和重复项""" + cursor = conn.cursor(pymysql.cursors.DictCursor) + + print("\n" + "="*80) + print("1. 分析 f_polic_field 表") + print("="*80) + + # 查询所有字段 + cursor.execute(""" + SELECT id, name, filed_code, field_type, state + FROM f_polic_field + WHERE tenant_id = %s + ORDER BY name, filed_code + """, (TENANT_ID,)) + + fields = cursor.fetchall() + print(f"\n总共找到 {len(fields)} 个字段记录") + + # 找出中文field_code + chinese_field_codes = [] + for field in fields: + if is_chinese(field['filed_code']): + chinese_field_codes.append(field) + + print(f"\n发现 {len(chinese_field_codes)} 个中文field_code:") + for field in chinese_field_codes: + print(f" - ID: {field['id']}, 名称: {field['name']}, field_code: {field['filed_code']}") + + # 找出重复的字段名称 + name_to_fields = {} + for field in fields: + name = field['name'] + if name not in name_to_fields: + name_to_fields[name] = [] + name_to_fields[name].append(field) + + duplicates = {name: fields_list for name, fields_list in name_to_fields.items() + if len(fields_list) > 1} + + print(f"\n发现 {len(duplicates)} 个重复的字段名称:") + for name, fields_list in duplicates.items(): + print(f"\n 字段名称: {name} (共 {len(fields_list)} 条记录)") + for field in fields_list: + print(f" - ID: {field['id']}, field_code: {field['filed_code']}, " + f"field_type: {field['field_type']}, state: {field['state']}") + + # 找出重复的field_code + code_to_fields = {} + for field in fields: + code = field['filed_code'] + if code not in code_to_fields: + code_to_fields[code] = [] + code_to_fields[code].append(field) + + duplicate_codes = {code: fields_list for code, fields_list in code_to_fields.items() + if len(fields_list) > 1} + + print(f"\n发现 {len(duplicate_codes)} 个重复的field_code:") + for code, fields_list in duplicate_codes.items(): + print(f"\n field_code: {code} (共 {len(fields_list)} 条记录)") + for field in fields_list: + print(f" - ID: {field['id']}, 名称: {field['name']}, " + f"field_type: {field['field_type']}, state: {field['state']}") + + return { + 'all_fields': fields, + 'chinese_field_codes': chinese_field_codes, + 'duplicate_names': duplicates, + 'duplicate_codes': duplicate_codes + } + + +def analyze_f_polic_file_field(conn) -> Dict: + """分析f_polic_file_field表,找出重复项""" + cursor = conn.cursor(pymysql.cursors.DictCursor) + + print("\n" + "="*80) + print("2. 分析 f_polic_file_field 表") + print("="*80) + + # 查询所有关联关系 + cursor.execute(""" + SELECT fff.id, fff.file_id, fff.filed_id, + fc.name as file_name, f.name as field_name, f.filed_code + FROM f_polic_file_field fff + LEFT JOIN f_polic_file_config fc ON fff.file_id = fc.id + LEFT JOIN f_polic_field f ON fff.filed_id = f.id + WHERE fff.tenant_id = %s + ORDER BY fff.file_id, fff.filed_id + """, (TENANT_ID,)) + + relations = cursor.fetchall() + print(f"\n总共找到 {len(relations)} 个关联关系") + + # 找出重复的关联关系(相同的file_id和filed_id) + relation_key_to_records = {} + for rel in relations: + key = (rel['file_id'], rel['filed_id']) + if key not in relation_key_to_records: + relation_key_to_records[key] = [] + relation_key_to_records[key].append(rel) + + duplicates = {key: records for key, records in relation_key_to_records.items() + if len(records) > 1} + + print(f"\n发现 {len(duplicates)} 个重复的关联关系:") + for (file_id, filed_id), records in duplicates.items(): + print(f"\n 文件ID: {file_id}, 字段ID: {filed_id} (共 {len(records)} 条记录)") + for record in records: + print(f" - 关联ID: {record['id']}, 文件: {record['file_name']}, " + f"字段: {record['field_name']} ({record['filed_code']})") + + # 统计使用中文field_code的关联关系 + chinese_relations = [rel for rel in relations if rel['filed_code'] and is_chinese(rel['filed_code'])] + + print(f"\n发现 {len(chinese_relations)} 个使用中文field_code的关联关系:") + for rel in chinese_relations[:10]: # 只显示前10个 + print(f" - 文件: {rel['file_name']}, 字段: {rel['field_name']}, " + f"field_code: {rel['filed_code']}") + if len(chinese_relations) > 10: + print(f" ... 还有 {len(chinese_relations) - 10} 个") + + return { + 'all_relations': relations, + 'duplicate_relations': duplicates, + 'chinese_relations': chinese_relations + } + + +def get_correct_field_code(field_name: str, current_code: str) -> Optional[str]: + """根据字段名称获取正确的field_code""" + # 首先从映射表中查找 + if field_name in FIELD_NAME_TO_CODE_MAPPING: + return FIELD_NAME_TO_CODE_MAPPING[field_name] + + # 如果当前code已经是英文且符合规范,保留 + if current_code and not is_chinese(current_code) and re.match(r'^[a-z_]+$', current_code): + return current_code + + return None + + +def fix_f_polic_field(conn, dry_run: bool = True) -> Dict: + """修复f_polic_field表中的问题""" + cursor = conn.cursor(pymysql.cursors.DictCursor) + + print("\n" + "="*80) + print("3. 修复 f_polic_field 表") + print("="*80) + + if dry_run: + print("\n[DRY RUN模式 - 不会实际修改数据库]") + + # 获取所有字段 + cursor.execute(""" + SELECT id, name, filed_code, field_type, state + FROM f_polic_field + WHERE tenant_id = %s + """, (TENANT_ID,)) + + fields = cursor.fetchall() + + updates = [] + merges = [] + + # 按字段名称分组,找出需要合并的重复项 + name_to_fields = {} + for field in fields: + name = field['name'] + if name not in name_to_fields: + name_to_fields[name] = [] + name_to_fields[name].append(field) + + # 处理每个字段名称 + for field_name, field_list in name_to_fields.items(): + if len(field_list) == 1: + # 单个字段,检查是否需要更新field_code + field = field_list[0] + correct_code = get_correct_field_code(field['name'], field['filed_code']) + + if correct_code and correct_code != field['filed_code']: + updates.append({ + 'id': field['id'], + 'name': field['name'], + 'old_code': field['filed_code'], + 'new_code': correct_code, + 'field_type': field['field_type'] + }) + else: + # 多个字段,需要合并 + # 找出最佳的field_code + best_field = None + best_code = None + + for field in field_list: + correct_code = get_correct_field_code(field['name'], field['filed_code']) + if correct_code: + if not best_field or (field['state'] == 1 and best_field['state'] == 0): + best_field = field + best_code = correct_code + + # 如果没找到最佳字段,选择第一个启用的,或者第一个 + if not best_field: + enabled_fields = [f for f in field_list if f['state'] == 1] + best_field = enabled_fields[0] if enabled_fields else field_list[0] + best_code = get_correct_field_code(best_field['name'], best_field['filed_code']) + if not best_code: + # 生成一个基于名称的code + best_code = field_name.lower().replace('被核查人员', 'target_').replace('被核查人', 'target_') + best_code = re.sub(r'[^\w]', '_', best_code) + best_code = re.sub(r'_+', '_', best_code).strip('_') + + # 确定要保留的字段和要删除的字段 + keep_field = best_field + remove_fields = [f for f in field_list if f['id'] != keep_field['id']] + + # 更新保留字段的field_code + if best_code and best_code != keep_field['filed_code']: + updates.append({ + 'id': keep_field['id'], + 'name': keep_field['name'], + 'old_code': keep_field['filed_code'], + 'new_code': best_code, + 'field_type': keep_field['field_type'] + }) + + merges.append({ + 'keep_field_id': keep_field['id'], + 'keep_field_name': keep_field['name'], + 'keep_field_code': best_code or keep_field['filed_code'], + 'remove_field_ids': [f['id'] for f in remove_fields], + 'remove_fields': remove_fields + }) + + # 显示更新计划 + print(f"\n需要更新 {len(updates)} 个字段的field_code:") + for update in updates: + print(f" - ID: {update['id']}, 名称: {update['name']}, " + f"{update['old_code']} -> {update['new_code']}") + + print(f"\n需要合并 {len(merges)} 组重复字段:") + for merge in merges: + print(f"\n 保留字段: ID={merge['keep_field_id']}, 名称={merge['keep_field_name']}, " + f"field_code={merge['keep_field_code']}") + print(f" 删除字段: {len(merge['remove_field_ids'])} 个") + for remove_field in merge['remove_fields']: + print(f" - ID: {remove_field['id']}, field_code: {remove_field['filed_code']}, " + f"field_type: {remove_field['field_type']}, state: {remove_field['state']}") + + # 执行更新 + if not dry_run: + print("\n开始执行更新...") + + # 1. 先更新field_code + for update in updates: + cursor.execute(""" + UPDATE f_polic_field + SET filed_code = %s, updated_time = %s, updated_by = %s + WHERE id = %s + """, (update['new_code'], CURRENT_TIME, UPDATED_BY, update['id'])) + print(f" ✓ 更新字段 ID {update['id']}: {update['old_code']} -> {update['new_code']}") + + # 2. 合并重复字段:先更新关联表,再删除重复字段 + for merge in merges: + keep_id = merge['keep_field_id'] + for remove_id in merge['remove_field_ids']: + # 更新f_polic_file_field表中的关联 + cursor.execute(""" + UPDATE f_polic_file_field + SET filed_id = %s, updated_time = %s, updated_by = %s + WHERE filed_id = %s AND tenant_id = %s + """, (keep_id, CURRENT_TIME, UPDATED_BY, remove_id, TENANT_ID)) + + # 删除重复的字段记录 + cursor.execute(""" + DELETE FROM f_polic_field + WHERE id = %s AND tenant_id = %s + """, (remove_id, TENANT_ID)) + + print(f" ✓ 合并字段: 保留 ID {keep_id}, 删除 {len(merge['remove_field_ids'])} 个重复字段") + + conn.commit() + print("\n✓ 更新完成") + else: + print("\n[DRY RUN] 以上操作不会实际执行") + + return { + 'updates': updates, + 'merges': merges + } + + +def fix_f_polic_file_field(conn, dry_run: bool = True) -> Dict: + """修复f_polic_file_field表中的重复项""" + cursor = conn.cursor(pymysql.cursors.DictCursor) + + print("\n" + "="*80) + print("4. 修复 f_polic_file_field 表") + print("="*80) + + if dry_run: + print("\n[DRY RUN模式 - 不会实际修改数据库]") + + # 找出重复的关联关系 + cursor.execute(""" + SELECT file_id, filed_id, COUNT(*) as count, GROUP_CONCAT(id) as ids + FROM f_polic_file_field + WHERE tenant_id = %s + GROUP BY file_id, filed_id + HAVING count > 1 + """, (TENANT_ID,)) + + duplicates = cursor.fetchall() + + print(f"\n发现 {len(duplicates)} 组重复的关联关系") + + deletes = [] + + for dup in duplicates: + file_id = dup['file_id'] + filed_id = dup['filed_id'] + ids = [int(id_str) for id_str in dup['ids'].split(',')] + + # 保留第一个,删除其他的 + keep_id = ids[0] + remove_ids = ids[1:] + + deletes.append({ + 'file_id': file_id, + 'filed_id': filed_id, + 'keep_id': keep_id, + 'remove_ids': remove_ids + }) + + print(f"\n 文件ID: {file_id}, 字段ID: {filed_id}") + print(f" 保留关联ID: {keep_id}") + print(f" 删除关联ID: {', '.join(map(str, remove_ids))}") + + # 执行删除 + if not dry_run: + print("\n开始删除重复的关联关系...") + for delete in deletes: + for remove_id in delete['remove_ids']: + cursor.execute(""" + DELETE FROM f_polic_file_field + WHERE id = %s AND tenant_id = %s + """, (remove_id, TENANT_ID)) + print(f" ✓ 删除文件ID {delete['file_id']} 和字段ID {delete['filed_id']} 的重复关联") + + conn.commit() + print("\n✓ 删除完成") + else: + print("\n[DRY RUN] 以上操作不会实际执行") + + return { + 'deletes': deletes + } + + +def check_other_tables(conn): + """检查其他可能受影响的表""" + cursor = conn.cursor(pymysql.cursors.DictCursor) + + print("\n" + "="*80) + print("5. 检查其他关联表") + print("="*80) + + # 检查f_polic_task表 + print("\n检查 f_polic_task 表...") + try: + cursor.execute(""" + SELECT COUNT(*) as count + FROM f_polic_task + WHERE tenant_id = %s + """, (TENANT_ID,)) + task_count = cursor.fetchone()['count'] + print(f" 找到 {task_count} 个任务记录") + + # 检查是否有引用字段ID的列 + cursor.execute("DESCRIBE f_polic_task") + columns = [col['Field'] for col in cursor.fetchall()] + print(f" 表字段: {', '.join(columns)}") + + # 检查是否有引用f_polic_field的字段 + field_refs = [col for col in columns if 'field' in col.lower() or 'filed' in col.lower()] + if field_refs: + print(f" 可能引用字段的列: {', '.join(field_refs)}") + except Exception as e: + print(f" 检查f_polic_task表时出错: {e}") + + # 检查f_polic_file表 + print("\n检查 f_polic_file 表...") + try: + cursor.execute(""" + SELECT COUNT(*) as count + FROM f_polic_file + WHERE tenant_id = %s + """, (TENANT_ID,)) + file_count = cursor.fetchone()['count'] + print(f" 找到 {file_count} 个文件记录") + + cursor.execute("DESCRIBE f_polic_file") + columns = [col['Field'] for col in cursor.fetchall()] + print(f" 表字段: {', '.join(columns)}") + except Exception as e: + print(f" 检查f_polic_file表时出错: {e}") + + +def main(): + """主函数""" + print("="*80) + print("字段编码问题分析和修复工具") + print("="*80) + + try: + conn = pymysql.connect(**DB_CONFIG) + + # 1. 分析f_polic_field表 + field_analysis = analyze_f_polic_field(conn) + + # 2. 分析f_polic_file_field表 + relation_analysis = analyze_f_polic_file_field(conn) + + # 3. 检查其他表 + check_other_tables(conn) + + # 4. 询问是否执行修复 + print("\n" + "="*80) + print("分析完成") + print("="*80) + + print("\n是否执行修复?") + print("1. 先执行DRY RUN(不实际修改数据库)") + print("2. 直接执行修复(会修改数据库)") + print("3. 仅查看分析结果,不执行修复") + + choice = input("\n请选择 (1/2/3,默认1): ").strip() or "1" + + if choice == "1": + # DRY RUN + print("\n" + "="*80) + print("执行DRY RUN...") + print("="*80) + fix_f_polic_field(conn, dry_run=True) + fix_f_polic_file_field(conn, dry_run=True) + + print("\n" + "="*80) + confirm = input("DRY RUN完成。是否执行实际修复?(y/n,默认n): ").strip().lower() + if confirm == 'y': + print("\n执行实际修复...") + fix_f_polic_field(conn, dry_run=False) + fix_f_polic_file_field(conn, dry_run=False) + print("\n✓ 修复完成!") + elif choice == "2": + # 直接执行 + print("\n" + "="*80) + print("执行修复...") + print("="*80) + fix_f_polic_field(conn, dry_run=False) + fix_f_polic_file_field(conn, dry_run=False) + print("\n✓ 修复完成!") + else: + print("\n仅查看分析结果,未执行修复") + + conn.close() + + except Exception as e: + print(f"\n✗ 执行失败: {e}") + import traceback + traceback.print_exc() + + +if __name__ == '__main__': + main() + diff --git a/analyze_and_update_template_tree.py b/analyze_and_update_template_tree.py new file mode 100644 index 0000000..7f093d8 --- /dev/null +++ b/analyze_and_update_template_tree.py @@ -0,0 +1,555 @@ +""" +分析和更新模板树状结构 +根据 template_finish 目录结构规划树状层级,并更新数据库中的 parent_id 字段 +""" +import os +import json +import pymysql +from pathlib import Path +from typing import Dict, List, Optional, Tuple +from datetime import datetime + +# 数据库连接配置 +DB_CONFIG = { + 'host': os.getenv('DB_HOST', '152.136.177.240'), + 'port': int(os.getenv('DB_PORT', 5012)), + 'user': os.getenv('DB_USER', 'finyx'), + 'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'), + 'database': os.getenv('DB_NAME', 'finyx'), + 'charset': 'utf8mb4' +} + +TENANT_ID = 615873064429507639 +CREATED_BY = 655162080928945152 +UPDATED_BY = 655162080928945152 +CURRENT_TIME = datetime.now() + +# 项目根目录 +PROJECT_ROOT = Path(__file__).parent +TEMPLATES_DIR = PROJECT_ROOT / "template_finish" + +# 从 init_all_templates.py 复制的文档类型映射 +DOCUMENT_TYPE_MAPPING = { + "1.请示报告卡(XXX)": { + "template_code": "REPORT_CARD", + "name": "1.请示报告卡(XXX)", + "business_type": "INVESTIGATION" + }, + "2.初步核实审批表(XXX)": { + "template_code": "PRELIMINARY_VERIFICATION_APPROVAL", + "name": "2.初步核实审批表(XXX)", + "business_type": "INVESTIGATION" + }, + "3.附件初核方案(XXX)": { + "template_code": "INVESTIGATION_PLAN", + "name": "3.附件初核方案(XXX)", + "business_type": "INVESTIGATION" + }, + "谈话通知书第一联": { + "template_code": "NOTIFICATION_LETTER_1", + "name": "谈话通知书第一联", + "business_type": "INVESTIGATION" + }, + "谈话通知书第二联": { + "template_code": "NOTIFICATION_LETTER_2", + "name": "谈话通知书第二联", + "business_type": "INVESTIGATION" + }, + "谈话通知书第三联": { + "template_code": "NOTIFICATION_LETTER_3", + "name": "谈话通知书第三联", + "business_type": "INVESTIGATION" + }, + "1.请示报告卡(初核谈话)": { + "template_code": "REPORT_CARD_INTERVIEW", + "name": "1.请示报告卡(初核谈话)", + "business_type": "INVESTIGATION" + }, + "2谈话审批表": { + "template_code": "INTERVIEW_APPROVAL_FORM", + "name": "2谈话审批表", + "business_type": "INVESTIGATION" + }, + "3.谈话前安全风险评估表": { + "template_code": "PRE_INTERVIEW_RISK_ASSESSMENT", + "name": "3.谈话前安全风险评估表", + "business_type": "INVESTIGATION" + }, + "4.谈话方案": { + "template_code": "INTERVIEW_PLAN", + "name": "4.谈话方案", + "business_type": "INVESTIGATION" + }, + "5.谈话后安全风险评估表": { + "template_code": "POST_INTERVIEW_RISK_ASSESSMENT", + "name": "5.谈话后安全风险评估表", + "business_type": "INVESTIGATION" + }, + "1.谈话笔录": { + "template_code": "INTERVIEW_RECORD", + "name": "1.谈话笔录", + "business_type": "INVESTIGATION" + }, + "2.谈话询问对象情况摸底调查30问": { + "template_code": "INVESTIGATION_30_QUESTIONS", + "name": "2.谈话询问对象情况摸底调查30问", + "business_type": "INVESTIGATION" + }, + "3.被谈话人权利义务告知书": { + "template_code": "RIGHTS_OBLIGATIONS_NOTICE", + "name": "3.被谈话人权利义务告知书", + "business_type": "INVESTIGATION" + }, + "4.点对点交接单": { + "template_code": "HANDOVER_FORM", + "name": "4.点对点交接单", + "business_type": "INVESTIGATION" + }, + "4.点对点交接单2": { + "template_code": "HANDOVER_FORM_2", + "name": "4.点对点交接单2", + "business_type": "INVESTIGATION" + }, + "5.陪送交接单(新)": { + "template_code": "ESCORT_HANDOVER_FORM", + "name": "5.陪送交接单(新)", + "business_type": "INVESTIGATION" + }, + "6.1保密承诺书(谈话对象使用-非中共党员用)": { + "template_code": "CONFIDENTIALITY_COMMITMENT_NON_PARTY", + "name": "6.1保密承诺书(谈话对象使用-非中共党员用)", + "business_type": "INVESTIGATION" + }, + "6.2保密承诺书(谈话对象使用-中共党员用)": { + "template_code": "CONFIDENTIALITY_COMMITMENT_PARTY", + "name": "6.2保密承诺书(谈话对象使用-中共党员用)", + "business_type": "INVESTIGATION" + }, + "7.办案人员-办案安全保密承诺书": { + "template_code": "INVESTIGATOR_CONFIDENTIALITY_COMMITMENT", + "name": "7.办案人员-办案安全保密承诺书", + "business_type": "INVESTIGATION" + }, + "8-1请示报告卡(初核报告结论) ": { + "template_code": "REPORT_CARD_CONCLUSION", + "name": "8-1请示报告卡(初核报告结论) ", + "business_type": "INVESTIGATION" + }, + "8.XXX初核情况报告": { + "template_code": "INVESTIGATION_REPORT", + "name": "8.XXX初核情况报告", + "business_type": "INVESTIGATION" + } +} + + +def generate_id(): + """生成ID(使用时间戳+随机数的方式,模拟雪花算法)""" + import time + import random + timestamp = int(time.time() * 1000) + random_part = random.randint(100000, 999999) + return timestamp * 1000 + random_part + + +def identify_document_type(file_name: str) -> Optional[Dict]: + """根据完整文件名识别文档类型""" + base_name = Path(file_name).stem + if base_name in DOCUMENT_TYPE_MAPPING: + return DOCUMENT_TYPE_MAPPING[base_name] + return None + + +def scan_directory_structure(base_dir: Path) -> Dict: + """ + 扫描目录结构,构建树状层级 + + Returns: + 包含目录和文件层级结构的字典 + """ + structure = { + 'directories': {}, # {path: {'name': ..., 'parent': ..., 'level': ...}} + 'files': {} # {file_path: {'name': ..., 'parent': ..., 'template_code': ...}} + } + + def process_path(path: Path, parent_path: Optional[str] = None, level: int = 0): + """递归处理路径""" + if path.is_file() and path.suffix == '.docx': + # 处理文件 + file_name = path.stem + doc_config = identify_document_type(file_name) + + structure['files'][str(path)] = { + 'name': file_name, + 'parent': parent_path, + 'level': level, + 'template_code': doc_config['template_code'] if doc_config else None, + 'full_path': str(path) + } + elif path.is_dir(): + # 处理目录 + dir_name = path.name + structure['directories'][str(path)] = { + 'name': dir_name, + 'parent': parent_path, + 'level': level + } + + # 递归处理子目录和文件 + for child in sorted(path.iterdir()): + if child.name != '__pycache__': + process_path(child, str(path), level + 1) + + # 从根目录开始扫描 + if TEMPLATES_DIR.exists(): + for item in sorted(TEMPLATES_DIR.iterdir()): + if item.name != '__pycache__': + process_path(item, None, 0) + + return structure + + +def get_existing_data(conn) -> Dict: + """ + 获取数据库中的现有数据 + + Returns: + { + 'by_id': {id: {...}}, + 'by_name': {name: {...}}, + 'by_template_code': {template_code: {...}} + } + """ + cursor = conn.cursor(pymysql.cursors.DictCursor) + + sql = """ + SELECT id, name, parent_id, template_code, input_data, file_path, state + FROM f_polic_file_config + WHERE tenant_id = %s + """ + cursor.execute(sql, (TENANT_ID,)) + configs = cursor.fetchall() + + result = { + 'by_id': {}, + 'by_name': {}, + 'by_template_code': {} + } + + for config in configs: + config_id = config['id'] + config_name = config['name'] + + # 尝试从 input_data 中提取 template_code + template_code = config.get('template_code') + if not template_code and config.get('input_data'): + try: + input_data = json.loads(config['input_data']) if isinstance(config['input_data'], str) else config['input_data'] + if isinstance(input_data, dict): + template_code = input_data.get('template_code') + except: + pass + + result['by_id'][config_id] = config + result['by_name'][config_name] = config + + if template_code: + # 如果已存在相同 template_code,保留第一个 + if template_code not in result['by_template_code']: + result['by_template_code'][template_code] = config + + cursor.close() + return result + + +def analyze_structure(): + """分析目录结构和数据库数据""" + print("="*80) + print("分析模板目录结构和数据库数据") + print("="*80) + + # 连接数据库 + try: + conn = pymysql.connect(**DB_CONFIG) + print("✓ 数据库连接成功\n") + except Exception as e: + print(f"✗ 数据库连接失败: {e}") + return None, None + + # 扫描目录结构 + print("扫描目录结构...") + dir_structure = scan_directory_structure(TEMPLATES_DIR) + print(f" 找到 {len(dir_structure['directories'])} 个目录") + print(f" 找到 {len(dir_structure['files'])} 个文件\n") + + # 获取数据库现有数据 + print("获取数据库现有数据...") + existing_data = get_existing_data(conn) + print(f" 数据库中有 {len(existing_data['by_id'])} 条记录\n") + + # 分析缺少 parent_id 的记录 + print("分析缺少 parent_id 的记录...") + missing_parent = [] + for config in existing_data['by_id'].values(): + if config.get('parent_id') is None: + missing_parent.append(config) + print(f" 有 {len(missing_parent)} 条记录缺少 parent_id\n") + + conn.close() + return dir_structure, existing_data + + +def plan_tree_structure(dir_structure: Dict, existing_data: Dict) -> List[Dict]: + """ + 规划树状结构 + + Returns: + 更新计划列表,每个元素包含: + { + 'type': 'directory' | 'file', + 'name': ..., + 'parent_name': ..., + 'level': ..., + 'action': 'create' | 'update', + 'config_id': ... (如果是更新), + 'template_code': ... (如果是文件) + } + """ + plan = [] + + # 按层级排序目录 + directories = sorted(dir_structure['directories'].items(), + key=lambda x: (x[1]['level'], x[0])) + + # 按层级排序文件 + files = sorted(dir_structure['files'].items(), + key=lambda x: (x[1]['level'], x[0])) + + # 创建目录映射(用于查找父目录ID) + dir_id_map = {} # {dir_path: config_id} + + # 处理目录(按层级顺序) + for dir_path, dir_info in directories: + dir_name = dir_info['name'] + parent_path = dir_info['parent'] + level = dir_info['level'] + + # 查找父目录ID + parent_id = None + if parent_path: + parent_id = dir_id_map.get(parent_path) + + # 检查数据库中是否已存在 + existing = existing_data['by_name'].get(dir_name) + + if existing: + # 更新现有记录 + plan.append({ + 'type': 'directory', + 'name': dir_name, + 'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None, + 'parent_id': parent_id, + 'level': level, + 'action': 'update', + 'config_id': existing['id'], + 'current_parent_id': existing.get('parent_id') + }) + dir_id_map[dir_path] = existing['id'] + else: + # 创建新记录(目录节点) + new_id = generate_id() + plan.append({ + 'type': 'directory', + 'name': dir_name, + 'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None, + 'parent_id': parent_id, + 'level': level, + 'action': 'create', + 'config_id': new_id, + 'current_parent_id': None + }) + dir_id_map[dir_path] = new_id + + # 处理文件 + for file_path, file_info in files: + file_name = file_info['name'] + parent_path = file_info['parent'] + level = file_info['level'] + template_code = file_info['template_code'] + + # 查找父目录ID + parent_id = dir_id_map.get(parent_path) if parent_path else None + + # 查找数据库中的记录(通过 template_code 或 name) + existing = None + if template_code: + existing = existing_data['by_template_code'].get(template_code) + if not existing: + existing = existing_data['by_name'].get(file_name) + + if existing: + # 更新现有记录 + plan.append({ + 'type': 'file', + 'name': file_name, + 'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None, + 'parent_id': parent_id, + 'level': level, + 'action': 'update', + 'config_id': existing['id'], + 'template_code': template_code, + 'current_parent_id': existing.get('parent_id') + }) + else: + # 创建新记录(文件节点) + new_id = generate_id() + plan.append({ + 'type': 'file', + 'name': file_name, + 'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None, + 'parent_id': parent_id, + 'level': level, + 'action': 'create', + 'config_id': new_id, + 'template_code': template_code, + 'current_parent_id': None + }) + + return plan + + +def generate_update_sql(plan: List[Dict], output_file: str = 'update_template_tree.sql'): + """生成更新SQL脚本""" + sql_lines = [ + "-- 模板树状结构更新脚本", + f"-- 生成时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}", + "-- 注意:执行前请备份数据库!", + "", + "USE finyx;", + "", + "START TRANSACTION;", + "" + ] + + # 按层级分组 + by_level = {} + for item in plan: + level = item['level'] + if level not in by_level: + by_level[level] = [] + by_level[level].append(item) + + # 按层级顺序处理(从顶层到底层) + for level in sorted(by_level.keys()): + sql_lines.append(f"-- ===== 层级 {level} =====") + sql_lines.append("") + + for item in by_level[level]: + if item['action'] == 'create': + # 创建新记录 + if item['type'] == 'directory': + sql_lines.append(f"-- 创建目录节点: {item['name']}") + sql_lines.append(f"INSERT INTO f_polic_file_config") + sql_lines.append(f" (id, tenant_id, parent_id, name, input_data, file_path, created_time, created_by, updated_time, updated_by, state)") + parent_id_sql = f"{item['parent_id']}" if item['parent_id'] else "NULL" + sql_lines.append(f"VALUES ({item['config_id']}, {TENANT_ID}, {parent_id_sql}, '{item['name']}', NULL, NULL, NOW(), {CREATED_BY}, NOW(), {UPDATED_BY}, 1);") + else: + # 文件节点(需要 template_code) + sql_lines.append(f"-- 创建文件节点: {item['name']}") + input_data = json.dumps({ + 'template_code': item.get('template_code', ''), + 'business_type': 'INVESTIGATION' + }, ensure_ascii=False).replace("'", "''") + sql_lines.append(f"INSERT INTO f_polic_file_config") + sql_lines.append(f" (id, tenant_id, parent_id, name, input_data, file_path, template_code, created_time, created_by, updated_time, updated_by, state)") + parent_id_sql = f"{item['parent_id']}" if item['parent_id'] else "NULL" + template_code_sql = f"'{item.get('template_code', '')}'" if item.get('template_code') else "NULL" + sql_lines.append(f"VALUES ({item['config_id']}, {TENANT_ID}, {parent_id_sql}, '{item['name']}', '{input_data}', NULL, {template_code_sql}, NOW(), {CREATED_BY}, NOW(), {UPDATED_BY}, 1);") + sql_lines.append("") + else: + # 更新现有记录 + current_parent = item.get('current_parent_id') + new_parent = item.get('parent_id') + + if current_parent != new_parent: + sql_lines.append(f"-- 更新: {item['name']} (parent_id: {current_parent} -> {new_parent})") + parent_id_sql = f"{new_parent}" if new_parent else "NULL" + sql_lines.append(f"UPDATE f_polic_file_config") + sql_lines.append(f"SET parent_id = {parent_id_sql}, updated_time = NOW(), updated_by = {UPDATED_BY}") + sql_lines.append(f"WHERE id = {item['config_id']} AND tenant_id = {TENANT_ID};") + sql_lines.append("") + + sql_lines.append("COMMIT;") + sql_lines.append("") + sql_lines.append("-- 更新完成") + + # 写入文件 + with open(output_file, 'w', encoding='utf-8') as f: + f.write('\n'.join(sql_lines)) + + print(f"✓ SQL脚本已生成: {output_file}") + return output_file + + +def print_analysis_report(dir_structure: Dict, existing_data: Dict, plan: List[Dict]): + """打印分析报告""" + print("\n" + "="*80) + print("分析报告") + print("="*80) + + print(f"\n目录结构:") + print(f" - 目录数量: {len(dir_structure['directories'])}") + print(f" - 文件数量: {len(dir_structure['files'])}") + + print(f"\n数据库现状:") + print(f" - 总记录数: {len(existing_data['by_id'])}") + missing_parent = sum(1 for c in existing_data['by_id'].values() if c.get('parent_id') is None) + print(f" - 缺少 parent_id 的记录: {missing_parent}") + + print(f"\n更新计划:") + create_count = sum(1 for p in plan if p['action'] == 'create') + update_count = sum(1 for p in plan if p['action'] == 'update') + print(f" - 需要创建: {create_count} 条") + print(f" - 需要更新: {update_count} 条") + + print(f"\n层级分布:") + by_level = {} + for item in plan: + level = item['level'] + by_level[level] = by_level.get(level, 0) + 1 + for level in sorted(by_level.keys()): + print(f" - 层级 {level}: {by_level[level]} 个节点") + + print("\n" + "="*80) + + +def main(): + """主函数""" + # 分析 + dir_structure, existing_data = analyze_structure() + if not dir_structure or not existing_data: + return + + # 规划树状结构 + print("规划树状结构...") + plan = plan_tree_structure(dir_structure, existing_data) + print(f" 生成 {len(plan)} 个更新计划\n") + + # 打印报告 + print_analysis_report(dir_structure, existing_data, plan) + + # 生成SQL脚本 + print("\n生成SQL更新脚本...") + sql_file = generate_update_sql(plan) + + print("\n" + "="*80) + print("分析完成!") + print("="*80) + print(f"\n请检查生成的SQL脚本: {sql_file}") + print("确认无误后,可以执行该脚本更新数据库。") + print("\n注意:执行前请备份数据库!") + + +if __name__ == '__main__': + main() + diff --git a/backup_database.py b/backup_database.py new file mode 100644 index 0000000..eeab3d5 --- /dev/null +++ b/backup_database.py @@ -0,0 +1,314 @@ +""" +数据库备份脚本 +支持使用mysqldump命令或Python直接导出SQL文件 +""" +import os +import sys +import subprocess +import pymysql +from datetime import datetime +from pathlib import Path +from dotenv import load_dotenv + +# 加载环境变量 +load_dotenv() + + +class DatabaseBackup: + """数据库备份类""" + + def __init__(self): + """初始化数据库配置""" + self.db_config = { + 'host': os.getenv('DB_HOST', '152.136.177.240'), + 'port': int(os.getenv('DB_PORT', 5012)), + 'user': os.getenv('DB_USER', 'finyx'), + 'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'), + 'database': os.getenv('DB_NAME', 'finyx'), + 'charset': 'utf8mb4' + } + + # 备份文件存储目录 + self.backup_dir = Path('backups') + self.backup_dir.mkdir(exist_ok=True) + + def backup_with_mysqldump(self, output_file=None, compress=False): + """ + 使用mysqldump命令备份数据库(推荐方式) + + Args: + output_file: 输出文件路径,如果为None则自动生成 + compress: 是否压缩备份文件 + + Returns: + 备份文件路径 + """ + # 生成备份文件名 + if output_file is None: + timestamp = datetime.now().strftime('%Y%m%d_%H%M%S') + output_file = self.backup_dir / f"backup_{self.db_config['database']}_{timestamp}.sql" + + output_file = Path(output_file) + + # 构建mysqldump命令 + cmd = [ + 'mysqldump', + f"--host={self.db_config['host']}", + f"--port={self.db_config['port']}", + f"--user={self.db_config['user']}", + f"--password={self.db_config['password']}", + '--single-transaction', # 保证数据一致性 + '--routines', # 包含存储过程和函数 + '--triggers', # 包含触发器 + '--events', # 包含事件 + '--add-drop-table', # 添加DROP TABLE语句 + '--default-character-set=utf8mb4', # 设置字符集 + self.db_config['database'] + ] + + try: + print(f"开始备份数据库 {self.db_config['database']}...") + print(f"备份文件: {output_file}") + + # 执行备份命令 + with open(output_file, 'w', encoding='utf-8') as f: + result = subprocess.run( + cmd, + stdout=f, + stderr=subprocess.PIPE, + text=True + ) + + if result.returncode != 0: + error_msg = result.stderr.decode('utf-8') if result.stderr else '未知错误' + raise Exception(f"mysqldump执行失败: {error_msg}") + + # 检查文件大小 + file_size = output_file.stat().st_size + print(f"备份完成!文件大小: {file_size / 1024 / 1024:.2f} MB") + + # 如果需要压缩 + if compress: + compressed_file = self._compress_file(output_file) + print(f"压缩完成: {compressed_file}") + return str(compressed_file) + + return str(output_file) + + except FileNotFoundError: + print("错误: 未找到mysqldump命令,请确保MySQL客户端已安装并在PATH中") + print("尝试使用Python方式备份...") + return self.backup_with_python(output_file) + except Exception as e: + print(f"备份失败: {str(e)}") + raise + + def backup_with_python(self, output_file=None): + """ + 使用Python直接连接数据库备份(备用方式) + + Args: + output_file: 输出文件路径,如果为None则自动生成 + + Returns: + 备份文件路径 + """ + if output_file is None: + timestamp = datetime.now().strftime('%Y%m%d_%H%M%S') + output_file = self.backup_dir / f"backup_{self.db_config['database']}_{timestamp}.sql" + + output_file = Path(output_file) + + try: + print(f"开始使用Python方式备份数据库 {self.db_config['database']}...") + print(f"备份文件: {output_file}") + + # 连接数据库 + connection = pymysql.connect(**self.db_config) + cursor = connection.cursor() + + with open(output_file, 'w', encoding='utf-8') as f: + # 写入文件头 + f.write(f"-- MySQL数据库备份\n") + f.write(f"-- 数据库: {self.db_config['database']}\n") + f.write(f"-- 备份时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n") + f.write(f"-- 主机: {self.db_config['host']}:{self.db_config['port']}\n") + f.write("--\n\n") + f.write(f"SET NAMES utf8mb4;\n") + f.write(f"SET FOREIGN_KEY_CHECKS=0;\n\n") + + # 获取所有表 + cursor.execute("SHOW TABLES") + tables = [table[0] for table in cursor.fetchall()] + + print(f"找到 {len(tables)} 个表") + + # 备份每个表 + for table in tables: + print(f"备份表: {table}") + + # 获取表结构 + cursor.execute(f"SHOW CREATE TABLE `{table}`") + create_table_sql = cursor.fetchone()[1] + + f.write(f"-- ----------------------------\n") + f.write(f"-- 表结构: {table}\n") + f.write(f"-- ----------------------------\n") + f.write(f"DROP TABLE IF EXISTS `{table}`;\n") + f.write(f"{create_table_sql};\n\n") + + # 获取表数据 + cursor.execute(f"SELECT * FROM `{table}`") + rows = cursor.fetchall() + + if rows: + # 获取列名 + cursor.execute(f"DESCRIBE `{table}`") + columns = [col[0] for col in cursor.fetchall()] + + f.write(f"-- ----------------------------\n") + f.write(f"-- 表数据: {table}\n") + f.write(f"-- ----------------------------\n") + + # 分批写入数据 + batch_size = 1000 + for i in range(0, len(rows), batch_size): + batch = rows[i:i+batch_size] + values_list = [] + + for row in batch: + values = [] + for value in row: + if value is None: + values.append('NULL') + elif isinstance(value, (int, float)): + values.append(str(value)) + else: + # 转义特殊字符 + escaped_value = str(value).replace('\\', '\\\\').replace("'", "\\'") + values.append(f"'{escaped_value}'") + + values_list.append(f"({', '.join(values)})") + + columns_str = ', '.join([f"`{col}`" for col in columns]) + values_str = ',\n'.join(values_list) + + f.write(f"INSERT INTO `{table}` ({columns_str}) VALUES\n") + f.write(f"{values_str};\n\n") + + print(f" 完成: {len(rows)} 条记录") + + f.write("SET FOREIGN_KEY_CHECKS=1;\n") + + cursor.close() + connection.close() + + # 检查文件大小 + file_size = output_file.stat().st_size + print(f"备份完成!文件大小: {file_size / 1024 / 1024:.2f} MB") + + return str(output_file) + + except Exception as e: + print(f"备份失败: {str(e)}") + raise + + def _compress_file(self, file_path): + """ + 压缩备份文件 + + Args: + file_path: 文件路径 + + Returns: + 压缩后的文件路径 + """ + import gzip + + file_path = Path(file_path) + compressed_path = file_path.with_suffix('.sql.gz') + + with open(file_path, 'rb') as f_in: + with gzip.open(compressed_path, 'wb') as f_out: + f_out.writelines(f_in) + + # 删除原文件 + file_path.unlink() + + return compressed_path + + def list_backups(self): + """ + 列出所有备份文件 + + Returns: + 备份文件列表 + """ + backups = [] + for file in sorted(self.backup_dir.glob('backup_*.sql*'), reverse=True): + file_info = { + 'filename': file.name, + 'path': str(file), + 'size': file.stat().st_size, + 'size_mb': file.stat().st_size / 1024 / 1024, + 'modified': datetime.fromtimestamp(file.stat().st_mtime) + } + backups.append(file_info) + + return backups + + +def main(): + """主函数""" + import argparse + + parser = argparse.ArgumentParser(description='数据库备份工具') + parser.add_argument('--method', choices=['mysqldump', 'python', 'auto'], + default='auto', help='备份方法 (默认: auto)') + parser.add_argument('--output', '-o', help='输出文件路径') + parser.add_argument('--compress', '-c', action='store_true', + help='压缩备份文件') + parser.add_argument('--list', '-l', action='store_true', + help='列出所有备份文件') + + args = parser.parse_args() + + backup = DatabaseBackup() + + # 列出备份文件 + if args.list: + backups = backup.list_backups() + if backups: + print(f"\n找到 {len(backups)} 个备份文件:\n") + print(f"{'文件名':<50} {'大小(MB)':<15} {'修改时间':<20}") + print("-" * 85) + for b in backups: + print(f"{b['filename']:<50} {b['size_mb']:<15.2f} {b['modified'].strftime('%Y-%m-%d %H:%M:%S'):<20}") + else: + print("未找到备份文件") + return + + # 执行备份 + try: + if args.method == 'mysqldump': + backup_file = backup.backup_with_mysqldump(args.output, args.compress) + elif args.method == 'python': + backup_file = backup.backup_with_python(args.output) + else: # auto + try: + backup_file = backup.backup_with_mysqldump(args.output, args.compress) + except: + print("\nmysqldump方式失败,切换到Python方式...") + backup_file = backup.backup_with_python(args.output) + + print(f"\n备份成功!") + print(f"备份文件: {backup_file}") + + except Exception as e: + print(f"\n备份失败: {str(e)}") + sys.exit(1) + + +if __name__ == '__main__': + main() + diff --git a/backups/backup_finyx_20251209_170604.zip b/backups/backup_finyx_20251209_170604.zip new file mode 100644 index 0000000..c7ad820 Binary files /dev/null and b/backups/backup_finyx_20251209_170604.zip differ diff --git a/check_existing_data.py b/check_existing_data.py new file mode 100644 index 0000000..5755ace --- /dev/null +++ b/check_existing_data.py @@ -0,0 +1,105 @@ +""" +检查数据库中的现有数据,确认匹配情况 +""" +import os +import json +import pymysql +from pathlib import Path + +# 数据库连接配置 +DB_CONFIG = { + 'host': os.getenv('DB_HOST', '152.136.177.240'), + 'port': int(os.getenv('DB_PORT', 5012)), + 'user': os.getenv('DB_USER', 'finyx'), + 'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'), + 'database': os.getenv('DB_NAME', 'finyx'), + 'charset': 'utf8mb4' +} + +TENANT_ID = 615873064429507639 + +def check_existing_data(): + """检查数据库中的现有数据""" + print("="*80) + print("检查数据库中的现有数据") + print("="*80) + + try: + conn = pymysql.connect(**DB_CONFIG) + cursor = conn.cursor(pymysql.cursors.DictCursor) + + # 查询所有记录 + sql = """ + SELECT id, name, parent_id, template_code, input_data, file_path, state + FROM f_polic_file_config + WHERE tenant_id = %s + ORDER BY name + """ + cursor.execute(sql, (TENANT_ID,)) + configs = cursor.fetchall() + + print(f"\n共找到 {len(configs)} 条记录\n") + + # 按 parent_id 分组统计 + with_parent = [] + without_parent = [] + + for config in configs: + # 尝试从 input_data 中提取 template_code + template_code = config.get('template_code') + if not template_code and config.get('input_data'): + try: + input_data = json.loads(config['input_data']) if isinstance(config['input_data'], str) else config['input_data'] + if isinstance(input_data, dict): + template_code = input_data.get('template_code') + except: + pass + + config['extracted_template_code'] = template_code + + if config.get('parent_id'): + with_parent.append(config) + else: + without_parent.append(config) + + print(f"有 parent_id 的记录: {len(with_parent)} 条") + print(f"无 parent_id 的记录: {len(without_parent)} 条\n") + + # 显示无 parent_id 的记录 + print("="*80) + print("无 parent_id 的记录列表:") + print("="*80) + for i, config in enumerate(without_parent, 1): + print(f"\n{i}. {config['name']}") + print(f" ID: {config['id']}") + print(f" template_code: {config.get('extracted_template_code') or config.get('template_code') or '无'}") + print(f" file_path: {config.get('file_path', '无')}") + print(f" state: {config.get('state')}") + + # 显示有 parent_id 的记录(树状结构) + print("\n" + "="*80) + print("有 parent_id 的记录(树状结构):") + print("="*80) + + # 构建ID到名称的映射 + id_to_name = {config['id']: config['name'] for config in configs} + + for config in with_parent: + parent_name = id_to_name.get(config['parent_id'], f"ID:{config['parent_id']}") + print(f"\n{config['name']}") + print(f" ID: {config['id']}") + print(f" 父节点: {parent_name} (ID: {config['parent_id']})") + print(f" template_code: {config.get('extracted_template_code') or config.get('template_code') or '无'}") + + cursor.close() + conn.close() + + except Exception as e: + print(f"错误: {e}") + import traceback + traceback.print_exc() + + +if __name__ == '__main__': + check_existing_data() + diff --git a/check_remaining_fields.py b/check_remaining_fields.py new file mode 100644 index 0000000..d4a9666 --- /dev/null +++ b/check_remaining_fields.py @@ -0,0 +1,131 @@ +""" +检查剩余的未处理字段,并生成合适的field_code +""" +import os +import pymysql +import re +from typing import Dict, List + +# 数据库连接配置 +DB_CONFIG = { + 'host': os.getenv('DB_HOST', '152.136.177.240'), + 'port': int(os.getenv('DB_PORT', 5012)), + 'user': os.getenv('DB_USER', 'finyx'), + 'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'), + 'database': os.getenv('DB_NAME', 'finyx'), + 'charset': 'utf8mb4' +} + +TENANT_ID = 615873064429507639 + +def is_chinese(text: str) -> bool: + """判断字符串是否包含中文字符""" + if not text: + return False + return bool(re.search(r'[\u4e00-\u9fff]', text)) + + +def generate_field_code(field_name: str) -> str: + """根据字段名称生成field_code""" + # 移除常见前缀 + name = field_name.replace('被核查人员', 'target_').replace('被核查人', 'target_') + + # 转换为小写并替换特殊字符 + code = name.lower() + code = re.sub(r'[^\w\u4e00-\u9fff]', '_', code) + code = re.sub(r'_+', '_', code).strip('_') + + # 如果还是中文,尝试更智能的转换 + if is_chinese(code): + # 简单的拼音映射(这里只是示例,实际应该使用拼音库) + # 暂时使用更简单的规则 + code = field_name.lower() + code = code.replace('被核查人员', 'target_') + code = code.replace('被核查人', 'target_') + code = code.replace('谈话', 'interview_') + code = code.replace('审批', 'approval_') + code = code.replace('核查', 'investigation_') + code = code.replace('人员', '') + code = code.replace('时间', '_time') + code = code.replace('地点', '_location') + code = code.replace('部门', '_department') + code = code.replace('姓名', '_name') + code = code.replace('号码', '_number') + code = code.replace('情况', '_situation') + code = code.replace('问题', '_issue') + code = code.replace('描述', '_description') + code = re.sub(r'[^\w]', '_', code) + code = re.sub(r'_+', '_', code).strip('_') + + return code + + +def check_remaining_fields(): + """检查剩余的未处理字段""" + conn = pymysql.connect(**DB_CONFIG) + cursor = conn.cursor(pymysql.cursors.DictCursor) + + print("="*80) + print("检查剩余的未处理字段") + print("="*80) + + # 查询所有包含中文field_code的字段 + cursor.execute(""" + SELECT id, name, filed_code, field_type, state + FROM f_polic_field + WHERE tenant_id = %s AND ( + filed_code REGEXP '[\\u4e00-\\u9fff]' + OR filed_code IS NULL + OR filed_code = '' + ) + ORDER BY name + """, (TENANT_ID,)) + + fields = cursor.fetchall() + + print(f"\n找到 {len(fields)} 个仍需要处理的字段:\n") + + suggestions = [] + for field in fields: + suggested_code = generate_field_code(field['name']) + suggestions.append({ + 'id': field['id'], + 'name': field['name'], + 'current_code': field['filed_code'], + 'suggested_code': suggested_code, + 'field_type': field['field_type'] + }) + print(f" ID: {field['id']}") + print(f" 名称: {field['name']}") + print(f" 当前field_code: {field['filed_code']}") + print(f" 建议field_code: {suggested_code}") + print(f" field_type: {field['field_type']}") + print() + + # 询问是否更新 + if suggestions: + print("="*80) + choice = input("是否更新这些字段的field_code?(y/n,默认n): ").strip().lower() + + if choice == 'y': + print("\n开始更新...") + for sug in suggestions: + cursor.execute(""" + UPDATE f_polic_field + SET filed_code = %s, updated_time = NOW(), updated_by = %s + WHERE id = %s + """, (sug['suggested_code'], 655162080928945152, sug['id'])) + print(f" ✓ 更新字段 ID {sug['id']}: {sug['name']} -> {sug['suggested_code']}") + + conn.commit() + print("\n✓ 更新完成") + else: + print("未执行更新") + + cursor.close() + conn.close() + + +if __name__ == '__main__': + check_remaining_fields() + diff --git a/fix_missing_field_relations.py b/fix_missing_field_relations.py new file mode 100644 index 0000000..cde3f5d --- /dev/null +++ b/fix_missing_field_relations.py @@ -0,0 +1,260 @@ +""" +修复缺少字段关联的模板 +为有 template_code 但没有字段关联的文件节点补充字段关联 +""" +import os +import json +import pymysql +from typing import Dict, List + +# 数据库连接配置 +DB_CONFIG = { + 'host': os.getenv('DB_HOST', '152.136.177.240'), + 'port': int(os.getenv('DB_PORT', 5012)), + 'user': os.getenv('DB_USER', 'finyx'), + 'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'), + 'database': os.getenv('DB_NAME', 'finyx'), + 'charset': 'utf8mb4' +} + +TENANT_ID = 615873064429507639 +CREATED_BY = 655162080928945152 +UPDATED_BY = 655162080928945152 + + +def generate_id(): + """生成ID""" + import time + import random + timestamp = int(time.time() * 1000) + random_part = random.randint(100000, 999999) + return timestamp * 1000 + random_part + + +def get_templates_without_relations(conn): + """获取没有字段关联的文件节点""" + cursor = conn.cursor(pymysql.cursors.DictCursor) + + sql = """ + SELECT + fc.id, + fc.name, + fc.template_code, + fc.input_data, + COUNT(ff.id) as relation_count + FROM f_polic_file_config fc + LEFT JOIN f_polic_file_field ff ON fc.id = ff.file_id AND ff.tenant_id = fc.tenant_id + WHERE fc.tenant_id = %s + AND fc.template_code IS NOT NULL + AND fc.template_code != '' + GROUP BY fc.id, fc.name, fc.template_code, fc.input_data + HAVING relation_count = 0 + ORDER BY fc.name + """ + cursor.execute(sql, (TENANT_ID,)) + templates = cursor.fetchall() + + cursor.close() + return templates + + +def get_fields_by_code(conn): + """获取所有字段,按字段编码索引""" + cursor = conn.cursor(pymysql.cursors.DictCursor) + + sql = """ + SELECT id, name, filed_code, field_type + FROM f_polic_field + WHERE tenant_id = %s + """ + cursor.execute(sql, (TENANT_ID,)) + fields = cursor.fetchall() + + result = { + 'by_code': {}, + 'by_name': {} + } + + for field in fields: + field_code = field['filed_code'] + field_name = field['name'] + result['by_code'][field_code] = field + result['by_name'][field_name] = field + + cursor.close() + return result + + +def extract_fields_from_input_data(input_data: str) -> List[str]: + """从 input_data 中提取字段编码列表""" + try: + data = json.loads(input_data) if isinstance(input_data, str) else input_data + if isinstance(data, dict): + return data.get('input_fields', []) + except: + pass + return [] + + +def create_field_relations(conn, file_id: int, field_codes: List[str], field_type: int, + db_fields: Dict, dry_run: bool = True): + """创建字段关联关系""" + cursor = conn.cursor() + + try: + created_count = 0 + + for field_code in field_codes: + field = db_fields['by_code'].get(field_code) + + if not field: + print(f" ⚠ 字段不存在: {field_code}") + continue + + if field['field_type'] != field_type: + print(f" ⚠ 字段类型不匹配: {field_code} (期望 {field_type}, 实际 {field['field_type']})") + continue + + if not dry_run: + # 检查是否已存在 + check_sql = """ + SELECT id FROM f_polic_file_field + WHERE tenant_id = %s AND file_id = %s AND filed_id = %s + """ + cursor.execute(check_sql, (TENANT_ID, file_id, field['id'])) + existing = cursor.fetchone() + + if not existing: + relation_id = generate_id() + insert_sql = """ + INSERT INTO f_polic_file_field + (id, tenant_id, file_id, filed_id, created_time, created_by, updated_time, updated_by, state) + VALUES (%s, %s, %s, %s, NOW(), %s, NOW(), %s, %s) + """ + cursor.execute(insert_sql, ( + relation_id, TENANT_ID, file_id, field['id'], + CREATED_BY, UPDATED_BY, 1 + )) + created_count += 1 + print(f" ✓ 创建关联: {field['name']} ({field_code})") + else: + created_count += 1 + print(f" [模拟] 将创建关联: {field_code}") + + if not dry_run: + conn.commit() + + return created_count + + finally: + cursor.close() + + +def main(): + """主函数""" + print("="*80) + print("修复缺少字段关联的模板") + print("="*80) + + try: + conn = pymysql.connect(**DB_CONFIG) + print("✓ 数据库连接成功\n") + except Exception as e: + print(f"✗ 数据库连接失败: {e}") + return + + try: + # 获取没有字段关联的模板 + print("查找缺少字段关联的模板...") + templates = get_templates_without_relations(conn) + print(f" 找到 {len(templates)} 个缺少字段关联的文件节点\n") + + if not templates: + print("✓ 所有文件节点都有字段关联,无需修复") + return + + # 获取所有字段 + print("获取字段定义...") + db_fields = get_fields_by_code(conn) + print(f" 找到 {len(db_fields['by_code'])} 个字段\n") + + # 显示需要修复的模板 + print("需要修复的模板:") + for template in templates: + print(f" - {template['name']} (code: {template['template_code']})") + + # 尝试从 input_data 中提取字段 + print("\n" + "="*80) + print("分析并修复") + print("="*80) + + fixable_count = 0 + unfixable_count = 0 + + for template in templates: + print(f"\n处理: {template['name']}") + print(f" template_code: {template['template_code']}") + + input_data = template.get('input_data') + if not input_data: + print(" ⚠ 没有 input_data,无法自动修复") + unfixable_count += 1 + continue + + # 从 input_data 中提取输入字段 + input_fields = extract_fields_from_input_data(input_data) + + if not input_fields: + print(" ⚠ input_data 中没有 input_fields,无法自动修复") + unfixable_count += 1 + continue + + print(f" 找到 {len(input_fields)} 个输入字段") + fixable_count += 1 + + # 创建输入字段关联 + print(" 创建输入字段关联...") + created = create_field_relations(conn, template['id'], input_fields, 1, db_fields, dry_run=True) + print(f" 将创建 {created} 个输入字段关联") + + print("\n" + "="*80) + print("统计") + print("="*80) + print(f" 可修复: {fixable_count} 个") + print(f" 无法自动修复: {unfixable_count} 个") + + # 询问是否执行 + if fixable_count > 0: + print("\n" + "="*80) + response = input("\n是否执行修复?(yes/no,默认no): ").strip().lower() + + if response == 'yes': + print("\n执行修复...") + for template in templates: + input_data = template.get('input_data') + if not input_data: + continue + + input_fields = extract_fields_from_input_data(input_data) + if not input_fields: + continue + + print(f"\n修复: {template['name']}") + create_field_relations(conn, template['id'], input_fields, 1, db_fields, dry_run=False) + + print("\n" + "="*80) + print("✓ 修复完成!") + print("="*80) + else: + print("\n已取消修复") + else: + print("\n没有可以自动修复的模板") + + finally: + conn.close() + print("\n数据库连接已关闭") + + +if __name__ == '__main__': + main() + diff --git a/fix_only_chinese_field_codes.py b/fix_only_chinese_field_codes.py new file mode 100644 index 0000000..923efd2 --- /dev/null +++ b/fix_only_chinese_field_codes.py @@ -0,0 +1,201 @@ +""" +只修复真正包含中文的field_code字段 +""" +import os +import pymysql +import re +from typing import Dict + +# 数据库连接配置 +DB_CONFIG = { + 'host': os.getenv('DB_HOST', '152.136.177.240'), + 'port': int(os.getenv('DB_PORT', 5012)), + 'user': os.getenv('DB_USER', 'finyx'), + 'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'), + 'database': os.getenv('DB_NAME', 'finyx'), + 'charset': 'utf8mb4' +} + +TENANT_ID = 615873064429507639 +UPDATED_BY = 655162080928945152 + +# 字段名称到field_code的映射(针对剩余的中文字段) +FIELD_MAPPING = { + # 谈话相关字段 + '拟谈话地点': 'proposed_interview_location', + '拟谈话时间': 'proposed_interview_time', + '谈话事由': 'interview_reason', + '谈话人': 'interviewer', + '谈话人员-安全员': 'interview_personnel_safety_officer', + '谈话人员-组长': 'interview_personnel_leader', + '谈话人员-谈话人员': 'interview_personnel', + '谈话前安全风险评估结果': 'pre_interview_risk_assessment_result', + '谈话地点': 'interview_location', + '谈话次数': 'interview_count', + + # 被核查人员相关字段 + '被核查人单位及职务': 'target_organization_and_position', # 注意:这个和"被核查人员单位及职务"应该是同一个 + '被核查人员交代问题程度': 'target_confession_level', + '被核查人员减压后的表现': 'target_behavior_after_relief', + '被核查人员学历': 'target_education', # 注意:这个和"被核查人员文化程度"可能不同 + '被核查人员工作履历': 'target_work_history', + '被核查人员思想负担程度': 'target_mental_burden_level', + '被核查人员职业': 'target_occupation', + '被核查人员谈话中的表现': 'target_behavior_during_interview', + '被核查人员问题严重程度': 'target_issue_severity_level', + '被核查人员风险等级': 'target_risk_level', + '被核查人基本情况': 'target_basic_info', + + # 其他字段 + '补空人员': 'backup_personnel', + '记录人': 'recorder', + '评估意见': 'assessment_opinion', +} + +def is_chinese(text: str) -> bool: + """判断字符串是否完全或主要包含中文字符""" + if not text: + return False + # 如果包含中文字符,且中文字符占比超过50%,认为是中文 + chinese_chars = len(re.findall(r'[\u4e00-\u9fff]', text)) + total_chars = len(text) + if total_chars == 0: + return False + return chinese_chars / total_chars > 0.3 # 如果中文字符占比超过30%,认为是中文 + + +def fix_chinese_fields(dry_run: bool = True): + """修复包含中文的field_code字段""" + conn = pymysql.connect(**DB_CONFIG) + cursor = conn.cursor(pymysql.cursors.DictCursor) + + print("="*80) + print("修复包含中文的field_code字段") + print("="*80) + + if dry_run: + print("\n[DRY RUN模式 - 不会实际修改数据库]") + + # 查询所有字段 + cursor.execute(""" + SELECT id, name, filed_code, field_type, state + FROM f_polic_field + WHERE tenant_id = %s + ORDER BY name + """, (TENANT_ID,)) + + all_fields = cursor.fetchall() + + # 找出field_code包含中文的字段 + chinese_fields = [] + for field in all_fields: + if field['filed_code'] and is_chinese(field['filed_code']): + chinese_fields.append(field) + + print(f"\n找到 {len(chinese_fields)} 个field_code包含中文的字段:\n") + + updates = [] + for field in chinese_fields: + field_name = field['name'] + new_code = FIELD_MAPPING.get(field_name) + + if not new_code: + # 如果没有映射,生成一个基于名称的code + new_code = field_name.lower() + new_code = new_code.replace('被核查人员', 'target_').replace('被核查人', 'target_') + new_code = new_code.replace('谈话', 'interview_') + new_code = new_code.replace('人员', '') + new_code = new_code.replace('时间', '_time') + new_code = new_code.replace('地点', '_location') + new_code = new_code.replace('问题', '_issue') + new_code = new_code.replace('情况', '_situation') + new_code = new_code.replace('程度', '_level') + new_code = new_code.replace('表现', '_behavior') + new_code = new_code.replace('等级', '_level') + new_code = new_code.replace('履历', '_history') + new_code = new_code.replace('学历', '_education') + new_code = new_code.replace('职业', '_occupation') + new_code = new_code.replace('事由', '_reason') + new_code = new_code.replace('次数', '_count') + new_code = new_code.replace('结果', '_result') + new_code = new_code.replace('意见', '_opinion') + new_code = re.sub(r'[^\w]', '_', new_code) + new_code = re.sub(r'_+', '_', new_code).strip('_') + new_code = new_code.replace('__', '_') + + updates.append({ + 'id': field['id'], + 'name': field_name, + 'old_code': field['filed_code'], + 'new_code': new_code, + 'field_type': field['field_type'] + }) + + print(f" ID: {field['id']}") + print(f" 名称: {field_name}") + print(f" 当前field_code: {field['filed_code']}") + print(f" 新field_code: {new_code}") + print() + + # 检查是否有重复的new_code + code_to_fields = {} + for update in updates: + code = update['new_code'] + if code not in code_to_fields: + code_to_fields[code] = [] + code_to_fields[code].append(update) + + duplicate_codes = {code: fields_list for code, fields_list in code_to_fields.items() + if len(fields_list) > 1} + + if duplicate_codes: + print("\n⚠ 警告:以下field_code会重复:") + for code, fields_list in duplicate_codes.items(): + print(f" field_code: {code}") + for field in fields_list: + print(f" - ID: {field['id']}, 名称: {field['name']}") + print() + + # 执行更新 + if not dry_run: + print("开始执行更新...\n") + for update in updates: + cursor.execute(""" + UPDATE f_polic_field + SET filed_code = %s, updated_time = NOW(), updated_by = %s + WHERE id = %s + """, (update['new_code'], UPDATED_BY, update['id'])) + print(f" ✓ 更新字段 ID {update['id']}: {update['name']}") + print(f" {update['old_code']} -> {update['new_code']}") + + conn.commit() + print("\n✓ 更新完成") + else: + print("[DRY RUN] 以上操作不会实际执行") + + cursor.close() + conn.close() + + return updates + + +if __name__ == '__main__': + print("是否执行修复?") + print("1. DRY RUN(不实际修改数据库)") + print("2. 直接执行修复(会修改数据库)") + + choice = input("\n请选择 (1/2,默认1): ").strip() or "1" + + if choice == "2": + print("\n执行实际修复...") + fix_chinese_fields(dry_run=False) + else: + print("\n执行DRY RUN...") + updates = fix_chinese_fields(dry_run=True) + + if updates: + confirm = input("\nDRY RUN完成。是否执行实际修复?(y/n,默认n): ").strip().lower() + if confirm == 'y': + print("\n执行实际修复...") + fix_chinese_fields(dry_run=False) + diff --git a/fix_remaining_chinese_fields.py b/fix_remaining_chinese_fields.py new file mode 100644 index 0000000..682ab73 --- /dev/null +++ b/fix_remaining_chinese_fields.py @@ -0,0 +1,191 @@ +""" +修复剩余的中文field_code字段 +为这些字段生成合适的英文field_code +""" +import os +import pymysql +import re +from typing import Dict + +# 数据库连接配置 +DB_CONFIG = { + 'host': os.getenv('DB_HOST', '152.136.177.240'), + 'port': int(os.getenv('DB_PORT', 5012)), + 'user': os.getenv('DB_USER', 'finyx'), + 'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'), + 'database': os.getenv('DB_NAME', 'finyx'), + 'charset': 'utf8mb4' +} + +TENANT_ID = 615873064429507639 +UPDATED_BY = 655162080928945152 + +# 字段名称到field_code的映射(针对剩余的中文字段) +FIELD_MAPPING = { + # 谈话相关字段 + '拟谈话地点': 'proposed_interview_location', + '拟谈话时间': 'proposed_interview_time', + '谈话事由': 'interview_reason', + '谈话人': 'interviewer', + '谈话人员-安全员': 'interview_personnel_safety_officer', + '谈话人员-组长': 'interview_personnel_leader', + '谈话人员-谈话人员': 'interview_personnel', + '谈话前安全风险评估结果': 'pre_interview_risk_assessment_result', + '谈话地点': 'interview_location', + '谈话次数': 'interview_count', + + # 被核查人员相关字段 + '被核查人单位及职务': 'target_organization_and_position', # 注意:这个和"被核查人员单位及职务"应该是同一个 + '被核查人员交代问题程度': 'target_confession_level', + '被核查人员减压后的表现': 'target_behavior_after_relief', + '被核查人员学历': 'target_education', # 注意:这个和"被核查人员文化程度"可能不同 + '被核查人员工作履历': 'target_work_history', + '被核查人员思想负担程度': 'target_mental_burden_level', + '被核查人员职业': 'target_occupation', + '被核查人员谈话中的表现': 'target_behavior_during_interview', + '被核查人员问题严重程度': 'target_issue_severity_level', + '被核查人员风险等级': 'target_risk_level', + '被核查人基本情况': 'target_basic_info', + + # 其他字段 + '补空人员': 'backup_personnel', + '记录人': 'recorder', + '评估意见': 'assessment_opinion', +} + +def is_chinese(text: str) -> bool: + """判断字符串是否包含中文字符""" + if not text: + return False + return bool(re.search(r'[\u4e00-\u9fff]', text)) + + +def fix_remaining_fields(dry_run: bool = True): + """修复剩余的中文field_code字段""" + conn = pymysql.connect(**DB_CONFIG) + cursor = conn.cursor(pymysql.cursors.DictCursor) + + print("="*80) + print("修复剩余的中文field_code字段") + print("="*80) + + if dry_run: + print("\n[DRY RUN模式 - 不会实际修改数据库]") + + # 查询所有包含中文field_code的字段 + cursor.execute(""" + SELECT id, name, filed_code, field_type, state + FROM f_polic_field + WHERE tenant_id = %s AND filed_code REGEXP '[\\u4e00-\\u9fff]' + ORDER BY name + """, (TENANT_ID,)) + + fields = cursor.fetchall() + + print(f"\n找到 {len(fields)} 个需要修复的字段:\n") + + updates = [] + for field in fields: + field_name = field['name'] + new_code = FIELD_MAPPING.get(field_name) + + if not new_code: + # 如果没有映射,生成一个基于名称的code + new_code = field_name.lower() + new_code = new_code.replace('被核查人员', 'target_').replace('被核查人', 'target_') + new_code = new_code.replace('谈话', 'interview_') + new_code = new_code.replace('人员', '') + new_code = new_code.replace('时间', '_time') + new_code = new_code.replace('地点', '_location') + new_code = new_code.replace('问题', '_issue') + new_code = new_code.replace('情况', '_situation') + new_code = new_code.replace('程度', '_level') + new_code = new_code.replace('表现', '_behavior') + new_code = new_code.replace('等级', '_level') + new_code = new_code.replace('履历', '_history') + new_code = new_code.replace('学历', '_education') + new_code = new_code.replace('职业', '_occupation') + new_code = new_code.replace('事由', '_reason') + new_code = new_code.replace('次数', '_count') + new_code = new_code.replace('结果', '_result') + new_code = new_code.replace('意见', '_opinion') + new_code = re.sub(r'[^\w]', '_', new_code) + new_code = re.sub(r'_+', '_', new_code).strip('_') + new_code = new_code.replace('__', '_') + + updates.append({ + 'id': field['id'], + 'name': field_name, + 'old_code': field['filed_code'], + 'new_code': new_code, + 'field_type': field['field_type'] + }) + + print(f" ID: {field['id']}") + print(f" 名称: {field_name}") + print(f" 当前field_code: {field['filed_code']}") + print(f" 新field_code: {new_code}") + print() + + # 检查是否有重复的new_code + code_to_fields = {} + for update in updates: + code = update['new_code'] + if code not in code_to_fields: + code_to_fields[code] = [] + code_to_fields[code].append(update) + + duplicate_codes = {code: fields_list for code, fields_list in code_to_fields.items() + if len(fields_list) > 1} + + if duplicate_codes: + print("\n⚠ 警告:以下field_code会重复:") + for code, fields_list in duplicate_codes.items(): + print(f" field_code: {code}") + for field in fields_list: + print(f" - ID: {field['id']}, 名称: {field['name']}") + print() + + # 执行更新 + if not dry_run: + print("开始执行更新...\n") + for update in updates: + cursor.execute(""" + UPDATE f_polic_field + SET filed_code = %s, updated_time = NOW(), updated_by = %s + WHERE id = %s + """, (update['new_code'], UPDATED_BY, update['id'])) + print(f" ✓ 更新字段 ID {update['id']}: {update['name']}") + print(f" {update['old_code']} -> {update['new_code']}") + + conn.commit() + print("\n✓ 更新完成") + else: + print("[DRY RUN] 以上操作不会实际执行") + + cursor.close() + conn.close() + + return updates + + +if __name__ == '__main__': + print("是否执行修复?") + print("1. DRY RUN(不实际修改数据库)") + print("2. 直接执行修复(会修改数据库)") + + choice = input("\n请选择 (1/2,默认1): ").strip() or "1" + + if choice == "2": + print("\n执行实际修复...") + fix_remaining_fields(dry_run=False) + else: + print("\n执行DRY RUN...") + updates = fix_remaining_fields(dry_run=True) + + if updates: + confirm = input("\nDRY RUN完成。是否执行实际修复?(y/n,默认n): ").strip().lower() + if confirm == 'y': + print("\n执行实际修复...") + fix_remaining_fields(dry_run=False) + diff --git a/generate_download_urls.py b/generate_download_urls.py new file mode 100644 index 0000000..7498a23 --- /dev/null +++ b/generate_download_urls.py @@ -0,0 +1,117 @@ +""" +为指定的文件路径生成 MinIO 预签名下载 URL +""" +from minio import Minio +from datetime import timedelta + +# MinIO连接配置 +MINIO_CONFIG = { + 'endpoint': 'minio.datacubeworld.com:9000', + 'access_key': 'JOLXFXny3avFSzB0uRA5', + 'secret_key': 'G1BR8jStNfovkfH5ou39EmPl34E4l7dGrnd3Cz0I', + 'secure': True +} + +BUCKET_NAME = 'finyx' + +# 文件相对路径列表 +FILE_PATHS = [ + '/615873064429507639/20251209170434/初步核实审批表_张三.docx', + '/615873064429507639/20251209170434/请示报告卡_张三.docx' +] + +def generate_download_urls(): + """为文件路径列表生成下载 URL""" + print("="*80) + print("生成 MinIO 下载链接") + print("="*80) + + try: + # 创建MinIO客户端 + client = Minio( + MINIO_CONFIG['endpoint'], + access_key=MINIO_CONFIG['access_key'], + secret_key=MINIO_CONFIG['secret_key'], + secure=MINIO_CONFIG['secure'] + ) + + print(f"\n存储桶: {BUCKET_NAME}") + print(f"端点: {MINIO_CONFIG['endpoint']}") + print(f"使用HTTPS: {MINIO_CONFIG['secure']}\n") + + results = [] + + for file_path in FILE_PATHS: + # 去掉开头的斜杠,得到对象名称 + object_name = file_path.lstrip('/') + + print("-"*80) + print(f"文件: {file_path}") + print(f"对象名称: {object_name}") + + try: + # 检查文件是否存在 + stat = client.stat_object(BUCKET_NAME, object_name) + print(f"✓ 文件存在") + print(f" 文件大小: {stat.size:,} 字节") + print(f" 最后修改: {stat.last_modified}") + + # 生成预签名URL(7天有效期) + url = client.presigned_get_object( + BUCKET_NAME, + object_name, + expires=timedelta(days=7) + ) + + print(f"✓ 预签名URL生成成功(7天有效)") + print(f"\n下载链接:") + print(f"{url}\n") + + results.append({ + 'file_path': file_path, + 'object_name': object_name, + 'url': url, + 'size': stat.size, + 'exists': True + }) + + except Exception as e: + print(f"✗ 错误: {e}\n") + results.append({ + 'file_path': file_path, + 'object_name': object_name, + 'url': None, + 'exists': False, + 'error': str(e) + }) + + # 输出汇总 + print("\n" + "="*80) + print("下载链接汇总") + print("="*80) + + for i, result in enumerate(results, 1): + print(f"\n{i}. {result['file_path']}") + if result['exists']: + print(f" ✓ 文件存在") + print(f" 下载链接: {result['url']}") + else: + print(f" ✗ 文件不存在或无法访问") + if 'error' in result: + print(f" 错误: {result['error']}") + + print("\n" + "="*80) + print("完成") + print("="*80) + + return results + + except Exception as e: + print(f"\n✗ 连接MinIO失败: {e}") + import traceback + traceback.print_exc() + return None + +if __name__ == '__main__': + generate_download_urls() + diff --git a/improved_match_and_update.py b/improved_match_and_update.py new file mode 100644 index 0000000..f2d5f90 --- /dev/null +++ b/improved_match_and_update.py @@ -0,0 +1,478 @@ +""" +改进的匹配和更新脚本 +增强匹配逻辑,能够匹配数据库中的已有数据 +""" +import os +import json +import pymysql +import re +from pathlib import Path +from typing import Dict, List, Optional, Tuple +from datetime import datetime + +# 数据库连接配置 +DB_CONFIG = { + 'host': os.getenv('DB_HOST', '152.136.177.240'), + 'port': int(os.getenv('DB_PORT', 5012)), + 'user': os.getenv('DB_USER', 'finyx'), + 'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'), + 'database': os.getenv('DB_NAME', 'finyx'), + 'charset': 'utf8mb4' +} + +TENANT_ID = 615873064429507639 +CREATED_BY = 655162080928945152 +UPDATED_BY = 655162080928945152 + +# 项目根目录 +PROJECT_ROOT = Path(__file__).parent +TEMPLATES_DIR = PROJECT_ROOT / "template_finish" + +# 文档类型映射 +DOCUMENT_TYPE_MAPPING = { + "1.请示报告卡(XXX)": { + "template_code": "REPORT_CARD", + "name": "1.请示报告卡(XXX)", + "business_type": "INVESTIGATION" + }, + "2.初步核实审批表(XXX)": { + "template_code": "PRELIMINARY_VERIFICATION_APPROVAL", + "name": "2.初步核实审批表(XXX)", + "business_type": "INVESTIGATION" + }, + "3.附件初核方案(XXX)": { + "template_code": "INVESTIGATION_PLAN", + "name": "3.附件初核方案(XXX)", + "business_type": "INVESTIGATION" + }, + "谈话通知书第一联": { + "template_code": "NOTIFICATION_LETTER_1", + "name": "谈话通知书第一联", + "business_type": "INVESTIGATION" + }, + "谈话通知书第二联": { + "template_code": "NOTIFICATION_LETTER_2", + "name": "谈话通知书第二联", + "business_type": "INVESTIGATION" + }, + "谈话通知书第三联": { + "template_code": "NOTIFICATION_LETTER_3", + "name": "谈话通知书第三联", + "business_type": "INVESTIGATION" + }, + "1.请示报告卡(初核谈话)": { + "template_code": "REPORT_CARD_INTERVIEW", + "name": "1.请示报告卡(初核谈话)", + "business_type": "INVESTIGATION" + }, + "2谈话审批表": { + "template_code": "INTERVIEW_APPROVAL_FORM", + "name": "2谈话审批表", + "business_type": "INVESTIGATION" + }, + "3.谈话前安全风险评估表": { + "template_code": "PRE_INTERVIEW_RISK_ASSESSMENT", + "name": "3.谈话前安全风险评估表", + "business_type": "INVESTIGATION" + }, + "4.谈话方案": { + "template_code": "INTERVIEW_PLAN", + "name": "4.谈话方案", + "business_type": "INVESTIGATION" + }, + "5.谈话后安全风险评估表": { + "template_code": "POST_INTERVIEW_RISK_ASSESSMENT", + "name": "5.谈话后安全风险评估表", + "business_type": "INVESTIGATION" + }, + "1.谈话笔录": { + "template_code": "INTERVIEW_RECORD", + "name": "1.谈话笔录", + "business_type": "INVESTIGATION" + }, + "2.谈话询问对象情况摸底调查30问": { + "template_code": "INVESTIGATION_30_QUESTIONS", + "name": "2.谈话询问对象情况摸底调查30问", + "business_type": "INVESTIGATION" + }, + "3.被谈话人权利义务告知书": { + "template_code": "RIGHTS_OBLIGATIONS_NOTICE", + "name": "3.被谈话人权利义务告知书", + "business_type": "INVESTIGATION" + }, + "4.点对点交接单": { + "template_code": "HANDOVER_FORM", + "name": "4.点对点交接单", + "business_type": "INVESTIGATION" + }, + "5.陪送交接单(新)": { + "template_code": "ESCORT_HANDOVER_FORM", + "name": "5.陪送交接单(新)", + "business_type": "INVESTIGATION" + }, + "6.1保密承诺书(谈话对象使用-非中共党员用)": { + "template_code": "CONFIDENTIALITY_COMMITMENT_NON_PARTY", + "name": "6.1保密承诺书(谈话对象使用-非中共党员用)", + "business_type": "INVESTIGATION" + }, + "6.2保密承诺书(谈话对象使用-中共党员用)": { + "template_code": "CONFIDENTIALITY_COMMITMENT_PARTY", + "name": "6.2保密承诺书(谈话对象使用-中共党员用)", + "business_type": "INVESTIGATION" + }, + "7.办案人员-办案安全保密承诺书": { + "template_code": "INVESTIGATOR_CONFIDENTIALITY_COMMITMENT", + "name": "7.办案人员-办案安全保密承诺书", + "business_type": "INVESTIGATION" + }, + "8-1请示报告卡(初核报告结论) ": { + "template_code": "REPORT_CARD_CONCLUSION", + "name": "8-1请示报告卡(初核报告结论) ", + "business_type": "INVESTIGATION" + }, + "8.XXX初核情况报告": { + "template_code": "INVESTIGATION_REPORT", + "name": "8.XXX初核情况报告", + "business_type": "INVESTIGATION" + } +} + + +def normalize_name(name: str) -> str: + """标准化名称,用于模糊匹配""" + # 去掉开头的编号(如 "1."、"2."、"8-1" 等) + name = re.sub(r'^\d+[\.\-]\s*', '', name) + # 去掉括号及其内容(如 "(XXX)"、"(初核谈话)" 等) + name = re.sub(r'[((].*?[))]', '', name) + # 去掉空格和特殊字符 + name = name.strip() + return name + + +def generate_id(): + """生成ID""" + import time + import random + timestamp = int(time.time() * 1000) + random_part = random.randint(100000, 999999) + return timestamp * 1000 + random_part + + +def identify_document_type(file_name: str) -> Optional[Dict]: + """根据完整文件名识别文档类型""" + base_name = Path(file_name).stem + if base_name in DOCUMENT_TYPE_MAPPING: + return DOCUMENT_TYPE_MAPPING[base_name] + return None + + +def scan_directory_structure(base_dir: Path) -> Dict: + """扫描目录结构,构建树状层级""" + structure = { + 'directories': {}, + 'files': {} + } + + def process_path(path: Path, parent_path: Optional[str] = None, level: int = 0): + """递归处理路径""" + if path.is_file() and path.suffix == '.docx': + file_name = path.stem + doc_config = identify_document_type(file_name) + + structure['files'][str(path)] = { + 'name': file_name, + 'parent': parent_path, + 'level': level, + 'template_code': doc_config['template_code'] if doc_config else None, + 'full_path': str(path), + 'normalized_name': normalize_name(file_name) + } + elif path.is_dir(): + dir_name = path.name + structure['directories'][str(path)] = { + 'name': dir_name, + 'parent': parent_path, + 'level': level, + 'normalized_name': normalize_name(dir_name) + } + + for child in sorted(path.iterdir()): + if child.name != '__pycache__': + process_path(child, str(path), level + 1) + + if TEMPLATES_DIR.exists(): + for item in sorted(TEMPLATES_DIR.iterdir()): + if item.name != '__pycache__': + process_path(item, None, 0) + + return structure + + +def get_existing_data(conn) -> Dict: + """获取数据库中的现有数据,增强匹配能力""" + cursor = conn.cursor(pymysql.cursors.DictCursor) + + sql = """ + SELECT id, name, parent_id, template_code, input_data, file_path, state + FROM f_polic_file_config + WHERE tenant_id = %s + """ + cursor.execute(sql, (TENANT_ID,)) + configs = cursor.fetchall() + + result = { + 'by_id': {}, + 'by_name': {}, + 'by_template_code': {}, + 'by_normalized_name': {} # 新增:标准化名称索引 + } + + for config in configs: + config_id = config['id'] + config_name = config['name'] + + # 提取 template_code + template_code = config.get('template_code') + if not template_code and config.get('input_data'): + try: + input_data = json.loads(config['input_data']) if isinstance(config['input_data'], str) else config['input_data'] + if isinstance(input_data, dict): + template_code = input_data.get('template_code') + except: + pass + + config['extracted_template_code'] = template_code + config['normalized_name'] = normalize_name(config_name) + + result['by_id'][config_id] = config + result['by_name'][config_name] = config + + if template_code: + if template_code not in result['by_template_code']: + result['by_template_code'][template_code] = config + + # 标准化名称索引(可能有多个记录匹配同一个标准化名称) + normalized = config['normalized_name'] + if normalized not in result['by_normalized_name']: + result['by_normalized_name'][normalized] = [] + result['by_normalized_name'][normalized].append(config) + + cursor.close() + return result + + +def find_matching_config(file_info: Dict, existing_data: Dict) -> Optional[Dict]: + """ + 查找匹配的数据库记录 + 优先级:1. template_code 精确匹配 2. 名称精确匹配 3. 标准化名称匹配 + """ + template_code = file_info.get('template_code') + file_name = file_info['name'] + normalized_name = file_info.get('normalized_name', normalize_name(file_name)) + + # 优先级1: template_code 精确匹配 + if template_code: + matched = existing_data['by_template_code'].get(template_code) + if matched: + return matched + + # 优先级2: 名称精确匹配 + matched = existing_data['by_name'].get(file_name) + if matched: + return matched + + # 优先级3: 标准化名称匹配 + candidates = existing_data['by_normalized_name'].get(normalized_name, []) + if candidates: + # 如果有多个候选,优先选择有正确 template_code 的 + for candidate in candidates: + if candidate.get('extracted_template_code') == template_code: + return candidate + # 否则返回第一个 + return candidates[0] + + return None + + +def plan_tree_structure(dir_structure: Dict, existing_data: Dict) -> List[Dict]: + """规划树状结构,使用改进的匹配逻辑""" + plan = [] + + directories = sorted(dir_structure['directories'].items(), + key=lambda x: (x[1]['level'], x[0])) + files = sorted(dir_structure['files'].items(), + key=lambda x: (x[1]['level'], x[0])) + + dir_id_map = {} + + # 处理目录 + for dir_path, dir_info in directories: + dir_name = dir_info['name'] + parent_path = dir_info['parent'] + level = dir_info['level'] + + parent_id = None + if parent_path: + parent_id = dir_id_map.get(parent_path) + + # 查找匹配的数据库记录 + matched = find_matching_config(dir_info, existing_data) + + if matched: + plan.append({ + 'type': 'directory', + 'name': dir_name, + 'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None, + 'parent_id': parent_id, + 'level': level, + 'action': 'update', + 'config_id': matched['id'], + 'current_parent_id': matched.get('parent_id'), + 'matched_by': 'existing' + }) + dir_id_map[dir_path] = matched['id'] + else: + new_id = generate_id() + plan.append({ + 'type': 'directory', + 'name': dir_name, + 'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None, + 'parent_id': parent_id, + 'level': level, + 'action': 'create', + 'config_id': new_id, + 'current_parent_id': None, + 'matched_by': 'new' + }) + dir_id_map[dir_path] = new_id + + # 处理文件 + for file_path, file_info in files: + file_name = file_info['name'] + parent_path = file_info['parent'] + level = file_info['level'] + template_code = file_info['template_code'] + + parent_id = dir_id_map.get(parent_path) if parent_path else None + + # 查找匹配的数据库记录 + matched = find_matching_config(file_info, existing_data) + + if matched: + plan.append({ + 'type': 'file', + 'name': file_name, + 'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None, + 'parent_id': parent_id, + 'level': level, + 'action': 'update', + 'config_id': matched['id'], + 'template_code': template_code, + 'current_parent_id': matched.get('parent_id'), + 'matched_by': 'existing' + }) + else: + new_id = generate_id() + plan.append({ + 'type': 'file', + 'name': file_name, + 'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None, + 'parent_id': parent_id, + 'level': level, + 'action': 'create', + 'config_id': new_id, + 'template_code': template_code, + 'current_parent_id': None, + 'matched_by': 'new' + }) + + return plan + + +def print_matching_report(plan: List[Dict]): + """打印匹配报告""" + print("\n" + "="*80) + print("匹配报告") + print("="*80) + + matched = [p for p in plan if p.get('matched_by') == 'existing'] + unmatched = [p for p in plan if p.get('matched_by') == 'new'] + + print(f"\n已匹配的记录: {len(matched)} 条") + print(f"未匹配的记录(将创建): {len(unmatched)} 条\n") + + if unmatched: + print("未匹配的记录列表:") + for item in unmatched: + print(f" - {item['name']} ({item['type']})") + + print("\n匹配详情:") + by_level = {} + for item in plan: + level = item['level'] + if level not in by_level: + by_level[level] = [] + by_level[level].append(item) + + for level in sorted(by_level.keys()): + print(f"\n【层级 {level}】") + for item in by_level[level]: + indent = " " * level + match_status = "✓" if item.get('matched_by') == 'existing' else "✗" + print(f"{indent}{match_status} {item['name']} (ID: {item['config_id']})") + if item.get('parent_name'): + print(f"{indent} 父节点: {item['parent_name']}") + if item['action'] == 'update': + current = item.get('current_parent_id', 'None') + new = item.get('parent_id', 'None') + if current != new: + print(f"{indent} parent_id: {current} → {new}") + + +def main(): + """主函数""" + print("="*80) + print("改进的模板树状结构分析和更新") + print("="*80) + + try: + conn = pymysql.connect(**DB_CONFIG) + print("✓ 数据库连接成功\n") + except Exception as e: + print(f"✗ 数据库连接失败: {e}") + return + + try: + print("扫描目录结构...") + dir_structure = scan_directory_structure(TEMPLATES_DIR) + print(f" 找到 {len(dir_structure['directories'])} 个目录") + print(f" 找到 {len(dir_structure['files'])} 个文件\n") + + print("获取数据库现有数据...") + existing_data = get_existing_data(conn) + print(f" 数据库中有 {len(existing_data['by_id'])} 条记录\n") + + print("规划树状结构(使用改进的匹配逻辑)...") + plan = plan_tree_structure(dir_structure, existing_data) + print(f" 生成 {len(plan)} 个更新计划\n") + + print_matching_report(plan) + + # 询问是否继续 + print("\n" + "="*80) + response = input("\n是否生成更新SQL脚本?(yes/no,默认no): ").strip().lower() + + if response == 'yes': + from analyze_and_update_template_tree import generate_update_sql + sql_file = generate_update_sql(plan) + print(f"\n✓ SQL脚本已生成: {sql_file}") + else: + print("\n已取消") + + finally: + conn.close() + + +if __name__ == '__main__': + main() + diff --git a/init_template_tree_from_directory.py b/init_template_tree_from_directory.py new file mode 100644 index 0000000..7d307b5 --- /dev/null +++ b/init_template_tree_from_directory.py @@ -0,0 +1,544 @@ +""" +从 template_finish 目录初始化模板树状结构 +删除旧数据,根据目录结构完全重建 +""" +import os +import json +import pymysql +from pathlib import Path +from typing import Dict, List, Optional, Tuple +from datetime import datetime +from minio import Minio +from minio.error import S3Error + +# 数据库连接配置 +DB_CONFIG = { + 'host': os.getenv('DB_HOST', '152.136.177.240'), + 'port': int(os.getenv('DB_PORT', 5012)), + 'user': os.getenv('DB_USER', 'finyx'), + 'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'), + 'database': os.getenv('DB_NAME', 'finyx'), + 'charset': 'utf8mb4' +} + +# MinIO连接配置 +MINIO_CONFIG = { + 'endpoint': 'minio.datacubeworld.com:9000', + 'access_key': 'JOLXFXny3avFSzB0uRA5', + 'secret_key': 'G1BR8jStNfovkfH5ou39EmPl34E4l7dGrnd3Cz0I', + 'secure': True +} + +TENANT_ID = 615873064429507639 +CREATED_BY = 655162080928945152 +UPDATED_BY = 655162080928945152 +BUCKET_NAME = 'finyx' + +# 项目根目录 +PROJECT_ROOT = Path(__file__).parent +TEMPLATES_DIR = PROJECT_ROOT / "template_finish" + +# 文档类型映射 +DOCUMENT_TYPE_MAPPING = { + "1.请示报告卡(XXX)": { + "template_code": "REPORT_CARD", + "name": "1.请示报告卡(XXX)", + "business_type": "INVESTIGATION" + }, + "2.初步核实审批表(XXX)": { + "template_code": "PRELIMINARY_VERIFICATION_APPROVAL", + "name": "2.初步核实审批表(XXX)", + "business_type": "INVESTIGATION" + }, + "3.附件初核方案(XXX)": { + "template_code": "INVESTIGATION_PLAN", + "name": "3.附件初核方案(XXX)", + "business_type": "INVESTIGATION" + }, + "谈话通知书第一联": { + "template_code": "NOTIFICATION_LETTER_1", + "name": "谈话通知书第一联", + "business_type": "INVESTIGATION" + }, + "谈话通知书第二联": { + "template_code": "NOTIFICATION_LETTER_2", + "name": "谈话通知书第二联", + "business_type": "INVESTIGATION" + }, + "谈话通知书第三联": { + "template_code": "NOTIFICATION_LETTER_3", + "name": "谈话通知书第三联", + "business_type": "INVESTIGATION" + }, + "1.请示报告卡(初核谈话)": { + "template_code": "REPORT_CARD_INTERVIEW", + "name": "1.请示报告卡(初核谈话)", + "business_type": "INVESTIGATION" + }, + "2谈话审批表": { + "template_code": "INTERVIEW_APPROVAL_FORM", + "name": "2谈话审批表", + "business_type": "INVESTIGATION" + }, + "3.谈话前安全风险评估表": { + "template_code": "PRE_INTERVIEW_RISK_ASSESSMENT", + "name": "3.谈话前安全风险评估表", + "business_type": "INVESTIGATION" + }, + "4.谈话方案": { + "template_code": "INTERVIEW_PLAN", + "name": "4.谈话方案", + "business_type": "INVESTIGATION" + }, + "5.谈话后安全风险评估表": { + "template_code": "POST_INTERVIEW_RISK_ASSESSMENT", + "name": "5.谈话后安全风险评估表", + "business_type": "INVESTIGATION" + }, + "1.谈话笔录": { + "template_code": "INTERVIEW_RECORD", + "name": "1.谈话笔录", + "business_type": "INVESTIGATION" + }, + "2.谈话询问对象情况摸底调查30问": { + "template_code": "INVESTIGATION_30_QUESTIONS", + "name": "2.谈话询问对象情况摸底调查30问", + "business_type": "INVESTIGATION" + }, + "3.被谈话人权利义务告知书": { + "template_code": "RIGHTS_OBLIGATIONS_NOTICE", + "name": "3.被谈话人权利义务告知书", + "business_type": "INVESTIGATION" + }, + "4.点对点交接单": { + "template_code": "HANDOVER_FORM", + "name": "4.点对点交接单", + "business_type": "INVESTIGATION" + }, + "5.陪送交接单(新)": { + "template_code": "ESCORT_HANDOVER_FORM", + "name": "5.陪送交接单(新)", + "business_type": "INVESTIGATION" + }, + "6.1保密承诺书(谈话对象使用-非中共党员用)": { + "template_code": "CONFIDENTIALITY_COMMITMENT_NON_PARTY", + "name": "6.1保密承诺书(谈话对象使用-非中共党员用)", + "business_type": "INVESTIGATION" + }, + "6.2保密承诺书(谈话对象使用-中共党员用)": { + "template_code": "CONFIDENTIALITY_COMMITMENT_PARTY", + "name": "6.2保密承诺书(谈话对象使用-中共党员用)", + "business_type": "INVESTIGATION" + }, + "7.办案人员-办案安全保密承诺书": { + "template_code": "INVESTIGATOR_CONFIDENTIALITY_COMMITMENT", + "name": "7.办案人员-办案安全保密承诺书", + "business_type": "INVESTIGATION" + }, + "8-1请示报告卡(初核报告结论) ": { + "template_code": "REPORT_CARD_CONCLUSION", + "name": "8-1请示报告卡(初核报告结论) ", + "business_type": "INVESTIGATION" + }, + "8.XXX初核情况报告": { + "template_code": "INVESTIGATION_REPORT", + "name": "8.XXX初核情况报告", + "business_type": "INVESTIGATION" + } +} + + +def generate_id(): + """生成ID""" + import time + import random + timestamp = int(time.time() * 1000) + random_part = random.randint(100000, 999999) + return timestamp * 1000 + random_part + + +def identify_document_type(file_name: str) -> Optional[Dict]: + """根据完整文件名识别文档类型""" + base_name = Path(file_name).stem + if base_name in DOCUMENT_TYPE_MAPPING: + return DOCUMENT_TYPE_MAPPING[base_name] + return None + + +def upload_to_minio(file_path: Path) -> str: + """上传文件到MinIO""" + try: + client = Minio( + MINIO_CONFIG['endpoint'], + access_key=MINIO_CONFIG['access_key'], + secret_key=MINIO_CONFIG['secret_key'], + secure=MINIO_CONFIG['secure'] + ) + + found = client.bucket_exists(BUCKET_NAME) + if not found: + raise Exception(f"存储桶 '{BUCKET_NAME}' 不存在,请先创建") + + now = datetime.now() + object_name = f'{TENANT_ID}/TEMPLATE/{now.year}/{now.month:02d}/{file_path.name}' + + client.fput_object( + BUCKET_NAME, + object_name, + str(file_path), + content_type='application/vnd.openxmlformats-officedocument.wordprocessingml.document' + ) + + return f"/{object_name}" + + except S3Error as e: + raise Exception(f"MinIO错误: {e}") + except Exception as e: + raise Exception(f"上传文件时发生错误: {e}") + + +def scan_directory_structure(base_dir: Path) -> List[Dict]: + """ + 扫描目录结构,返回按层级排序的节点列表 + 每个节点包含:type, name, path, parent_path, level, template_code, file_path + """ + nodes = [] + + def process_path(path: Path, parent_path: Optional[str] = None, level: int = 0): + """递归处理路径""" + if path.is_file() and path.suffix == '.docx': + file_name = path.stem + doc_config = identify_document_type(file_name) + + nodes.append({ + 'type': 'file', + 'name': file_name, + 'path': str(path), + 'parent_path': parent_path, + 'level': level, + 'template_code': doc_config['template_code'] if doc_config else None, + 'doc_config': doc_config, + 'file_path': path + }) + elif path.is_dir(): + dir_name = path.name + nodes.append({ + 'type': 'directory', + 'name': dir_name, + 'path': str(path), + 'parent_path': parent_path, + 'level': level, + 'template_code': None, + 'doc_config': None, + 'file_path': None + }) + + for child in sorted(path.iterdir()): + if child.name != '__pycache__': + process_path(child, str(path), level + 1) + + if TEMPLATES_DIR.exists(): + for item in sorted(TEMPLATES_DIR.iterdir()): + if item.name != '__pycache__': + process_path(item, None, 0) + + # 按层级排序 + return sorted(nodes, key=lambda x: (x['level'], x['path'])) + + +def delete_old_data(conn, dry_run: bool = True): + """删除旧数据""" + cursor = conn.cursor() + + try: + print("\n" + "="*80) + print("删除旧数据") + print("="*80) + + # 1. 先删除关联表 f_polic_file_field + print("\n1. 删除 f_polic_file_field 关联记录...") + + if not dry_run: + # 先获取所有相关的 file_id + select_file_ids_sql = """ + SELECT id FROM f_polic_file_config + WHERE tenant_id = %s + """ + cursor.execute(select_file_ids_sql, (TENANT_ID,)) + file_ids = [row[0] for row in cursor.fetchall()] + + if file_ids: + # 使用占位符构建SQL + placeholders = ','.join(['%s'] * len(file_ids)) + delete_file_field_sql = f""" + DELETE FROM f_polic_file_field + WHERE tenant_id = %s AND file_id IN ({placeholders}) + """ + cursor.execute(delete_file_field_sql, [TENANT_ID] + file_ids) + deleted_count = cursor.rowcount + print(f" ✓ 删除了 {deleted_count} 条关联记录") + else: + print(" ✓ 没有需要删除的关联记录") + else: + # 模拟模式:只统计 + count_sql = """ + SELECT COUNT(*) FROM f_polic_file_field + WHERE tenant_id = %s AND file_id IN ( + SELECT id FROM f_polic_file_config WHERE tenant_id = %s + ) + """ + cursor.execute(count_sql, (TENANT_ID, TENANT_ID)) + count = cursor.fetchone()[0] + print(f" [模拟] 将删除 {count} 条关联记录") + + # 2. 删除 f_polic_file_config 记录 + print("\n2. 删除 f_polic_file_config 记录...") + delete_config_sql = """ + DELETE FROM f_polic_file_config + WHERE tenant_id = %s + """ + + if not dry_run: + cursor.execute(delete_config_sql, (TENANT_ID,)) + deleted_count = cursor.rowcount + print(f" ✓ 删除了 {deleted_count} 条配置记录") + conn.commit() + else: + count_sql = "SELECT COUNT(*) FROM f_polic_file_config WHERE tenant_id = %s" + cursor.execute(count_sql, (TENANT_ID,)) + count = cursor.fetchone()[0] + print(f" [模拟] 将删除 {count} 条配置记录") + + return True + + except Exception as e: + if not dry_run: + conn.rollback() + print(f" ✗ 删除失败: {e}") + raise + finally: + cursor.close() + + +def create_tree_structure(conn, nodes: List[Dict], upload_files: bool = True, dry_run: bool = True): + """创建树状结构""" + cursor = conn.cursor() + + try: + if not dry_run: + conn.autocommit(False) + + print("\n" + "="*80) + print("创建树状结构") + print("="*80) + + # 创建路径到ID的映射 + path_to_id = {} + created_count = 0 + updated_count = 0 + + # 按层级顺序处理 + for node in nodes: + node_path = node['path'] + node_name = node['name'] + parent_path = node['parent_path'] + level = node['level'] + + # 获取父节点ID + parent_id = path_to_id.get(parent_path) if parent_path else None + + if node['type'] == 'directory': + # 创建目录节点 + node_id = generate_id() + path_to_id[node_path] = node_id + + if not dry_run: + # 目录节点不包含 template_code 字段 + insert_sql = """ + INSERT INTO f_polic_file_config + (id, tenant_id, parent_id, name, input_data, file_path, + created_time, created_by, updated_time, updated_by, state) + VALUES (%s, %s, %s, %s, %s, %s, NOW(), %s, NOW(), %s, %s) + """ + cursor.execute(insert_sql, ( + node_id, + TENANT_ID, + parent_id, + node_name, + None, + None, + CREATED_BY, + UPDATED_BY, + 1 + )) + + indent = " " * level + parent_info = f" [父: {path_to_id.get(parent_path, 'None')}]" if parent_path else "" + print(f"{indent}✓ {'[模拟]' if dry_run else ''}创建目录: {node_name} (ID: {node_id}){parent_info}") + created_count += 1 + + else: + # 创建文件节点 + node_id = generate_id() + path_to_id[node_path] = node_id + + doc_config = node.get('doc_config') + template_code = node.get('template_code') + file_path_obj = node.get('file_path') + + # 上传文件到MinIO(如果需要) + minio_path = None + if upload_files and file_path_obj and file_path_obj.exists(): + try: + if not dry_run: + minio_path = upload_to_minio(file_path_obj) + else: + minio_path = f"/{TENANT_ID}/TEMPLATE/2025/12/{file_path_obj.name}" + print(f" {'[模拟]' if dry_run else ''}上传文件: {file_path_obj.name} → {minio_path}") + except Exception as e: + print(f" ⚠ 上传文件失败: {e}") + # 继续执行,使用None作为路径 + + # 构建 input_data + input_data = None + if doc_config: + input_data = json.dumps({ + 'template_code': doc_config['template_code'], + 'business_type': doc_config['business_type'] + }, ensure_ascii=False) + + if not dry_run: + # 如果 template_code 为 None,使用空字符串 + template_code_value = template_code if template_code else '' + + insert_sql = """ + INSERT INTO f_polic_file_config + (id, tenant_id, parent_id, name, input_data, file_path, template_code, + created_time, created_by, updated_time, updated_by, state) + VALUES (%s, %s, %s, %s, %s, %s, %s, NOW(), %s, NOW(), %s, %s) + """ + cursor.execute(insert_sql, ( + node_id, + TENANT_ID, + parent_id, + node_name, + input_data, + minio_path, + template_code_value, + CREATED_BY, + UPDATED_BY, + 1 + )) + + indent = " " * level + parent_info = f" [父: {path_to_id.get(parent_path, 'None')}]" if parent_path else "" + template_info = f" [code: {template_code}]" if template_code else "" + print(f"{indent}✓ {'[模拟]' if dry_run else ''}创建文件: {node_name} (ID: {node_id}){parent_info}{template_info}") + created_count += 1 + + if not dry_run: + conn.commit() + print(f"\n✓ 创建完成!共创建 {created_count} 个节点") + else: + print(f"\n[模拟模式] 将创建 {created_count} 个节点") + + return path_to_id + + except Exception as e: + if not dry_run: + conn.rollback() + print(f"\n✗ 创建失败: {e}") + import traceback + traceback.print_exc() + raise + finally: + cursor.close() + + +def main(): + """主函数""" + print("="*80) + print("初始化模板树状结构(从目录结构完全重建)") + print("="*80) + print("\n⚠️ 警告:此操作将删除当前租户的所有模板数据!") + print(" 包括:") + print(" - f_polic_file_config 表中的所有记录") + print(" - f_polic_file_field 表中的相关关联记录") + print(" 然后根据 template_finish 目录结构完全重建") + + # 确认 + print("\n" + "="*80) + confirm1 = input("\n确认继续?(yes/no,默认no): ").strip().lower() + if confirm1 != 'yes': + print("已取消") + return + + # 连接数据库 + try: + conn = pymysql.connect(**DB_CONFIG) + print("✓ 数据库连接成功") + except Exception as e: + print(f"✗ 数据库连接失败: {e}") + return + + try: + # 扫描目录结构 + print("\n扫描目录结构...") + nodes = scan_directory_structure(TEMPLATES_DIR) + print(f" 找到 {len(nodes)} 个节点") + print(f" 其中目录: {len([n for n in nodes if n['type'] == 'directory'])} 个") + print(f" 其中文件: {len([n for n in nodes if n['type'] == 'file'])} 个") + + # 显示预览 + print("\n目录结构预览:") + for node in nodes[:10]: # 只显示前10个 + indent = " " * node['level'] + type_icon = "📁" if node['type'] == 'directory' else "📄" + print(f"{indent}{type_icon} {node['name']}") + if len(nodes) > 10: + print(f" ... 还有 {len(nodes) - 10} 个节点") + + # 询问是否上传文件 + print("\n" + "="*80) + upload_files = input("\n是否上传文件到MinIO?(yes/no,默认yes): ").strip().lower() + upload_files = upload_files != 'no' + + # 先执行模拟删除 + print("\n执行模拟删除...") + delete_old_data(conn, dry_run=True) + + # 再执行模拟创建 + print("\n执行模拟创建...") + create_tree_structure(conn, nodes, upload_files=upload_files, dry_run=True) + + # 最终确认 + print("\n" + "="*80) + confirm2 = input("\n确认执行实际更新?(yes/no,默认no): ").strip().lower() + if confirm2 != 'yes': + print("已取消") + return + + # 执行实际删除 + print("\n执行实际删除...") + delete_old_data(conn, dry_run=False) + + # 执行实际创建 + print("\n执行实际创建...") + create_tree_structure(conn, nodes, upload_files=upload_files, dry_run=False) + + print("\n" + "="*80) + print("初始化完成!") + print("="*80) + + except Exception as e: + print(f"\n✗ 初始化失败: {e}") + import traceback + traceback.print_exc() + finally: + conn.close() + print("\n数据库连接已关闭") + + +if __name__ == '__main__': + main() + diff --git a/restore_database.py b/restore_database.py new file mode 100644 index 0000000..e895782 --- /dev/null +++ b/restore_database.py @@ -0,0 +1,340 @@ +""" +数据库恢复脚本 +从SQL备份文件恢复数据库 +""" +import os +import sys +import subprocess +import pymysql +from pathlib import Path +from dotenv import load_dotenv +import gzip + +# 加载环境变量 +load_dotenv() + + +class DatabaseRestore: + """数据库恢复类""" + + def __init__(self): + """初始化数据库配置""" + self.db_config = { + 'host': os.getenv('DB_HOST', '152.136.177.240'), + 'port': int(os.getenv('DB_PORT', 5012)), + 'user': os.getenv('DB_USER', 'finyx'), + 'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'), + 'database': os.getenv('DB_NAME', 'finyx'), + 'charset': 'utf8mb4' + } + + def restore_with_mysql(self, backup_file, drop_database=False): + """ + 使用mysql命令恢复数据库(推荐方式) + + Args: + backup_file: 备份文件路径 + drop_database: 是否先删除数据库(危险操作) + + Returns: + 是否成功 + """ + backup_file = Path(backup_file) + + if not backup_file.exists(): + raise FileNotFoundError(f"备份文件不存在: {backup_file}") + + # 如果是压缩文件,先解压 + sql_file = backup_file + temp_file = None + if backup_file.suffix == '.gz': + print(f"检测到压缩文件,正在解压...") + temp_file = backup_file.with_suffix('') + with gzip.open(backup_file, 'rb') as f_in: + with open(temp_file, 'wb') as f_out: + f_out.write(f_in.read()) + sql_file = temp_file + print(f"解压完成: {sql_file}") + + try: + print(f"开始恢复数据库 {self.db_config['database']}...") + print(f"备份文件: {backup_file}") + + # 如果指定删除数据库 + if drop_database: + print("警告: 将删除现有数据库!") + confirm = input("确认继续? (yes/no): ") + if confirm.lower() != 'yes': + print("已取消恢复操作") + return False + + # 删除数据库 + self._drop_database() + + # 构建mysql命令 + cmd = [ + 'mysql', + f"--host={self.db_config['host']}", + f"--port={self.db_config['port']}", + f"--user={self.db_config['user']}", + f"--password={self.db_config['password']}", + '--default-character-set=utf8mb4', + self.db_config['database'] + ] + + # 执行恢复命令 + with open(sql_file, 'r', encoding='utf-8') as f: + result = subprocess.run( + cmd, + stdin=f, + stderr=subprocess.PIPE, + text=True + ) + + if result.returncode != 0: + error_msg = result.stderr.decode('utf-8') if result.stderr else '未知错误' + raise Exception(f"mysql执行失败: {error_msg}") + + print("恢复完成!") + return True + + except FileNotFoundError: + print("错误: 未找到mysql命令,请确保MySQL客户端已安装并在PATH中") + print("尝试使用Python方式恢复...") + return self.restore_with_python(backup_file, drop_database) + except Exception as e: + print(f"恢复失败: {str(e)}") + raise + finally: + # 清理临时解压文件 + if temp_file and temp_file.exists(): + temp_file.unlink() + + def restore_with_python(self, backup_file, drop_database=False): + """ + 使用Python直接连接数据库恢复(备用方式) + + Args: + backup_file: 备份文件路径 + drop_database: 是否先删除数据库(危险操作) + + Returns: + 是否成功 + """ + backup_file = Path(backup_file) + + if not backup_file.exists(): + raise FileNotFoundError(f"备份文件不存在: {backup_file}") + + # 如果是压缩文件,先解压 + sql_file = backup_file + temp_file = None + if backup_file.suffix == '.gz': + print(f"检测到压缩文件,正在解压...") + temp_file = backup_file.with_suffix('') + with gzip.open(backup_file, 'rb') as f_in: + with open(temp_file, 'wb') as f_out: + f_out.write(f_in.read()) + sql_file = temp_file + print(f"解压完成: {sql_file}") + + try: + print(f"开始使用Python方式恢复数据库 {self.db_config['database']}...") + print(f"备份文件: {backup_file}") + + # 如果指定删除数据库 + if drop_database: + print("警告: 将删除现有数据库!") + confirm = input("确认继续? (yes/no): ") + if confirm.lower() != 'yes': + print("已取消恢复操作") + return False + + # 删除数据库 + self._drop_database() + + # 连接数据库 + connection = pymysql.connect(**self.db_config) + cursor = connection.cursor() + + # 读取SQL文件 + print("读取SQL文件...") + with open(sql_file, 'r', encoding='utf-8') as f: + sql_content = f.read() + + # 分割SQL语句(按分号分割,但要注意字符串中的分号) + print("执行SQL语句...") + statements = self._split_sql_statements(sql_content) + + total = len(statements) + print(f"共 {total} 条SQL语句") + + # 执行每条SQL语句 + for i, statement in enumerate(statements, 1): + statement = statement.strip() + if not statement or statement.startswith('--'): + continue + + try: + cursor.execute(statement) + if i % 100 == 0: + print(f"进度: {i}/{total} ({i*100//total}%)") + except Exception as e: + # 某些错误可以忽略(如表已存在等) + error_msg = str(e).lower() + if 'already exists' in error_msg or 'duplicate' in error_msg: + continue + print(f"警告: 执行SQL语句时出错 (第{i}条): {str(e)}") + print(f"SQL: {statement[:100]}...") + + # 提交事务 + connection.commit() + + cursor.close() + connection.close() + + print("恢复完成!") + return True + + except Exception as e: + print(f"恢复失败: {str(e)}") + raise + finally: + # 清理临时解压文件 + if temp_file and temp_file.exists(): + temp_file.unlink() + + def _split_sql_statements(self, sql_content): + """ + 分割SQL语句(处理字符串中的分号) + + Args: + sql_content: SQL内容 + + Returns: + SQL语句列表 + """ + statements = [] + current_statement = [] + in_string = False + string_char = None + i = 0 + + while i < len(sql_content): + char = sql_content[i] + + # 检测字符串开始/结束 + if char in ("'", '"', '`') and (i == 0 or sql_content[i-1] != '\\'): + if not in_string: + in_string = True + string_char = char + elif char == string_char: + in_string = False + string_char = None + + current_statement.append(char) + + # 如果不在字符串中且遇到分号,分割语句 + if not in_string and char == ';': + statement = ''.join(current_statement).strip() + if statement: + statements.append(statement) + current_statement = [] + + i += 1 + + # 添加最后一条语句 + if current_statement: + statement = ''.join(current_statement).strip() + if statement: + statements.append(statement) + + return statements + + def _drop_database(self): + """删除数据库(危险操作)""" + try: + # 连接到MySQL服务器(不指定数据库) + config = self.db_config.copy() + config.pop('database') + connection = pymysql.connect(**config) + cursor = connection.cursor() + + cursor.execute(f"DROP DATABASE IF EXISTS `{self.db_config['database']}`") + cursor.execute(f"CREATE DATABASE `{self.db_config['database']}` CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci") + + connection.commit() + cursor.close() + connection.close() + + print(f"数据库 {self.db_config['database']} 已删除并重新创建") + + except Exception as e: + raise Exception(f"删除数据库失败: {str(e)}") + + def test_connection(self): + """测试数据库连接""" + try: + connection = pymysql.connect(**self.db_config) + cursor = connection.cursor() + cursor.execute("SELECT VERSION()") + version = cursor.fetchone()[0] + cursor.close() + connection.close() + + print(f"数据库连接成功!MySQL版本: {version}") + return True + except Exception as e: + print(f"数据库连接失败: {str(e)}") + return False + + +def main(): + """主函数""" + import argparse + + parser = argparse.ArgumentParser(description='数据库恢复工具') + parser.add_argument('backup_file', help='备份文件路径') + parser.add_argument('--method', choices=['mysql', 'python', 'auto'], + default='auto', help='恢复方法 (默认: auto)') + parser.add_argument('--drop-db', action='store_true', + help='恢复前删除现有数据库(危险操作)') + parser.add_argument('--test', action='store_true', + help='仅测试数据库连接') + + args = parser.parse_args() + + restore = DatabaseRestore() + + # 测试连接 + if args.test: + restore.test_connection() + return + + # 执行恢复 + try: + if args.method == 'mysql': + success = restore.restore_with_mysql(args.backup_file, args.drop_db) + elif args.method == 'python': + success = restore.restore_with_python(args.backup_file, args.drop_db) + else: # auto + try: + success = restore.restore_with_mysql(args.backup_file, args.drop_db) + except: + print("\nmysql方式失败,切换到Python方式...") + success = restore.restore_with_python(args.backup_file, args.drop_db) + + if success: + print("\n恢复成功!") + else: + print("\n恢复失败!") + sys.exit(1) + + except Exception as e: + print(f"\n恢复失败: {str(e)}") + sys.exit(1) + + +if __name__ == '__main__': + main() + diff --git a/rollback_incorrect_updates.py b/rollback_incorrect_updates.py new file mode 100644 index 0000000..41c99c2 --- /dev/null +++ b/rollback_incorrect_updates.py @@ -0,0 +1,122 @@ +""" +回滚错误的更新,恢复被错误修改的字段 +""" +import os +import pymysql + +# 数据库连接配置 +DB_CONFIG = { + 'host': os.getenv('DB_HOST', '152.136.177.240'), + 'port': int(os.getenv('DB_PORT', 5012)), + 'user': os.getenv('DB_USER', 'finyx'), + 'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'), + 'database': os.getenv('DB_NAME', 'finyx'), + 'charset': 'utf8mb4' +} + +TENANT_ID = 615873064429507639 +UPDATED_BY = 655162080928945152 + +# 需要恢复的字段映射(字段ID -> 正确的field_code) +ROLLBACK_MAPPING = { + # 这些字段被错误地从英文改成了中文,需要恢复 + 1764656917410273: 'target_issue_description', + 1764656918032031: 'filler_name', + 1764656917418979: 'department_opinion', + 1764836032906561: 'appointment_location', + 1764836032488198: 'appointment_time', + 1764836033052889: 'approval_time', + 1764836032655678: 'handler_name', + 1764836033342084: 'handling_department', + 1764836033240593: 'investigation_unit_name', + 1764836033018470: 'investigation_location', + 1764836033274278: 'investigation_team_code', + 1764836033094781: 'investigation_team_member_names', + 1764836033176386: 'investigation_team_leader_name', + 1764836033500799: 'commission_name', + 1764656917384058: 'clue_info', + 1764656917861268: 'clue_source', + 1764836032538308: 'target_address', + 1764836033565636: 'target_health_status', + 1764836033332970: 'target_other_situation', + 1764656917299164: 'target_date_of_birth', + 1764836033269146: 'target_date_of_birth_full', + 1765151880445876: 'target_organization', + 1764656917367205: 'target_organization_and_position', + 1764836033405778: 'target_family_situation', + 1764836033162748: 'target_work_basic_info', + 1764656917996367: 'target_basic_info_clue', + 1764836032997850: 'target_age', + 1764656917561689: 'target_gender', + 1764836032855869: 'target_personality', + 1764836032893680: 'target_registered_address', + 1764836033603501: 'target_tolerance', + 1764656917185956: 'target_political_status', + 1764836033786057: 'target_attitude', + 1764836033587951: 'target_previous_investigation', + 1764836032951705: 'target_ethnicity', + 1764836033280024: 'target_other_issues_possibility', + 1764836033458872: 'target_issue_severity', + 1764836032929811: 'target_social_relations', + 1764836033618877: 'target_negative_events', + 1764836032926994: 'target_place_of_origin', + 1765151880304552: 'target_position', + 1764656917802442: 'target_professional_rank', + 1764836032817243: 'target_contact', + 1764836032902356: 'target_id_number', + 1764836032913357: 'target_id_number', + 1764656917073644: 'target_name', + 1764836033571266: 'target_problem_description', + 1764836032827460: 'report_card_request_time', + 1764836032694865: 'notification_location', + 1764836032909732: 'notification_time', + 1764836033451248: 'risk_level', +} + +def rollback(): + """回滚错误的更新""" + conn = pymysql.connect(**DB_CONFIG) + cursor = conn.cursor(pymysql.cursors.DictCursor) + + print("="*80) + print("回滚错误的字段更新") + print("="*80) + + print(f"\n需要恢复 {len(ROLLBACK_MAPPING)} 个字段\n") + + # 先查询当前状态 + for field_id, correct_code in ROLLBACK_MAPPING.items(): + cursor.execute(""" + SELECT id, name, filed_code + FROM f_polic_field + WHERE id = %s AND tenant_id = %s + """, (field_id, TENANT_ID)) + + field = cursor.fetchone() + if field: + print(f" ID: {field_id}") + print(f" 名称: {field['name']}") + print(f" 当前field_code: {field['filed_code']}") + print(f" 恢复为: {correct_code}") + print() + + # 执行回滚 + print("开始执行回滚...\n") + for field_id, correct_code in ROLLBACK_MAPPING.items(): + cursor.execute(""" + UPDATE f_polic_field + SET filed_code = %s, updated_time = NOW(), updated_by = %s + WHERE id = %s AND tenant_id = %s + """, (correct_code, UPDATED_BY, field_id, TENANT_ID)) + print(f" ✓ 恢复字段 ID {field_id}: {correct_code}") + + conn.commit() + print("\n✓ 回滚完成") + + cursor.close() + conn.close() + + +if __name__ == '__main__': + rollback() + diff --git a/services/ai_service.py b/services/ai_service.py index e3981bc..20a91d7 100644 --- a/services/ai_service.py +++ b/services/ai_service.py @@ -362,8 +362,8 @@ class AIService: for key in ['target_name', 'target_gender', 'target_age', 'target_date_of_birth']: if key in normalized_data: print(f"[AI服务] 日期格式化后 {key} = '{normalized_data[key]}'") - # 后处理:从已有信息推断缺失字段 - normalized_data = self._post_process_inferred_fields(normalized_data, output_fields) + # 后处理:从已有信息推断缺失字段(传入原始prompt以便从输入文本中提取) + normalized_data = self._post_process_inferred_fields(normalized_data, output_fields, prompt) # 打印后处理后的关键字段 for key in ['target_name', 'target_gender', 'target_age', 'target_date_of_birth', 'target_organization', 'target_position']: if key in normalized_data: @@ -429,7 +429,7 @@ class AIService: print(f"[AI服务] 使用jsonrepair最后修复成功,提取到 {len(extracted_data)} 个字段") normalized_data = self._normalize_field_names(extracted_data, output_fields) normalized_data = self._normalize_date_formats(normalized_data, output_fields) - normalized_data = self._post_process_inferred_fields(normalized_data, output_fields) + normalized_data = self._post_process_inferred_fields(normalized_data, output_fields, prompt) # 记录对话 if self.ai_logger: self.ai_logger.log_conversation( @@ -1233,13 +1233,14 @@ class AIService: # 如果无法解析,返回原值 return date_str - def _post_process_inferred_fields(self, data: Dict, output_fields: List[Dict]) -> Dict: + def _post_process_inferred_fields(self, data: Dict, output_fields: List[Dict], prompt: str = None) -> Dict: """ - 后处理:从已有信息推断缺失字段 + 后处理:从已有信息推断缺失字段,如果字段缺失,尝试从原始输入文本中提取 Args: data: 提取的数据字典 output_fields: 输出字段列表 + prompt: 原始提示词(包含输入文本),用于从原始输入中提取缺失字段 Returns: 后处理后的数据字典 @@ -1249,11 +1250,25 @@ class AIService: # 1. 从出生年月计算年龄 if 'target_age' in field_code_map and (not data.get('target_age') or data.get('target_age') == ''): + # 首先尝试从已有数据中计算 if 'target_date_of_birth' in data and data.get('target_date_of_birth'): age = self._calculate_age_from_birth_date(data['target_date_of_birth']) if age: data['target_age'] = str(age) print(f"[AI服务] 后处理:从出生年月 '{data['target_date_of_birth']}' 计算年龄: {age}岁") + + # 如果还没有,尝试从原始输入文本中直接提取年龄 + if (not data.get('target_age') or data.get('target_age') == '') and prompt: + input_text_match = re.search(r'输入文本[::]\s*\n(.*?)(?:\n\n需要提取的字段|$)', prompt, re.DOTALL) + if input_text_match: + input_text = input_text_match.group(1) + # 匹配年龄模式:年龄44岁、44岁、年龄44等 + age_match = re.search(r'年龄\s*(\d+)\s*岁|(\d+)\s*岁|年龄\s*(\d+)', input_text) + if age_match: + age = age_match.group(1) or age_match.group(2) or age_match.group(3) + if age: + data['target_age'] = str(age) + print(f"[AI服务] 后处理:从原始输入文本中提取年龄: {age}岁") # 2. 从单位及职务中拆分单位和职务 if 'target_organization_and_position' in data and data.get('target_organization_and_position'): @@ -1299,6 +1314,25 @@ class AIService: data['target_gender'] = '女' print(f"[AI服务] 后处理:从字段 '{key}' 中推断性别: 女") break + + # 如果仍然没有,尝试从原始输入文本(prompt)中提取 + if (not data.get('target_gender') or data.get('target_gender') == '') and prompt: + # 从prompt中提取输入文本部分(通常在"输入文本:"之后) + input_text_match = re.search(r'输入文本[::]\s*\n(.*?)(?:\n\n需要提取的字段|$)', prompt, re.DOTALL) + if input_text_match: + input_text = input_text_match.group(1) + # 匹配性别关键词:男性、女性、男、女等 + if re.search(r'\b男性\b|\b男\b', input_text) and not re.search(r'\b女性\b|\b女\b', input_text): + data['target_gender'] = '男' + print(f"[AI服务] 后处理:从原始输入文本中提取性别: 男") + elif re.search(r'\b女性\b|\b女\b', input_text) and not re.search(r'\b男性\b|\b男\b', input_text): + data['target_gender'] = '女' + print(f"[AI服务] 后处理:从原始输入文本中提取性别: 女") + elif re.search(r'[,,]\s*([男女])\s*[,,]', input_text): + gender_match = re.search(r'[,,]\s*([男女])\s*[,,]', input_text) + if gender_match: + data['target_gender'] = gender_match.group(1) + print(f"[AI服务] 后处理:从原始输入文本中提取性别: {gender_match.group(1)}") # 4. 从工作基本情况中提取职级(如果target_professional_rank为空) if 'target_professional_rank' in field_code_map and (not data.get('target_professional_rank') or data.get('target_professional_rank') == ''): diff --git a/sync_template_fields_from_excel.py b/sync_template_fields_from_excel.py new file mode 100644 index 0000000..c2e9b9f --- /dev/null +++ b/sync_template_fields_from_excel.py @@ -0,0 +1,552 @@ +""" +根据Excel数据设计文档同步更新模板的input_data、template_code和字段关联关系 +""" +import os +import json +import pymysql +import pandas as pd +from pathlib import Path +from typing import Dict, List, Optional, Set +from datetime import datetime +from collections import defaultdict + +# 数据库连接配置 +DB_CONFIG = { + 'host': os.getenv('DB_HOST', '152.136.177.240'), + 'port': int(os.getenv('DB_PORT', 5012)), + 'user': os.getenv('DB_USER', 'finyx'), + 'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'), + 'database': os.getenv('DB_NAME', 'finyx'), + 'charset': 'utf8mb4' +} + +TENANT_ID = 615873064429507639 +CREATED_BY = 655162080928945152 +UPDATED_BY = 655162080928945152 + +# Excel文件路径 +EXCEL_FILE = '技术文档/智慧监督项目模板数据结构设计表-20251125-一凡标注.xlsx' + +# 模板名称映射(Excel中的名称 -> 数据库中的名称) +TEMPLATE_NAME_MAPPING = { + '请示报告卡': '1.请示报告卡(XXX)', + '初步核实审批表': '2.初步核实审批表(XXX)', + '初核方案': '3.附件初核方案(XXX)', + '谈话通知书': '谈话通知书', + '谈话通知书第一联': '谈话通知书第一联', + '谈话通知书第二联': '谈话通知书第二联', + '谈话通知书第三联': '谈话通知书第三联', + '走读式谈话审批': '走读式谈话审批', + '走读式谈话流程': '走读式谈话流程', + '请示报告卡(初核报告结论)': '8-1请示报告卡(初核报告结论) ', + 'XXX初核情况报告': '8.XXX初核情况报告', +} + +# 模板编码映射(Excel中的名称 -> template_code) +TEMPLATE_CODE_MAPPING = { + '请示报告卡': 'REPORT_CARD', + '初步核实审批表': 'PRELIMINARY_VERIFICATION_APPROVAL', + '初核方案': 'INVESTIGATION_PLAN', + '谈话通知书第一联': 'NOTIFICATION_LETTER_1', + '谈话通知书第二联': 'NOTIFICATION_LETTER_2', + '谈话通知书第三联': 'NOTIFICATION_LETTER_3', + '请示报告卡(初核报告结论)': 'REPORT_CARD_CONCLUSION', + 'XXX初核情况报告': 'INVESTIGATION_REPORT', +} + +# 字段名称到字段编码的映射 +FIELD_NAME_TO_CODE_MAP = { + # 输入字段 + '线索信息': 'clue_info', + '被核查人员工作基本情况线索': 'target_basic_info_clue', + + # 输出字段 - 基本信息 + '被核查人姓名': 'target_name', + '被核查人员单位及职务': 'target_organization_and_position', + '被核查人员性别': 'target_gender', + '被核查人员出生年月': 'target_date_of_birth', + '被核查人员出生年月日': 'target_date_of_birth_full', + '被核查人员政治面貌': 'target_political_status', + '被核查人员职级': 'target_professional_rank', + '被核查人员单位': 'target_organization', + '被核查人员职务': 'target_position', + + # 输出字段 - 其他信息 + '线索来源': 'clue_source', + '主要问题线索': 'target_issue_description', + '初步核实审批表承办部门意见': 'department_opinion', + '初步核实审批表填表人': 'filler_name', + '请示报告卡请示时间': 'report_card_request_time', + '被核查人员身份证件及号码': 'target_id_number', + '被核查人员身份证号': 'target_id_number', + '应到时间': 'appointment_time', + '应到地点': 'appointment_location', + '批准时间': 'approval_time', + '承办部门': 'handling_department', + '承办人': 'handler_name', + '谈话通知时间': 'notification_time', + '谈话通知地点': 'notification_location', + '被核查人员住址': 'target_address', + '被核查人员户籍住址': 'target_registered_address', + '被核查人员联系方式': 'target_contact', + '被核查人员籍贯': 'target_place_of_origin', + '被核查人员民族': 'target_ethnicity', + '被核查人员工作基本情况': 'target_work_basic_info', + '核查单位名称': 'investigation_unit_name', + '核查组组长姓名': 'investigation_team_leader_name', + '核查组成员姓名': 'investigation_team_member_names', + '核查地点': 'investigation_location', +} + + +def generate_id(): + """生成ID""" + import time + import random + timestamp = int(time.time() * 1000) + random_part = random.randint(100000, 999999) + return timestamp * 1000 + random_part + + +def normalize_template_name(name: str) -> str: + """标准化模板名称,用于匹配""" + import re + # 去掉开头的编号和括号内容 + name = re.sub(r'^\d+[\.\-]\s*', '', name) + name = re.sub(r'[((].*?[))]', '', name) + name = name.strip() + return name + + +def parse_excel_data() -> Dict: + """解析Excel文件,提取模板和字段的关联关系""" + print("="*80) + print("解析Excel数据设计文档") + print("="*80) + + if not Path(EXCEL_FILE).exists(): + print(f"✗ Excel文件不存在: {EXCEL_FILE}") + return None + + try: + df = pd.read_excel(EXCEL_FILE) + print(f"✓ 成功读取Excel文件,共 {len(df)} 行数据\n") + + templates = defaultdict(lambda: { + 'template_name': '', + 'template_code': '', + 'input_fields': [], + 'output_fields': [] + }) + + current_template = None + current_input_field = None + + for idx, row in df.iterrows(): + level1 = row.get('一级分类') + level2 = row.get('二级分类') + level3 = row.get('三级分类') + input_field = row.get('输入数据字段') + output_field = row.get('输出数据字段') + + # 处理二级分类(模板名称) + if pd.notna(level2) and level2: + current_template = str(level2).strip() + # 获取模板编码 + template_code = TEMPLATE_CODE_MAPPING.get(current_template, '') + if not template_code: + # 如果没有映射,尝试生成 + template_code = current_template.upper().replace(' ', '_') + + templates[current_template]['template_name'] = current_template + templates[current_template]['template_code'] = template_code + current_input_field = None # 重置输入字段 + print(f" 模板: {current_template} (code: {template_code})") + + # 处理三级分类(子模板,如谈话通知书第一联) + if pd.notna(level3) and level3: + current_template = str(level3).strip() + template_code = TEMPLATE_CODE_MAPPING.get(current_template, '') + if not template_code: + template_code = current_template.upper().replace(' ', '_') + + templates[current_template]['template_name'] = current_template + templates[current_template]['template_code'] = template_code + current_input_field = None + print(f" 子模板: {current_template} (code: {template_code})") + + # 处理输入字段 + if pd.notna(input_field) and input_field: + input_field_name = str(input_field).strip() + if input_field_name != current_input_field: + current_input_field = input_field_name + field_code = FIELD_NAME_TO_CODE_MAP.get(input_field_name, input_field_name.lower().replace(' ', '_')) + if current_template: + templates[current_template]['input_fields'].append({ + 'name': input_field_name, + 'field_code': field_code + }) + + # 处理输出字段 + if pd.notna(output_field) and output_field: + output_field_name = str(output_field).strip() + field_code = FIELD_NAME_TO_CODE_MAP.get(output_field_name, output_field_name.lower().replace(' ', '_')) + if current_template: + templates[current_template]['output_fields'].append({ + 'name': output_field_name, + 'field_code': field_code + }) + + # 去重 + for template_name, template_info in templates.items(): + # 输入字段去重 + seen_input = set() + unique_input = [] + for field in template_info['input_fields']: + key = field['field_code'] + if key not in seen_input: + seen_input.add(key) + unique_input.append(field) + template_info['input_fields'] = unique_input + + # 输出字段去重 + seen_output = set() + unique_output = [] + for field in template_info['output_fields']: + key = field['field_code'] + if key not in seen_output: + seen_output.add(key) + unique_output.append(field) + template_info['output_fields'] = unique_output + + print(f"\n✓ 解析完成,共 {len(templates)} 个模板") + for template_name, template_info in templates.items(): + print(f" - {template_name}: {len(template_info['input_fields'])} 个输入字段, {len(template_info['output_fields'])} 个输出字段") + + return dict(templates) + + except Exception as e: + print(f"✗ 解析Excel文件失败: {e}") + import traceback + traceback.print_exc() + return None + + +def get_database_templates(conn) -> Dict: + """获取数据库中的模板配置""" + cursor = conn.cursor(pymysql.cursors.DictCursor) + + sql = """ + SELECT id, name, template_code, input_data, parent_id + FROM f_polic_file_config + WHERE tenant_id = %s + """ + cursor.execute(sql, (TENANT_ID,)) + configs = cursor.fetchall() + + result = {} + for config in configs: + name = config['name'] + result[name] = config + # 也添加标准化名称的映射 + normalized = normalize_template_name(name) + if normalized not in result: + result[normalized] = config + + cursor.close() + return result + + +def get_database_fields(conn) -> Dict: + """获取数据库中的字段定义""" + cursor = conn.cursor(pymysql.cursors.DictCursor) + + sql = """ + SELECT id, name, filed_code, field_type + FROM f_polic_field + WHERE tenant_id = %s + """ + cursor.execute(sql, (TENANT_ID,)) + fields = cursor.fetchall() + + result = { + 'by_code': {}, + 'by_name': {} + } + + for field in fields: + field_code = field['filed_code'] + field_name = field['name'] + result['by_code'][field_code] = field + result['by_name'][field_name] = field + + cursor.close() + return result + + +def find_matching_template(excel_template_name: str, db_templates: Dict) -> Optional[Dict]: + """查找匹配的数据库模板""" + # 1. 精确匹配 + if excel_template_name in db_templates: + return db_templates[excel_template_name] + + # 2. 通过映射表匹配 + mapped_name = TEMPLATE_NAME_MAPPING.get(excel_template_name) + if mapped_name and mapped_name in db_templates: + return db_templates[mapped_name] + + # 3. 标准化名称匹配 + normalized = normalize_template_name(excel_template_name) + if normalized in db_templates: + return db_templates[normalized] + + # 4. 模糊匹配 + for db_name, db_config in db_templates.items(): + if normalized in normalize_template_name(db_name) or normalize_template_name(db_name) in normalized: + return db_config + + return None + + +def update_template_config(conn, template_id: int, template_code: str, input_fields: List[Dict], dry_run: bool = True): + """更新模板配置的input_data和template_code""" + cursor = conn.cursor() + + try: + # 构建input_data + input_data = { + 'template_code': template_code, + 'business_type': 'INVESTIGATION', + 'input_fields': [f['field_code'] for f in input_fields] + } + input_data_json = json.dumps(input_data, ensure_ascii=False) + + if not dry_run: + update_sql = """ + UPDATE f_polic_file_config + SET template_code = %s, input_data = %s, updated_time = NOW(), updated_by = %s + WHERE id = %s AND tenant_id = %s + """ + cursor.execute(update_sql, (template_code, input_data_json, UPDATED_BY, template_id, TENANT_ID)) + conn.commit() + print(f" ✓ 更新模板配置") + else: + print(f" [模拟] 将更新模板配置: template_code={template_code}") + + finally: + cursor.close() + + +def update_template_field_relations(conn, template_id: int, input_fields: List[Dict], output_fields: List[Dict], + db_fields: Dict, dry_run: bool = True): + """更新模板和字段的关联关系""" + cursor = conn.cursor() + + try: + # 先删除旧的关联关系 + if not dry_run: + delete_sql = """ + DELETE FROM f_polic_file_field + WHERE tenant_id = %s AND file_id = %s + """ + cursor.execute(delete_sql, (TENANT_ID, template_id)) + + # 创建新的关联关系 + relations_created = 0 + + # 关联输入字段(field_type=1) + for field_info in input_fields: + field_code = field_info['field_code'] + field = db_fields['by_code'].get(field_code) + + if not field: + print(f" ⚠ 输入字段不存在: {field_code}") + continue + + if field['field_type'] != 1: + print(f" ⚠ 字段类型不匹配: {field_code} (期望输入字段,实际为输出字段)") + continue + + if not dry_run: + # 检查是否已存在 + check_sql = """ + SELECT id FROM f_polic_file_field + WHERE tenant_id = %s AND file_id = %s AND filed_id = %s + """ + cursor.execute(check_sql, (TENANT_ID, template_id, field['id'])) + existing = cursor.fetchone() + + if not existing: + relation_id = generate_id() + insert_sql = """ + INSERT INTO f_polic_file_field + (id, tenant_id, file_id, filed_id, created_time, created_by, updated_time, updated_by, state) + VALUES (%s, %s, %s, %s, NOW(), %s, NOW(), %s, %s) + """ + cursor.execute(insert_sql, ( + relation_id, TENANT_ID, template_id, field['id'], + CREATED_BY, UPDATED_BY, 1 + )) + relations_created += 1 + else: + relations_created += 1 + + # 关联输出字段(field_type=2) + for field_info in output_fields: + field_code = field_info['field_code'] + field = db_fields['by_code'].get(field_code) + + if not field: + # 尝试通过名称匹配 + field_name = field_info['name'] + field = db_fields['by_name'].get(field_name) + + if not field: + print(f" ⚠ 输出字段不存在: {field_code} ({field_info['name']})") + continue + + if field['field_type'] != 2: + print(f" ⚠ 字段类型不匹配: {field_code} (期望输出字段,实际为输入字段)") + continue + + if not dry_run: + # 检查是否已存在 + check_sql = """ + SELECT id FROM f_polic_file_field + WHERE tenant_id = %s AND file_id = %s AND filed_id = %s + """ + cursor.execute(check_sql, (TENANT_ID, template_id, field['id'])) + existing = cursor.fetchone() + + if not existing: + relation_id = generate_id() + insert_sql = """ + INSERT INTO f_polic_file_field + (id, tenant_id, file_id, filed_id, created_time, created_by, updated_time, updated_by, state) + VALUES (%s, %s, %s, %s, NOW(), %s, NOW(), %s, %s) + """ + cursor.execute(insert_sql, ( + relation_id, TENANT_ID, template_id, field['id'], + CREATED_BY, UPDATED_BY, 1 + )) + relations_created += 1 + else: + relations_created += 1 + + if not dry_run: + conn.commit() + print(f" ✓ 创建 {relations_created} 个字段关联关系") + else: + print(f" [模拟] 将创建 {relations_created} 个字段关联关系") + + finally: + cursor.close() + + +def main(): + """主函数""" + print("="*80) + print("同步模板字段信息(根据Excel数据设计文档)") + print("="*80) + + # 解析Excel + excel_data = parse_excel_data() + if not excel_data: + return + + # 连接数据库 + try: + conn = pymysql.connect(**DB_CONFIG) + print("\n✓ 数据库连接成功") + except Exception as e: + print(f"\n✗ 数据库连接失败: {e}") + return + + try: + # 获取数据库中的模板和字段 + print("\n获取数据库中的模板和字段...") + db_templates = get_database_templates(conn) + db_fields = get_database_fields(conn) + print(f" 数据库中有 {len(db_templates)} 个模板") + print(f" 数据库中有 {len(db_fields['by_code'])} 个字段") + + # 匹配和更新 + print("\n" + "="*80) + print("匹配模板并更新配置") + print("="*80) + + matched_count = 0 + unmatched_templates = [] + + for excel_template_name, template_info in excel_data.items(): + print(f"\n处理模板: {excel_template_name}") + + # 查找匹配的数据库模板 + db_template = find_matching_template(excel_template_name, db_templates) + + if not db_template: + print(f" ✗ 未找到匹配的数据库模板") + unmatched_templates.append(excel_template_name) + continue + + print(f" ✓ 匹配到数据库模板: {db_template['name']} (ID: {db_template['id']})") + matched_count += 1 + + # 更新模板配置 + template_code = template_info['template_code'] + input_fields = template_info['input_fields'] + output_fields = template_info['output_fields'] + + print(f" 模板编码: {template_code}") + print(f" 输入字段: {len(input_fields)} 个") + print(f" 输出字段: {len(output_fields)} 个") + + # 先执行模拟更新 + print(" [模拟模式]") + update_template_config(conn, db_template['id'], template_code, input_fields, dry_run=True) + update_template_field_relations(conn, db_template['id'], input_fields, output_fields, db_fields, dry_run=True) + + # 显示统计 + print("\n" + "="*80) + print("统计信息") + print("="*80) + print(f"Excel中的模板数: {len(excel_data)}") + print(f"成功匹配: {matched_count} 个") + print(f"未匹配: {len(unmatched_templates)} 个") + + if unmatched_templates: + print("\n未匹配的模板:") + for template in unmatched_templates: + print(f" - {template}") + + # 询问是否执行实际更新 + print("\n" + "="*80) + response = input("\n是否执行实际更新?(yes/no,默认no): ").strip().lower() + + if response == 'yes': + print("\n执行实际更新...") + for excel_template_name, template_info in excel_data.items(): + db_template = find_matching_template(excel_template_name, db_templates) + if db_template: + print(f"\n更新: {db_template['name']}") + update_template_config(conn, db_template['id'], template_info['template_code'], + template_info['input_fields'], dry_run=False) + update_template_field_relations(conn, db_template['id'], + template_info['input_fields'], + template_info['output_fields'], + db_fields, dry_run=False) + + print("\n" + "="*80) + print("✓ 同步完成!") + print("="*80) + else: + print("\n已取消更新") + + finally: + conn.close() + print("\n数据库连接已关闭") + + +if __name__ == '__main__': + main() + diff --git a/update_template_tree.py b/update_template_tree.py new file mode 100644 index 0000000..d8f04ee --- /dev/null +++ b/update_template_tree.py @@ -0,0 +1,618 @@ +""" +更新模板树状结构 +根据 template_finish 目录结构更新数据库中的 parent_id 字段 +""" +import os +import json +import pymysql +from pathlib import Path +from typing import Dict, List, Optional, Tuple +from datetime import datetime + +# 数据库连接配置 +DB_CONFIG = { + 'host': os.getenv('DB_HOST', '152.136.177.240'), + 'port': int(os.getenv('DB_PORT', 5012)), + 'user': os.getenv('DB_USER', 'finyx'), + 'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'), + 'database': os.getenv('DB_NAME', 'finyx'), + 'charset': 'utf8mb4' +} + +TENANT_ID = 615873064429507639 +CREATED_BY = 655162080928945152 +UPDATED_BY = 655162080928945152 + +# 项目根目录 +PROJECT_ROOT = Path(__file__).parent +TEMPLATES_DIR = PROJECT_ROOT / "template_finish" + +# 从 init_all_templates.py 复制的文档类型映射 +DOCUMENT_TYPE_MAPPING = { + "1.请示报告卡(XXX)": { + "template_code": "REPORT_CARD", + "name": "1.请示报告卡(XXX)", + "business_type": "INVESTIGATION" + }, + "2.初步核实审批表(XXX)": { + "template_code": "PRELIMINARY_VERIFICATION_APPROVAL", + "name": "2.初步核实审批表(XXX)", + "business_type": "INVESTIGATION" + }, + "3.附件初核方案(XXX)": { + "template_code": "INVESTIGATION_PLAN", + "name": "3.附件初核方案(XXX)", + "business_type": "INVESTIGATION" + }, + "谈话通知书第一联": { + "template_code": "NOTIFICATION_LETTER_1", + "name": "谈话通知书第一联", + "business_type": "INVESTIGATION" + }, + "谈话通知书第二联": { + "template_code": "NOTIFICATION_LETTER_2", + "name": "谈话通知书第二联", + "business_type": "INVESTIGATION" + }, + "谈话通知书第三联": { + "template_code": "NOTIFICATION_LETTER_3", + "name": "谈话通知书第三联", + "business_type": "INVESTIGATION" + }, + "1.请示报告卡(初核谈话)": { + "template_code": "REPORT_CARD_INTERVIEW", + "name": "1.请示报告卡(初核谈话)", + "business_type": "INVESTIGATION" + }, + "2谈话审批表": { + "template_code": "INTERVIEW_APPROVAL_FORM", + "name": "2谈话审批表", + "business_type": "INVESTIGATION" + }, + "3.谈话前安全风险评估表": { + "template_code": "PRE_INTERVIEW_RISK_ASSESSMENT", + "name": "3.谈话前安全风险评估表", + "business_type": "INVESTIGATION" + }, + "4.谈话方案": { + "template_code": "INTERVIEW_PLAN", + "name": "4.谈话方案", + "business_type": "INVESTIGATION" + }, + "5.谈话后安全风险评估表": { + "template_code": "POST_INTERVIEW_RISK_ASSESSMENT", + "name": "5.谈话后安全风险评估表", + "business_type": "INVESTIGATION" + }, + "1.谈话笔录": { + "template_code": "INTERVIEW_RECORD", + "name": "1.谈话笔录", + "business_type": "INVESTIGATION" + }, + "2.谈话询问对象情况摸底调查30问": { + "template_code": "INVESTIGATION_30_QUESTIONS", + "name": "2.谈话询问对象情况摸底调查30问", + "business_type": "INVESTIGATION" + }, + "3.被谈话人权利义务告知书": { + "template_code": "RIGHTS_OBLIGATIONS_NOTICE", + "name": "3.被谈话人权利义务告知书", + "business_type": "INVESTIGATION" + }, + "4.点对点交接单": { + "template_code": "HANDOVER_FORM", + "name": "4.点对点交接单", + "business_type": "INVESTIGATION" + }, + "4.点对点交接单2": { + "template_code": "HANDOVER_FORM_2", + "name": "4.点对点交接单2", + "business_type": "INVESTIGATION" + }, + "5.陪送交接单(新)": { + "template_code": "ESCORT_HANDOVER_FORM", + "name": "5.陪送交接单(新)", + "business_type": "INVESTIGATION" + }, + "6.1保密承诺书(谈话对象使用-非中共党员用)": { + "template_code": "CONFIDENTIALITY_COMMITMENT_NON_PARTY", + "name": "6.1保密承诺书(谈话对象使用-非中共党员用)", + "business_type": "INVESTIGATION" + }, + "6.2保密承诺书(谈话对象使用-中共党员用)": { + "template_code": "CONFIDENTIALITY_COMMITMENT_PARTY", + "name": "6.2保密承诺书(谈话对象使用-中共党员用)", + "business_type": "INVESTIGATION" + }, + "7.办案人员-办案安全保密承诺书": { + "template_code": "INVESTIGATOR_CONFIDENTIALITY_COMMITMENT", + "name": "7.办案人员-办案安全保密承诺书", + "business_type": "INVESTIGATION" + }, + "8-1请示报告卡(初核报告结论) ": { + "template_code": "REPORT_CARD_CONCLUSION", + "name": "8-1请示报告卡(初核报告结论) ", + "business_type": "INVESTIGATION" + }, + "8.XXX初核情况报告": { + "template_code": "INVESTIGATION_REPORT", + "name": "8.XXX初核情况报告", + "business_type": "INVESTIGATION" + } +} + + +def generate_id(): + """生成ID(使用时间戳+随机数的方式,模拟雪花算法)""" + import time + import random + timestamp = int(time.time() * 1000) + random_part = random.randint(100000, 999999) + return timestamp * 1000 + random_part + + +def normalize_name(name: str) -> str: + """标准化名称,用于模糊匹配""" + import re + # 去掉开头的编号(如 "1."、"2."、"8-1" 等) + name = re.sub(r'^\d+[\.\-]\s*', '', name) + # 去掉括号及其内容(如 "(XXX)"、"(初核谈话)" 等) + name = re.sub(r'[((].*?[))]', '', name) + # 去掉空格和特殊字符 + name = name.strip() + return name + + +def identify_document_type(file_name: str) -> Optional[Dict]: + """根据完整文件名识别文档类型""" + base_name = Path(file_name).stem + if base_name in DOCUMENT_TYPE_MAPPING: + return DOCUMENT_TYPE_MAPPING[base_name] + return None + + +def scan_directory_structure(base_dir: Path) -> Dict: + """扫描目录结构,构建树状层级""" + structure = { + 'directories': {}, # {path: {'name': ..., 'parent': ..., 'level': ...}} + 'files': {} # {file_path: {'name': ..., 'parent': ..., 'template_code': ...}} + } + + def process_path(path: Path, parent_path: Optional[str] = None, level: int = 0): + """递归处理路径""" + if path.is_file() and path.suffix == '.docx': + # 处理文件 + file_name = path.stem + doc_config = identify_document_type(file_name) + + structure['files'][str(path)] = { + 'name': file_name, + 'parent': parent_path, + 'level': level, + 'template_code': doc_config['template_code'] if doc_config else None, + 'full_path': str(path), + 'normalized_name': normalize_name(file_name) + } + elif path.is_dir(): + # 处理目录 + dir_name = path.name + structure['directories'][str(path)] = { + 'name': dir_name, + 'parent': parent_path, + 'level': level, + 'normalized_name': normalize_name(dir_name) + } + + # 递归处理子目录和文件 + for child in sorted(path.iterdir()): + if child.name != '__pycache__': + process_path(child, str(path), level + 1) + + # 从根目录开始扫描 + if TEMPLATES_DIR.exists(): + for item in sorted(TEMPLATES_DIR.iterdir()): + if item.name != '__pycache__': + process_path(item, None, 0) + + return structure + + +def find_matching_config(file_info: Dict, existing_data: Dict) -> Optional[Dict]: + """ + 查找匹配的数据库记录 + 优先级:1. template_code 精确匹配 2. 名称精确匹配 3. 标准化名称匹配 + """ + template_code = file_info.get('template_code') + file_name = file_info['name'] + normalized_name = file_info.get('normalized_name', normalize_name(file_name)) + + # 优先级1: template_code 精确匹配 + if template_code: + matched = existing_data['by_template_code'].get(template_code) + if matched: + return matched + + # 优先级2: 名称精确匹配 + matched = existing_data['by_name'].get(file_name) + if matched: + return matched + + # 优先级3: 标准化名称匹配 + candidates = existing_data['by_normalized_name'].get(normalized_name, []) + if candidates: + # 如果有多个候选,优先选择有正确 template_code 的 + for candidate in candidates: + if candidate.get('extracted_template_code') == template_code: + return candidate + # 否则返回第一个 + return candidates[0] + + return None + + +def get_existing_data(conn) -> Dict: + """获取数据库中的现有数据""" + cursor = conn.cursor(pymysql.cursors.DictCursor) + + sql = """ + SELECT id, name, parent_id, template_code, input_data, file_path, state + FROM f_polic_file_config + WHERE tenant_id = %s + """ + cursor.execute(sql, (TENANT_ID,)) + configs = cursor.fetchall() + + result = { + 'by_id': {}, + 'by_name': {}, + 'by_template_code': {}, + 'by_normalized_name': {} # 新增:标准化名称索引 + } + + for config in configs: + config_id = config['id'] + config_name = config['name'] + + # 尝试从 input_data 中提取 template_code + template_code = config.get('template_code') + if not template_code and config.get('input_data'): + try: + input_data = json.loads(config['input_data']) if isinstance(config['input_data'], str) else config['input_data'] + if isinstance(input_data, dict): + template_code = input_data.get('template_code') + except: + pass + + config['extracted_template_code'] = template_code + config['normalized_name'] = normalize_name(config_name) + + result['by_id'][config_id] = config + result['by_name'][config_name] = config + + if template_code: + # 如果已存在相同 template_code,保留第一个 + if template_code not in result['by_template_code']: + result['by_template_code'][template_code] = config + + # 标准化名称索引(可能有多个记录匹配同一个标准化名称) + normalized = config['normalized_name'] + if normalized not in result['by_normalized_name']: + result['by_normalized_name'][normalized] = [] + result['by_normalized_name'][normalized].append(config) + + cursor.close() + return result + + +def plan_tree_structure(dir_structure: Dict, existing_data: Dict) -> List[Dict]: + """规划树状结构""" + plan = [] + + # 按层级排序目录 + directories = sorted(dir_structure['directories'].items(), + key=lambda x: (x[1]['level'], x[0])) + + # 按层级排序文件 + files = sorted(dir_structure['files'].items(), + key=lambda x: (x[1]['level'], x[0])) + + # 创建目录映射(用于查找父目录ID) + dir_id_map = {} # {dir_path: config_id} + + # 处理目录(按层级顺序) + for dir_path, dir_info in directories: + dir_name = dir_info['name'] + parent_path = dir_info['parent'] + level = dir_info['level'] + + # 查找父目录ID + parent_id = None + if parent_path: + parent_id = dir_id_map.get(parent_path) + + # 查找匹配的数据库记录(使用改进的匹配逻辑) + existing = find_matching_config(dir_info, existing_data) + + if existing: + # 使用现有记录 + plan.append({ + 'type': 'directory', + 'name': dir_name, + 'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None, + 'parent_id': parent_id, + 'level': level, + 'action': 'update', + 'config_id': existing['id'], + 'current_parent_id': existing.get('parent_id') + }) + dir_id_map[dir_path] = existing['id'] + else: + # 创建新记录(目录节点) + new_id = generate_id() + plan.append({ + 'type': 'directory', + 'name': dir_name, + 'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None, + 'parent_id': parent_id, + 'level': level, + 'action': 'create', + 'config_id': new_id, + 'current_parent_id': None + }) + dir_id_map[dir_path] = new_id + + # 处理文件 + for file_path, file_info in files: + file_name = file_info['name'] + parent_path = file_info['parent'] + level = file_info['level'] + template_code = file_info['template_code'] + + # 查找父目录ID + parent_id = dir_id_map.get(parent_path) if parent_path else None + + # 查找匹配的数据库记录(使用改进的匹配逻辑) + existing = find_matching_config(file_info, existing_data) + + if existing: + # 更新现有记录 + plan.append({ + 'type': 'file', + 'name': file_name, + 'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None, + 'parent_id': parent_id, + 'level': level, + 'action': 'update', + 'config_id': existing['id'], + 'template_code': template_code, + 'current_parent_id': existing.get('parent_id') + }) + else: + # 创建新记录(文件节点)- 这种情况应该很少,因为文件应该已经在数据库中 + new_id = generate_id() + plan.append({ + 'type': 'file', + 'name': file_name, + 'parent_name': dir_structure['directories'].get(parent_path, {}).get('name') if parent_path else None, + 'parent_id': parent_id, + 'level': level, + 'action': 'create', + 'config_id': new_id, + 'template_code': template_code, + 'current_parent_id': None + }) + + return plan + + +def print_preview(plan: List[Dict]): + """打印更新预览""" + print("\n" + "="*80) + print("更新预览") + print("="*80) + + # 按层级分组 + by_level = {} + for item in plan: + level = item['level'] + if level not in by_level: + by_level[level] = [] + by_level[level].append(item) + + # 按层级顺序显示 + for level in sorted(by_level.keys()): + print(f"\n【层级 {level}】") + for item in by_level[level]: + indent = " " * level + if item['action'] == 'create': + print(f"{indent}+ 创建: {item['name']} (ID: {item['config_id']})") + if item['parent_name']: + print(f"{indent} 父节点: {item['parent_name']}") + else: + current = item.get('current_parent_id', 'None') + new = item.get('parent_id', 'None') + if current != new: + print(f"{indent}→ 更新: {item['name']} (ID: {item['config_id']})") + print(f"{indent} parent_id: {current} → {new}") + if item['parent_name']: + print(f"{indent} 父节点: {item['parent_name']}") + else: + print(f"{indent}✓ 无需更新: {item['name']} (parent_id 已正确)") + + +def execute_update(conn, plan: List[Dict], dry_run: bool = True): + """执行更新""" + cursor = conn.cursor() + + try: + if not dry_run: + conn.autocommit(False) + + # 按层级分组 + by_level = {} + for item in plan: + level = item['level'] + if level not in by_level: + by_level[level] = [] + by_level[level].append(item) + + create_count = 0 + update_count = 0 + skip_count = 0 + + # 按层级顺序处理(从顶层到底层) + for level in sorted(by_level.keys()): + for item in by_level[level]: + if item['action'] == 'create': + # 创建新记录 + if not dry_run: + if item['type'] == 'directory': + insert_sql = """ + INSERT INTO f_polic_file_config + (id, tenant_id, parent_id, name, input_data, file_path, created_time, created_by, updated_time, updated_by, state) + VALUES (%s, %s, %s, %s, %s, %s, NOW(), %s, NOW(), %s, %s) + """ + cursor.execute(insert_sql, ( + item['config_id'], + TENANT_ID, + item['parent_id'], + item['name'], + None, + None, + CREATED_BY, + UPDATED_BY, + 1 + )) + else: + # 文件节点 + input_data = json.dumps({ + 'template_code': item.get('template_code', ''), + 'business_type': 'INVESTIGATION' + }, ensure_ascii=False) + insert_sql = """ + INSERT INTO f_polic_file_config + (id, tenant_id, parent_id, name, input_data, file_path, template_code, created_time, created_by, updated_time, updated_by, state) + VALUES (%s, %s, %s, %s, %s, %s, %s, NOW(), %s, NOW(), %s, %s) + """ + cursor.execute(insert_sql, ( + item['config_id'], + TENANT_ID, + item['parent_id'], + item['name'], + input_data, + None, + item.get('template_code'), + CREATED_BY, + UPDATED_BY, + 1 + )) + create_count += 1 + print(f" ✓ {'[模拟]' if dry_run else ''}创建: {item['name']}") + else: + # 更新现有记录 + current_parent = item.get('current_parent_id') + new_parent = item.get('parent_id') + + if current_parent != new_parent: + if not dry_run: + update_sql = """ + UPDATE f_polic_file_config + SET parent_id = %s, updated_time = NOW(), updated_by = %s + WHERE id = %s AND tenant_id = %s + """ + cursor.execute(update_sql, ( + new_parent, + UPDATED_BY, + item['config_id'], + TENANT_ID + )) + update_count += 1 + print(f" ✓ {'[模拟]' if dry_run else ''}更新: {item['name']} (parent_id: {current_parent} → {new_parent})") + else: + skip_count += 1 + + if not dry_run: + conn.commit() + print(f"\n✓ 更新完成!") + else: + print(f"\n[模拟模式] 未实际执行更新") + + print(f"\n统计:") + print(f" - 创建: {create_count} 条") + print(f" - 更新: {update_count} 条") + print(f" - 跳过: {skip_count} 条") + + except Exception as e: + if not dry_run: + conn.rollback() + print(f"\n✗ 更新失败: {e}") + import traceback + traceback.print_exc() + raise + finally: + cursor.close() + + +def main(): + """主函数""" + print("="*80) + print("更新模板树状结构") + print("="*80) + + # 连接数据库 + try: + conn = pymysql.connect(**DB_CONFIG) + print("✓ 数据库连接成功\n") + except Exception as e: + print(f"✗ 数据库连接失败: {e}") + return + + try: + # 扫描目录结构 + print("扫描目录结构...") + dir_structure = scan_directory_structure(TEMPLATES_DIR) + print(f" 找到 {len(dir_structure['directories'])} 个目录") + print(f" 找到 {len(dir_structure['files'])} 个文件\n") + + # 获取数据库现有数据 + print("获取数据库现有数据...") + existing_data = get_existing_data(conn) + print(f" 数据库中有 {len(existing_data['by_id'])} 条记录\n") + + # 规划树状结构 + print("规划树状结构...") + plan = plan_tree_structure(dir_structure, existing_data) + print(f" 生成 {len(plan)} 个更新计划\n") + + # 打印预览 + print_preview(plan) + + # 询问是否执行 + print("\n" + "="*80) + response = input("\n是否执行更新?(yes/no,默认no): ").strip().lower() + + if response == 'yes': + # 先执行一次模拟 + print("\n执行模拟更新...") + execute_update(conn, plan, dry_run=True) + + # 再次确认 + print("\n" + "="*80) + confirm = input("\n确认执行实际更新?(yes/no,默认no): ").strip().lower() + + if confirm == 'yes': + print("\n执行实际更新...") + execute_update(conn, plan, dry_run=False) + else: + print("\n已取消更新") + else: + print("\n已取消更新") + + finally: + conn.close() + print("\n数据库连接已关闭") + + +if __name__ == '__main__': + main() + diff --git a/update_template_tree.sql b/update_template_tree.sql new file mode 100644 index 0000000..366f6cc --- /dev/null +++ b/update_template_tree.sql @@ -0,0 +1,159 @@ +-- 模板树状结构更新脚本 +-- 生成时间: 2025-12-09 17:39:51 +-- 注意:执行前请备份数据库! + +USE finyx; + +START TRANSACTION; + +-- ===== 层级 0 ===== + +-- 创建目录节点: 2-初核模版 +INSERT INTO f_polic_file_config + (id, tenant_id, parent_id, name, input_data, file_path, created_time, created_by, updated_time, updated_by, state) +VALUES (1765273080357704, 615873064429507639, NULL, '2-初核模版', NULL, NULL, NOW(), 655162080928945152, NOW(), 655162080928945152, 1); + +-- ===== 层级 1 ===== + +-- 创建目录节点: 1.初核请示 +INSERT INTO f_polic_file_config + (id, tenant_id, parent_id, name, input_data, file_path, created_time, created_by, updated_time, updated_by, state) +VALUES (1765273080719940, 615873064429507639, 1765273080357704, '1.初核请示', NULL, NULL, NOW(), 655162080928945152, NOW(), 655162080928945152, 1); + +-- 更新: 2.谈话审批 (parent_id: None -> 1765273080357704) +UPDATE f_polic_file_config +SET parent_id = 1765273080357704, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 704825582342212610 AND tenant_id = 615873064429507639; + +-- 更新: 3.初核结论 (parent_id: None -> 1765273080357704) +UPDATE f_polic_file_config +SET parent_id = 1765273080357704, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 704825582342212611 AND tenant_id = 615873064429507639; + +-- ===== 层级 2 ===== + +-- 更新: 谈话通知书 (parent_id: None -> 704825582342212610) +UPDATE f_polic_file_config +SET parent_id = 704825582342212610, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1764836033451564 AND tenant_id = 615873064429507639; + +-- 更新: 走读式谈话审批 (parent_id: None -> 704825582342212610) +UPDATE f_polic_file_config +SET parent_id = 704825582342212610, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1764836034070056 AND tenant_id = 615873064429507639; + +-- 更新: 走读式谈话流程 (parent_id: None -> 704825582342212610) +UPDATE f_polic_file_config +SET parent_id = 704825582342212610, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1764836034052009 AND tenant_id = 615873064429507639; + +-- 更新: 1.请示报告卡(XXX) (parent_id: None -> 1765273080719940) +UPDATE f_polic_file_config +SET parent_id = 1765273080719940, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1764836033251691 AND tenant_id = 615873064429507639; + +-- 更新: 2.初步核实审批表(XXX) (parent_id: None -> 1765273080719940) +UPDATE f_polic_file_config +SET parent_id = 1765273080719940, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1764656918061150 AND tenant_id = 615873064429507639; + +-- 更新: 3.附件初核方案(XXX) (parent_id: None -> 1765273080719940) +UPDATE f_polic_file_config +SET parent_id = 1765273080719940, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1765242273284972 AND tenant_id = 615873064429507639; + +-- 更新: 8-1请示报告卡(初核报告结论) (parent_id: None -> 704825582342212611) +UPDATE f_polic_file_config +SET parent_id = 704825582342212611, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1765242278419277 AND tenant_id = 615873064429507639; + +-- 更新: 8.XXX初核情况报告 (parent_id: None -> 704825582342212611) +UPDATE f_polic_file_config +SET parent_id = 704825582342212611, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1765242278832792 AND tenant_id = 615873064429507639; + +-- ===== 层级 3 ===== + +-- 更新: 谈话通知书第一联 (parent_id: None -> 1764836033451564) +UPDATE f_polic_file_config +SET parent_id = 1764836033451564, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1765242274101483 AND tenant_id = 615873064429507639; + +-- 更新: 谈话通知书第三联 (parent_id: None -> 1764836033451564) +UPDATE f_polic_file_config +SET parent_id = 1764836033451564, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1765242274109904 AND tenant_id = 615873064429507639; + +-- 更新: 谈话通知书第二联 (parent_id: None -> 1764836033451564) +UPDATE f_polic_file_config +SET parent_id = 1764836033451564, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1765242273898117 AND tenant_id = 615873064429507639; + +-- 更新: 1.请示报告卡(初核谈话) (parent_id: None -> 1764836034070056) +UPDATE f_polic_file_config +SET parent_id = 1764836034070056, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1765242274961528 AND tenant_id = 615873064429507639; + +-- 更新: 2谈话审批表 (parent_id: None -> 1764836034070056) +UPDATE f_polic_file_config +SET parent_id = 1764836034070056, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1765242275071133 AND tenant_id = 615873064429507639; + +-- 更新: 3.谈话前安全风险评估表 (parent_id: None -> 1764836034070056) +UPDATE f_polic_file_config +SET parent_id = 1764836034070056, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1765242275362306 AND tenant_id = 615873064429507639; + +-- 更新: 4.谈话方案 (parent_id: None -> 1764836034070056) +UPDATE f_polic_file_config +SET parent_id = 1764836034070056, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1765242275716334 AND tenant_id = 615873064429507639; + +-- 更新: 5.谈话后安全风险评估表 (parent_id: None -> 1764836034070056) +UPDATE f_polic_file_config +SET parent_id = 1764836034070056, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1765242275780395 AND tenant_id = 615873064429507639; + +-- 更新: 1.谈话笔录 (parent_id: None -> 1764836034052009) +UPDATE f_polic_file_config +SET parent_id = 1764836034052009, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1765242276549299 AND tenant_id = 615873064429507639; + +-- 更新: 2.谈话询问对象情况摸底调查30问 (parent_id: None -> 1764836034052009) +UPDATE f_polic_file_config +SET parent_id = 1764836034052009, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1765242276522490 AND tenant_id = 615873064429507639; + +-- 更新: 3.被谈话人权利义务告知书 (parent_id: None -> 1764836034052009) +UPDATE f_polic_file_config +SET parent_id = 1764836034052009, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1765242277165087 AND tenant_id = 615873064429507639; + +-- 更新: 4.点对点交接单 (parent_id: None -> 1764836034052009) +UPDATE f_polic_file_config +SET parent_id = 1764836034052009, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1765242276709614 AND tenant_id = 615873064429507639; + +-- 更新: 5.陪送交接单(新) (parent_id: None -> 1764836034052009) +UPDATE f_polic_file_config +SET parent_id = 1764836034052009, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1765242277149374 AND tenant_id = 615873064429507639; + +-- 更新: 6.1保密承诺书(谈话对象使用-非中共党员用) (parent_id: None -> 1764836034052009) +UPDATE f_polic_file_config +SET parent_id = 1764836034052009, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1765242277776686 AND tenant_id = 615873064429507639; + +-- 更新: 6.2保密承诺书(谈话对象使用-中共党员用) (parent_id: None -> 1764836034052009) +UPDATE f_polic_file_config +SET parent_id = 1764836034052009, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1765242277897239 AND tenant_id = 615873064429507639; + +-- 更新: 7.办案人员-办案安全保密承诺书 (parent_id: None -> 1764836034052009) +UPDATE f_polic_file_config +SET parent_id = 1764836034052009, updated_time = NOW(), updated_by = 655162080928945152 +WHERE id = 1765242278111656 AND tenant_id = 615873064429507639; + +COMMIT; + +-- 更新完成 \ No newline at end of file diff --git a/verify_field_code_fix.py b/verify_field_code_fix.py new file mode 100644 index 0000000..d1f178c --- /dev/null +++ b/verify_field_code_fix.py @@ -0,0 +1,148 @@ +""" +验证字段编码修复结果,并处理剩余的真正问题 +""" +import os +import pymysql +import re +from typing import Dict, List + +# 数据库连接配置 +DB_CONFIG = { + 'host': os.getenv('DB_HOST', '152.136.177.240'), + 'port': int(os.getenv('DB_PORT', 5012)), + 'user': os.getenv('DB_USER', 'finyx'), + 'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'), + 'database': os.getenv('DB_NAME', 'finyx'), + 'charset': 'utf8mb4' +} + +TENANT_ID = 615873064429507639 + +def is_chinese(text: str) -> bool: + """判断字符串是否包含中文字符""" + if not text: + return False + return bool(re.search(r'[\u4e00-\u9fff]', text)) + + +def verify_fix(): + """验证修复结果""" + conn = pymysql.connect(**DB_CONFIG) + cursor = conn.cursor(pymysql.cursors.DictCursor) + + print("="*80) + print("验证字段编码修复结果") + print("="*80) + + # 查询所有字段 + cursor.execute(""" + SELECT id, name, filed_code, field_type, state + FROM f_polic_field + WHERE tenant_id = %s + ORDER BY name + """, (TENANT_ID,)) + + fields = cursor.fetchall() + + # 找出仍然包含中文的field_code + chinese_fields = [] + for field in fields: + if field['filed_code'] and is_chinese(field['filed_code']): + chinese_fields.append(field) + + print(f"\n总共 {len(fields)} 个字段") + print(f"仍有 {len(chinese_fields)} 个字段的field_code包含中文:\n") + + if chinese_fields: + for field in chinese_fields: + print(f" ID: {field['id']}") + print(f" 名称: {field['name']}") + print(f" field_code: {field['filed_code']}") + print(f" field_type: {field['field_type']}") + print() + + # 检查重复的字段名称 + name_to_fields = {} + for field in fields: + name = field['name'] + if name not in name_to_fields: + name_to_fields[name] = [] + name_to_fields[name].append(field) + + duplicates = {name: fields_list for name, fields_list in name_to_fields.items() + if len(fields_list) > 1} + + print(f"\n仍有 {len(duplicates)} 个重复的字段名称:\n") + for name, fields_list in duplicates.items(): + print(f" 字段名称: {name} (共 {len(fields_list)} 条记录)") + for field in fields_list: + print(f" - ID: {field['id']}, field_code: {field['filed_code']}, " + f"field_type: {field['field_type']}, state: {field['state']}") + print() + + # 检查f_polic_file_field表中的关联关系 + print("="*80) + print("检查 f_polic_file_field 表") + print("="*80) + + cursor.execute(""" + SELECT fff.id, fff.file_id, fff.filed_id, + fc.name as file_name, f.name as field_name, f.filed_code + FROM f_polic_file_field fff + LEFT JOIN f_polic_file_config fc ON fff.file_id = fc.id + LEFT JOIN f_polic_field f ON fff.filed_id = f.id + WHERE fff.tenant_id = %s AND f.filed_code IS NOT NULL + ORDER BY fff.file_id, fff.filed_id + """, (TENANT_ID,)) + + relations = cursor.fetchall() + + # 检查是否有重复的关联关系 + relation_keys = {} + for rel in relations: + key = (rel['file_id'], rel['filed_id']) + if key not in relation_keys: + relation_keys[key] = [] + relation_keys[key].append(rel) + + duplicate_relations = {key: records for key, records in relation_keys.items() + if len(records) > 1} + + print(f"\n总共 {len(relations)} 个关联关系") + print(f"发现 {len(duplicate_relations)} 个重复的关联关系") + + # 检查使用中文field_code的关联关系 + chinese_relations = [rel for rel in relations + if rel['filed_code'] and is_chinese(rel['filed_code'])] + + print(f"使用中文field_code的关联关系: {len(chinese_relations)} 个") + + if chinese_relations: + print("\n前10个使用中文field_code的关联关系:") + for rel in chinese_relations[:10]: + print(f" - 文件: {rel['file_name']}, 字段: {rel['field_name']}, " + f"field_code: {rel['filed_code']}") + + cursor.close() + conn.close() + + return { + 'total_fields': len(fields), + 'chinese_fields': len(chinese_fields), + 'duplicate_names': len(duplicates), + 'duplicate_relations': len(duplicate_relations), + 'chinese_relations': len(chinese_relations) + } + + +if __name__ == '__main__': + result = verify_fix() + print("\n" + "="*80) + print("验证完成") + print("="*80) + print(f"总字段数: {result['total_fields']}") + print(f"中文field_code字段数: {result['chinese_fields']}") + print(f"重复字段名称数: {result['duplicate_names']}") + print(f"重复关联关系数: {result['duplicate_relations']}") + print(f"使用中文field_code的关联关系数: {result['chinese_relations']}") + diff --git a/verify_template_fields_sync.py b/verify_template_fields_sync.py new file mode 100644 index 0000000..50023a8 --- /dev/null +++ b/verify_template_fields_sync.py @@ -0,0 +1,345 @@ +""" +验证模板字段同步结果 +检查 input_data、template_code 和字段关联关系是否正确 +""" +import os +import json +import pymysql +from typing import Dict, List + +# 数据库连接配置 +DB_CONFIG = { + 'host': os.getenv('DB_HOST', '152.136.177.240'), + 'port': int(os.getenv('DB_PORT', 5012)), + 'user': os.getenv('DB_USER', 'finyx'), + 'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'), + 'database': os.getenv('DB_NAME', 'finyx'), + 'charset': 'utf8mb4' +} + +TENANT_ID = 615873064429507639 + + +def verify_template_configs(conn): + """验证模板配置的 input_data 和 template_code""" + cursor = conn.cursor(pymysql.cursors.DictCursor) + + print("="*80) + print("验证模板配置") + print("="*80) + + sql = """ + SELECT id, name, template_code, input_data, parent_id + FROM f_polic_file_config + WHERE tenant_id = %s + ORDER BY parent_id, name + """ + cursor.execute(sql, (TENANT_ID,)) + configs = cursor.fetchall() + + print(f"\n共 {len(configs)} 个模板配置\n") + + # 统计 + has_template_code = 0 + has_input_data = 0 + has_both = 0 + missing_both = 0 + + # 文件节点(有 template_code 的) + file_nodes = [] + # 目录节点(没有 template_code 的) + dir_nodes = [] + + for config in configs: + template_code = config.get('template_code') + input_data = config.get('input_data') + + if template_code: + has_template_code += 1 + file_nodes.append(config) + else: + dir_nodes.append(config) + + if input_data: + has_input_data += 1 + try: + input_data_dict = json.loads(input_data) if isinstance(input_data, str) else input_data + if isinstance(input_data_dict, dict) and input_data_dict.get('template_code'): + has_both += 1 + except: + pass + + if not template_code and not input_data: + missing_both += 1 + + print("统计信息:") + print(f" 文件节点(有 template_code): {len(file_nodes)} 个") + print(f" 目录节点(无 template_code): {len(dir_nodes)} 个") + print(f" 有 input_data: {has_input_data} 个") + print(f" 同时有 template_code 和 input_data: {has_both} 个") + print(f" 两者都没有: {missing_both} 个") + + # 检查文件节点的 input_data + print("\n文件节点 input_data 检查:") + missing_input_data = [] + for config in file_nodes: + input_data = config.get('input_data') + if not input_data: + missing_input_data.append(config) + else: + try: + input_data_dict = json.loads(input_data) if isinstance(input_data, str) else input_data + if not isinstance(input_data_dict, dict) or 'template_code' not in input_data_dict: + missing_input_data.append(config) + except: + missing_input_data.append(config) + + if missing_input_data: + print(f" ⚠ 有 {len(missing_input_data)} 个文件节点缺少或格式错误的 input_data:") + for config in missing_input_data[:10]: # 只显示前10个 + print(f" - {config['name']} (ID: {config['id']})") + if len(missing_input_data) > 10: + print(f" ... 还有 {len(missing_input_data) - 10} 个") + else: + print(" ✓ 所有文件节点都有正确的 input_data") + + cursor.close() + return { + 'total': len(configs), + 'file_nodes': len(file_nodes), + 'dir_nodes': len(dir_nodes), + 'has_input_data': has_input_data, + 'has_both': has_both, + 'missing_input_data': len(missing_input_data) + } + + +def verify_field_relations(conn): + """验证字段关联关系""" + cursor = conn.cursor(pymysql.cursors.DictCursor) + + print("\n" + "="*80) + print("验证字段关联关系") + print("="*80) + + # 获取所有文件节点的字段关联 + sql = """ + SELECT + fc.id as file_id, + fc.name as file_name, + fc.template_code, + COUNT(ff.id) as field_count, + SUM(CASE WHEN f.field_type = 1 THEN 1 ELSE 0 END) as input_field_count, + SUM(CASE WHEN f.field_type = 2 THEN 1 ELSE 0 END) as output_field_count + FROM f_polic_file_config fc + LEFT JOIN f_polic_file_field ff ON fc.id = ff.file_id AND ff.tenant_id = fc.tenant_id + LEFT JOIN f_polic_field f ON ff.filed_id = f.id AND f.tenant_id = fc.tenant_id + WHERE fc.tenant_id = %s AND fc.template_code IS NOT NULL + GROUP BY fc.id, fc.name, fc.template_code + ORDER BY fc.name + """ + cursor.execute(sql, (TENANT_ID,)) + relations = cursor.fetchall() + + print(f"\n共 {len(relations)} 个文件节点有字段关联\n") + + # 统计 + has_relations = 0 + no_relations = 0 + has_input_fields = 0 + has_output_fields = 0 + + no_relation_templates = [] + + for rel in relations: + field_count = rel['field_count'] or 0 + input_count = rel['input_field_count'] or 0 + output_count = rel['output_field_count'] or 0 + + if field_count > 0: + has_relations += 1 + if input_count > 0: + has_input_fields += 1 + if output_count > 0: + has_output_fields += 1 + else: + no_relations += 1 + no_relation_templates.append(rel) + + print("统计信息:") + print(f" 有字段关联: {has_relations} 个") + print(f" 无字段关联: {no_relations} 个") + print(f" 有输入字段: {has_input_fields} 个") + print(f" 有输出字段: {has_output_fields} 个") + + if no_relation_templates: + print(f"\n ⚠ 有 {len(no_relation_templates)} 个文件节点没有字段关联:") + for rel in no_relation_templates[:10]: + print(f" - {rel['file_name']} (code: {rel['template_code']})") + if len(no_relation_templates) > 10: + print(f" ... 还有 {len(no_relation_templates) - 10} 个") + else: + print("\n ✓ 所有文件节点都有字段关联") + + # 显示详细的关联信息(前10个) + print("\n字段关联详情(前10个):") + for rel in relations[:10]: + print(f"\n {rel['file_name']} (code: {rel['template_code']})") + print(f" 总字段数: {rel['field_count']}") + print(f" 输入字段: {rel['input_field_count']}") + print(f" 输出字段: {rel['output_field_count']}") + + cursor.close() + return { + 'total': len(relations), + 'has_relations': has_relations, + 'no_relations': no_relations, + 'has_input_fields': has_input_fields, + 'has_output_fields': has_output_fields + } + + +def verify_input_data_structure(conn): + """验证 input_data 的结构""" + cursor = conn.cursor(pymysql.cursors.DictCursor) + + print("\n" + "="*80) + print("验证 input_data 结构") + print("="*80) + + sql = """ + SELECT id, name, template_code, input_data + FROM f_polic_file_config + WHERE tenant_id = %s AND template_code IS NOT NULL AND input_data IS NOT NULL + """ + cursor.execute(sql, (TENANT_ID,)) + configs = cursor.fetchall() + + print(f"\n检查 {len(configs)} 个有 input_data 的文件节点\n") + + correct_structure = 0 + incorrect_structure = 0 + incorrect_items = [] + + for config in configs: + try: + input_data = json.loads(config['input_data']) if isinstance(config['input_data'], str) else config['input_data'] + + if not isinstance(input_data, dict): + incorrect_structure += 1 + incorrect_items.append({ + 'name': config['name'], + 'reason': 'input_data 不是字典格式' + }) + continue + + # 检查必需字段 + required_fields = ['template_code', 'business_type'] + missing_fields = [f for f in required_fields if f not in input_data] + + if missing_fields: + incorrect_structure += 1 + incorrect_items.append({ + 'name': config['name'], + 'reason': f'缺少字段: {", ".join(missing_fields)}' + }) + continue + + # 检查 template_code 是否匹配 + if input_data.get('template_code') != config.get('template_code'): + incorrect_structure += 1 + incorrect_items.append({ + 'name': config['name'], + 'reason': f"template_code 不匹配: input_data中为 '{input_data.get('template_code')}', 字段中为 '{config.get('template_code')}'" + }) + continue + + correct_structure += 1 + + except json.JSONDecodeError as e: + incorrect_structure += 1 + incorrect_items.append({ + 'name': config['name'], + 'reason': f'JSON解析错误: {str(e)}' + }) + except Exception as e: + incorrect_structure += 1 + incorrect_items.append({ + 'name': config['name'], + 'reason': f'其他错误: {str(e)}' + }) + + print(f" 结构正确: {correct_structure} 个") + print(f" 结构错误: {incorrect_structure} 个") + + if incorrect_items: + print("\n 错误详情:") + for item in incorrect_items[:10]: + print(f" - {item['name']}: {item['reason']}") + if len(incorrect_items) > 10: + print(f" ... 还有 {len(incorrect_items) - 10} 个错误") + else: + print("\n ✓ 所有 input_data 结构都正确") + + cursor.close() + return { + 'correct': correct_structure, + 'incorrect': incorrect_structure + } + + +def main(): + """主函数""" + print("="*80) + print("验证模板字段同步结果") + print("="*80) + + try: + conn = pymysql.connect(**DB_CONFIG) + print("✓ 数据库连接成功\n") + except Exception as e: + print(f"✗ 数据库连接失败: {e}") + return + + try: + # 验证模板配置 + config_stats = verify_template_configs(conn) + + # 验证字段关联 + relation_stats = verify_field_relations(conn) + + # 验证 input_data 结构 + input_data_stats = verify_input_data_structure(conn) + + # 总结 + print("\n" + "="*80) + print("验证总结") + print("="*80) + print(f"模板配置:") + print(f" - 总模板数: {config_stats['total']}") + print(f" - 文件节点: {config_stats['file_nodes']}") + print(f" - 缺少 input_data: {config_stats['missing_input_data']} 个") + print(f"\n字段关联:") + print(f" - 有字段关联: {relation_stats['has_relations']} 个") + print(f" - 无字段关联: {relation_stats['no_relations']} 个") + print(f"\ninput_data 结构:") + print(f" - 正确: {input_data_stats['correct']} 个") + print(f" - 错误: {input_data_stats['incorrect']} 个") + + # 总体评估 + print("\n" + "="*80) + if (config_stats['missing_input_data'] == 0 and + relation_stats['no_relations'] == 0 and + input_data_stats['incorrect'] == 0): + print("✓ 所有验证通过!同步成功!") + else: + print("⚠ 发现一些问题,请检查上述详情") + + finally: + conn.close() + print("\n数据库连接已关闭") + + +if __name__ == '__main__': + main() + diff --git a/verify_tree_structure.py b/verify_tree_structure.py new file mode 100644 index 0000000..688a6ce --- /dev/null +++ b/verify_tree_structure.py @@ -0,0 +1,169 @@ +""" +验证树状结构更新结果 +""" +import os +import json +import pymysql +from typing import Dict, List + +# 数据库连接配置 +DB_CONFIG = { + 'host': os.getenv('DB_HOST', '152.136.177.240'), + 'port': int(os.getenv('DB_PORT', 5012)), + 'user': os.getenv('DB_USER', 'finyx'), + 'password': os.getenv('DB_PASSWORD', '6QsGK6MpePZDE57Z'), + 'database': os.getenv('DB_NAME', 'finyx'), + 'charset': 'utf8mb4' +} + +TENANT_ID = 615873064429507639 + + +def print_tree_structure(conn): + """打印树状结构""" + cursor = conn.cursor(pymysql.cursors.DictCursor) + + sql = """ + SELECT id, name, parent_id, template_code, input_data, state + FROM f_polic_file_config + WHERE tenant_id = %s + ORDER BY parent_id, name + """ + cursor.execute(sql, (TENANT_ID,)) + configs = cursor.fetchall() + + # 构建ID到配置的映射 + id_to_config = {config['id']: config for config in configs} + + # 找出根节点(parent_id为NULL) + root_nodes = [config for config in configs if config.get('parent_id') is None] + + def print_node(config, indent=0, visited=None): + """递归打印节点""" + if visited is None: + visited = set() + + if config['id'] in visited: + return + + visited.add(config['id']) + + prefix = " " * indent + parent_info = "" + if config.get('parent_id'): + parent_name = id_to_config.get(config['parent_id'], {}).get('name', f"ID:{config['parent_id']}") + parent_info = f" [父: {parent_name}]" + + template_code = config.get('template_code') + if not template_code and config.get('input_data'): + try: + input_data = json.loads(config['input_data']) if isinstance(config['input_data'], str) else config['input_data'] + if isinstance(input_data, dict): + template_code = input_data.get('template_code') + except: + pass + + template_info = f" [code: {template_code}]" if template_code else "" + state_info = " [启用]" if config.get('state') == 1 else " [未启用]" + + print(f"{prefix}├─ {config['name']}{parent_info}{template_info}{state_info}") + + # 打印子节点 + children = [c for c in configs if c.get('parent_id') == config['id']] + for i, child in enumerate(sorted(children, key=lambda x: x['name'])): + is_last = i == len(children) - 1 + if is_last: + print_node(child, indent + 1, visited) + else: + print_node(child, indent + 1, visited) + + print("="*80) + print("树状结构") + print("="*80) + + for root in sorted(root_nodes, key=lambda x: x['name']): + print_node(root) + print() + + # 统计信息 + print("="*80) + print("统计信息") + print("="*80) + print(f"总记录数: {len(configs)}") + print(f"根节点数: {len(root_nodes)}") + print(f"有父节点的记录: {len([c for c in configs if c.get('parent_id')])}") + print(f"无父节点的记录: {len([c for c in configs if not c.get('parent_id')])}") + + cursor.close() + + +def verify_parent_relationships(conn): + """验证父子关系""" + cursor = conn.cursor(pymysql.cursors.DictCursor) + + sql = """ + SELECT id, name, parent_id + FROM f_polic_file_config + WHERE tenant_id = %s AND parent_id IS NOT NULL + """ + cursor.execute(sql, (TENANT_ID,)) + configs = cursor.fetchall() + + print("\n" + "="*80) + print("验证父子关系") + print("="*80) + + errors = [] + for config in configs: + parent_id = config['parent_id'] + check_sql = """ + SELECT id, name FROM f_polic_file_config + WHERE id = %s AND tenant_id = %s + """ + cursor.execute(check_sql, (parent_id, TENANT_ID)) + parent = cursor.fetchone() + + if not parent: + errors.append({ + 'child': config['name'], + 'child_id': config['id'], + 'parent_id': parent_id, + 'error': '父节点不存在' + }) + + if errors: + print(f"\n✗ 发现 {len(errors)} 个错误:") + for error in errors: + print(f" - {error['child']} (ID: {error['child_id']})") + print(f" 父节点ID {error['parent_id']} 不存在") + else: + print("\n✓ 所有父子关系验证通过") + + cursor.close() + return len(errors) == 0 + + +def main(): + """主函数""" + print("="*80) + print("验证树状结构") + print("="*80) + + try: + conn = pymysql.connect(**DB_CONFIG) + print("✓ 数据库连接成功\n") + + print_tree_structure(conn) + verify_parent_relationships(conn) + + conn.close() + + except Exception as e: + print(f"✗ 错误: {e}") + import traceback + traceback.print_exc() + + +if __name__ == '__main__': + main() + diff --git a/同步结果总结.md b/同步结果总结.md new file mode 100644 index 0000000..105cfe8 --- /dev/null +++ b/同步结果总结.md @@ -0,0 +1,152 @@ +# 模板字段同步结果总结 + +## 执行时间 +根据验证脚本执行结果生成 + +## 同步状态概览 + +### ✅ 成功同步的部分 + +1. **模板配置 (f_polic_file_config)** + - ✓ 所有 23 个文件节点都有正确的 `template_code` + - ✓ 所有 23 个文件节点都有正确的 `input_data` + - ✓ 所有 `input_data` 结构都正确,包含: + - `template_code`: 模板编码 + - `business_type`: 业务类型(INVESTIGATION) + - `input_fields`: 输入字段列表(部分模板) + +2. **字段关联 (f_polic_file_field)** + - ✓ 19 个文件节点有完整的字段关联 + - ✓ 17 个文件节点有输入字段关联 + - ✓ 19 个文件节点有输出字段关联 + +### ⚠️ 需要关注的部分 + +1. **缺少字段关联的节点(9个)** + + **目录节点(5个)- 正常情况,无需处理:** + - `1.初核请示` - 目录节点 + - `2-初核模版` - 根目录节点 + - `3.初核结论` - 目录节点 + - `谈话通知书` - 目录节点(但 template_code 不为空,可能需要检查) + - `走读式谈话审批` - 目录节点(但 template_code 不为空,可能需要检查) + - `走读式谈话流程` - 目录节点(但 template_code 不为空,可能需要检查) + + **文件节点(4个)- 需要检查:** + - `1.请示报告卡(初核谈话)` - template_code 为空,可能是匹配问题 + - `2谈话审批表` - 有 template_code (INTERVIEW_APPROVAL_FORM),但无字段关联 + - `6.1保密承诺书(谈话对象使用-非中共党员用)` - template_code 为空 + +## 详细统计 + +### 模板配置统计 +- 总模板数: 28 +- 文件节点: 23 +- 目录节点: 5 +- 有 input_data: 23 +- 同时有 template_code 和 input_data: 23 +- 缺少 input_data: 0 + +### 字段关联统计 +- 有字段关联: 19 个 +- 无字段关联: 9 个(其中 5 个是目录节点) +- 有输入字段: 17 个 +- 有输出字段: 19 个 + +### input_data 结构验证 +- 结构正确: 23 个 +- 结构错误: 0 个 + +## 已同步的模板列表 + +根据验证结果,以下模板已成功同步: + +1. `1.请示报告卡(XXX)` - 4个字段关联(1输入+3输出) +2. `2.初步核实审批表(XXX)` - 12个字段关联(2输入+10输出) +3. `3.附件初核方案(XXX)` - 10个字段关联(2输入+8输出) +4. `谈话通知书第一联` - 字段关联 +5. `谈话通知书第二联` - 字段关联 +6. `谈话通知书第三联` - 字段关联 +7. `1.谈话笔录` - 8个字段关联(1输入+7输出) +8. `2.谈话询问对象情况摸底调查30问` - 11个字段关联(1输入+10输出) +9. `3.被谈话人权利义务告知书` - 字段关联 +10. `4.点对点交接单` - 字段关联 +11. `5.陪送交接单(新)` - 字段关联 +12. `6.2保密承诺书(谈话对象使用-中共党员用)` - 字段关联 +13. `7.办案人员-办案安全保密承诺书` - 字段关联 +14. `2.谈话审批` - 13个字段关联(2输入+11输出) +15. `3.谈话前安全风险评估表` - 18个字段关联(1输入+17输出) +16. `4.谈话方案` - 8个字段关联(1输入+7输出) +17. `5.谈话后安全风险评估表` - 17个字段关联(1输入+16输出) +18. `8-1请示报告卡(初核报告结论)` - 5个字段关联(1输入+4输出) +19. `8.XXX初核情况报告` - 8个字段关联(2输入+6输出) + +## 需要手动处理的问题 + +### 1. 目录节点的 template_code + +以下目录节点有 template_code,但按照设计应该是 NULL: +- `谈话通知书` (code: 谈话通知书) +- `走读式谈话审批` (code: 走读式谈话审批) +- `走读式谈话流程` (code: 走读式谈话流程) + +**建议处理:** +- 如果这些确实是目录节点,应该将 template_code 设置为 NULL +- 如果这些是文件节点,需要补充字段关联 + +### 2. 缺少字段关联的文件节点 + +以下文件节点有 template_code 但没有字段关联: +- `2谈话审批表` (code: INTERVIEW_APPROVAL_FORM) + +**可能原因:** +- Excel 中对应的模板名称不匹配 +- 字段定义不存在 +- 需要手动检查并补充 + +### 3. template_code 为空的文件节点 + +以下文件节点应该是文件但 template_code 为空: +- `1.请示报告卡(初核谈话)` +- `6.1保密承诺书(谈话对象使用-非中共党员用)` + +**可能原因:** +- Excel 中名称不匹配 +- 需要手动检查并补充 template_code + +## 建议的后续操作 + +1. **检查目录节点** + - 确认 `谈话通知书`、`走读式谈话审批`、`走读式谈话流程` 是目录还是文件 + - 如果是目录,将 template_code 设置为 NULL + +2. **补充缺失的字段关联** + - 检查 `2谈话审批表` 在 Excel 中的定义 + - 确认字段是否存在 + - 手动补充字段关联 + +3. **修复 template_code** + - 检查 `1.请示报告卡(初核谈话)` 和 `6.1保密承诺书(谈话对象使用-非中共党员用)` 的 template_code + - 根据 Excel 文档补充正确的 template_code + +## 验证命令 + +运行以下命令验证同步结果: + +```bash +python verify_template_fields_sync.py +``` + +## 总结 + +✅ **主要同步工作已完成** +- 23 个文件节点的 input_data 和 template_code 已正确同步 +- 19 个文件节点有完整的字段关联 +- input_data 结构全部正确 + +⚠️ **需要手动处理** +- 4 个文件节点缺少字段关联(需要检查 Excel 定义) +- 3 个目录节点有 template_code(可能需要清理) + +总体同步成功率:**约 83%** (19/23 文件节点有完整关联) + diff --git a/备份数据库.bat b/备份数据库.bat new file mode 100644 index 0000000..4881740 --- /dev/null +++ b/备份数据库.bat @@ -0,0 +1,24 @@ +@echo off +chcp 65001 >nul +echo ======================================== +echo 数据库备份工具 +echo ======================================== +echo. + +REM 检查Python是否安装 +python --version >nul 2>&1 +if errorlevel 1 ( + echo 错误: 未找到Python,请先安装Python + pause + exit /b 1 +) + +REM 执行备份 +python backup_database.py --compress + +echo. +echo ======================================== +echo 备份完成! +echo ======================================== +pause + diff --git a/字段编码修复总结.md b/字段编码修复总结.md new file mode 100644 index 0000000..06fd16f --- /dev/null +++ b/字段编码修复总结.md @@ -0,0 +1,182 @@ +# 字段编码修复总结 + +## 修复日期 +2025-01-XX + +## 修复目标 +1. 分析并修复 `f_polic_field` 表中的中文 `field_code` 问题 +2. 合并 `f_polic_file_field` 表中的重复项 +3. 确保所有 `field_code` 与占位符与字段对照表文档中的英文名称对应 + +## 发现的问题 + +### 1. f_polic_field 表问题 +- **初始状态**:87个字段记录 +- **中文field_code字段**:69个 +- **重复字段名称**:8组(每组2条记录) +- **重复field_code**:0个 + +### 2. f_polic_file_field 表问题 +- **初始状态**:144个关联关系 +- **重复关联关系**:0个(已通过之前的修复处理) +- **使用中文field_code的关联关系**:81个 + +## 修复操作 + +### 第一阶段:主要字段修复 +1. **更新37个字段的field_code**:将中文field_code更新为英文field_code +2. **合并8组重复字段**: + - 主要问题线索 + - 初步核实审批表填表人 + - 初步核实审批表承办部门意见 + - 线索来源 + - 被核查人员出生年月 + - 被核查人员性别 + - 被核查人员政治面貌 + - 被核查人员职级 + +### 第二阶段:剩余字段修复 +修复了24个剩余的中文field_code字段,包括: +- 谈话相关字段(拟谈话地点、拟谈话时间、谈话事由等) +- 被核查人员相关字段(被核查人员学历、工作履历、职业等) +- 其他字段(补空人员、记录人、评估意见等) + +## 修复结果 + +### 最终状态 +- **总字段数**:79个 +- **中文field_code字段数**:4个(系统字段,保留) + - 年龄 (ID: 704553856941259783) + - 用户 (ID: 704553856941259782) + - 用户名称 (ID: 704553856941259780) + - 用户名称1 (ID: 704553856941259781) +- **重复字段名称数**:0个 +- **重复关联关系数**:0个 +- **使用中文field_code的关联关系数**:0个 + +### 字段映射对照 + +#### 基本信息字段 +- `target_name` - 被核查人姓名 +- `target_organization_and_position` - 被核查人员单位及职务 / 被核查人单位及职务 +- `target_organization` - 被核查人员单位 +- `target_position` - 被核查人员职务 +- `target_gender` - 被核查人员性别 +- `target_date_of_birth` - 被核查人员出生年月 +- `target_date_of_birth_full` - 被核查人员出生年月日 +- `target_age` - 被核查人员年龄 +- `target_education_level` - 被核查人员文化程度 +- `target_political_status` - 被核查人员政治面貌 +- `target_professional_rank` - 被核查人员职级 +- `target_id_number` - 被核查人员身份证号 / 被核查人员身份证件及号码 +- `target_address` - 被核查人员住址 +- `target_registered_address` - 被核查人员户籍住址 +- `target_contact` - 被核查人员联系方式 +- `target_place_of_origin` - 被核查人员籍贯 +- `target_ethnicity` - 被核查人员民族 + +#### 问题相关字段 +- `clue_source` - 线索来源 +- `target_issue_description` - 主要问题线索 +- `target_problem_description` - 被核查人问题描述 + +#### 审批相关字段 +- `department_opinion` - 初步核实审批表承办部门意见 +- `filler_name` - 初步核实审批表填表人 +- `approval_time` - 批准时间 + +#### 核查相关字段 +- `investigation_unit_name` - 核查单位名称 +- `investigation_team_code` - 核查组代号 +- `investigation_team_leader_name` - 核查组组长姓名 +- `investigation_team_member_names` - 核查组成员姓名 +- `investigation_location` - 核查地点 + +#### 风险评估相关字段 +- `target_family_situation` - 被核查人员家庭情况 +- `target_social_relations` - 被核查人员社会关系 +- `target_health_status` - 被核查人员健康状况 +- `target_personality` - 被核查人员性格特征 +- `target_tolerance` - 被核查人员承受能力 +- `target_issue_severity` - 被核查人员涉及问题严重程度 +- `target_other_issues_possibility` - 被核查人员涉及其他问题的可能性 +- `target_previous_investigation` - 被核查人员此前被审查情况 +- `target_negative_events` - 被核查人员社会负面事件 +- `target_other_situation` - 被核查人员其他情况 +- `risk_level` - 风险等级 + +#### 谈话相关字段(新增) +- `proposed_interview_location` - 拟谈话地点 +- `proposed_interview_time` - 拟谈话时间 +- `interview_reason` - 谈话事由 +- `interviewer` - 谈话人 +- `interview_personnel_safety_officer` - 谈话人员-安全员 +- `interview_personnel_leader` - 谈话人员-组长 +- `interview_personnel` - 谈话人员-谈话人员 +- `pre_interview_risk_assessment_result` - 谈话前安全风险评估结果 +- `interview_location` - 谈话地点 +- `interview_count` - 谈话次数 + +#### 其他新增字段 +- `target_education` - 被核查人员学历 +- `target_work_history` - 被核查人员工作履历 +- `target_occupation` - 被核查人员职业 +- `target_confession_level` - 被核查人员交代问题程度 +- `target_behavior_after_relief` - 被核查人员减压后的表现 +- `target_mental_burden_level` - 被核查人员思想负担程度 +- `target_behavior_during_interview` - 被核查人员谈话中的表现 +- `target_issue_severity_level` - 被核查人员问题严重程度 +- `target_risk_level` - 被核查人员风险等级 +- `target_basic_info` - 被核查人基本情况 +- `backup_personnel` - 补空人员 +- `recorder` - 记录人 +- `assessment_opinion` - 评估意见 + +## 关联表检查 + +### f_polic_file_field 表 +- ✅ 无重复关联关系 +- ✅ 所有关联关系使用的field_code均为英文 + +### f_polic_task 表 +- 检查了表结构,未发现直接引用字段ID的列 +- 表字段:id, tenant_id, task_name, input_data, output_data, task_status, created_time, created_by, updated_time, updated_by, state + +### f_polic_file 表 +- 检查了表结构 +- 表字段:id, tenant_id, task_id, file_id, name, input_data, file_path, created_time, created_by, updated_time, updated_by, state +- 未发现需要更新的关联关系 + +## 使用的脚本 + +1. **analyze_and_fix_field_code_issues.py** - 主要分析和修复脚本 +2. **verify_field_code_fix.py** - 验证修复结果 +3. **fix_only_chinese_field_codes.py** - 修复剩余的中文field_code +4. **rollback_incorrect_updates.py** - 回滚错误的更新(已使用) + +## 注意事项 + +1. **保留的系统字段**:以下4个字段的field_code仍为中文,这些可能是系统字段或测试数据,暂时保留: + - 年龄 + - 用户 + - 用户名称 + - 用户名称1 + +2. **字段合并**:在合并重复字段时,系统自动更新了 `f_polic_file_field` 表中的关联关系,将删除字段的关联关系指向保留的字段。 + +3. **数据一致性**:所有修复操作都确保了数据的一致性,关联表已同步更新。 + +## 后续建议 + +1. 如果"年龄"、"用户"等字段是业务字段,建议为其设置合适的英文field_code +2. 定期检查是否有新的中文field_code字段产生 +3. 在新增字段时,确保field_code使用英文命名规范 + +## 完成状态 + +✅ **主要修复任务已完成** +- 所有业务相关字段的field_code已更新为英文 +- 重复字段已合并 +- 关联表已同步更新 +- 数据一致性已确保 + diff --git a/性别和年龄字段缺失问题深度修复.md b/性别和年龄字段缺失问题深度修复.md new file mode 100644 index 0000000..8a5d39a --- /dev/null +++ b/性别和年龄字段缺失问题深度修复.md @@ -0,0 +1,147 @@ +# 性别和年龄字段缺失问题深度修复 + +## 问题描述 + +测试数据中明明有"男性"、"男"、"年龄44岁"等明确信息,但解析结果中`target_gender`和`target_age`都是空。 + +## 根本原因分析 + +### 问题1:后处理逻辑无法访问原始输入文本 + +**问题**: +- 后处理函数`_post_process_inferred_fields`只能访问模型返回的JSON解析结果(`data`) +- 如果模型根本没有提取这些字段,后处理也无法从原始输入文本中提取 +- 后处理逻辑只能从已提取的数据中推断,无法访问原始prompt + +**影响**: +- 即使原始输入文本中明确有"男性"、"年龄44岁"等信息 +- 如果模型没有提取,后处理也无法补充 + +### 问题2:模型可能没有正确提取 + +虽然我们强化了system prompt,但模型可能仍然: +- 忽略了某些字段 +- 返回了空值 +- 字段名错误导致规范化失败 + +## 修复方案 + +### 1. 增强后处理逻辑,支持从原始输入文本提取 ✅ + +**修改位置**:`services/ai_service.py` 第1236-1350行 + +**改进内容**: + +1. **修改函数签名**,增加`prompt`参数: +```python +def _post_process_inferred_fields(self, data: Dict, output_fields: List[Dict], prompt: str = None) -> Dict: +``` + +2. **从原始输入文本中提取性别**: +```python +# 如果仍然没有,尝试从原始输入文本(prompt)中提取 +if (not data.get('target_gender') or data.get('target_gender') == '') and prompt: + # 从prompt中提取输入文本部分(通常在"输入文本:"之后) + input_text_match = re.search(r'输入文本[::]\s*\n(.*?)(?:\n\n需要提取的字段|$)', prompt, re.DOTALL) + if input_text_match: + input_text = input_text_match.group(1) + # 匹配性别关键词:男性、女性、男、女等 + if re.search(r'\b男性\b|\b男\b', input_text) and not re.search(r'\b女性\b|\b女\b', input_text): + data['target_gender'] = '男' + elif re.search(r'\b女性\b|\b女\b', input_text) and not re.search(r'\b男性\b|\b男\b', input_text): + data['target_gender'] = '女' + elif re.search(r'[,,]\s*([男女])\s*[,,]', input_text): + gender_match = re.search(r'[,,]\s*([男女])\s*[,,]', input_text) + if gender_match: + data['target_gender'] = gender_match.group(1) +``` + +3. **从原始输入文本中提取年龄**: +```python +# 如果还没有,尝试从原始输入文本中直接提取年龄 +if (not data.get('target_age') or data.get('target_age') == '') and prompt: + input_text_match = re.search(r'输入文本[::]\s*\n(.*?)(?:\n\n需要提取的字段|$)', prompt, re.DOTALL) + if input_text_match: + input_text = input_text_match.group(1) + # 匹配年龄模式:年龄44岁、44岁、年龄44等 + age_match = re.search(r'年龄\s*(\d+)\s*岁|(\d+)\s*岁|年龄\s*(\d+)', input_text) + if age_match: + age = age_match.group(1) or age_match.group(2) or age_match.group(3) + if age: + data['target_age'] = str(age) +``` + +4. **更新所有调用点**,传入`prompt`参数: +```python +# 修改前 +normalized_data = self._post_process_inferred_fields(normalized_data, output_fields) + +# 修改后 +normalized_data = self._post_process_inferred_fields(normalized_data, output_fields, prompt) +``` + +### 2. 提取逻辑的优先级 + +后处理逻辑按以下优先级提取字段: + +**对于性别(target_gender)**: +1. 从`target_work_basic_info`中提取(匹配`XXX,男,...`模式) +2. 从所有已提取的文本字段中查找(使用正则表达式) +3. **从原始输入文本中提取**(新增) + +**对于年龄(target_age)**: +1. 从`target_date_of_birth`计算(根据出生年月和当前年份) +2. **从原始输入文本中直接提取**(新增,匹配"年龄44岁"等模式) + +## 预期效果 + +1. **提高字段提取成功率** + - 即使模型没有提取,后处理也能从原始输入文本中提取 + - 多层保障确保关键字段不会为空 + +2. **增强容错能力** + - 不依赖模型的提取准确性 + - 即使模型返回空值,也能从原始输入中补充 + +3. **提高数据完整性** + - 确保性别、年龄等关键字段有值 + - 减少空值的情况 + +## 测试建议 + +1. **功能测试** + - 使用包含"男性"、"年龄44岁"的测试数据 + - 验证后处理是否能从原始输入文本中提取 + - 检查日志输出,确认提取来源 + +2. **边界测试** + - 测试性别信息在不同位置的情况 + - 测试年龄的不同表述方式("44岁"、"年龄44"、"年龄44岁"等) + - 测试模型返回空值的情况 + +3. **日志检查** + - 查看日志中的"后处理"信息 + - 确认是从哪个来源提取的字段 + - 验证提取逻辑是否正确执行 + +## 调试建议 + +如果问题仍然存在,可以: + +1. **检查日志输出** + - 查看`[AI服务] 后处理:从原始输入文本中提取...`的日志 + - 确认prompt是否正确传入 + - 确认正则表达式是否匹配成功 + +2. **手动测试正则表达式** + - 测试`r'输入文本[::]\s*\n(.*?)(?:\n\n需要提取的字段|$)'`是否能正确提取输入文本 + - 测试性别和年龄的正则表达式是否能匹配 + +3. **检查prompt格式** + - 确认prompt中确实包含"输入文本:"标签 + - 确认输入文本的格式是否符合预期 + +## 总结 + +通过增强后处理逻辑,让它能够访问原始输入文本(prompt),即使模型没有正确提取字段,也能从原始输入中补充。这提供了多层保障,确保关键字段不会为空。 + diff --git a/恢复数据库.bat b/恢复数据库.bat new file mode 100644 index 0000000..cae57a8 --- /dev/null +++ b/恢复数据库.bat @@ -0,0 +1,41 @@ +@echo off +chcp 65001 >nul +echo ======================================== +echo 数据库恢复工具 +echo ======================================== +echo. +echo 警告: 恢复操作会覆盖现有数据! +echo. + +REM 检查Python是否安装 +python --version >nul 2>&1 +if errorlevel 1 ( + echo 错误: 未找到Python,请先安装Python + pause + exit /b 1 +) + +REM 检查是否提供了备份文件路径 +if "%~1"=="" ( + echo 用法: 恢复数据库.bat [备份文件路径] + echo. + echo 示例: + echo 恢复数据库.bat backups\backup_finyx_20241205_120000.sql + echo 恢复数据库.bat backups\backup_finyx_20241205_120000.sql.gz + echo. + echo 可用的备份文件: + python backup_database.py --list + echo. + pause + exit /b 1 +) + +REM 执行恢复 +python restore_database.py "%~1" + +echo. +echo ======================================== +echo 恢复完成! +echo ======================================== +pause + diff --git a/数据库备份恢复说明.md b/数据库备份恢复说明.md new file mode 100644 index 0000000..3c0b696 --- /dev/null +++ b/数据库备份恢复说明.md @@ -0,0 +1,216 @@ +# 数据库备份和恢复工具使用说明 + +## 概述 + +本项目提供了两个Python脚本用于MySQL数据库的备份和恢复: +- `backup_database.py` - 数据库备份脚本 +- `restore_database.py` - 数据库恢复脚本 + +## 功能特性 + +### 备份功能 +- ✅ 支持使用 `mysqldump` 命令备份(推荐,速度快) +- ✅ 支持使用 Python 直接连接备份(备用方案) +- ✅ 自动检测可用方法(auto模式) +- ✅ 支持压缩备份文件(.sql.gz格式) +- ✅ 备份包含表结构、数据、存储过程、触发器、事件等 +- ✅ 自动生成带时间戳的备份文件名 +- ✅ 列出所有备份文件 + +### 恢复功能 +- ✅ 支持使用 `mysql` 命令恢复(推荐,速度快) +- ✅ 支持使用 Python 直接连接恢复(备用方案) +- ✅ 自动检测可用方法(auto模式) +- ✅ 支持恢复压缩的备份文件(.sql.gz格式) +- ✅ 可选择恢复前删除现有数据库 +- ✅ 测试数据库连接功能 + +## 环境要求 + +- Python 3.6+ +- pymysql 库(已包含在 requirements.txt 中) +- MySQL客户端工具(可选,用于mysqldump/mysql命令) +- 数据库连接配置(通过环境变量或默认配置) + +## 安装依赖 + +```bash +pip install pymysql python-dotenv +``` + +## 使用方法 + +### 1. 数据库备份 + +#### 基本用法(自动选择方法) +```bash +python backup_database.py +``` + +#### 指定备份方法 +```bash +# 使用mysqldump命令备份 +python backup_database.py --method mysqldump + +# 使用Python方式备份 +python backup_database.py --method python +``` + +#### 指定输出文件 +```bash +python backup_database.py --output backups/my_backup.sql +``` + +#### 压缩备份文件 +```bash +python backup_database.py --compress +``` + +#### 列出所有备份文件 +```bash +python backup_database.py --list +``` + +#### 完整示例 +```bash +# 使用mysqldump备份并压缩 +python backup_database.py --method mysqldump --compress --output backups/finyx_backup.sql.gz +``` + +### 2. 数据库恢复 + +#### 基本用法(自动选择方法) +```bash +python restore_database.py backups/backup_finyx_20241205_120000.sql +``` + +#### 指定恢复方法 +```bash +# 使用mysql命令恢复 +python restore_database.py backups/backup.sql --method mysql + +# 使用Python方式恢复 +python restore_database.py backups/backup.sql --method python +``` + +#### 恢复压缩的备份文件 +```bash +python restore_database.py backups/backup.sql.gz +``` + +#### 恢复前删除现有数据库(危险操作) +```bash +python restore_database.py backups/backup.sql --drop-db +``` + +#### 测试数据库连接 +```bash +python restore_database.py --test +``` + +#### 完整示例 +```bash +# 恢复压缩的备份文件,恢复前删除现有数据库 +python restore_database.py backups/backup.sql.gz --drop-db --method mysql +``` + +## 备份文件存储 + +- 默认备份目录:`backups/` +- 备份文件命名格式:`backup_{数据库名}_{时间戳}.sql` +- 压缩文件格式:`backup_{数据库名}_{时间戳}.sql.gz` +- 时间戳格式:`YYYYMMDD_HHMMSS` + +## 数据库配置 + +脚本会自动从以下位置读取数据库配置: + +1. **环境变量**(优先): + - `DB_HOST` - 数据库主机(默认: 152.136.177.240) + - `DB_PORT` - 数据库端口(默认: 5012) + - `DB_USER` - 数据库用户名(默认: finyx) + - `DB_PASSWORD` - 数据库密码(默认: 6QsGK6MpePZDE57Z) + - `DB_NAME` - 数据库名称(默认: finyx) + +2. **.env文件**: + 在项目根目录创建 `.env` 文件: + ```env + DB_HOST=152.136.177.240 + DB_PORT=5012 + DB_USER=finyx + DB_PASSWORD=6QsGK6MpePZDE57Z + DB_NAME=finyx + ``` + +## 注意事项 + +### 备份注意事项 +1. ⚠️ 备份大数据库时可能需要较长时间,请耐心等待 +2. ⚠️ 确保有足够的磁盘空间存储备份文件 +3. ⚠️ 建议定期备份,并保留多个备份版本 +4. ⚠️ 生产环境建议使用压缩备份以节省空间 + +### 恢复注意事项 +1. ⚠️ **恢复操作会覆盖现有数据,请谨慎操作!** +2. ⚠️ 恢复前建议先备份当前数据库 +3. ⚠️ 使用 `--drop-db` 选项会删除整个数据库,请确认后再操作 +4. ⚠️ 恢复大数据库时可能需要较长时间 +5. ⚠️ 恢复过程中请勿中断,否则可能导致数据不一致 + +## 常见问题 + +### Q1: 提示找不到 mysqldump 命令? +**A:** 确保MySQL客户端已安装并在系统PATH中。如果未安装,脚本会自动切换到Python方式备份。 + +### Q2: 备份文件太大怎么办? +**A:** 使用 `--compress` 选项压缩备份文件,通常可以节省50-80%的空间。 + +### Q3: 恢复时提示表已存在错误? +**A:** 使用 `--drop-db` 选项先删除数据库再恢复,或者手动删除相关表。 + +### Q4: 如何定时自动备份? +**A:** 可以使用操作系统的定时任务功能(如Windows的计划任务、Linux的cron): +```bash +# Linux crontab示例(每天凌晨2点备份) +0 2 * * * cd /path/to/project && python backup_database.py --compress +``` + +### Q5: 备份文件可以恢复到其他数据库吗? +**A:** 可以,修改环境变量中的 `DB_NAME` 或直接编辑备份文件中的数据库名称。 + +## 示例场景 + +### 场景1: 日常备份 +```bash +# 每天自动备份并压缩 +python backup_database.py --compress +``` + +### 场景2: 迁移数据库 +```bash +# 1. 备份源数据库 +python backup_database.py --output migration_backup.sql + +# 2. 修改配置指向目标数据库 + +# 3. 恢复备份到目标数据库 +python restore_database.py migration_backup.sql --drop-db +``` + +### 场景3: 数据恢复 +```bash +# 1. 查看可用备份 +python backup_database.py --list + +# 2. 恢复指定备份 +python restore_database.py backups/backup_finyx_20241205_120000.sql +``` + +## 技术支持 + +如有问题,请检查: +1. 数据库连接配置是否正确 +2. 数据库服务是否正常运行 +3. 是否有足够的磁盘空间 +4. 是否有数据库操作权限 + diff --git a/模板树状结构更新说明.md b/模板树状结构更新说明.md new file mode 100644 index 0000000..b0b2c95 --- /dev/null +++ b/模板树状结构更新说明.md @@ -0,0 +1,180 @@ +# 模板树状结构更新说明 + +## 概述 + +根据 `template_finish` 目录结构,更新数据库 `f_polic_file_config` 表中的 `parent_id` 字段,建立树状层级结构。 + +## 目录结构示例 + +``` +template_finish/ +└── 2-初核模版/ (一级) + ├── 1.初核请示/ (二级) + │ ├── 1.请示报告卡(XXX).docx + │ ├── 2.初步核实审批表(XXX).docx + │ └── 3.附件初核方案(XXX).docx + ├── 2.谈话审批/ (二级) + │ ├── 谈话通知书/ (三级) + │ │ ├── 谈话通知书第一联.docx + │ │ ├── 谈话通知书第二联.docx + │ │ └── 谈话通知书第三联.docx + │ ├── 走读式谈话审批/ (三级) + │ │ ├── 1.请示报告卡(初核谈话).docx + │ │ ├── 2谈话审批表.docx + │ │ └── ... + │ └── 走读式谈话流程/ (三级) + │ ├── 1.谈话笔录.docx + │ └── ... + └── 3.初核结论/ (二级) + ├── 8-1请示报告卡(初核报告结论) .docx + └── 8.XXX初核情况报告.docx +``` + +## 脚本说明 + +### 1. analyze_and_update_template_tree.py + +**功能:** 分析目录结构和数据库数据,生成 SQL 更新脚本 + +**使用方法:** +```bash +python analyze_and_update_template_tree.py +``` + +**输出:** +- 分析报告(控制台输出) +- `update_template_tree.sql` - SQL 更新脚本 + +**特点:** +- 只生成 SQL 脚本,不直接修改数据库 +- 可以手动检查 SQL 脚本后再执行 + +### 2. update_template_tree.py + +**功能:** 分析并直接更新数据库(带预览和确认) + +**使用方法:** +```bash +python update_template_tree.py +``` + +**特点:** +- 交互式操作,先预览再确认 +- 支持模拟模式(dry-run) +- 自动按层级顺序更新 +- 更安全的更新流程 + +## 更新逻辑 + +1. **目录节点**:根据目录名称匹配数据库记录,如果不存在则创建 +2. **文件节点**:优先通过 `template_code` 匹配,其次通过文件名匹配 +3. **层级关系**:按照目录结构的层级关系设置 `parent_id` + - 一级目录:`parent_id = NULL` + - 二级目录:`parent_id = 一级目录的ID` + - 三级目录:`parent_id = 二级目录的ID` + - 文件:`parent_id = 所在目录的ID` + +## 执行步骤 + +### 方法一:使用 SQL 脚本(推荐用于生产环境) + +1. 运行分析脚本: + ```bash + python analyze_and_update_template_tree.py + ``` + +2. 检查生成的 SQL 脚本: + ```bash + # 查看 update_template_tree.sql + ``` + +3. 备份数据库(重要!) + +4. 执行 SQL 脚本: + ```sql + -- 在 MySQL 客户端中执行 + source update_template_tree.sql; + ``` + +### 方法二:使用 Python 脚本(推荐用于测试环境) + +1. 运行更新脚本: + ```bash + python update_template_tree.py + ``` + +2. 查看预览信息 + +3. 输入 `yes` 确认执行 + +4. 再次确认执行实际更新 + +## 注意事项 + +1. **备份数据库**:执行更新前务必备份数据库 +2. **检查匹配**:确保目录和文件名与数据库中的记录能够正确匹配 +3. **层级顺序**:更新会按照层级顺序执行,确保父节点先于子节点创建/更新 +4. **重复执行**:脚本支持重复执行,已正确设置 `parent_id` 的记录会被跳过 + +## 数据库表结构 + +`f_polic_file_config` 表的关键字段: +- `id`: 主键 +- `tenant_id`: 租户ID(固定值:615873064429507639) +- `parent_id`: 父节点ID(NULL 表示根节点) +- `name`: 名称 +- `template_code`: 模板编码(文件节点使用) +- `input_data`: JSON格式的配置数据 +- `file_path`: MinIO文件路径 + +## 问题排查 + +### 问题1:某些文件无法匹配 + +**原因:** 文件名或 `template_code` 不匹配 + +**解决:** 检查 `DOCUMENT_TYPE_MAPPING` 字典,确保文件名映射正确 + +### 问题2:目录节点重复创建 + +**原因:** 数据库中已存在同名目录节点,但脚本未正确匹配 + +**解决:** 检查数据库中的记录,确保名称完全一致(包括空格和标点) + +### 问题3:parent_id 更新失败 + +**原因:** 父节点ID不存在或层级关系错误 + +**解决:** 检查生成的 SQL 脚本,确认父节点ID是否正确 + +## 验证更新结果 + +执行更新后,可以使用以下 SQL 查询验证: + +```sql +-- 查看树状结构 +SELECT + id, + name, + parent_id, + template_code, + (SELECT name FROM f_polic_file_config p2 WHERE p2.id = p1.parent_id) as parent_name +FROM f_polic_file_config p1 +WHERE tenant_id = 615873064429507639 +ORDER BY parent_id, name; + +-- 查看缺少 parent_id 的记录(应该只有根节点) +SELECT id, name, parent_id +FROM f_polic_file_config +WHERE tenant_id = 615873064429507639 + AND parent_id IS NULL + AND name NOT LIKE '%-%'; -- 排除一级目录 +``` + +## 联系信息 + +如有问题,请检查: +1. 数据库连接配置是否正确 +2. 目录结构是否与预期一致 +3. 数据库中的记录是否完整 +