From 420baca63de67927f70a963ee5ad638ba4a82fe7 Mon Sep 17 00:00:00 2001
From: wuwenbo <wuwenbo@bilibili.com>
Date: Tue, 30 Jun 2026 15:22:25 +0800
Subject: [PATCH] Document AI Director optimization loop

---
 .codex/skills/th1-ai-director/SKILL.md | 101 +++++++++++++++++++++++++
 1 file changed, 101 insertions(+)
diff --git a/.codex/skills/th1-ai-director/SKILL.md b/.codex/skills/th1-ai-director/SKILL.md
index 70923ab0a..3d816aeb1 100644
--- a/.codex/skills/th1-ai-director/SKILL.md
+++ b/.codex/skills/th1-ai-director/SKILL.md
@@ -43,6 +43,14 @@ python .codex/skills/th1-ai-director/scripts/analyze_ai_director_log.py --last 5
 python .codex/skills/th1-ai-director/scripts/analyze_ai_director_log.py --json
 ```
 
+For batch-level quality work, use the batch analyzer first:
+
+```powershell
+python .codex/skills/th1-ai-director/scripts/analyze_ai_batch_quality.py
+python .codex/skills/th1-ai-director/scripts/analyze_ai_batch_quality.py --batch Unity/Logs/AI_Batch/YYYYMMDD_HHMMSS/batch_summary.json --top 12
+python .codex/skills/th1-ai-director/scripts/analyze_ai_batch_quality.py --json
+```
+
 Read the analyzer output in this order:
 
 1. Latest log path and event counts.
@@ -52,6 +60,99 @@ Read the analyzer output in this order:
 5. No-effect successful actions.
 6. Last execution rows.
 
+Read the batch analyzer output in this order:
+
+1. `Games` success/failure count and whether any run timed out or was manually interrupted.
+2. Runtime throughput: `avgGame`, `actions/sec`, `frames/sec`, `actions/turn`.
+3. Outcome shape: `avgTurn`, `avgSurvivors`, eliminations, and winners if present.
+4. Expansion: alive average city count, all-player average city count, max city count, alive `>=2` and `>=3` city ratios.
+5. Power and attrition: alive unit count, score p10/p50/p90, kills, acting-unit deaths.
+6. Action quality: no-effect actions, repeated stable actions, max actions per player turn.
+7. Decision time: average, p95, max, and top lanes/actions.
+
+Do not judge AI quality from a single interrupted or partial run. The batch analyzer filters incomplete logs when the summary only contains completed games, but final before/after claims should use completed batches with the same options.
+
+## Self-Optimization Loop
+
+Use this loop for all long-running AI intelligence work. The goal is not just "no crash"; the AI must become measurably stronger without becoming slower or more brittle.
+
+1. Establish a baseline before changing behavior.
+   - Use the same map size, player count, turn limit, difficulty, and action budgets that will be used after the patch.
+   - Save the `Unity/Logs/AI_Batch/<timestamp>/batch_summary.json` path and the batch analyzer output.
+   - If there is no usable baseline, run at least one compact baseline batch first.
+
+2. Run a bounded local batch, never an open-ended Unity session.
+   - Close the visible Unity Editor before batchmode runs.
+   - Always use explicit `-TimeoutSeconds`, `-MaxActions`, and `-MaxActionsPerPlayerTurn`.
+   - Prefer `-KeepGoing` for quality loops so one bad game still yields evidence.
+   - If Unity appears slow, inspect `unity_batch.log`, `batch_summary.json`, latest JSONL logs, and the Unity process before killing it.
+   - After killing or interrupting a batch, confirm no `Unity.exe` process remains.
+
+3. Analyze metrics before reading raw logs.
+   - Use `analyze_ai_batch_quality.py` for the batch.
+   - Use `analyze_ai_director_log.py` only after the batch report points to repeated actions, no-effect actions, outlier players, or slow lanes.
+   - If the log is too sparse to explain the issue, improve diagnostics before guessing at AI logic.
+
+4. Classify the problem type.
+   - Correctness: failed games, exceptions, null references, action timeout, action budget stop, no-effect successful actions, illegal repeated actions.
+   - Intelligence: weak expansion, low city count, poor unit count, bad attrition, obvious idle/economic waste, bad hero task timing, over-defense, under-attack.
+   - Performance: low actions/sec, high avg game runtime, high decision p95/max, too many actions per player turn, oversized JSONL output.
+   - Noise: legal repeated city upgrades, normal tactical move/attack chains, expected Steam warmup messages, incomplete logs from killed batches.
+
+5. Fix in the right layer.
+   - If the design intent is wrong, update `18-AI导演系统策划文档.md` and `19-AI导演系统逻辑语言.md` before or alongside code.
+   - If action availability is wrong for every caller, fix `CheckCan` or action semantics.
+   - If the action is only wrong for AI, filter/scoring-gate it in AI generation or Director indexing.
+   - If scoring is wrong, adjust lane priority, target value, or cache features; do not add broad hardcoded hacks that only satisfy one replay.
+   - If performance is wrong, prefer caching, indexed lookup, action-pool pruning, and cheaper diagnostics before reducing strategic search quality.
+
+6. Re-run the same batch options and compare.
+   - A change only counts as an AI improvement when the target metric improves and no core guardrail regresses.
+   - If a performance shortcut improves p95 but hurts expansion/city count/action quality, revert it or redesign it.
+   - Keep failed experiments out of the final patch; mention them in the final report only when they explain the chosen direction.
+
+7. Commit only coherent, verified changes.
+   - Stage AI code, docs, scripts, and diagnostics changes that belong together.
+   - Do not stage Unity auto-generated side effects such as `Unity.sln`, `ProjectSettings.asset`, or `packages-lock.json` unless they are intentionally part of the task.
+
+## Quality Metrics
+
+Primary guardrails:
+
+- `failedGames` must be `0` for normal quality claims.
+- `noEffect` should be `0`; any nonzero value needs raw-log explanation.
+- `repeated` should be `0` after excluding known legal repeats such as same-turn `CityLevelUpAction:Park` consuming city upgrade points.
+- `maxActions/playerTurn` should stay well below the forced-stop budget; investigate anything above `80`.
+- No AI loop, forced AI stop, fatal exception, or unresolved null reference is acceptable.
+
+Intelligence targets for compact 17-player, 20x20, 20-turn Director batches:
+
+- Expansion should not trigger `LOW_EXPANSION`; target `aliveAvgCities >= 1.35`.
+- Second-city rate should not trigger `FEW_SECOND_CITIES`; target alive `>=2` city ratio at least `25%`.
+- Track max city count and alive `>=3` city ratio as snowball signals, but do not overfit to one high-roll player.
+- Unit count and score p10/p50/p90 should not collapse while expansion improves.
+- Attrition should be interpreted with context: more kills and more deaths may indicate stronger aggression, not necessarily worse play.
+
+Performance targets:
+
+- `actions/sec` should stay at or above `15` in compact batches.
+- Decision p95 above `60ms` is a warning and above `100ms` should be treated as an optimization target.
+- If p95 is high and top slow lanes are Front/Expansion/Emergency, inspect action generation, move lookup, `CheckCan`, and world-cache computation before trimming strategic behavior.
+- Large diagnostic output is acceptable for local debug, but batch analysis JSON should remain compact enough for automated comparison.
+
+Recommended batch commands:
+
+```powershell
+# Fast smoke after compile-sensitive changes.
+Tools/RunAIDirectorBatch.ps1 -Games 1 -Players 2 -Turns 1 -TimeoutSeconds 60
+
+# Compact quality loop for before/after comparison.
+Tools/RunAIDirectorBatch.ps1 -Games 3 -Players 17 -Width 20 -Height 20 -Turns 20 -TimeoutSeconds 420 -MaxActions 9000 -MaxActionsPerPlayerTurn 120 -Difficulty LUNATIC -KeepGoing
+
+# Larger confidence loop when compact metrics look good.
+Tools/RunAIDirectorBatch.ps1 -Games 5 -Players 17 -Width 30 -Height 30 -Turns 30 -TimeoutSeconds 900 -MaxActions 16000 -MaxActionsPerPlayerTurn 160 -Difficulty LUNATIC -KeepGoing
+```
+
 ## Infinite Loop Triage
 
 Classify the repeated action before patching: