From 420baca63de67927f70a963ee5ad638ba4a82fe7 Mon Sep 17 00:00:00 2001 From: wuwenbo Date: Tue, 30 Jun 2026 15:22:25 +0800 Subject: [PATCH] Document AI Director optimization loop --- .codex/skills/th1-ai-director/SKILL.md | 101 +++++++++++++++++++++++++ 1 file changed, 101 insertions(+) diff --git a/.codex/skills/th1-ai-director/SKILL.md b/.codex/skills/th1-ai-director/SKILL.md index 70923ab0a..3d816aeb1 100644 --- a/.codex/skills/th1-ai-director/SKILL.md +++ b/.codex/skills/th1-ai-director/SKILL.md @@ -43,6 +43,14 @@ python .codex/skills/th1-ai-director/scripts/analyze_ai_director_log.py --last 5 python .codex/skills/th1-ai-director/scripts/analyze_ai_director_log.py --json ``` +For batch-level quality work, use the batch analyzer first: + +```powershell +python .codex/skills/th1-ai-director/scripts/analyze_ai_batch_quality.py +python .codex/skills/th1-ai-director/scripts/analyze_ai_batch_quality.py --batch Unity/Logs/AI_Batch/YYYYMMDD_HHMMSS/batch_summary.json --top 12 +python .codex/skills/th1-ai-director/scripts/analyze_ai_batch_quality.py --json +``` + Read the analyzer output in this order: 1. Latest log path and event counts. @@ -52,6 +60,99 @@ Read the analyzer output in this order: 5. No-effect successful actions. 6. Last execution rows. +Read the batch analyzer output in this order: + +1. `Games` success/failure count and whether any run timed out or was manually interrupted. +2. Runtime throughput: `avgGame`, `actions/sec`, `frames/sec`, `actions/turn`. +3. Outcome shape: `avgTurn`, `avgSurvivors`, eliminations, and winners if present. +4. Expansion: alive average city count, all-player average city count, max city count, alive `>=2` and `>=3` city ratios. +5. Power and attrition: alive unit count, score p10/p50/p90, kills, acting-unit deaths. +6. Action quality: no-effect actions, repeated stable actions, max actions per player turn. +7. Decision time: average, p95, max, and top lanes/actions. + +Do not judge AI quality from a single interrupted or partial run. The batch analyzer filters incomplete logs when the summary only contains completed games, but final before/after claims should use completed batches with the same options. + +## Self-Optimization Loop + +Use this loop for all long-running AI intelligence work. The goal is not just "no crash"; the AI must become measurably stronger without becoming slower or more brittle. + +1. Establish a baseline before changing behavior. + - Use the same map size, player count, turn limit, difficulty, and action budgets that will be used after the patch. + - Save the `Unity/Logs/AI_Batch//batch_summary.json` path and the batch analyzer output. + - If there is no usable baseline, run at least one compact baseline batch first. + +2. Run a bounded local batch, never an open-ended Unity session. + - Close the visible Unity Editor before batchmode runs. + - Always use explicit `-TimeoutSeconds`, `-MaxActions`, and `-MaxActionsPerPlayerTurn`. + - Prefer `-KeepGoing` for quality loops so one bad game still yields evidence. + - If Unity appears slow, inspect `unity_batch.log`, `batch_summary.json`, latest JSONL logs, and the Unity process before killing it. + - After killing or interrupting a batch, confirm no `Unity.exe` process remains. + +3. Analyze metrics before reading raw logs. + - Use `analyze_ai_batch_quality.py` for the batch. + - Use `analyze_ai_director_log.py` only after the batch report points to repeated actions, no-effect actions, outlier players, or slow lanes. + - If the log is too sparse to explain the issue, improve diagnostics before guessing at AI logic. + +4. Classify the problem type. + - Correctness: failed games, exceptions, null references, action timeout, action budget stop, no-effect successful actions, illegal repeated actions. + - Intelligence: weak expansion, low city count, poor unit count, bad attrition, obvious idle/economic waste, bad hero task timing, over-defense, under-attack. + - Performance: low actions/sec, high avg game runtime, high decision p95/max, too many actions per player turn, oversized JSONL output. + - Noise: legal repeated city upgrades, normal tactical move/attack chains, expected Steam warmup messages, incomplete logs from killed batches. + +5. Fix in the right layer. + - If the design intent is wrong, update `18-AI导演系统策划文档.md` and `19-AI导演系统逻辑语言.md` before or alongside code. + - If action availability is wrong for every caller, fix `CheckCan` or action semantics. + - If the action is only wrong for AI, filter/scoring-gate it in AI generation or Director indexing. + - If scoring is wrong, adjust lane priority, target value, or cache features; do not add broad hardcoded hacks that only satisfy one replay. + - If performance is wrong, prefer caching, indexed lookup, action-pool pruning, and cheaper diagnostics before reducing strategic search quality. + +6. Re-run the same batch options and compare. + - A change only counts as an AI improvement when the target metric improves and no core guardrail regresses. + - If a performance shortcut improves p95 but hurts expansion/city count/action quality, revert it or redesign it. + - Keep failed experiments out of the final patch; mention them in the final report only when they explain the chosen direction. + +7. Commit only coherent, verified changes. + - Stage AI code, docs, scripts, and diagnostics changes that belong together. + - Do not stage Unity auto-generated side effects such as `Unity.sln`, `ProjectSettings.asset`, or `packages-lock.json` unless they are intentionally part of the task. + +## Quality Metrics + +Primary guardrails: + +- `failedGames` must be `0` for normal quality claims. +- `noEffect` should be `0`; any nonzero value needs raw-log explanation. +- `repeated` should be `0` after excluding known legal repeats such as same-turn `CityLevelUpAction:Park` consuming city upgrade points. +- `maxActions/playerTurn` should stay well below the forced-stop budget; investigate anything above `80`. +- No AI loop, forced AI stop, fatal exception, or unresolved null reference is acceptable. + +Intelligence targets for compact 17-player, 20x20, 20-turn Director batches: + +- Expansion should not trigger `LOW_EXPANSION`; target `aliveAvgCities >= 1.35`. +- Second-city rate should not trigger `FEW_SECOND_CITIES`; target alive `>=2` city ratio at least `25%`. +- Track max city count and alive `>=3` city ratio as snowball signals, but do not overfit to one high-roll player. +- Unit count and score p10/p50/p90 should not collapse while expansion improves. +- Attrition should be interpreted with context: more kills and more deaths may indicate stronger aggression, not necessarily worse play. + +Performance targets: + +- `actions/sec` should stay at or above `15` in compact batches. +- Decision p95 above `60ms` is a warning and above `100ms` should be treated as an optimization target. +- If p95 is high and top slow lanes are Front/Expansion/Emergency, inspect action generation, move lookup, `CheckCan`, and world-cache computation before trimming strategic behavior. +- Large diagnostic output is acceptable for local debug, but batch analysis JSON should remain compact enough for automated comparison. + +Recommended batch commands: + +```powershell +# Fast smoke after compile-sensitive changes. +Tools/RunAIDirectorBatch.ps1 -Games 1 -Players 2 -Turns 1 -TimeoutSeconds 60 + +# Compact quality loop for before/after comparison. +Tools/RunAIDirectorBatch.ps1 -Games 3 -Players 17 -Width 20 -Height 20 -Turns 20 -TimeoutSeconds 420 -MaxActions 9000 -MaxActionsPerPlayerTurn 120 -Difficulty LUNATIC -KeepGoing + +# Larger confidence loop when compact metrics look good. +Tools/RunAIDirectorBatch.ps1 -Games 5 -Players 17 -Width 30 -Height 30 -Turns 30 -TimeoutSeconds 900 -MaxActions 16000 -MaxActionsPerPlayerTurn 160 -Difficulty LUNATIC -KeepGoing +``` + ## Infinite Loop Triage Classify the repeated action before patching: