Document AI Director optimization loop

This commit is contained in:
wuwenbo 2026-06-30 15:22:25 +08:00
parent 9649658e2f
commit 420baca63d

View File

@ -43,6 +43,14 @@ python .codex/skills/th1-ai-director/scripts/analyze_ai_director_log.py --last 5
python .codex/skills/th1-ai-director/scripts/analyze_ai_director_log.py --json
```
For batch-level quality work, use the batch analyzer first:
```powershell
python .codex/skills/th1-ai-director/scripts/analyze_ai_batch_quality.py
python .codex/skills/th1-ai-director/scripts/analyze_ai_batch_quality.py --batch Unity/Logs/AI_Batch/YYYYMMDD_HHMMSS/batch_summary.json --top 12
python .codex/skills/th1-ai-director/scripts/analyze_ai_batch_quality.py --json
```
Read the analyzer output in this order:
1. Latest log path and event counts.
@ -52,6 +60,99 @@ Read the analyzer output in this order:
5. No-effect successful actions.
6. Last execution rows.
Read the batch analyzer output in this order:
1. `Games` success/failure count and whether any run timed out or was manually interrupted.
2. Runtime throughput: `avgGame`, `actions/sec`, `frames/sec`, `actions/turn`.
3. Outcome shape: `avgTurn`, `avgSurvivors`, eliminations, and winners if present.
4. Expansion: alive average city count, all-player average city count, max city count, alive `>=2` and `>=3` city ratios.
5. Power and attrition: alive unit count, score p10/p50/p90, kills, acting-unit deaths.
6. Action quality: no-effect actions, repeated stable actions, max actions per player turn.
7. Decision time: average, p95, max, and top lanes/actions.
Do not judge AI quality from a single interrupted or partial run. The batch analyzer filters incomplete logs when the summary only contains completed games, but final before/after claims should use completed batches with the same options.
## Self-Optimization Loop
Use this loop for all long-running AI intelligence work. The goal is not just "no crash"; the AI must become measurably stronger without becoming slower or more brittle.
1. Establish a baseline before changing behavior.
- Use the same map size, player count, turn limit, difficulty, and action budgets that will be used after the patch.
- Save the `Unity/Logs/AI_Batch/<timestamp>/batch_summary.json` path and the batch analyzer output.
- If there is no usable baseline, run at least one compact baseline batch first.
2. Run a bounded local batch, never an open-ended Unity session.
- Close the visible Unity Editor before batchmode runs.
- Always use explicit `-TimeoutSeconds`, `-MaxActions`, and `-MaxActionsPerPlayerTurn`.
- Prefer `-KeepGoing` for quality loops so one bad game still yields evidence.
- If Unity appears slow, inspect `unity_batch.log`, `batch_summary.json`, latest JSONL logs, and the Unity process before killing it.
- After killing or interrupting a batch, confirm no `Unity.exe` process remains.
3. Analyze metrics before reading raw logs.
- Use `analyze_ai_batch_quality.py` for the batch.
- Use `analyze_ai_director_log.py` only after the batch report points to repeated actions, no-effect actions, outlier players, or slow lanes.
- If the log is too sparse to explain the issue, improve diagnostics before guessing at AI logic.
4. Classify the problem type.
- Correctness: failed games, exceptions, null references, action timeout, action budget stop, no-effect successful actions, illegal repeated actions.
- Intelligence: weak expansion, low city count, poor unit count, bad attrition, obvious idle/economic waste, bad hero task timing, over-defense, under-attack.
- Performance: low actions/sec, high avg game runtime, high decision p95/max, too many actions per player turn, oversized JSONL output.
- Noise: legal repeated city upgrades, normal tactical move/attack chains, expected Steam warmup messages, incomplete logs from killed batches.
5. Fix in the right layer.
- If the design intent is wrong, update `18-AI导演系统策划文档.md` and `19-AI导演系统逻辑语言.md` before or alongside code.
- If action availability is wrong for every caller, fix `CheckCan` or action semantics.
- If the action is only wrong for AI, filter/scoring-gate it in AI generation or Director indexing.
- If scoring is wrong, adjust lane priority, target value, or cache features; do not add broad hardcoded hacks that only satisfy one replay.
- If performance is wrong, prefer caching, indexed lookup, action-pool pruning, and cheaper diagnostics before reducing strategic search quality.
6. Re-run the same batch options and compare.
- A change only counts as an AI improvement when the target metric improves and no core guardrail regresses.
- If a performance shortcut improves p95 but hurts expansion/city count/action quality, revert it or redesign it.
- Keep failed experiments out of the final patch; mention them in the final report only when they explain the chosen direction.
7. Commit only coherent, verified changes.
- Stage AI code, docs, scripts, and diagnostics changes that belong together.
- Do not stage Unity auto-generated side effects such as `Unity.sln`, `ProjectSettings.asset`, or `packages-lock.json` unless they are intentionally part of the task.
## Quality Metrics
Primary guardrails:
- `failedGames` must be `0` for normal quality claims.
- `noEffect` should be `0`; any nonzero value needs raw-log explanation.
- `repeated` should be `0` after excluding known legal repeats such as same-turn `CityLevelUpAction:Park` consuming city upgrade points.
- `maxActions/playerTurn` should stay well below the forced-stop budget; investigate anything above `80`.
- No AI loop, forced AI stop, fatal exception, or unresolved null reference is acceptable.
Intelligence targets for compact 17-player, 20x20, 20-turn Director batches:
- Expansion should not trigger `LOW_EXPANSION`; target `aliveAvgCities >= 1.35`.
- Second-city rate should not trigger `FEW_SECOND_CITIES`; target alive `>=2` city ratio at least `25%`.
- Track max city count and alive `>=3` city ratio as snowball signals, but do not overfit to one high-roll player.
- Unit count and score p10/p50/p90 should not collapse while expansion improves.
- Attrition should be interpreted with context: more kills and more deaths may indicate stronger aggression, not necessarily worse play.
Performance targets:
- `actions/sec` should stay at or above `15` in compact batches.
- Decision p95 above `60ms` is a warning and above `100ms` should be treated as an optimization target.
- If p95 is high and top slow lanes are Front/Expansion/Emergency, inspect action generation, move lookup, `CheckCan`, and world-cache computation before trimming strategic behavior.
- Large diagnostic output is acceptable for local debug, but batch analysis JSON should remain compact enough for automated comparison.
Recommended batch commands:
```powershell
# Fast smoke after compile-sensitive changes.
Tools/RunAIDirectorBatch.ps1 -Games 1 -Players 2 -Turns 1 -TimeoutSeconds 60
# Compact quality loop for before/after comparison.
Tools/RunAIDirectorBatch.ps1 -Games 3 -Players 17 -Width 20 -Height 20 -Turns 20 -TimeoutSeconds 420 -MaxActions 9000 -MaxActionsPerPlayerTurn 120 -Difficulty LUNATIC -KeepGoing
# Larger confidence loop when compact metrics look good.
Tools/RunAIDirectorBatch.ps1 -Games 5 -Players 17 -Width 30 -Height 30 -Turns 30 -TimeoutSeconds 900 -MaxActions 16000 -MaxActionsPerPlayerTurn 160 -Difficulty LUNATIC -KeepGoing
```
## Infinite Loop Triage
Classify the repeated action before patching: