9. API-driven self-play test harness
The automated self-play harness is a test-only layer in rts_ai::selfplay. It is intentionally
separate from the simulation core: gameplay AI is a player feature, while self-play is a
regression harness for exercising the public simulation API.
Contract. Self-play scripts may only drive the game through the Game seam in §3.1:
start_payload(), snapshot_for(player), enqueue(player, SimCommand), tick(),
alive_players(), and tick_count(). Scripts observe the same fog-filtered snapshots a client
would receive and issue ordinary domain commands. They must not mutate entities, players, map state, or
private system internals. This keeps the simulation architected for future API clients, replay
tools, and external test drivers without adding a second privileged control path.
Command log replay. Game records every command at the authoritative apply tick, after callers
have enqueued human, scripted, or AI commands and before systems apply the pending queue.
game/replay.rs translates that wire-compatible log into SimCommands, feeds them into a fresh
Game with AI thinking disabled, and compares the resulting event stream and final per-player
snapshots. Replay and live play use the same typed command application path, so a replay proves both
the recorded command artifact and the deterministic simulation ordering. Entity iteration and A*
tie-breaking must remain stable; avoid hash-order-dependent simulation behavior.
Fast/full AI split. Plain cargo nextest run --config-file .config/nextest.toml --manifest-path server/Cargo.toml --profile default keeps the self-play harness in the default gate, but only runs
the fast scripted coverage. Long profile-backed and real-AI self-play tests return early unless
RTS_FULL_AI_TESTS=1 is set; tests/run-all.sh --full-ai enables that mode for the full
orchestrator.
RTS_SELFPLAY_FULL=1 remains accepted as an alias for manual self-play runs. Use full AI coverage
when touching AI strategy, profile-backed self-play, replay determinism, or balance behavior that
depends on long matches.
Profile-backed coverage. The long profile-backed tests spawn AI-profile players through the
self-play adapter and run matches headlessly under RTS_FULL_AI_TESTS=1 cargo nextest run --config-file .config/nextest.toml --manifest-path server/Cargo.toml --profile default. The
profiles gather steel and oil, construct supply and tech structures, train Riflemen and Tanks, and launch
attack-move waves at public enemy start tiles. The self-play adapter owns harness-only state such as
pending build intents, failed build spots, and staging/attack guards needed to interpret
fog-filtered snapshots without duplicating profile strategy logic. The harness checks per-tick
invariants for invalid resources, supply overflow, malformed entity snapshots, out-of-bounds
positions, and non-finite progress values. It also enforces progress deadlines so a stuck
economy/tech/combat loop fails as a deadlock instead of timing out silently.
Special harness scripts remain where they cover behavior that is not a normal AI strategy profile:
WorkerRushScript is an all-in worker-pull scenario, and MineOnlyScript is passive mining/fairness
coverage. These scripts are kept isolated from the canonical profile list.
Artifacts. On failure, the test writes target/selfplay-failures/<test>-<pid>-<time>/
with:
replay.json: a normalReplayArtifactV1command-log replay artifact, loadable through the same replay runtime as post-match and match-history replays. Team-capable artifacts preserveplayers[].teamId,winnerTeamId, and team-aware final score rows; old singleton-FFA artifacts without team fields still load through compatibility defaults.diagnostic.json: self-play-only start payload, script decision log, event log, milestone state, and sampled snapshot summaries.summary.log: short human-readable failure summary and missing milestones.
The replay artifact is meant to be enough to reproduce or inspect a failing run without manually
playtesting first. Load an artifact with
/dev/replay-artifact?replay=<artifact_name> on a local server using the same Cargo target
directory. By default successful runs do not write artifacts. For manual inspection,
setting RTS_SELFPLAY_SAVE_REPLAY=1 writes a successful run to
target/selfplay-artifacts/<test>-<pid>-<time>/; setting RTS_SELFPLAY_SAVE_REPLAY=<name> uses
that explicit safe artifact name instead.
Profile matchup CLI. The ai-matchup binary is the manual fixed-horizon matchup facility for
profile-vs-profile runs. It composes the same self-play adapter and Game seam as the tests, runs
one directed match to elimination or a tick cap, optionally verifies deterministic replay, and can
write a replay artifact:
cd server
cargo run --bin ai-matchup -- rush tech
cargo run --bin ai-matchup -- saturation tech --seed 7 --ticks 20000 --json
cargo run --bin ai-matchup -- --list-profiles
Keep fast invariant-style milestone coverage in cargo nextest run; use
RTS_FULL_AI_TESTS=1 cargo nextest run --config-file .config/nextest.toml --manifest-path server/Cargo.toml --profile default
for the long regression gate and the CLI for balance exploration, seed sweeps, and strategy result
sampling.
10. Dev scenario inspection
Game-backed dev scenarios are live, no-fog watcher rooms for inspecting authored simulation situations through the normal Pixi client. Start a local server, then open the index:
open "http://localhost:<port>/dev/scenarios"
The index lists every supported launch and links to the current URL shape:
/dev/scenarios?id=<scenario_id>&unit=<unit>&count=<count>[&blocker=<unit|none>]
The handler redirects into the normal client with watchScenario=1; the client auto-joins a
reserved spectator room named:
__dev_scenario__:<scenario_id>:unit=<unit>:count=<count>[:blocker=<unit|none>]
Current scenario ids:
scout_car_snaking_corridor— movement/pathing through the snaking stone corridor.direct_reverse_order— one vehicle ordered directly behind its current facing.scout_car_wall_chokepoint— vehicle groups moving through a narrow wall gap.vehicle_corner_wall— vehicle groups cornering around a wall spur.vehicle_small_block_baseline— vehicles moving through optional small-unit blockers.factory_zero_gap_perpendicular— one vehicle starting flush against a factory and moving east.tank_trap_line_horizontal— Training Centre, engineers, one rifleman, and one vehicle for manually building a horizontal Tank Trap line before the test units try to cross.tank_trap_line_vertical— Training Centre, engineers, one rifleman, and one vehicle for manually building a vertical Tank Trap line before the test units try to cross.tank_trap_line_diagonal— Training Centre, engineers, one rifleman, and one vehicle for manually building a diagonal Tank Trap line before the test units try to cross.tank_trap_pathing_matrix— one dropdown-backed matrix scenario with selectable cases:friendly_vehicle_reroute,enemy_vehicle_breach,infantry_pass_through, andexplicit_infantry_attack.
The watcher shows movement debug path overlays by default. Replay speed controls are reused for
dev scenarios: Pause sets the simulation speed to zero, and Step advances exactly one
authoritative tick while paused. Normal seek/reset controls are replay-only.
Scenario setup is server-side only under server/crates/sim/src/game/setup/dev_scenarios.rs; do
not expose arbitrary spawning or map editing through client commands. Scenario artifact recording
under target/scenario-artifacts/ is not currently implemented.
The Tank Trap pathing matrix scenarios are harnesses for owner-aware pathing, infantry pass-through, explicit infantry attacks, and attack-move acquisition filtering. Enemy Tank Traps are breachable for vehicle path planning only; physical movement and standability still treat live Tank Trap footprints and closed one-tile gaps as vehicle-body blockers until combat removes enough traps.
11. Package-aware test selection policy
The authoritative full gate is the PR ./tests/run-all.sh check from the Main test gate workflow. Local runs should
usually be narrower and selected by the changed files or contracts. Use
node tests/select-suites.mjs --from=<base-ref> or pass changed paths directly to see the expected
suites.
rts-contractorrts-protocol: run Rust contract/protocol tests, compact snapshot tests, JS protocol mirror/decode tests, and Node integration when a top-level message or compact shape changed.rts-rules: run rules tests plus sim tests that consume stats/formulas. If visible balance values changed, run client config/protocol mirror checks and include factual player-facing patch notes.- Faction guardrails: run
node scripts/check-faction-assumptions.mjsfor faction docs, lifecycle policy, lobby admission, protocol/config vocabulary, or checker changes. Runnode scripts/check-faction-catalog-parity.mjswhen faction catalog facts, the Rust catalog dump, or client mirrors can change, includingserver/crates/rules/src/faction.rs,server/crates/rules/src/bin/dump-faction-catalog.rs,client/src/config.js,client/src/lobby_view.js, protocol/config mirror files, or the catalog parity checker itself. Docs-only faction policy edits should select these guardrails without requiring live-server suites. rts-sim: run sim package tests, deterministic replay coverage, and live-server integration for changed behavior that crosses the room/network boundary.- SVG legacy unit renderer oracle: run
node tests/legacy_unit_visual_oracle.mjswhen legacy unit rendering behavior ortests/fixtures/svg/legacy-unit-oracle.baseline.jsonchanges. The oracle uses a deterministic Node fixture, semantic measurements, and bounded pixel-diff thresholds across current unit kinds and representative animation states. - Team-aware authored start assignment is covered by
cargo nextest run mapfor deterministic FFA compatibility, current authored map proximity, 1v2/1v3 team layouts, synthetic larger layouts, start payload team ids, and replay reconstruction. Runnode tests/team_integration.mjsfor the live lobby/start contract. tests/team_integration.mjsis the canonical live multi-client team suite. It requires a running server and covers default singleton FFA, solo sandbox starts, scripted1v2/1v3/2v2setup, host-only/invalid team mutation rejection, shared team snapshot vision, allied command-authority no-ops, allied attack rejection, and team victory/game-over semantics.tests/run-all.sh --no-rustincludes this suite in the live Node API pass, so a final local gate already exercises it.rts-ai: run AI package tests andnode tests/ai_integration.mjs. RunRTS_FULL_AI_TESTS=1 cargo nextest run --config-file .config/nextest.toml --manifest-path server/Cargo.toml --profile defaultortests/run-all.sh --full-aiwhen strategy profiles, profile-backed self-play, replay determinism, or long-match balance behavior changed. Default AI package coverage includes team-safety assertions forteamIdobservation, visible-ally exclusion fromvisible_enemies, allied-start exclusion from public enemy base / expansion safety, live alive-player target filtering, and real-AI self-play remaining per-player rather than shared-team controlled.rts-server: run server/lobby tests, Node live-server integration/regression suites, and client smoke when connection, snapshot delivery, room lifecycle, or served client behavior changes.client/: run JS protocol/client contract checks, minimap/input contracts where relevant, and client smoke. Include Node integration when protocol decode or network behavior changed.
scripts/check-crate-boundaries.mjs is part of the gate and fails on forbidden Cargo package
edges or server-only imports in lower crates. The sim architecture ratchet is also part of the gate:
cargo run --manifest-path server/Cargo.toml -p rts-archcheck -- check-sim-architecture fails when
rts-sim::game grows new service edges, broad mutable APIs, direct state writes/usages, public API
surface, or file-size budget over the committed baseline. Prefer reducing coupling first. If the
growth is intentional, update server/crates/archcheck/baselines/sim-architecture.json with:
cargo run --manifest-path server/Cargo.toml -p rts-archcheck -- check-sim-architecture --bless --reason "short reason"
Avoid broad allowlist additions unless the same change or a tracked follow-up explains the cleanup
path. tests/select-suites.mjs --verify keeps the changed file mapping itself covered by small
examples. CI comments document any intentionally skipped suite; that skip becomes invalid when the
changed-file mapping selects the skipped behavior.
12. PR CI contract
The canonical required PR check context is ./tests/run-all.sh in the Main test gate workflow.
It is an aggregate check over split coverage jobs for server binary build, Rust/architecture, live
Node, and browser/tri-state coverage on pull requests targeting main and on pushes to main.
The split jobs run tests/run-all.sh sub-modes under CI so the required aggregate gate preserves
client smoke plus tri-state browser coverage without serializing every suite in one runner. Local
tests/run-all.sh runs keep client smoke in the default browser gate but skip the latency-sensitive
tri-state browser scenarios unless --with-tri-state-browser or RTS_RUN_TRI_STATE_BROWSER=1 is
set.
Changed-file detection classifies PRs and main pushes as docs_only, client_only, or full
from the PR base/head range or the push before/after range. docs_only keeps the same check
contexts green but exits before expensive suites. client_only is limited to conservative
client/ paths and skips Rust format, nextest, lint, and Rust architecture work while still
building the server and running live Node plus browser coverage. Contract-adjacent client paths
such as client/src/config.js, client/src/protocol.js, client/src/net.js,
client/src/lobby_view.js, and generated sim-WASM assets fall back to full. Branch protection
should require this single aggregate full-gate check unless a plan phase explicitly changes the
contract.
node scripts/check-docs-health.mjs runs in the early changed-files CI lane before expensive split
jobs. It validates docs/doc-map.json, enforces the 5 KiB docs/context/*.md capsule cap, and
checks local Markdown links in docs/ and plans/.
The PR ownership workflow validates owned agent PR metadata for zvorygin/* branches with
scripts/check-pr-ownership.sh.
The old standalone Rust and Integration workflows are retired. Their package, architecture,
live Node, and browser coverage is owned by the split Main test gate jobs under the required
aggregate ./tests/run-all.sh check, so separate auxiliary workflows would duplicate coverage and
consume extra runner capacity without increasing merge safety.
GitHub Actions uses standard ubuntu-latest runners for this contract. Public-repository standard
runners are acceptable for the current cost posture, while larger paid runner classes are out of
scope. The gate remains portable through tests/run-all.sh so it can run locally or on another
runner if the hosting or billing posture changes.
PR workflows use concurrency groups scoped by workflow plus PR number, with cancellation enabled
only for pull request events. A newer push to the same PR branch may cancel superseded runs, while
pushes to main and unrelated branches keep independent results.
Beta deployment is downstream of the full gate but must only deploy tested main push commits. The
deploy workflow checks that the completed Main test gate run came from a push event on main
before checking out and deploying the tested head SHA.
13. Documentation drift sweeper
scripts/docdrift-sweep.mjs --dry-run is the deterministic operator surface for reviewing commits
between docs/docdrift-checkpoint.txt or --base and --head. It reads commit metadata,
changed paths, compact diff stats, docs touched, and docs/doc-map.json trace-map candidates, but
does not edit docs, create PRs, or advance the checkpoint. Merge commits, empty commits, and
docs-only churn are skipped before classifier prompts are built.
scripts/docdrift-sweep.mjs --classify adds the cheap Codex CLI classifier. Live classifier runs
must use Codex CLI authentication through the local codex exec path; they must not use the
OpenAI Agents SDK, direct API clients, API-key environment variables, or API-billed fallback
routes. Fixture runs use --no-codex --fixture <name> and are the required focused verification
path before any live Codex smoke. Classifier decisions are cached under the ignored
.docdrift/classifier-cache/ runtime directory by prompt version and commit SHA, and reports can be
written with --out-dir. Live Codex calls run read-only with approval policy forced to never via
Codex config override, emit per-commit progress on stderr, and record token usage when the Codex
JSON event stream includes it.
scripts/docdrift-sweep.mjs --generate-docs reruns or reuses the classifier records, selects only
update_docs decisions, loads targeted authoritative design-doc sections, and asks Codex CLI for
exact minimal find/replace doc patches. The generator prefers classifier-selected design docs; docs
touched in the commit and broad trace-map design docs are fallbacks, not an automatic union. It
builds and applies doc-patch prompts sequentially so later update_docs decisions see docs already
changed by earlier decisions in the same sweep; if the supplied sections already cover the behavior,
the generator should return an empty patch set instead of restating it. The script applies generated
patches to the working tree and writes docdrift-generate.{md,json} with --out-dir; operators
inspect the resulting docs diff before any PR lifecycle step. Fixture runs use the same
--no-codex --fixture <name> path and must remain idempotent. If a retry sees that a cached patch’s
replacement text is already present, it reports the patch as already applied without spending
another Codex generation call.
scripts/docdrift-sweep.mjs --full is the PR-first operator lifecycle. It fetches origin/main,
uses the local checkpoint from .docdrift/checkpoint.txt when present, falls back to the committed
seed in docs/docdrift-checkpoint.txt, creates or reuses .docdrift/worktrees/docdrift-sweep on
zvorygin/docdrift-sweep, runs classification plus doc generation there, commits any docs changes,
pushes the sweep branch, opens or updates the owned PR through scripts/agent-pr.sh, and waits with
scripts/wait-pr.sh. The checkpoint advances atomically only after a no-PR range is fully processed
or after wait-pr.sh confirms the sweep PR head is reachable from origin/main; failed checks,
closed PRs, stale branches, dirty sweep worktrees, and Codex failures leave the checkpoint unchanged.
Full sweeps write ignored local reports under .docdrift/runs/<run-id>/, including
docdrift-full.{md,json} and any classify/generate reports. Use
scripts/docdrift-daily.sh as the launchd-friendly daily 8 p.m. command; pass normal
docdrift-sweep.mjs options after it, for example --dry-run for a lifecycle preview or
--run-id <id> for predictable report paths. The wrapper only runs the command; it does not install
or require a launchd job for other developers.