Architected and shipped the full-stack core (Python · FastAPI · SQLAlchemy 2.0 ·
NetworkX; React 18 · TypeScript · React Flow — 100+ REST endpoints over a
39-table versioned schema) modeling goals as typed directed multigraphs with
live completion-state tracking and versioned snapshots.
Built the agent harness governing all mutations: a proposal-first review protocol
(draft → diff preview → human commit/reject), a simulation sandbox for candidate
plans, and a full event log capturing every action for auditability and
downstream training data.
Built adaptive re-planning services that re-derive the critical path as reality
changes (constraint-solver scheduling with Monte-Carlo feasibility checks); now
building the conversational planner–executor agent: multi-step tool-call loops
over CLI-wrapped system capabilities, bounded retries, and human-in-the-loop
batch review.
Deployed a self-hosted 6-GPU inference node (vLLM serving GLM / MiniMax-class
open models) for latency-critical agent steps: tuned tensor-parallel vs
multi-replica layout and continuous batching, and exploited prefix caching over
shared system-prompt and graph-state context across multi-turn sessions.
Built a dynamic model-routing layer trading off latency, cost, and capacity:
routine agent steps (tool-argument formatting, summarization) run on the local
node, complex planning steps escalate to frontier APIs, with load-aware spillover
and per-step cost telemetry.
Engineered stateful agent-session serving: async long-running loops with
streaming (WebSocket/SSE), checkpointed and resumable runs, idempotent tool
mutations, per-user concurrency caps, and per-step tracing (model, tokens,
latency) for production debuggability.
Migrated the platform to multi-tenant Postgres (39-table schema + full versioning
system) with per-tenant isolation, graph-hydration caching keyed on
(user, graph-version), and monthly event-log archival designed around
training-data extraction.
Moved solver workloads off the request path onto a job queue with per-tenant
quotas and incremental local-repair scheduling, keeping API latency flat under
concurrent re-planning load.
Built the training-data flywheel: trajectory and preference records
(proposal → accept / reject → actual outcome) extracted from the event log into
parquet datasets, powering evaluation suites and a bandit layer for per-user
proposal-style personalization.