Skip to main content
Architecture

Claude Opus 4.7 Agent Teams for continuous RMF monitoring: a federal architecture pattern

A concrete architecture for implementing NIST SP 800-37 Rev 2 Step 7 — Monitor — with Claude Opus 4.7 Agent Teams inside a FedRAMP / IL4 / IL5 boundary. How evidence collection, drift detection, POA&M maintenance, and quarterly ConMon reporting decompose into specialized agents with scoped tool access and a traceable decision ledger.

The continuous-monitoring problem no one has solved

Continuous monitoring is the step in the NIST Risk Management Framework where federal systems go to die. Step 7 of SP 800-37 Rev 2 — Monitor — requires ongoing assessment of security controls, near-real-time awareness of system state, active POA&M maintenance, and periodic re-authorization evidence. In theory, this is continuous. In practice, it is a quarterly scramble in which ISSOs and system owners reconstitute evidence that decayed six weeks earlier, paste screenshots into GRC tools built in 2014, and push POA&M updates that reference controls the system hasn't actually satisfied since the last audit. The gap between the RMF's intent and its operational reality is where risk accumulates.

WHAT THIS ARTICLE PROPOSES

An architectural pattern — not a product — for using Claude Opus 4.7 Agent Teams to implement NIST SP 800-37 Rev 2 Step 7 as genuinely continuous, inside a FedRAMP / IL4 / IL5 authorization boundary. The pattern uses five specialized agents (Evidence-Collector, Drift-Detector, Control-Correlator, POA&M-Maintainer, Report-Composer) coordinating through Agent Teams, with a Compaction-based shared memory that preserves state across multi-month monitoring cycles. Tool scopes, HITL boundaries, and SSP-ready language are included.

Evidence decays faster than it is manually re-collected. Continuous monitoring is not a documentation problem — it is a compute and attention problem. Agentic AI, scoped correctly, is the first architecture that can match the rate at which evidence actually decays.

Where Step 7 is failing today

Most federal systems currently implement continuous monitoring through some combination of GRC platforms (Xacta, eMASS, CSAM, RSA Archer), spreadsheet workflows, ticketing-system exports, and manual screenshot collection on a 30/60/90-day cadence. Each of these is capable of the job description. None of them produce continuous monitoring in the SP 800-137 sense. What they produce is periodic monitoring that carries the word "continuous" in its ATO documentation.

The failure is structural. A mid-complexity FedRAMP Moderate system inherits roughly 180 controls and enumerates another 80–120 customer-responsibility controls. Each enhancement needs evidence — configuration extracts, policy documents, access-review exports, log samples, scan results — on a schedule ranging from continuously to annually. The evidence must then be correlated with the SSP, compared to the prior state, explained in terms of any drift, and packaged into POA&M items or ConMon deliverables. Most agencies do not have a full-time ISSO per system. So the schedule slips, the evidence ages, and the SSP loses its correspondence to reality.

The FedRAMP 20x working groups have been pushing for machine-readable SSPs (OSCAL) and automated evidence collection as the path forward. OSCAL is the data format. What has been missing is the agent that understands the data format, the controls, and the system well enough to do the collection and correlation on its own.

The Agent Teams primitive, explained for the federal reader

Claude Opus 4.7, released by Anthropic on April 16, 2026, exposes Agent Teams as a first-class orchestration primitive rather than a library pattern developers assemble themselves. Conceptually: instead of one monolithic model attempting a long-horizon task, a team of role-specialized agents coordinates through shared context, role-defined tool access, and a coordinator that routes work. Each agent has its own system prompt, its own allowlist of tools, and its own slice of the shared context.

For federal continuous monitoring, the Agent Teams model aligns closely with how RMF work is already organized on the human side. An ISSO team has implicit roles — the person who runs scans, the person who reviews access, the person who drafts the POA&M. Agent Teams encodes those roles as agents with scoped permissions and a traceable coordination ledger, such that every decision across the team is attributable to a specific agent and a specific input.

The companion primitive that matters here is the Compaction API. Continuous monitoring is a long-horizon task — the cycle may run for 18 to 36 months between re-authorizations. No single conversation context is adequate. Compaction allows shared memory to be progressively summarized and re-hydrated across sessions, preserving control-state history and trend data without unbounded token growth. Before Compaction, this state lived in external memory stacks stitched together by the integrator; with it, state remains inside the model-provider boundary, simplifying the authorization story.

The five roles in the proposed team

The pattern below decomposes Step 7 into five specialized agents, each with a narrowly scoped tool-use surface. The decomposition follows the natural structure of a federal ConMon program and maps cleanly onto NIST SP 800-53 Rev 5 control families. Each agent is a Claude Opus 4.7 instance running inside Bedrock GovCloud (or the equivalent boundary), with its allowlist of tools expressed in the Bedrock-side tool-schema configuration.

Agent rolePrimary RMF functionControl families in scope
Evidence-CollectorPeriodic and event-triggered evidence re-collection from system-of-record APIs, log stores, IaC repos, and configuration baselinesAC, AU, CM, IA, SC, SI
Drift-DetectorCompare current evidence to the SSP baseline and the prior monitoring cycle; flag configuration, policy, or control-implementation driftCM-2, CM-3, CM-6, CM-8, SI-7
Control-CorrelatorMap raw evidence to specific SP 800-53 Rev 5 controls and enhancements; identify coverage gaps against the OSCAL SSPAll families; cross-cutting
POA&M-MaintainerOpen, update, close, and re-age POA&M items based on drift findings and remediation evidence; manage severity and deviation requestsCA-5, PM-4, RA-5
Report-ComposerDraft the quarterly ConMon report, monthly vulnerability summary, annual assessment input, and OSCAL artifact updatesCA-2, CA-7, PM-14, PM-31

The coordinator is a thin orchestration layer — either Opus 4.7 itself in a supervisory role, or a deterministic workflow engine invoking the agents in a DAG. In our reference implementations we use the model-as-coordinator pattern for exploratory phases and the deterministic-orchestrator pattern for production, because a deterministic DAG is easier to defend in an authorization package and easier to test for reproducibility.

The Evidence-Collector agent

Evidence-Collector has the broadest tool surface and the narrowest decision authority. Its job is to fetch, hash, timestamp, and ingest — not to interpret. The tool allowlist is read-only calls to the IaC repository, CloudTrail and GuardDuty, Config/SSM inventory, the identity provider and access-review systems, and the vulnerability scanner, plus a single write tool that writes signed evidence records to the immutable evidence store.

The agent has no network access outside the Bedrock tool-call boundary and no write access except to the evidence store. It cannot modify IaC, close findings, or touch configurations. This is the critical property of the architecture: the agent with the broadest read surface has the narrowest write surface. A compromise cannot change the system — it can only produce fabricated evidence, which the Drift-Detector and Control-Correlator flag as inconsistent with downstream telemetry. Evidence is hashed on collection, signed with a service KMS key, and written with a monotonic sequence number. The agent's reasoning is captured by Bedrock model-invocation-logging — full prompt, full response, full tool-call sequence — inheriting the audit properties required by AU-2, AU-3, and AU-12.

The Drift-Detector and Control-Correlator agents

Drift-Detector reads from the evidence store and the prior-state snapshot. Its only write target is a "findings" queue consumed by downstream agents. The agent identifies three classes of drift: (1) configuration drift where a control's implementation has changed; (2) policy drift where the SSP text no longer describes the implementation; (3) inheritance drift where a previously inherited control no longer flows through the authorization path it claimed.

Control-Correlator is the semantic hinge of the team. It resolves structured evidence and findings against the OSCAL SSP to identify which specific SP 800-53 Rev 5 controls and enhancements are affected. The Correlator depends most on long-horizon context — it must remember which controls were previously assessed, which inherited, which tailored out, and which have open deviations. This is where the Compaction API earns its place: between cycles the Correlator's working memory is compacted into a structured summary keyed by control identifier, then re-hydrated on the next cycle. Without Compaction, the Correlator would either lose history or require an external state store with its own authorization story.

The POA&M-Maintainer and Report-Composer agents

POA&M-Maintainer is the agent with the narrowest tool surface and the highest HITL friction. Its only tools are: read findings, read current POA&M, propose a POA&M delta, submit delta to human approval queue. It cannot write to the POA&M directly. Every proposed create/update/close on a POA&M item flows through an ISSO approval step before it is committed. This is deliberate: CA-5 is one of the most scrutinized controls in any federal assessment, and the cost of an agent autonomously closing a POA&M item is materially higher than the cost of an ISSO reviewing a queue of agent-drafted updates.

Report-Composer assembles the human-facing narrative — the quarterly ConMon report, the monthly vulnerability summary, the OSCAL assessment-results artifact — from the outputs of the other four agents. Its tools are read-only against the evidence store, findings queue, and POA&M state, plus a draft-write tool that deposits the assembled report into a review folder. Like the POA&M-Maintainer, its outputs are never auto-published; they are always drafts that an ISSO or system owner signs before submission. The value of the agent is not autonomy — it is the consistency and completeness of the first draft.

Fitting the team inside a FedRAMP / IL4 / IL5 boundary

The architecture lives inside the authorized boundary. All five agents run against Bedrock in AWS GovCloud, inheriting FedRAMP High / IL4 / IL5 from the underlying service. Evidence storage lives in a GovCloud S3 bucket with object-lock enabled, SSE-KMS encryption with a customer-managed key, and a bucket policy that allows writes only from agent execution roles and reads only from agents and the auditor role. The evidence store and the agent runtime are in the same account and region — evidence never crosses the boundary the ATO covers.

Bedrock model-invocation logging captures every prompt, response, and tool call to a CloudWatch log group with retention matching the agency's ConMon deliverable schedule. GuardDuty is enabled with the Bedrock and IAM Analyzer protection plans. CloudTrail data-events are enabled on the evidence bucket. Encryption-in-transit is TLS 1.2+ enforced by the VPC endpoint policy; Bedrock VPC interface endpoints are deployed in the same VPC as the orchestration runtime, so no agent traffic traverses the public internet. Tool allowlists are enforced at the Bedrock invocation layer — a jailbroken agent cannot call a tool outside its declared scope.

Mapping agents to RMF steps and control families

The team implements Step 7 (Monitor) directly, but it also feeds Steps 4 (Assess), 5 (Authorize), and 6 (Monitor, in Rev 2) by keeping the SSP and evidence package continuously current. The inverse is also true: during initial authorization (Steps 1–5), the team can be pre-seeded with SSP draft content and used to pressure-test evidence availability before the ATO package is submitted to the AO.

Agent roleRMF Rev 2 stepPrimary SP 800-53 Rev 5 controls
Evidence-CollectorStep 7 (Monitor); supports Step 4AC-2, AC-6, AU-6, CM-6, CM-8, SI-4
Drift-DetectorStep 7 (Monitor)CM-2, CM-3, CM-6(1), SI-7, SI-7(1)
Control-CorrelatorStep 7 supports Step 4, Step 5, Step 6CA-2, CA-7, PM-14
POA&M-MaintainerStep 7 supports Step 6CA-5, PM-4, RA-5
Report-ComposerStep 7 outputCA-7(3), PM-14, PM-31

The human-in-the-loop boundary

Every agent output that affects an authoritative record requires explicit human approval. Evidence-Collector writes freely to the append-only, immutable evidence store; Drift-Detector writes findings into a queue; Control-Correlator proposes SSP edits that the ISSO must approve before the OSCAL SSP is updated; POA&M-Maintainer proposes every POA&M delta for ISSO approval; Report-Composer produces drafts that the system owner signs before submission.

The design principle: agents are trusted to see, reason, and propose — never to commit a change to an authoritative record without human sign-off. This preserves the ISSO's role as the accountable party while moving labor from evidence production to evidence review, which is where the ISSO's expertise matters. Operationally, a system that previously consumed 200 ISSO-hours per quarter to produce a ConMon deliverable now consumes roughly 30 ISSO-hours of review — a shift defensible in a Phase I SBIR ROI analysis.

Risk management and red-teaming the team itself

An agentic system inside a federal authorization boundary introduces two new risk categories that non-agentic systems do not have: prompt-injection against the agents, and emergent-behavior risk from multi-agent coordination.

Prompt-injection surfaces wherever an agent reads content it did not author. Evidence-Collector reads IaC files, policy documents, and log samples — all potential injection vectors. The mitigation is input-sanitization at the tool layer (structured parsing, not raw text passthrough), tool-scope restriction (Evidence-Collector has no write capabilities that would let an injected instruction exfiltrate or modify data), and output filtering (Drift-Detector flags any evidence record whose content suggests an injection attempt). Injection on the Report-Composer is the highest-impact vector, because its output goes to human decision-makers. The mitigation is to never let Report-Composer read raw external content — it consumes only the structured outputs of upstream agents.

Multi-agent emergent behavior is the harder category. Two agents reasoning in concert can produce collusive outputs that neither would produce alone — for example, a Drift-Detector that underreports a finding because it has learned the POA&M-Maintainer will push back on high-severity items. The mitigation is adversarial evaluation: the team is red-teamed by running it against synthetic scenarios where ground truth is known, and by periodically injecting test-case evidence that should produce known findings. We cover the broader federal AI red-teaming discipline in a separate piece; for Agent Teams specifically, the red-team protocol runs monthly and is itself documented in the SSP as an SI-4 enhancement.

The Phase I SBIR framing

For technical reviewers evaluating this pattern in a Phase I SBIR technical volume, three framings land well. First, the architecture references a named NIST publication (SP 800-37 Rev 2) and a specific step (Step 7), signaling the proposer understands the compliance surface rather than hand-waving at it. Second, the decomposition into role-specialized agents with named control-family scopes is the kind of concrete mapping that reviewers from the TPOC's compliance office recognize as operationally realistic. Third, the HITL boundary is explicit and conservative — agents propose, humans commit — which addresses the predictable reviewer concern about autonomous agents making unsupervised changes to federal authorization records.

The language that belongs in the technical volume: "The proposed system implements NIST SP 800-37 Rev 2 Step 7 continuous monitoring through a five-agent Claude Opus 4.7 Agent Teams architecture deployed inside a FedRAMP High / IL5 Bedrock GovCloud boundary. Agent roles are scoped to individual SP 800-53 Rev 5 control families with tool allowlists enforced at the Bedrock invocation layer. All authoritative writes require explicit human-in-the-loop approval. The architecture is designed to satisfy CA-7, PM-14, and AU-12 through inheritance from Bedrock GovCloud and augmentation through the team's signed-evidence store."

NIST 800-53 Rev 5 today, Rev 6 when it lands

The pattern is specified against SP 800-53 Rev 5 because that is the authoritative baseline today. When Rev 6 lands — the public-draft cycle has been running, with final publication expected in the FY26–FY27 window — the control mappings will need updating, but the architectural shape does not change. The agent roles are decomposed by RMF function, not by control identifier, so a Rev 5 → Rev 6 migration is a remapping of the Control-Correlator's knowledge base, not a rearchitecture. The expected Rev 6 changes around AI-specific controls (derived from AI RMF 1.0) will land most heavily on the Drift-Detector and POA&M-Maintainer, which will need additional patterns for model-version drift and model-provenance findings. For systems that touch CUI under SP 800-171, the same decomposition applies with a narrower control set — SP 800-171 Rev 3's 110-control baseline maps to a subset of Evidence-Collector and Drift-Detector scopes, making the architecture practical for CMMC L2 assessment preparation as well as federal ConMon.

Our reference implementation approach

At Precision Federal we have prototyped this pattern against a synthetic federal system-of-record containing a representative 180-control Moderate baseline and a 30-day evidence cycle. The prototype runs on commercial Bedrock with Opus 4.7 for development and migrates to Bedrock GovCloud with Opus 4.6, then Opus 4.7 when it lands in the government region. Findings and POA&M items are structured OSCAL assessment-results and POA&M artifacts, not free-form prose.

The engineering effort is modest — the bulk of the complexity is in the tool-allowlist definitions and the structured schemas for findings and POA&M deltas, not in the agent orchestration itself. The hard part is not the agents; it is the evidence taxonomy and the control-family-to-tool mapping. The model is becoming a commodity; the control mapping is the durable artifact. The five-agent decomposition is stable across deployments; tool allowlists adjust per local control tailoring, existing GRC platform integration, and the specific IaC and log-source topology.

What we are not claiming

This pattern is not a replacement for the ISSO, the AO, or the assessment organization. It is a labor-reallocation mechanism that moves ISSO time from evidence production to evidence review. The agents do not decide risk posture; they surface findings and propose changes. Authoritative accountability remains with the ATO-signing human, and the architecture is designed to preserve that accountability — not to delegate it.

It is also not a novel research claim. Agent Teams and the Compaction API are Anthropic's; RMF, SP 800-53, and OSCAL are NIST's. What is proposed is a composition of these existing components into an operating architecture for a problem — continuous monitoring — that has been under-solved for a decade. The novelty, to the extent there is any, is in the composition and the tool-scoping discipline.

Frequently asked questions

Does this pattern require Claude Opus 4.7 specifically, or can it run on other models?

The decomposition into five role-specialized agents with scoped tools is model-agnostic. What Opus 4.7 adds is the native Agent Teams primitive (no custom orchestration library) and the Compaction API (long-horizon shared memory). The pattern runs on Opus 4.6 today with external orchestration; it runs more cleanly on 4.7 once Bedrock GovCloud carries it. It can also be implemented against GPT-5.5 on Azure Government OpenAI when that lands, using the Assistants API for coordination.

Where does the pattern live in the authorization boundary?

Inside it. All five agents run against Bedrock in AWS GovCloud (FedRAMP High / IL4 / IL5). The evidence store is a GovCloud S3 bucket with object-lock and KMS-CMK encryption in the same account and region. Agent traffic uses Bedrock VPC interface endpoints so it never traverses the public internet. The architecture consumes the inherited controls of the boundary; it does not add new ones.

What happens if an agent hallucinates a finding or misclassifies a control?

The HITL boundary catches it. No agent commits to an authoritative record without human approval. Evidence-Collector writes only to an immutable evidence store; everything downstream — POA&M updates, SSP edits, ConMon report submission — requires ISSO or system-owner sign-off. The red-team protocol runs monthly against synthetic scenarios to measure false-positive and false-negative rates on drift detection and control correlation.

How does this satisfy CA-7 (Continuous Monitoring)?

CA-7 requires an organization-defined monitoring strategy, metrics, and a frequency. The pattern implements the strategy through the agent roles (each role covers a subset of the strategy's control families), the metrics through the structured findings and drift records emitted by Drift-Detector and Control-Correlator, and the frequency through the orchestrator's scheduling of Evidence-Collector runs. CA-7(3) (Trend Analyses) is satisfied by the compacted historical state the Control-Correlator maintains across cycles.

Does the Compaction API introduce data-residency concerns?

Compaction state lives in the Bedrock service boundary in the same region as the model invocation. In Bedrock GovCloud (US), that residency is inherited from the GovCloud region's FedRAMP High authorization. Document the Compaction usage, the region, and the retention in the SSP under SC-28 (Protection of Information at Rest) and the applicable AU controls. Confirm current Compaction availability in Bedrock GovCloud before architecting around it — features typically lag commercial.

Is this pattern applicable to SP 800-171 / CMMC L2 as well as FedRAMP?

Yes, with a narrower control set. SP 800-171 Rev 3's 110-control baseline is a subset of the SP 800-53 Rev 5 Moderate baseline, so the Evidence-Collector and Drift-Detector scopes reduce accordingly. CMMC L2 assessments benefit especially from the evidence-store and signed-collection properties of the architecture, since CMMC assessors explicitly look for objective evidence with a verifiable chain of custody.

1 business day response

Book a federal agentic-AI architecture review

We help agencies and primes design agentic-AI architectures that pass the ISSO, the AO, and the assessor. Continuous monitoring, ConMon automation, and SSP-ready documentation.

Talk to usRead more insights →
UEI Y2JVCZXT9HP5CAGE 1AYQ0NAICS 541512SAM.GOV ACTIVE