A new policy baseline, not a new compliance regime
On March 20, 2026, the White House issued the National AI Policy Framework, consolidating direction to federal agencies on how AI systems are to be acquired, deployed, monitored, and retired. For SBIR offerors, the framework does not create a new compliance regime from scratch — it pulls together existing obligations under NIST AI RMF 1.0, NIST SP 800-53 Rev 5, NIST SP 800-37 RMF, and OMB M-24-10, and raises the bar on how those obligations are documented at the proposal stage. The practical effect: Phase I technical volumes, Phase II continuous-monitoring plans, and Phase III transition narratives all need to address a specific list of safety, provenance, and governance artifacts that were previously optional. This article walks through what changed, what stayed the same, and what to add to a proposal being written today.
Safety-testing evidence package, model-provenance attestation, red-team plan with threat-model scope, continuous-monitoring plan mapped to NIST SP 800-37 Step 6, agency use-case inventory alignment statement, civil-rights / disparate-impact assessment where applicable, and a supply-chain attestation covering training-data jurisdiction, model weights, and inference dependencies. Each item maps to a specific clause or control family — do not treat them as narrative bullets.
The framework's pillars, in the order they bind an offeror
The framework is organized around six operative pillars. The ordering below reflects which pillars most directly change how an SBIR proposal is written, not the document's internal taxonomy.
1. Safety testing before deployment
The framework treats pre-deployment safety testing as a required input to any authorization decision for an AI-enabled federal system. For SBIR offerors, this means a Phase I technical volume that proposes to deliver an AI capability needs to describe what safety testing will look like before any government user touches the output. Vague language (“we will conduct comprehensive testing”) is no longer sufficient; reviewers will look for a specific methodology, a named threat model, and a deliverable artifact (test plan, test results, remediation log). The methodology should trace to NIST AI RMF 1.0 functions (Govern, Map, Measure, Manage) and the test results should be produced in a form that fits into the eventual system security plan.
2. Model provenance and supply-chain attestation
The framework directs agencies to document the provenance of models deployed in federal environments — training-data jurisdiction, model-weight origin, inference-dependency supply chain, and any fine-tuning datasets. For offerors using commercial foundation models (Claude, GPT, Llama, Gemini, Mistral), this reinforces FY26 NDAA §1532 provenance requirements and extends them to the full inference stack. For offerors proposing custom models, it raises the documentation burden on training-data sourcing. At proposal time, an offeror should be able to state: which foundation model (if any) is in the loop, which vendor, what the vendor's contractual data-handling commitments are, and how the stack will be attested in the SSP.
3. Governance and agency inventories
Under OMB M-24-10, agencies already maintain AI use-case inventories. The March 2026 framework strengthens that requirement and pushes agencies to align new AI acquisitions to specific inventoried use cases with defined risk tiers. SBIR offerors should expect TPOCs and contracting officers to ask which inventoried use case the proposed capability corresponds to (if any) and what the risk-tier classification would be. Phase II proposals in particular should describe how the capability will be registered in the agency's inventory at transition, what the risk tier is, and what governance review cadence applies.
4. Red-teaming
The framework treats red-teaming as a standard control expectation, not an optional enhancement. For offerors, the effect is that a Phase II proposal is expected to include a red-team plan — scope, adversarial-prompt categories (prompt injection, jailbreaks, data exfiltration, model-extraction attempts), cadence, and remediation protocol. This is not the same as penetration testing of the underlying infrastructure; it is adversarial testing of the AI-specific behavior. We cover the mechanics of federal AI red-teaming separately in AI red-teaming for federal systems.
5. Continuous monitoring
AI systems drift. Inputs change, fine-tuned models degrade, upstream providers update weights, and behavior shifts over time. The framework treats continuous monitoring as the operational counterpart to pre-deployment safety testing, and maps it squarely onto NIST SP 800-37 RMF Step 6. Phase II proposals are expected to describe a continuous-monitoring plan with defined metrics, cadence, triggering thresholds for re-authorization, and an incident-response protocol for behavior regressions.
6. Civil-rights and disparate-impact considerations
For AI systems that touch decisions affecting individuals — benefits adjudication, hiring, credit, healthcare access, law enforcement workflows — the framework incorporates civil-rights review and disparate-impact assessment as part of the governance cycle. Not every SBIR topic is affected, but those that are need to address the assessment methodology in the proposal rather than treating it as a post-deployment afterthought.
How the framework ties to existing controls
None of the pillars stand alone. Each has a predecessor in existing guidance, and the framework's principal effect is to draw those predecessors into a single operating picture. Knowing the crosswalk matters for offerors because it is the crosswalk — not the framework's own language — that will be cited in the SSP, the ATO package, and the continuous-monitoring reports.
| Framework pillar | Underlying control or guidance | What an SBIR offeror cites |
|---|---|---|
| Safety testing | NIST AI RMF 1.0 (Measure, Manage functions); NIST SP 800-53 Rev 5 CA-2, CA-8, SI-7 | Test-plan artifact mapped to RMF functions; results feed SSP §3 and ATO package |
| Model provenance | NIST SP 800-53 Rev 5 SA-8, SR family (supply chain); FY26 NDAA §1532; OMB M-24-10 §5 | Provenance attestation as SSP appendix; vendor contractual terms referenced in cost proposal |
| Governance / inventory | OMB M-24-10; agency-specific AI governance directives | Use-case registration plan in Phase II transition narrative |
| Red-teaming | NIST AI RMF 1.0 (Measure 2.7); NIST SP 800-53 Rev 5 CA-8(2), RA-10 | Red-team plan as Phase II deliverable; findings mapped to RMF Measure function |
| Continuous monitoring | NIST SP 800-37 Rev 2 RMF Step 6; NIST SP 800-137 ConMon | ConMon plan as Phase II artifact; triggers for re-authorization documented |
| Civil-rights review | Executive branch civil-rights guidance; agency general counsel review | Disparate-impact methodology in technical volume where applicable |
| Acquisition alignment | FAR Part 39 (IT acquisition); agency AI acquisition supplements | Referenced in cost proposal and Phase III transition plan |
What Phase I technical volumes now need to address
Phase I proposals have always emphasized feasibility over compliance mechanics. The framework does not change that posture, but it does raise the floor. A Phase I technical volume that is silent on safety testing, provenance, or red-teaming now reads as incomplete regardless of how strong the underlying feasibility argument is. The following are the specific additions we now recommend for any Phase I proposal being drafted in the post-framework window.
- A one-paragraph safety-testing methodology. Name the NIST AI RMF functions in scope, describe the threat model (prompt injection, hallucination, data leakage, unsafe tool use as applicable), and specify the artifact produced (test plan, test log, remediation register).
- A provenance statement. Identify the foundation model if any, the vendor, the authorization posture (FedRAMP, DoD IL), and the contractual data-handling terms. If using open-weight models, name the weights, the hosting environment, and the provenance of any fine-tuning data.
- A red-team concept paragraph. Phase I does not require a full red-team execution, but it should describe how red-teaming will be approached in Phase II — scope, adversarial categories, cadence.
- An inventory-alignment note. Indicate how the capability maps to the agency's AI use-case inventory under OMB M-24-10, or flag that the mapping will be completed in Phase II.
- A continuous-monitoring teaser. One paragraph outlining the metrics and cadence the Phase II plan will elaborate on.
- A civil-rights / disparate-impact note where applicable. Only relevant for topics touching individual decisions; if irrelevant, state so explicitly rather than omit.
These additions fit inside the existing Phase I technical-volume page budget without requiring wholesale restructuring. They add roughly two to three pages of content when handled concisely. Omitting them does not disqualify a proposal, but it does reduce the evaluation score on governance and risk-management axes that several agencies now weight explicitly.
What Phase II proposals need to add on continuous monitoring
Phase II is where the framework's operational implications land hardest. A Phase II proposal now needs to describe a continuous-monitoring plan with enough specificity that an ISSO and an Authorizing Official can evaluate it without follow-up questions. At a minimum that plan covers:
- Metric definitions. Which behavioral metrics are monitored (accuracy drift, hallucination rate, refusal-rate deltas, adversarial-input detection rate, latency-distribution shift) and how each is measured.
- Cadence. How often each metric is sampled and reported. Daily, weekly, monthly, event-triggered — named.
- Thresholds. The numeric or categorical conditions that trigger an alert, a re-test, or a re-authorization cycle.
- Governance routing. Who receives the alerts, what the escalation path is, and how findings are documented back into the SSP.
- Re-authorization triggers. The conditions under which a significant change (per NIST SP 800-37 RMF Step 6) forces an update to the ATO package — model-version upgrades, training-data refreshes, scope expansions.
- Tooling. What logs, dashboards, and alerting infrastructure implement the plan, and which systems host them.
This level of specificity is already common in mature federal AI contracts. What the framework changes is that it is now expected in the Phase II proposal itself, not deferred to a post-award SSP exercise. Offerors who treat ConMon as boilerplate will lose points to offerors who treat it as a deliverable artifact.
FedRAMP authorization inheritance and the framework
FedRAMP remains the authorization backbone for cloud-delivered AI services. The framework does not replace FedRAMP — it runs on top of it. For offerors using FedRAMP-authorized AI services (Amazon Bedrock in GovCloud, Azure OpenAI in Azure Government, Vertex AI at applicable impact levels), the practical effect is that a significant portion of the safety-testing, monitoring, and provenance obligations can be partially inherited through the cloud provider's authorization package. Offerors still own the application-layer implementation (prompt handling, tool-scope restriction, output filtering), but the underlying service's safety posture is established through the authorization.
The crosswalk to document in the SSP is: which controls are inherited from the cloud provider, which are shared responsibility, and which are fully owned by the offeror. Shared-responsibility matrices for Bedrock GovCloud and Azure Government OpenAI are published by the respective vendors; use them verbatim rather than paraphrasing. A useful default: the model itself is vendor-responsibility, the orchestration and prompts are offeror-responsibility, the logging is shared.
The "safety testing evidence package" — what reviewers expect
The framework uses the phrase "safety testing" across several pillars. In practice, what reviewers evaluate is a coherent evidence package built from a specific set of artifacts. The package need not be complete at Phase I — it is built up through Phase II and finalized at Phase III transition — but the proposal should describe what the package will contain. We recommend the following structure:
- Threat model. A written description of the adversarial surface. For a federal AI capability this typically covers prompt injection, jailbreaks, data-exfiltration via model output, unauthorized tool-use, sensitive-data disclosure, and hallucination-driven decision errors. Map each to the NIST AI RMF Measure function.
- Test plan. A protocol describing the adversarial inputs, test harness, pass/fail criteria, and remediation workflow.
- Test results. Logs, pass/fail rates per category, failure exemplars, and remediation actions taken.
- Red-team findings and remediations. For Phase II and beyond, independent red-team exercises with findings and remediation status.
- Continuous-monitoring dashboard. Live operational evidence that safety metrics are being tracked in production.
- Attestation. A signed statement from the principal investigator or responsible officer that the above components exist, are current, and reflect the deployed system.
At proposal time, the offeror is committing to produce this package, not to show it in finished form. The specificity of the commitment is what distinguishes a competitive proposal from a generic one.
Agency variance — DoD vs. civilian implementation pace
Federal agencies will not implement the framework at the same pace. Based on the pattern of prior NIST and OMB guidance adoption, we expect the following variance through 2026 and into 2027.
DoD components — working under the established NIST SP 800-53 / RMF baseline and with years of AI risk-management muscle in programs like JAIC-legacy and CDAO efforts — will operationalize the framework's expectations inside existing ATO cycles relatively quickly. Expect SBIR topic descriptions from Air Force, Army, Navy, SOCOM, and SDA to start citing framework-aligned language in SITIS responses and topic Q&A by late 2026. The framework does not override DoDI 5000.82 or the CJCSI AI governance expectations — it layers on top.
Civilian agencies vary more. Agencies with mature chief AI officer functions (VA, HHS, Treasury, GSA) will move quickly. Agencies with smaller AI footprints will lag, and in practice their SBIR topics may not cite the framework explicitly for another cycle. Offerors writing across both DoD and civilian topics should default to the higher bar (DoD-aligned framework compliance) in their boilerplate, since the incremental writing cost is trivial and the optionality is worth it.
The IC operates on separate authorization tracks (ICD 503, ICD 705) but has historically harmonized to NIST controls for unclassified-adjacent work. Framework alignment will flow through the ODNI-coordinated pathway; offerors in that space should track their sponsor's AI governance directive rather than the White House framework directly.
Practical Phase I checklist — 10 additions post-framework
The following checklist is what we now use internally when drafting any Phase I SBIR technical volume for an AI-touching topic. Every item adds measured content; collectively they total roughly two to four pages inside the standard page budget and raise the evaluation signal on governance and risk-management dimensions.
| # | Addition | Where it lives in the volume |
|---|---|---|
| 1 | Safety-testing methodology paragraph. Named threat model, NIST AI RMF mapping, artifact produced. | Technical approach section |
| 2 | Model-provenance statement. Foundation model, vendor, authorization posture, data-handling terms. | Approach / technical architecture |
| 3 | Red-team concept paragraph. Phase II scope, adversarial categories, cadence. | Phase II transition |
| 4 | Agency inventory alignment. Mapping to OMB M-24-10 use-case inventory (or commitment to complete in Phase II). | Phase III / transition narrative |
| 5 | Continuous-monitoring teaser. Metrics, cadence, triggers — concise. | Technical approach or Phase II transition |
| 6 | Civil-rights / disparate-impact note. Only if topic touches individual decisions; state scope explicitly. | Technical approach |
| 7 | FedRAMP inheritance statement. Which controls are inherited, shared, owned. | Technical architecture |
| 8 | Supply-chain attestation commitment. Training-data jurisdiction, model-weight origin, inference dependencies. | Risk / compliance section |
| 9 | Re-authorization trigger definition. What constitutes a significant change under NIST SP 800-37 RMF Step 6. | Risk / compliance section |
| 10 | PI attestation language. Principal investigator sign-off on the safety-testing evidence package commitment. | Cover / signature block |
Red-teaming, specifically
Because red-teaming is the pillar most likely to be underestimated by SBIR offerors who come from a pure research or ML-engineering background, it warrants its own note. Federal AI red-teaming is not web-application penetration testing. The target is the model's behavior — the prompts, the tool calls, the outputs — not the network perimeter or the authentication layer. Standard categories include: prompt-injection via untrusted data sources, jailbreaking refusal behavior, extracting sensitive information from training or fine-tuning data, causing the model to call tools out of scope, triggering hallucinations on policy-sensitive topics, and probing for bias-driven disparate outputs.
A Phase II red-team plan typically specifies the test harness, the categories of adversarial inputs, the cadence, the severity rubric, and the remediation protocol. Some agencies accept internal red-teaming by the offeror; others require third-party red-teaming or agency-led exercises. Budget accordingly in the Phase II cost volume — treat red-teaming as a line item, not as overhead. Our detailed treatment of how this operates in a federal context lives at AI red-teaming for federal systems.
Phase III transition — governance handoff
Phase III transition narratives often read as commercialization plans divorced from the governance posture of the delivered capability. The framework makes that separation harder to defend. A Phase III narrative now benefits from addressing: how the safety-testing evidence package transfers to the transition customer, how continuous-monitoring operational responsibility shifts, how re-authorization triggers are documented for the receiving agency, and how use-case inventory entries move from Phase II registration to steady-state governance. Treating the governance handoff as part of the commercialization story — not as compliance cleanup — is one of the clearest differentiators in evaluation.
What has not changed
- FedRAMP remains the cloud-authorization backbone. The framework layers on top; it does not replace.
- NIST AI RMF 1.0 and NIST SP 800-53 Rev 5 remain the underlying control catalogs. The framework cites them; offerors cite them.
- Phase I page budgets and evaluation criteria are unchanged. The additions fit inside the existing volume structure.
- SBIR Phase I dollar ceilings are unchanged. The documentation expansion does not come with additional funding.
- FAR Part 39 IT acquisition rules apply as before. The framework does not amend the FAR directly; agencies translate its direction into acquisition supplements over time.
Frequently asked questions
No. The framework is executive-branch policy direction. Statutory obligations flow from the underlying instruments — NIST AI RMF 1.0 (referenced), NIST SP 800-53 Rev 5 (authority), OMB M-24-10 (implementation), FY26 NDAA §1532 (provenance). The framework's practical effect is to compress the documentation timeline and raise proposal-stage expectations.
No. At Phase I you commit to producing the package. The proposal should name the artifacts, the methodology, and the NIST AI RMF functions in scope. The package itself is built through Phase I execution and matured in Phase II.
FedRAMP remains the cloud-authorization pathway. The framework pushes offerors to document which safety, monitoring, and provenance controls are inherited from the FedRAMP-authorized service (e.g., Bedrock in GovCloud, Azure OpenAI in Azure Government), which are shared responsibility, and which are owned by the offeror at the application layer.
Agency-dependent. Some agencies accept offeror-internal red-teaming; others require third-party or agency-led exercises. Confirm during SITIS Q&A or TPOC engagement. Budget as a Phase II cost line item, not as overhead.
Yes, in proportion to the AI role. If the capability's correctness materially depends on AI inference or decisioning, the framework's expectations apply. If AI is incidental (e.g., autocomplete in a UI), a brief treatment is usually sufficient.
Reference it thematically with the underlying authoritative document — NIST AI RMF 1.0, NIST SP 800-53 Rev 5, OMB M-24-10 — as the binding citation. Do not paraphrase framework language as if it were regulatory text. The evaluator will look for the underlying control citation, not framework prose.
