Compliance

FedRAMP LLM deployment in 2026: a practical map

Which LLM services are actually authorized, what you inherit from the platform, what you still have to build, and the mistakes that fail an ATO. Written for federal architects and ISSOs, not vendors.

FedRAMP in one paragraph

FedRAMP (Federal Risk and Authorization Management Program) is the government-wide program that standardizes how cloud services are assessed and authorized for federal use. A service carries one of three baseline authorizations: Low (for non-sensitive, publicly releasable data), Moderate (for most CUI — the baseline where most federal work lives), and High (for data where loss would have severe or catastrophic impact). DoD adds its own Impact Levels on top — IL2, IL4, IL5, IL6 — which overlap with but are not identical to FedRAMP. An agency that wants to use a service issues an Authority to Operate (ATO); the ATO inherits the FedRAMP authorization and adds system-specific controls on top.

For LLMs, the question is never "is AI allowed." It is "is this specific service, in this specific region, at this specific impact level, authorized for my data, and what controls do I still own."

FedRAMP authorizes the service. The agency authorizes the system. Your job is to know the boundary between the two and document everything on your side of it.

Current state: authorized LLM services

The landscape in April 2026 has three useful clusters. The managed hyperscaler LLM services, the open-weight hosting path, and the air-gapped path. Each has a different compliance posture.

Managed hyperscaler LLM services

ServiceAuthorizationNotes
Azure OpenAI Service (Azure Government) FedRAMP High, DoD IL4, DoD IL5 Runs in Azure Government regions. Model catalog lags commercial Azure OpenAI — check availability of specific GPT-4.x and reasoning-model SKUs before architecting around them. Customer data is not used for training under the Azure Government service terms.
Amazon Bedrock (AWS GovCloud US-West/US-East) FedRAMP High, DoD IL4, DoD IL5 Anthropic Claude model family, Amazon Titan, and select Meta and Mistral models are exposed via Bedrock in GovCloud. Specific model version availability trails commercial regions by weeks to months. Customer prompts are not used for vendor training under the GovCloud Bedrock terms.
Google Vertex AI (Assured Workloads) FedRAMP High, DoD IL4, DoD IL5 Gemini model family available through Vertex AI under Assured Workloads for US Federal. Impact-level coverage depends on region and specific model endpoint. Confirm with your Google federal team before committing.

Three usable paths. The practical choice is usually driven less by model quality and more by which cloud your agency has already authorized, where your data already lives, and which contract vehicle you have budget under.

Open-weight models on federal-authorized infrastructure

For teams that need full control over the model (fine-tuning on CUI, offline inference, avoiding any vendor model telemetry), the pattern is: run an open-weight model on FedRAMP-authorized infrastructure.

  • Llama 3.x family on SageMaker HyperPod in AWS GovCloud. HyperPod gives you managed training and inference clusters at FedRAMP High / IL5. You still own every model-layer control.
  • Mistral and Mixtral on equivalent Azure or GCP compute. Same pattern: the platform gives you infrastructure authorization; the model is your responsibility.
  • DeepSeek and other non-US-origin open weights. Possible technically but requires careful supply-chain review. Some agencies prohibit non-allied-origin models by policy even when the math is fine.

Open-weight on authorized infrastructure is the right pattern when you need to fine-tune on CUI, when you need deterministic inference pinning, or when your agency's data governance team will not sign off on sending prompts to a managed model endpoint.

Air-gapped and classified deployments

For classified work (IL6 / Secret, or higher), the story is different. You are on isolated networks — C2S (Commercial Cloud Services) AWS Secret Region, Azure Government Secret, Google for Government Secret — and the model choice is constrained to what has been deployed and authorized inside that enclave. Open-weight models run on air-gapped GPU infrastructure are the common pattern. Bring your own model and expect a slower, more deliberate authorization cycle.

Control inheritance: what you get, what you still own

The single biggest mistake teams make when scoping federal LLM work is over-estimating what they inherit. A FedRAMP High service gives you a huge boost — hundreds of controls are handled at the platform. But LLMs add an entire model layer that FedRAMP was not written for, and that layer is almost entirely your responsibility.

What you inherit from the platform

  • Physical data center controls (PE-family)
  • Network isolation, encryption in transit and at rest at the infrastructure level (SC-family)
  • Identity and access management primitives (AC-family) — though you still configure roles
  • Continuous monitoring and patch management of the underlying platform (SI, CM families)
  • Incident response on the platform side (IR)

What you still own

  • Input validation and prompt filtering (SI-10)
  • Output validation and data loss prevention (SI-15, SC-8)
  • Role-based access to specific models, endpoints, and tool integrations (AC-3, AC-6)
  • Full audit logging of prompts, responses, and tool invocations (AU-2, AU-3, AU-12)
  • Retrieval-augmented generation (RAG) data classification and source-level access control
  • Fine-tuning data handling (if applicable) — same impact level as the source data
  • Model version pinning and change management (CM-3)
  • Red-team testing and adversarial evaluation (CA-8, RA-5 extended for AI)

In a typical federal LLM system, the model-layer controls represent a substantial share of the engineering work. Teams that budget the project as "just call the API" routinely discover this in the middle of the ATO package.

The boundary question

Draw the authorization boundary diagram on day one.

Every federal system has an authorization boundary. For LLM systems, the boundary almost always encloses the model endpoint, your orchestration layer, your RAG store, your logging, and your UI. The FedRAMP service sits outside the boundary as an external system you interconnect with. Drawing that diagram first saves weeks of SSP rework later.

Prompt injection as a federal compliance issue

Prompt injection is often discussed as a security topic. In federal deployments it is also a compliance topic, because its outcomes map directly to NIST 800-53 controls.

  • An injected prompt that causes the model to emit CUI it should not have emitted is a data spill. That is AC-3 (access enforcement) and SC-4 (information in shared resources).
  • An injected prompt that causes the model to call an unauthorized tool is an unauthorized execution. That is AC-3 again, plus SI-10 (input validation).
  • An injected prompt that extracts system-prompt contents or earlier-session data is an information disclosure. AU-12 (audit content) and SC-8 (transmission confidentiality) are implicated.

An ISSO will ask, specifically, how you detect and respond to prompt injection. Be ready with a documented answer: what you filter on the way in (prompt classification, known-injection patterns, policy rules), what you filter on the way out (PII detection, classification markers, refusal enforcement), and how a detected injection turns into an audit event.

Audit logging for LLM systems

FedRAMP's audit control family (AU) predates LLMs. The interpretation in 2026 for LLM systems is settling on this minimum:

  • User identity. Who made the request.
  • Timestamp. Wall-clock, synchronized.
  • Model and version. Specific endpoint and version string.
  • Full prompt. Including system prompt, user prompt, and any context injected by your orchestration layer.
  • Full response. The complete model output.
  • Tool calls. Every tool invocation, with inputs and outputs.
  • RAG sources. Every document or passage retrieved and included in context, with the source ID.

Logs must be tamper-evident and retained per the system's SSP (typically one year online, longer offline depending on agency policy). If your prompts or responses contain CUI or PII, the log store itself must be handled at the same impact level as the source data — and encrypted at rest.

If you cannot tell me which user asked which question and got which answer from which model version, your system will not pass an ATO. Period.

Common mistakes that fail or slow an ATO

1. Using commercial endpoints for federal data

The most common and most serious mistake: engineers prototype against OpenAI's commercial API with real CUI (even "just for testing"), then assume it is trivial to swap to Azure Government later. It is not trivial, and the commercial-endpoint prototyping may itself be a data spill. If you are building for federal, stand up your development environment in the target federal cloud region from day one. Synthetic data only until then.

2. Unencrypted prompt logs

Logging the full prompt is required. Logging it in plain text in a default-configured log aggregator is a finding. Make sure your prompt and response logs are encrypted at rest with keys managed inside your authorization boundary (or in a KMS that is itself at the right impact level).

3. Missing PII and CUI masking in the retrieval layer

Teams will harden the LLM endpoint and then pipe a RAG retriever into it that pulls from a document store with no classification filtering. The model dutifully serves whatever the retriever returns. RAG sources need to respect the user's clearance and the request's classification. Retrieval-layer access control is a hard control, not a UI hint.

4. Assuming parity between commercial and Government endpoints

"We use GPT-4.x commercially, so we can use it at IL5." Not necessarily. Specific model versions, context windows, and features roll out to FedRAMP-authorized endpoints on their own timeline. Confirm the exact version available in the exact region at the exact impact level before architecting around it.

5. Skipping model evaluation and red-team documentation

An ATO package for an LLM system needs evidence that the team has tested the model against adversarial inputs relevant to the use case, has documented residual risks, and has a mitigation plan. "We use GPT-4" is not evidence. A short evaluation report with specific adversarial categories (injection, jailbreak, PII leakage, hallucination on in-domain queries) is.

6. No model change-management process

Vendor-managed models update. Version strings change. Behavior shifts. If your ATO is written against "the current Claude model" with no pinning, you have a moving authorization target. Pin to specific model versions, treat upgrades as formal change-management events, and re-test against your evaluation suite before rolling forward.

Practical architecture pattern

The pattern that clears ATO with the least friction in 2026 looks like this:

  • One authorized cloud region (Azure Government, AWS GovCloud, or Google Assured Workloads) holding everything inside your authorization boundary.
  • One model endpoint (Azure OpenAI, Bedrock, or Vertex) as an external system interconnection. Version-pinned.
  • An orchestration service inside the boundary that handles prompt construction, input filtering, output filtering, tool calls, and logging.
  • A classification-aware RAG store with per-document access controls tied to the requesting user's identity.
  • A central audit log capturing the seven fields above, encrypted at rest, retained per SSP.
  • A lightweight evaluation harness run on every model-version change, with results attached to the change ticket.

None of this is exotic. It is disciplined engineering applied to a layer (model behavior) that federal controls were not originally written for.

Where this is going

NIST 800-53 revisions and FedRAMP's own AI-specific guidance are still catching up. Expect three things to harden over the next 12 to 18 months: explicit AI-specific control overlays, standardized prompt-and-response logging schemas, and agency-level policy on approved model versions. The teams that are logging and evaluating today will have an easy transition. The teams that are not will have to retrofit under pressure.

Bottom line

FedRAMP-authorized LLM services exist and are usable in 2026. The authorization covers the service. It does not cover the prompt, the RAG retriever, the tool registry, the logging, or the evaluation discipline around the model. Those are yours. Draw the boundary, inherit what you can, document what you own, and test before you ship. That is the whole job.

Frequently asked questions

Is Azure OpenAI FedRAMP authorized?

Yes. Azure OpenAI Service is available in Azure Government at FedRAMP High and DoD IL4/IL5. Specific model availability and rollout timing differ from commercial Azure OpenAI — confirm the exact model and region before architecting around it.

Can I use Claude for federal workloads?

Yes, through Amazon Bedrock in AWS GovCloud regions under FedRAMP High / IL4 / IL5 authorization. Specific Claude model versions in GovCloud typically lag commercial regions by weeks to months.

Can I use open-weight models like Llama on federal infrastructure?

Yes. Host Llama 3.x or Mistral on SageMaker HyperPod (or equivalent Azure/GCP compute) inside a FedRAMP-authorized region. You inherit infrastructure controls and own every model-layer control — input filtering, output filtering, logging, fine-tuning data handling.

Is prompt injection a compliance issue?

In federal deployments, yes. Prompt injection outcomes map directly to NIST 800-53 controls: data spill (AC-3, SC-4), unauthorized tool execution (AC-3, SI-10), information disclosure (AU-12, SC-8). Your ATO package needs a documented detection and response story.

What do I have to log for federal LLM deployments?

At minimum: user identity, timestamp, model and version, full prompt, full response, tool calls, and RAG retrieval sources. Logs must be tamper-evident, retained per SSP, and encrypted at rest at the same impact level as the source data.

Can I prototype against a commercial LLM endpoint with real federal data?

No. Prototype against the same federal cloud region you plan to authorize in. Use synthetic data only until your development environment is inside the target boundary. Commercial-endpoint prototyping with real CUI is a data spill.

Keep reading
1 business day response

Architecting a federal LLM system?

We build agentic and RAG systems that stay inside a FedRAMP authorization boundary. If you are drafting an SSP or staring at an ATO package, start here.

Talk to us
UEI Y2JVCZXT9HP5CAGE 1AYQ0NAICS 541512SAM.GOV ACTIVE