Multi-Agency Federal Cloud Migration

Legacy federal workloads meet cloud-native infrastructure. Assessment, infrastructure-as-code, containerization, and zero-downtime cutover, with an operational handoff that sticks.

Multi-Agency

Legacy stacks across several federal environments brought onto modern cloud.

Zero-Downtime

Cutover patterns designed so mission users never saw a maintenance window.

100% IaC

Every resource represented in Terraform. No snowflakes, no manual console edits.

Operational

Full handoff with runbooks, dashboards, and on-call rotations for the agency team.

Context: Legacy Federal Workloads

Most federal applications started life as something other than "a cloud-native app." They ran on on-prem virtual machines, or on a hosting provider's managed servers, or on a piece of infrastructure that was once modern and is now very much not. Over years and administrations, they accumulated: custom shell scripts in forgotten directories, service accounts whose owners left the agency, firewall rules whose purpose no one remembers, and configuration files edited by hand across half a dozen environments.

That is what we inherit on day one of a federal cloud migration. The first job is not to move anything. The first job is to understand what is actually there.

Ground rules. Federal cloud migrations operate inside approved environments (FedRAMP-authorized cloud services, agency-specific landing zones, DoD impact levels where applicable). Everything in this case study assumes that constraint.

Assessment Phase: 6R Analysis

We use the industry-standard 6R framework to classify every workload. It is not a buzzword; it is the fastest way to align engineers, program managers, and budget people on what is actually happening to each application.

The Six Rs

How We Decide

Every workload gets a one-page assessment: current state, technical debt, data sensitivity, compliance constraints, operational criticality, forecast cost in each of the 6R paths, and a recommended disposition. Program stakeholders see all six options, with honest pros and cons, and sign off on the disposition. That one-pager becomes the source of truth for the migration plan, budget, and timeline.

A federal cloud migration that cannot tell you, per workload, which R was chosen and why, is a migration waiting to surprise its program office.

Infrastructure-as-Code with Terraform

Every resource is represented in Terraform. Every. Single. One.

This is a discipline, not a preference. Federal environments must be reproducible. When a security reviewer asks "how is this VPC configured," the answer is a file, not a screenshot. When an operator needs to rebuild an environment, the answer is a pipeline, not a tribal-knowledge memo.

Module Structure

We organize Terraform into layered modules:

Environments (dev, test, staging, prod) are the same code with different variable values. A pull request that merges to foundation changes is reviewed at a different threshold than one that changes an application module, and the CI/CD pipeline enforces that separation.

State and Secrets

Terraform state lives in encrypted, access-controlled storage with state locking. Secrets do not live in state. They live in an approved secrets manager and are referenced by ID, not by value. Any code review that surfaces a literal secret fails the build.

Policy-as-Code

Above Terraform, we run policy-as-code (OPA, Sentinel, or equivalent). Rules like "no public S3 buckets," "all EBS volumes encrypted," "no 0.0.0.0/0 ingress on production security groups" are enforced in the pipeline, not in quarterly audits. A misconfiguration cannot reach production.

Containerization Strategy

Docker

Every application workload is packaged as an OCI container. Base images are scanned, signed, and pulled from an approved registry. Every image has an SBOM (software bill of materials) generated at build time and attached to the image.

Kubernetes

Kubernetes is the orchestration substrate. Clusters are hardened against CIS benchmarks, run on a minimal node OS, and are upgraded on a defined cadence. Workload isolation is enforced with namespaces, network policies, and pod security standards. Admission controllers reject workloads that violate policy before they land.

Service Mesh and Observability

A service mesh provides mTLS between services, consistent traffic management, and first-class observability. Every request carries a trace ID from the edge through every downstream call. Dashboards show request rate, error rate, latency distribution, and saturation for every service. Incidents become traceable within minutes, not hours.

Zero-Downtime Migration Approach

Federal users do not tolerate Saturday-night maintenance windows for mission-critical applications. The migration approach is designed so that, for most workloads, users never see a cutover at all.

Pattern: Shadow Production

For applications where correctness is paramount, we run the new cloud deployment in shadow mode: it receives a mirror of production traffic, produces responses, and those responses are compared against the legacy system's responses offline. Discrepancies are investigated and resolved before cutover. The cutover itself becomes a routing change, not a functional change.

Pattern: Dual Write

For data-heavy workloads, we dual-write to legacy and cloud data stores for a controlled period. Reads are progressively migrated, with fallback to legacy if the cloud path fails. Once cloud is verified as the source of truth, the legacy store is frozen, archived, and eventually decommissioned.

Pattern: Canary

For applications where small differences are tolerable, we route a small percentage of traffic to the cloud deployment and monitor. Progressively increase the percentage. Automated rollback if error rate, latency, or saturation crosses a threshold.

Pattern: Big-Red-Button

Every cutover, regardless of pattern, has a documented rollback procedure that can be executed in minutes. That procedure is tested before the cutover, not theorized.

Compliance and Security Baked In

Federal migrations live or die by the security posture at go-live. Our pattern:

Post-Migration Operational Handoff

A migration is not done when the cutover completes. It is done when the agency team can operate the new environment without us. The handoff includes:

Lessons Learned

1. Assessment is 40% of the value

A rigorous 6R assessment prevents the expensive mistake of refactoring something that should have been retired.

2. IaC or it did not happen

If you cannot reproduce an environment from code, you do not have an environment; you have a pet. Federal systems cannot be pets.

3. Policy-as-code catches what reviews miss

Humans miss misconfigurations. Policy-as-code does not.

4. Zero-downtime is a design property

You cannot bolt zero-downtime on at the end. It has to be baked into the migration pattern from assessment.

5. Data is harder than compute

Moving stateless services is a Tuesday. Moving databases with strict consistency requirements is a month. Plan accordingly.

6. Build the operational handoff from day one

Runbooks written after cutover are runbooks that miss the hard cases. Write them as you encounter the hard cases.

7. Decommission deliberately

Legacy systems left running after migration are security liabilities. Set a decommission date, stick to it, and verify the shutdown.

8. Cost awareness is a cultural change

Cloud makes it easy to provision and easy to forget. Budgets, tags, and cost dashboards are not optional; they are part of the platform.

FAQ

Which clouds do you work in?
Any FedRAMP-authorized or agency-approved environment. Typical targets include AWS GovCloud, Azure Government, and Google for Government, plus agency-specific clouds where applicable.
Do you always use Kubernetes?
No. Kubernetes is a good fit for many workloads but not all. Some workloads are better served by managed services or serverless. The 6R assessment drives the choice.
What about FedRAMP boundary?
We work inside the boundary the agency or program is operating under. Boundary decisions are program decisions; we support them with clean architecture and evidence.
Can you support DoD Impact Levels?
Yes. Patterns are broadly similar; specific controls, network isolation, and approved services differ by impact level.
How do you price migration engagements?
Structured around the assessment output: scope is defined workload by workload, with milestones tied to cutover and operational handoff. We do not sign up to migrate a portfolio we have not first assessed.
What if an agency already has a landing zone?
Better. We prefer to inherit an existing landing zone and focus on application migration, rather than rebuilding foundational infrastructure from scratch.

Related Capabilities

Planning a federal cloud migration?

Start with an honest assessment. We have done this before, and we know which R is the right R.

Email Bo Peng →