Multi-Agency
Legacy stacks across several federal environments brought onto modern cloud.
Zero-Downtime
Cutover patterns designed so mission users never saw a maintenance window.
100% IaC
Every resource represented in Terraform. No snowflakes, no manual console edits.
Operational
Full handoff with runbooks, dashboards, and on-call rotations for the agency team.
Context: Legacy Federal Workloads
Most federal applications started life as something other than "a cloud-native app." They ran on on-prem virtual machines, or on a hosting provider's managed servers, or on a piece of infrastructure that was once modern and is now very much not. Over years and administrations, they accumulated: custom shell scripts in forgotten directories, service accounts whose owners left the agency, firewall rules whose purpose no one remembers, and configuration files edited by hand across half a dozen environments.
That is what we inherit on day one of a federal cloud migration. The first job is not to move anything. The first job is to understand what is actually there.
Assessment Phase: 6R Analysis
We use the industry-standard 6R framework to classify every workload. It is not a buzzword; it is the fastest way to align engineers, program managers, and budget people on what is actually happening to each application.
The Six Rs
- Retire. The workload is unused or redundant. The cheapest migration is the one you do not do.
- Retain. The workload stays where it is, for policy, cost, or timing reasons. Revisit later.
- Rehost ("lift and shift"). Move the workload to cloud with minimal changes. Fast, lower risk, limited upside.
- Replatform ("lift and reshape"). Move and make targeted changes (managed database, managed queue, autoscaling) without rewriting the app.
- Repurchase. Replace the workload with a commercial SaaS or GOTS alternative.
- Refactor (or re-architect). Rewrite significant parts of the application to take advantage of cloud-native services. Highest upside, highest risk.
How We Decide
Every workload gets a one-page assessment: current state, technical debt, data sensitivity, compliance constraints, operational criticality, forecast cost in each of the 6R paths, and a recommended disposition. Program stakeholders see all six options, with honest pros and cons, and sign off on the disposition. That one-pager becomes the source of truth for the migration plan, budget, and timeline.
A federal cloud migration that cannot tell you, per workload, which R was chosen and why, is a migration waiting to surprise its program office.
Infrastructure-as-Code with Terraform
Every resource is represented in Terraform. Every. Single. One.
This is a discipline, not a preference. Federal environments must be reproducible. When a security reviewer asks "how is this VPC configured," the answer is a file, not a screenshot. When an operator needs to rebuild an environment, the answer is a pipeline, not a tribal-knowledge memo.
Module Structure
We organize Terraform into layered modules:
- Foundation. Accounts, networking, identity, logging, baseline security controls. Changes rarely and carefully.
- Platform. Shared services used by many applications: container orchestration, observability, secrets management, artifact registries.
- Application. Per-workload resources, defined by the application team, consuming the platform modules.
Environments (dev, test, staging, prod) are the same code with different variable values. A pull request that merges to foundation changes is reviewed at a different threshold than one that changes an application module, and the CI/CD pipeline enforces that separation.
State and Secrets
Terraform state lives in encrypted, access-controlled storage with state locking. Secrets do not live in state. They live in an approved secrets manager and are referenced by ID, not by value. Any code review that surfaces a literal secret fails the build.
Policy-as-Code
Above Terraform, we run policy-as-code (OPA, Sentinel, or equivalent). Rules like "no public S3 buckets," "all EBS volumes encrypted," "no 0.0.0.0/0 ingress on production security groups" are enforced in the pipeline, not in quarterly audits. A misconfiguration cannot reach production.
Containerization Strategy
Docker
Every application workload is packaged as an OCI container. Base images are scanned, signed, and pulled from an approved registry. Every image has an SBOM (software bill of materials) generated at build time and attached to the image.
Kubernetes
Kubernetes is the orchestration substrate. Clusters are hardened against CIS benchmarks, run on a minimal node OS, and are upgraded on a defined cadence. Workload isolation is enforced with namespaces, network policies, and pod security standards. Admission controllers reject workloads that violate policy before they land.
Service Mesh and Observability
A service mesh provides mTLS between services, consistent traffic management, and first-class observability. Every request carries a trace ID from the edge through every downstream call. Dashboards show request rate, error rate, latency distribution, and saturation for every service. Incidents become traceable within minutes, not hours.
Zero-Downtime Migration Approach
Federal users do not tolerate Saturday-night maintenance windows for mission-critical applications. The migration approach is designed so that, for most workloads, users never see a cutover at all.
Pattern: Shadow Production
For applications where correctness is paramount, we run the new cloud deployment in shadow mode: it receives a mirror of production traffic, produces responses, and those responses are compared against the legacy system's responses offline. Discrepancies are investigated and resolved before cutover. The cutover itself becomes a routing change, not a functional change.
Pattern: Dual Write
For data-heavy workloads, we dual-write to legacy and cloud data stores for a controlled period. Reads are progressively migrated, with fallback to legacy if the cloud path fails. Once cloud is verified as the source of truth, the legacy store is frozen, archived, and eventually decommissioned.
Pattern: Canary
For applications where small differences are tolerable, we route a small percentage of traffic to the cloud deployment and monitor. Progressively increase the percentage. Automated rollback if error rate, latency, or saturation crosses a threshold.
Pattern: Big-Red-Button
Every cutover, regardless of pattern, has a documented rollback procedure that can be executed in minutes. That procedure is tested before the cutover, not theorized.
Compliance and Security Baked In
Federal migrations live or die by the security posture at go-live. Our pattern:
- Inherit controls from the landing zone. Work inside the agency's approved landing zone so common controls are inherited rather than re-implemented.
- Map to NIST SP 800-53. Every infrastructure component has a mapping to the applicable control family and evidence of implementation.
- Continuous monitoring. Configuration drift detection, vulnerability scanning, patch cadence, and log aggregation are all wired in before first production traffic.
- POA&M discipline. Residual risks are tracked with owners and due dates. Nothing is left implicit.
Post-Migration Operational Handoff
A migration is not done when the cutover completes. It is done when the agency team can operate the new environment without us. The handoff includes:
- Runbooks. Every known alert has a runbook. Every runbook has been executed at least once in a tabletop exercise.
- On-call rotations. Defined, with escalation paths and response-time targets.
- Dashboards. Curated views for operators, for program managers, and for leadership. Different audiences, different views.
- Training. Hands-on training sessions recorded and indexed. New team members can ramp without us.
- Warranty period. A defined window where we remain on call while the agency team takes primary ownership.
Lessons Learned
1. Assessment is 40% of the value
A rigorous 6R assessment prevents the expensive mistake of refactoring something that should have been retired.
2. IaC or it did not happen
If you cannot reproduce an environment from code, you do not have an environment; you have a pet. Federal systems cannot be pets.
3. Policy-as-code catches what reviews miss
Humans miss misconfigurations. Policy-as-code does not.
4. Zero-downtime is a design property
You cannot bolt zero-downtime on at the end. It has to be baked into the migration pattern from assessment.
5. Data is harder than compute
Moving stateless services is a Tuesday. Moving databases with strict consistency requirements is a month. Plan accordingly.
6. Build the operational handoff from day one
Runbooks written after cutover are runbooks that miss the hard cases. Write them as you encounter the hard cases.
7. Decommission deliberately
Legacy systems left running after migration are security liabilities. Set a decommission date, stick to it, and verify the shutdown.
8. Cost awareness is a cultural change
Cloud makes it easy to provision and easy to forget. Budgets, tags, and cost dashboards are not optional; they are part of the platform.