Microservices for federal mission systems.

Service mesh, event-driven architecture, CQRS and Saga patterns, and domain-driven design for federal mission systems with 5-year and 10-year time horizons.

Discuss your architecture View capabilities statement

Microservices in the federal context

Federal mission systems do not look like consumer SaaS. They have multi-decade lifespans, multiple authorizing officials, more compliance burden than business logic by line count, integration with systems older than the engineers who maintain them, and operational realities that include disconnected enclaves, classified networks, and procurement cycles measured in fiscal years. Microservices in this environment are a viable architecture, but only when the trade-offs are understood. Done thoughtfully they enable agency teams to ship independently and to compose long-lived assets. Done as cargo-cult they multiply complexity and produce nothing the agency couldn't have built as a clean monolith.

Service mesh

Istio / Linkerd

API gateway

Auth + routing

FedRAMP

Container-native ATO

MICROSERVICES — reference architecture

Edge / Client

authenticated request

identity + audit

MICROSERVICES

policy + guardrails

core engine

Agency system

system of record

SIEM / audit sink

Precision Federal designs microservice architectures for federal agencies with explicit attention to whether microservices are actually the right answer for the system, and when they are, with the operational, observability, and authorization patterns that make them sustainable.

FEDERAL MICROSERVICES DELIVERY

Domain decomposition and API contracts

Wk 1-3

Service skeleton and CI pipeline

Wk 3-6

Auth and service mesh (Istio/Envoy)

Wk 5-8

Observability (tracing, metrics, logs)

Wk 6-9

Kubernetes deployment and health gates

Wk 8-12

ATO boundary documentation

Wk 11-13

When microservices are the right answer

Three conditions need to hold. The system has multiple delivery teams that need to ship independently. The system has parts with meaningfully different scaling, data, or compliance requirements (e.g., a public lookup endpoint and a sensitive PII processor inside the same logical product). The system will live long enough — typically 5+ years — for the operational overhead to pay back. If two of these conditions don't hold, a modular monolith is the better choice. We have moved several federal teams from over-decomposed microservice landscapes back to modular monoliths because the operational tax exceeded the team coordination benefit.

Bounded contexts via domain-driven design

The most expensive microservice mistake is drawing boundaries at the wrong place. We use domain-driven design to identify bounded contexts in the agency's actual work:

Event storming

with subject matter experts to map domain events end to end.

Linguistic boundary detection

when a word means different things in different parts of the process, that's a context boundary. "Claim" in VA adjudication is a different aggregate than "claim" in appeals.

Subdomain classification

core (the agency's mission), supporting (necessary but undifferentiated), generic (commodity). Investment scales with classification.

Context map

explicit documentation of how contexts integrate (shared kernel, customer/supplier, anti-corruption layer, published language).

The output is a set of services where each service maps to a bounded context, owns its data, and exposes a published language to its consumers. This is not negotiable — services without bounded context discipline become distributed monoliths within 18 months.

Service mesh: Istio versus Linkerd

The service mesh is where federal compliance lives in microservice architectures. We deploy two meshes depending on context:

Istio — the federal default. Ships in DoD Platform One Big Bang. Reference mesh for Red Hat OpenShift Service Mesh. Capabilities: mTLS by default between every service, traffic policy (canary, mirroring, fault injection), authorization policy (deny-by-default, JWT-aware), telemetry (every request emits metrics, traces, and access logs), multi-cluster federation. Operational cost: significant. Pays back at 30+ services.
Linkerd — CNCF graduated, lighter. mTLS by default, simpler operational model, smaller resource footprint per workload. Lacks some of Istio's policy features. We use Linkerd at the edge, in resource-constrained environments, and where the team prefers simplicity over feature surface.

Both meshes deploy on hardened federal Kubernetes. Both integrate with OpenTelemetry for federation into the agency's observability stack.

Event-driven patterns

Synchronous request/response is the wrong default for federal mission systems with strict availability requirements. Event-driven architectures provide loose coupling, replayability, and an audit trail that maps cleanly to federal recordkeeping requirements. Brokers we deploy:

Apache Kafka

the federal default for high-throughput event backbones. Confluent Platform is FedRAMP authorized. Self-hosted Kafka on Kubernetes via Strimzi works well in disconnected environments.

AWS EventBridge / SNS / SQS

managed services in GovCloud, FedRAMP High. Tight IAM integration.

NATS JetStream

light, fast, deploys cleanly inside Kubernetes. Good for internal control planes.

RabbitMQ

mature, common in legacy federal environments.

CQRS and Saga for cross-service consistency

Distributed transactions across services are a recipe for failure modes that authorize poorly and operate worse. We use two patterns:

CQRS (Command Query Responsibility Segregation) — write models and read models are separate. Commands mutate the write model and emit events. Read models consume events and project into query-optimized stores. Eventual consistency, with documented staleness windows that the user-facing surface accommodates.
Saga pattern — long-running cross-service transactions decomposed into a sequence of local transactions, each with a compensating action if a downstream step fails. Two flavors: orchestration (a saga coordinator drives the flow, easier to audit, our default for federal) and choreography (services react to each other's events, more decoupled, harder to debug).

For orchestration sagas we typically use Temporal, AWS Step Functions, or a hand-rolled state machine emitting events the SIEM can consume. The audit trail produced by an orchestration saga is one of its strongest federal selling points: every step, every compensating action, every retry is explicit and queryable.

Observability built into the architecture

Microservices without strong observability are inoperable. We instrument every service from day one with the OpenTelemetry SDK in its native language, emitting traces, metrics, and structured logs. The service mesh adds another layer of telemetry (every cross-service hop). See federal observability for the platform side. Each service publishes:

RED metrics

Rate, Errors, Duration per endpoint.

Business metrics

tied to mission outcomes (claims processed per minute, alerts triaged, etc.).

Distributed traces

with W3C Trace Context propagation across mesh and message broker boundaries.

Structured logs

in JSON with correlation IDs.

Federal authorization for microservice systems

The microservice architecture, paradoxically, can simplify authorization when done with the right structure. Each service is treated as an independently authorizable component. Inheritance from the platform satisfies the bulk of NIST 800-53 controls. The service-level package documents only what the service adds: its data classification, its trust boundary, its inputs and outputs, its specific application controls. When the platform substrate itself is pre-authorized (via platform engineering), new services join the boundary in days rather than months. See ATO acceleration for the broader pattern.

Polyglot persistence with care

Each microservice owns its data. That principle invites polyglot persistence — Postgres for one service, MongoDB for another, Elasticsearch for search, Redis for caching, S3 for blobs. Polyglot persistence is appropriate when each choice serves a real need. It becomes a federal liability when each new datastore requires its own backup, encryption, monitoring, and authorization story. We constrain the federal palette to a small approved set per agency (typically Postgres + Redis + S3 + one search engine), with deviation requiring justification.

Migration patterns for legacy federal systems

Most federal microservice work is decomposing an existing system, not greenfield. We use the strangler fig pattern: route traffic incrementally from the legacy monolith to new microservices behind an API gateway, expand the new services until the legacy is empty, then decommission. Concrete steps:

Place an API gateway in front of the existing system without changing behavior.
Identify the first bounded context to extract (typically the most-changing, lowest-risk surface).
Build the new service against the same external contract.
Use traffic mirroring to validate correctness against the legacy.
Cut over traffic with feature flags and canary patterns.
Repeat until the legacy is empty.

We have run this pattern on systems ranging from 1980s mainframe-era applications to early-2010s Java monoliths.

Who we build microservice systems for

VA — Lighthouse program services, claims processing systems.
DoD — Platform One-aligned mission systems on IL4/IL5.
HHS — CMS modernization, FHIR services for health data exchange.
DHS — component-level mission systems with multi-tenant requirements.
Treasury — high-throughput financial event processing systems.

Decompose your monolith without losing the mission.

Bounded-context microservices for long-lived federal systems.

Contact the PI See which agencies we serve →

UEI Y2JVCZXT9HP5CAGE 1AYQ0NAICS 541512SAM.GOV ACTIVE