Skip to main content

Kubernetes in the IC tier.

April 9, 2026 · 15 min read · OpenShift, RKE2, GPU operators, cross-domain patterns — what actually runs in SIPR, JWICS, and beyond.

The IC tier, briefly

"IC tier" is shorthand for the classified and special-access environments the intelligence community and parts of DoD operate in: SIPR (Secret), JWICS (TS/SCI), the AWS Secret Region / C2S, the Azure Secret / Top Secret offerings, on-prem accredited facilities, and program-specific enclaves. Kubernetes in this tier is not exotic anymore — it is the default way new services are shipped — but it is operated under constraints that significantly shape the architecture.

IC KUBERNETES IS DIFFERENT

Kubernetes in the IC tier (IL5/IL6) requires STIG-hardened base images, runtime security enforcement, and network policy limiting pod-to-pod traffic to explicit allowlists. Standard Kubernetes tutorials produce non-compliant configurations.

This post is the container platform we design and build inside those environments.

The main distributions, honestly

Red Hat OpenShift Container Platform (OCP)

The default across the DoD and IC enterprise. Strong integration (auth, registry, monitoring, SDN, operator framework), STIG-supported, FIPS-140 mode, Red Hat support available to cleared programs. OKD is the upstream community version; in practice programs deploy OCP because support and STIGs matter.

Footprint: significant. Control plane is a set of operators managing a set of services. Powerful; opinionated; adds surface area.

RKE2 (Rancher Kubernetes Engine 2)

SUSE Rancher's government-focused distribution. FIPS-140 certified, minimal footprint, hardened by default (CIS Kubernetes Benchmark preset), no Docker dependency. Growing presence in DoD edge and IC programs because of small attack surface.

Footprint: small. You bring your own registry, auth, monitoring, ingress. More assembly; more control.

Amazon EKS on GovCloud (and Secret Region)

Managed control plane. FedRAMP High in GovCloud, SECRET in AWS Secret Region. Strong fit when the workload is AWS-first and sits in those tiers.

Azure AKS on Azure Government

Equivalent managed option on Azure Government. Azure Top Secret and Secret offerings extend into higher tiers.

K3s, Kairos, edge options

For tactical-edge, shipboard, and deployed kits. K3s is SUSE's lightweight distro. Kairos is an immutable-OS K8s pattern that is gaining traction in tactical. Both fit small-footprint deployments with intermittent connectivity.

Kubernetes Distribution — IC Tier Suitability (FIPS, support, footprint)

Red Hat OpenShift (OCP) — STIG-backed, DoD default
95%
Amazon EKS on GovCloud / Secret Region
82%
Azure AKS on Azure Government / Top Secret
80%
RKE2 (Rancher) — FIPS-140, minimal footprint
78%
K3s / Kairos — edge deployments only
55%

OCP dominates enterprise IC programs. RKE2 and edge distros are growing at the classified tactical edge. Managed cloud (EKS/AKS) wins on ease of operations within authorized cloud boundaries.

GPU scheduling

Every federal AI K8s cluster has GPUs. The NVIDIA GPU Operator is the managed path:

  • Installs and upgrades NVIDIA drivers via a DaemonSet (no per-node manual installs).
  • Configures container runtime for GPU passthrough (CRI-O / containerd).
  • Deploys the K8s device plugin for GPU resource scheduling.
  • Runs DCGM and DCGM-exporter for GPU telemetry to Prometheus.
  • Manages MIG profile configuration for partitioned GPUs (split an H100 into smaller logical GPUs for many small workloads).

Key operational patterns:

  • Node labels and taints separate GPU nodes from non-GPU nodes. Pods request nvidia.com/gpu: N or MIG profiles explicitly.
  • GPU scheduling uses binpacking to maximize utilization; for serving workloads, anti-affinity ensures redundancy.
  • Time-slicing is an alternative to MIG when you have mixed-duty workloads and no strict isolation requirement. For mission systems we prefer MIG where available.

Air-gap install pattern

Every component must be in-cluster or in a local registry. The install bundle is significant:

  • Distribution installer (OpenShift release bundle or RKE2 tarball).
  • All container images referenced by cluster operators and default workloads, mirrored to a local registry (Harbor or Quay).
  • Helm charts and operator bundles, mirrored.
  • OS images for node provisioning (RHCOS for OpenShift, SLES or Ubuntu Pro for RKE2).
  • Node drivers (NVIDIA CUDA, network drivers, storage drivers).
  • CLI binaries (kubectl, helm, oc).
  • Signed manifest and install runbook.

Networking

CNI choice

OpenShift ships OVN-Kubernetes. RKE2 defaults to Canal (Calico + Flannel) or Cilium. Cilium is increasingly the choice for high-throughput or eBPF-observability workloads.

Ingress

OpenShift has Routes (HAProxy-based) and Ingress. RKE2 brings ingress-nginx. Production federal clusters usually front ingress with an approved WAF and mTLS to the service mesh.

Service mesh

Istio or OpenShift Service Mesh (built on Istio). mTLS everywhere is the pattern. Linkerd is lighter if the feature set fits.

Cluster-to-cluster

Cross-enclave communication via approved gateways or CDS, not via cluster federation primitives.

Storage

  • Block storage for stateful workloads via CSI drivers matching the hosting infrastructure (VMware CSI, EBS CSI on AWS, Ceph via Rook for on-prem HCI).
  • Object storage via S3-compatible backends (MinIO on-prem, AWS S3 in GovCloud).
  • Persistent volume encryption at rest via the storage provider's mechanism. Keys come from an approved KMS or HSM.

Security posture

  • Pod Security Standards. Baseline or Restricted (we default to Restricted in production). No privileged containers except where exceptions are documented and approved.
  • Image signing and admission. Cosign or Notation for signing; Kyverno or Gatekeeper to enforce signed-image-only policy at admission.
  • CIS Kubernetes Benchmark. RKE2 ships with CIS preset; OpenShift has STIG docs for the platform. Run kube-bench periodically.
  • NetworkPolicies. Default-deny, then explicit allow. Calico or Cilium network policies for finer grain.
  • Secrets. External Secrets Operator pulling from Vault or a KMS-backed secrets service. Etcd-encrypted secrets at minimum; external secrets for anything sensitive.
  • Audit. Kubernetes audit log at RequestResponse level for privileged operations, forwarded to the SIEM.
  • Runtime security. Falco or Tetragon for syscall-level monitoring; rules tuned to the workload profile.
Small attack surface is a feature. Every operator you add is another thing to STIG, patch, and defend. RKE2 plus the specific operators you need often beats a full OpenShift install on defensibility, especially at the edge.

Monitoring and logging

Metrics

Prometheus (or Thanos for multi-cluster). Alertmanager. Grafana for dashboards. In OpenShift this comes integrated; in RKE2 you install it.

Logs

Fluent Bit or Vector → Loki or Elasticsearch (self-hosted in-boundary) → Kibana or Grafana. Retention aligned with audit requirements.

Traces

OpenTelemetry collector → Tempo or Jaeger.

Audit pipeline

K8s audit → SIEM (Splunk Enterprise, Elastic Security, or approved SIEM).

CI/CD into classified clusters

The pipeline sits inside the boundary. Common patterns:

  • GitLab Enterprise on-prem with runners on STIG-hardened RHEL/SLES.
  • OpenShift Pipelines (Tekton) with runners inside the cluster.
  • Argo CD for GitOps-style deployments; Git as the source of truth for desired state.
  • Image build via Buildah or Kaniko (no Docker daemon); push to Harbor or Quay.
  • Image scan via Trivy or Clair; promotion gated on scan results.

What goes wrong

Outbound egress left open

A cluster that can reach the internet in a classified environment is a finding. Block by default at the network layer.

Base image drift

Node images updated out-of-band from the cluster upgrade process; STIG compliance drifts.

GPU driver version mismatch

Driver upgraded on some nodes and not others; workloads fail non-deterministically.

Unsigned images promoted to production

Admission policy not enforcing signatures, or a bypass was left active.

Control plane on the same network segment as workloads

Lateral movement risk; separate.

Etcd backup on unencrypted storage

Etcd contains secrets; backups must be encrypted and access-controlled.

Upgrade skipped because it looked hard

Security advisories accumulate. Every quarter, upgrade.

Where this fits in our practice

We build and operate Kubernetes platforms in every federal tier from GovCloud up. See our on-prem LLM deployment for what runs on top and our GPU capacity planning for the compute that drives the sizing.

FAQ

What Kubernetes distributions are realistic in the IC tier?
Red Hat OpenShift Container Platform is the most common. RKE2 (Rancher Kubernetes Engine 2) has grown significantly in DoD and IC footprints because of its small attack surface and FIPS-140 support. Amazon EKS on GovCloud reaches into SECRET via AWS Secret Region. Vanilla upstream Kubernetes is rare in classified production; operational cost is too high.
Why does the GPU operator matter for federal Kubernetes?
The NVIDIA GPU operator installs drivers, container runtime integration, device plugin, DCGM monitoring, and MIG partitioning as a managed lifecycle. Without it you are installing drivers by hand per node and the STIG posture drifts. Essential for any GPU-serving K8s cluster in federal.
How do you patch Kubernetes in an air-gapped environment?
Pull the new distribution (OpenShift release image, RKE2 binary), sign it, push through the cross-domain pipeline, import into the on-side registry, and upgrade through the distribution-native upgrade path. Upgrade cadence is typically quarterly, not monthly, to match the approvals calendar.
What about cross-domain solutions in Kubernetes environments?
Kubernetes typically lives inside a single security domain. Cross-domain needs are handled at the data layer (guards, CDS hardware/software) not inside the cluster. Applications that need cross-domain flow interact with CDS through approved endpoints; the cluster itself does not span domains.
What is the attack surface difference between OpenShift and RKE2?
OpenShift ships more features and more services (OperatorHub, built-in registry, built-in auth, integrated monitoring, OpenShift SDN). RKE2 ships less. OpenShift trades surface area for integration; RKE2 trades integration for a smaller footprint and FIPS-140 certification. Both can be hardened to federal bars; the effort is different.
Can you run classified workloads on commercial EKS?
Not directly. EKS in GovCloud supports FedRAMP High. AWS Secret Region supports SECRET workloads. JWICS and TS/SCI workloads run in other environments (C2S for the IC, programs with purpose-built infrastructure). Cluster choice follows the authorization.

Related insights

Standing up Kubernetes in a classified environment?

We build container platforms on OpenShift and RKE2 for SIPR, JWICS, and cross-domain federal workloads, with GPU scheduling and air-gap install pipelines.