The IC tier, briefly
"IC tier" is shorthand for the classified and special-access environments the intelligence community and parts of DoD operate in: SIPR (Secret), JWICS (TS/SCI), the AWS Secret Region / C2S, the Azure Secret / Top Secret offerings, on-prem accredited facilities, and program-specific enclaves. Kubernetes in this tier is not exotic anymore — it is the default way new services are shipped — but it is operated under constraints that significantly shape the architecture.
Kubernetes in the IC tier (IL5/IL6) requires STIG-hardened base images, runtime security enforcement, and network policy limiting pod-to-pod traffic to explicit allowlists. Standard Kubernetes tutorials produce non-compliant configurations.
This post is the container platform we design and build inside those environments.
The main distributions, honestly

Red Hat OpenShift Container Platform (OCP)
The default across the DoD and IC enterprise. Strong integration (auth, registry, monitoring, SDN, operator framework), STIG-supported, FIPS-140 mode, Red Hat support available to cleared programs. OKD is the upstream community version; in practice programs deploy OCP because support and STIGs matter.
Footprint: significant. Control plane is a set of operators managing a set of services. Powerful; opinionated; adds surface area.
RKE2 (Rancher Kubernetes Engine 2)
SUSE Rancher's government-focused distribution. FIPS-140 certified, minimal footprint, hardened by default (CIS Kubernetes Benchmark preset), no Docker dependency. Growing presence in DoD edge and IC programs because of small attack surface.
Footprint: small. You bring your own registry, auth, monitoring, ingress. More assembly; more control.
Amazon EKS on GovCloud (and Secret Region)
Managed control plane. FedRAMP High in GovCloud, SECRET in AWS Secret Region. Strong fit when the workload is AWS-first and sits in those tiers.
Azure AKS on Azure Government
Equivalent managed option on Azure Government. Azure Top Secret and Secret offerings extend into higher tiers.
K3s, Kairos, edge options
For tactical-edge, shipboard, and deployed kits. K3s is SUSE's lightweight distro. Kairos is an immutable-OS K8s pattern that is gaining traction in tactical. Both fit small-footprint deployments with intermittent connectivity.
Kubernetes Distribution — IC Tier Suitability (FIPS, support, footprint)
OCP dominates enterprise IC programs. RKE2 and edge distros are growing at the classified tactical edge. Managed cloud (EKS/AKS) wins on ease of operations within authorized cloud boundaries.
GPU scheduling
Every federal AI K8s cluster has GPUs. The NVIDIA GPU Operator is the managed path:
- Installs and upgrades NVIDIA drivers via a DaemonSet (no per-node manual installs).
- Configures container runtime for GPU passthrough (CRI-O / containerd).
- Deploys the K8s device plugin for GPU resource scheduling.
- Runs DCGM and DCGM-exporter for GPU telemetry to Prometheus.
- Manages MIG profile configuration for partitioned GPUs (split an H100 into smaller logical GPUs for many small workloads).
Key operational patterns:
- Node labels and taints separate GPU nodes from non-GPU nodes. Pods request
nvidia.com/gpu: Nor MIG profiles explicitly. - GPU scheduling uses binpacking to maximize utilization; for serving workloads, anti-affinity ensures redundancy.
- Time-slicing is an alternative to MIG when you have mixed-duty workloads and no strict isolation requirement. For mission systems we prefer MIG where available.
Air-gap install pattern
Every component must be in-cluster or in a local registry. The install bundle is significant:
- Distribution installer (OpenShift release bundle or RKE2 tarball).
- All container images referenced by cluster operators and default workloads, mirrored to a local registry (Harbor or Quay).
- Helm charts and operator bundles, mirrored.
- OS images for node provisioning (RHCOS for OpenShift, SLES or Ubuntu Pro for RKE2).
- Node drivers (NVIDIA CUDA, network drivers, storage drivers).
- CLI binaries (kubectl, helm, oc).
- Signed manifest and install runbook.
Networking
CNI choice
OpenShift ships OVN-Kubernetes. RKE2 defaults to Canal (Calico + Flannel) or Cilium. Cilium is increasingly the choice for high-throughput or eBPF-observability workloads.
Ingress
OpenShift has Routes (HAProxy-based) and Ingress. RKE2 brings ingress-nginx. Production federal clusters usually front ingress with an approved WAF and mTLS to the service mesh.
Service mesh
Istio or OpenShift Service Mesh (built on Istio). mTLS everywhere is the pattern. Linkerd is lighter if the feature set fits.
Cluster-to-cluster
Cross-enclave communication via approved gateways or CDS, not via cluster federation primitives.
Storage
- Block storage for stateful workloads via CSI drivers matching the hosting infrastructure (VMware CSI, EBS CSI on AWS, Ceph via Rook for on-prem HCI).
- Object storage via S3-compatible backends (MinIO on-prem, AWS S3 in GovCloud).
- Persistent volume encryption at rest via the storage provider's mechanism. Keys come from an approved KMS or HSM.
Security posture
- Pod Security Standards. Baseline or Restricted (we default to Restricted in production). No privileged containers except where exceptions are documented and approved.
- Image signing and admission. Cosign or Notation for signing; Kyverno or Gatekeeper to enforce signed-image-only policy at admission.
- CIS Kubernetes Benchmark. RKE2 ships with CIS preset; OpenShift has STIG docs for the platform. Run kube-bench periodically.
- NetworkPolicies. Default-deny, then explicit allow. Calico or Cilium network policies for finer grain.
- Secrets. External Secrets Operator pulling from Vault or a KMS-backed secrets service. Etcd-encrypted secrets at minimum; external secrets for anything sensitive.
- Audit. Kubernetes audit log at RequestResponse level for privileged operations, forwarded to the SIEM.
- Runtime security. Falco or Tetragon for syscall-level monitoring; rules tuned to the workload profile.
Monitoring and logging
Metrics
Prometheus (or Thanos for multi-cluster). Alertmanager. Grafana for dashboards. In OpenShift this comes integrated; in RKE2 you install it.
Logs
Fluent Bit or Vector → Loki or Elasticsearch (self-hosted in-boundary) → Kibana or Grafana. Retention aligned with audit requirements.
Traces
OpenTelemetry collector → Tempo or Jaeger.
Audit pipeline
K8s audit → SIEM (Splunk Enterprise, Elastic Security, or approved SIEM).
CI/CD into classified clusters
The pipeline sits inside the boundary. Common patterns:
- GitLab Enterprise on-prem with runners on STIG-hardened RHEL/SLES.
- OpenShift Pipelines (Tekton) with runners inside the cluster.
- Argo CD for GitOps-style deployments; Git as the source of truth for desired state.
- Image build via Buildah or Kaniko (no Docker daemon); push to Harbor or Quay.
- Image scan via Trivy or Clair; promotion gated on scan results.
What goes wrong
Outbound egress left open
A cluster that can reach the internet in a classified environment is a finding. Block by default at the network layer.
Base image drift
Node images updated out-of-band from the cluster upgrade process; STIG compliance drifts.
GPU driver version mismatch
Driver upgraded on some nodes and not others; workloads fail non-deterministically.
Unsigned images promoted to production
Admission policy not enforcing signatures, or a bypass was left active.
Control plane on the same network segment as workloads
Lateral movement risk; separate.
Etcd backup on unencrypted storage
Etcd contains secrets; backups must be encrypted and access-controlled.
Upgrade skipped because it looked hard
Security advisories accumulate. Every quarter, upgrade.
Where this fits in our practice
We build and operate Kubernetes platforms in every federal tier from GovCloud up. See our on-prem LLM deployment for what runs on top and our GPU capacity planning for the compute that drives the sizing.