The shape of federal geospatial ML in 2026
Federal geospatial ML has matured enormously over the past three years. Foundation models for satellite imagery are available and strong. Segment Anything and its descendants have made interactive segmentation practical. Time-series analysis at continental scale is achievable on reasonable budgets. What has not changed: federal geospatial data comes with classification, licensing, and provenance obligations that shape the pipeline differently than a commercial equivalent.
Geospatial ML for federal imagery must handle security markings on the image itself. Classification-aware pipelines must prevent downgraded imagery from leaking to unauthorized models — an architectural requirement, not a post-processing step.
This post is the stack we build for NGA, USGS, NASA, USDA, NOAA, and adjacent geospatial programs.
Model performance by task: baseline vs fine-tuned on federal imagery
Data sources and their constraints

| Source | Modality | Access | Typical use |
|---|---|---|---|
| Sentinel-2 (ESA) | MSI, 10-60m | Free, open | Land cover, agriculture, change detection |
| Landsat (USGS) | MSI, 30m | Free, open | Long time series, land management |
| Sentinel-1 (ESA) | SAR, C-band | Free, open | All-weather monitoring, flood, deformation |
| NAIP (USDA) | Aerial RGB/NIR, 0.6-1m | Open | Ag, infrastructure, US-only |
| Planet / Maxar (commercial) | Sub-meter MSI | Licensed via federal contracts | High-cadence monitoring |
| NGA GEOINT | Various | Classified | Mission analysis, IC work |
| EMIT / PRISMA / EnMAP | HSI | Open (mission-specific) | Material identification, methane, minerals |
| ICESat-2 / GEDI | LiDAR altimetry | Open | Canopy height, ice, biomass |
Preprocessing that matters
Geospatial ML is 70% preprocessing. Skipping this step produces bad models confidently.
- Atmospheric correction. Sentinel-2 L1C → L2A (Sen2Cor). Landsat Collection-2 Level-2. Without it, models learn atmospheric noise instead of surface signal.
- Cloud and shadow masking. Fmask, s2cloudless, or ML-based models (CloudSEN12). Apply before any temporal aggregation.
- SAR calibration. Sigma-nought or gamma-nought, multi-look speckle filtering (Lee, refined Lee), terrain correction (SRTM DEM), logarithmic scaling.
- Coregistration. Sub-pixel alignment across time or across sensors. Failure to coregister ruins change detection.
- Chipping. Extract tiles at a consistent size (224x224, 512x512) with sufficient overlap for seamless inference. Preserve geographic metadata.
- Normalization. Per-band statistics from the training set. Consistent at training and inference.
Foundation models for geospatial in 2026
Prithvi (NASA / IBM)
ViT-based, pretrained on Harmonized Landsat Sentinel-2, open weights. Strong baseline for segmentation and classification tasks after fine-tuning.
Clay
Multi-modal, multi-sensor foundation model, open-source. Handles Sentinel-1 + Sentinel-2 + NAIP jointly.
SatMAE / SatMAE++
Masked autoencoder pretraining for satellite; good transfer learning.
DINOv2 satellite variants
Self-supervised vision models fine-tuned on satellite imagery.
SAM 2 and geospatial SAM variants (SAMGeo, SatSAM, GeoSAM)
Interactive and automatic segmentation; works on orbital imagery with fine-tuning.
Spectral foundation models (HySpecNet, SpectralGPT)
For HSI; earlier-stage but real.
Task patterns
Land cover and land use classification
Fine-tune a foundation backbone on the agency's labeled polygons. Tooling: torchgeo or a custom PyTorch Lightning training loop. For broad-area production, export the fine-tuned model as a TorchScript or ONNX artifact and serve via Triton or vLLM-for-vision on GPU nodes.
Object detection (vehicles, aircraft, structures)
YOLO variants (YOLOv10, YOLOv11) remain competitive. DETR-family (RT-DETR) is increasing. For tiny objects in high-resolution imagery, detection-transformer architectures with high-resolution backbones beat YOLO, at higher compute cost.
Segmentation (buildings, roads, agriculture)
U-Net, UPerNet, or SegFormer heads on a Prithvi / Clay backbone. For interactive work, SAM 2 with click or box prompts. For automatic full-scene, semantic segmentation across the stack.
Change detection
Bi-temporal siamese networks (ChangeFormer, BIT) for two-date comparison. Time-series transformers (Presto, TempCNN, Prithvi-TS) for dense stacks. Key pattern: train on pairs/stacks from the same geography; generalization across very different geographies is weaker than teams expect.
Material identification (HSI)
Spectral angle mapping as a baseline; deep methods (1D CNN, 3D CNN over spatial-spectral cubes) for harder tasks. Strong use cases: agriculture, mineral exploration, methane detection (EMIT).
SAR-specific
Despeckling (SAR2SAR, self-supervised), ship and vehicle detection on amplitude, deformation monitoring via InSAR for infrastructure and subsidence. Processing-heavy; GPU helpful for large-area InSAR.
Serving at scale
A model that works on 100 chips must work on 100 million. Patterns:
Dask or Spark for tile-level orchestration
Launch tile inference jobs in parallel; gather and mosaic results.
GPU nodes for inference
H100 or L40S for throughput; A10G for cost-sensitive workloads. Triton Inference Server for multi-model GPU serving.
Cloud-native raster formats
COG (Cloud-Optimized GeoTIFF), Zarr, or STAC + COG for output products. Partial reads matter for downstream consumers.
STAC catalog
Register outputs with spatial, temporal, and asset metadata; makes downstream access feasible.
Scene-level caching
Preprocessed chips and intermediate products cached; cheaper to reload than recompute.
Evaluation: what actually measures success
Per-class IoU and F1 with confidence intervals
Not just mean; rare classes always matter.
Stratified evaluation
Performance by biome, season, resolution, sensor. Single-number metrics hide failure modes.
Spatial holdout
Train and test on geographically separated areas; otherwise generalization is overestimated.
Temporal holdout
For change detection, test on date pairs not seen during training.
Operational metrics
False-alarm rate at the tile level matters for downstream analysts; the model is useless if it floods them.
Classification, provenance, and chain of custody
- Source imagery's classification propagates to every derived product.
- Aggregation can uplift classification. Review per program.
- STAC metadata includes data lineage, model version, and processing chain.
- Outputs are signed (hash + pipeline version + model checkpoint).
- Access control at the STAC and object-storage layers respects the classification.
Where this fits in our practice
We build geospatial ML pipelines end to end — ingestion, preprocessing, training, inference at scale, and STAC-based delivery. See our GPU capacity planning for sizing and our MLOps on GovCloud for the surrounding platform.