Skip to main content

Geospatial ML for federal imagery.

April 11, 2026 · 15 min read · SAR, MSI, HSI, segment-anything for geo, change detection, and real NGA/USGS data patterns in 2026.

The shape of federal geospatial ML in 2026

Federal geospatial ML has matured enormously over the past three years. Foundation models for satellite imagery are available and strong. Segment Anything and its descendants have made interactive segmentation practical. Time-series analysis at continental scale is achievable on reasonable budgets. What has not changed: federal geospatial data comes with classification, licensing, and provenance obligations that shape the pipeline differently than a commercial equivalent.

CLASSIFICATION LABELS LIVE ON THE IMAGE

Geospatial ML for federal imagery must handle security markings on the image itself. Classification-aware pipelines must prevent downgraded imagery from leaking to unauthorized models — an architectural requirement, not a post-processing step.

This post is the stack we build for NGA, USGS, NASA, USDA, NOAA, and adjacent geospatial programs.

Model performance by task: baseline vs fine-tuned on federal imagery

Data sources and their constraints

SourceModalityAccessTypical use
Sentinel-2 (ESA)MSI, 10-60mFree, openLand cover, agriculture, change detection
Landsat (USGS)MSI, 30mFree, openLong time series, land management
Sentinel-1 (ESA)SAR, C-bandFree, openAll-weather monitoring, flood, deformation
NAIP (USDA)Aerial RGB/NIR, 0.6-1mOpenAg, infrastructure, US-only
Planet / Maxar (commercial)Sub-meter MSILicensed via federal contractsHigh-cadence monitoring
NGA GEOINTVariousClassifiedMission analysis, IC work
EMIT / PRISMA / EnMAPHSIOpen (mission-specific)Material identification, methane, minerals
ICESat-2 / GEDILiDAR altimetryOpenCanopy height, ice, biomass

Preprocessing that matters

Geospatial ML is 70% preprocessing. Skipping this step produces bad models confidently.

  • Atmospheric correction. Sentinel-2 L1C → L2A (Sen2Cor). Landsat Collection-2 Level-2. Without it, models learn atmospheric noise instead of surface signal.
  • Cloud and shadow masking. Fmask, s2cloudless, or ML-based models (CloudSEN12). Apply before any temporal aggregation.
  • SAR calibration. Sigma-nought or gamma-nought, multi-look speckle filtering (Lee, refined Lee), terrain correction (SRTM DEM), logarithmic scaling.
  • Coregistration. Sub-pixel alignment across time or across sensors. Failure to coregister ruins change detection.
  • Chipping. Extract tiles at a consistent size (224x224, 512x512) with sufficient overlap for seamless inference. Preserve geographic metadata.
  • Normalization. Per-band statistics from the training set. Consistent at training and inference.

Foundation models for geospatial in 2026

Prithvi (NASA / IBM)

ViT-based, pretrained on Harmonized Landsat Sentinel-2, open weights. Strong baseline for segmentation and classification tasks after fine-tuning.

Clay

Multi-modal, multi-sensor foundation model, open-source. Handles Sentinel-1 + Sentinel-2 + NAIP jointly.

SatMAE / SatMAE++

Masked autoencoder pretraining for satellite; good transfer learning.

DINOv2 satellite variants

Self-supervised vision models fine-tuned on satellite imagery.

SAM 2 and geospatial SAM variants (SAMGeo, SatSAM, GeoSAM)

Interactive and automatic segmentation; works on orbital imagery with fine-tuning.

Spectral foundation models (HySpecNet, SpectralGPT)

For HSI; earlier-stage but real.

Task patterns

Land cover and land use classification

Fine-tune a foundation backbone on the agency's labeled polygons. Tooling: torchgeo or a custom PyTorch Lightning training loop. For broad-area production, export the fine-tuned model as a TorchScript or ONNX artifact and serve via Triton or vLLM-for-vision on GPU nodes.

Object detection (vehicles, aircraft, structures)

YOLO variants (YOLOv10, YOLOv11) remain competitive. DETR-family (RT-DETR) is increasing. For tiny objects in high-resolution imagery, detection-transformer architectures with high-resolution backbones beat YOLO, at higher compute cost.

Segmentation (buildings, roads, agriculture)

U-Net, UPerNet, or SegFormer heads on a Prithvi / Clay backbone. For interactive work, SAM 2 with click or box prompts. For automatic full-scene, semantic segmentation across the stack.

Change detection

Bi-temporal siamese networks (ChangeFormer, BIT) for two-date comparison. Time-series transformers (Presto, TempCNN, Prithvi-TS) for dense stacks. Key pattern: train on pairs/stacks from the same geography; generalization across very different geographies is weaker than teams expect.

Material identification (HSI)

Spectral angle mapping as a baseline; deep methods (1D CNN, 3D CNN over spatial-spectral cubes) for harder tasks. Strong use cases: agriculture, mineral exploration, methane detection (EMIT).

SAR-specific

Despeckling (SAR2SAR, self-supervised), ship and vehicle detection on amplitude, deformation monitoring via InSAR for infrastructure and subsidence. Processing-heavy; GPU helpful for large-area InSAR.

A fine-tuned Prithvi beats a custom-trained ResNet50 on almost any federal geospatial task in 2026. Start with the foundation model.

Serving at scale

A model that works on 100 chips must work on 100 million. Patterns:

Dask or Spark for tile-level orchestration

Launch tile inference jobs in parallel; gather and mosaic results.

GPU nodes for inference

H100 or L40S for throughput; A10G for cost-sensitive workloads. Triton Inference Server for multi-model GPU serving.

Cloud-native raster formats

COG (Cloud-Optimized GeoTIFF), Zarr, or STAC + COG for output products. Partial reads matter for downstream consumers.

STAC catalog

Register outputs with spatial, temporal, and asset metadata; makes downstream access feasible.

Scene-level caching

Preprocessed chips and intermediate products cached; cheaper to reload than recompute.

Evaluation: what actually measures success

Per-class IoU and F1 with confidence intervals

Not just mean; rare classes always matter.

Stratified evaluation

Performance by biome, season, resolution, sensor. Single-number metrics hide failure modes.

Spatial holdout

Train and test on geographically separated areas; otherwise generalization is overestimated.

Temporal holdout

For change detection, test on date pairs not seen during training.

Operational metrics

False-alarm rate at the tile level matters for downstream analysts; the model is useless if it floods them.

Classification, provenance, and chain of custody

  • Source imagery's classification propagates to every derived product.
  • Aggregation can uplift classification. Review per program.
  • STAC metadata includes data lineage, model version, and processing chain.
  • Outputs are signed (hash + pipeline version + model checkpoint).
  • Access control at the STAC and object-storage layers respects the classification.

Where this fits in our practice

We build geospatial ML pipelines end to end — ingestion, preprocessing, training, inference at scale, and STAC-based delivery. See our GPU capacity planning for sizing and our MLOps on GovCloud for the surrounding platform.

FAQ

What is the difference between SAR, MSI, and HSI and why does it matter?
SAR (synthetic aperture radar) is active; works day, night, through clouds. MSI (multispectral, e.g. Sentinel-2, Landsat) is a small number of bands in the visible and near-infrared. HSI (hyperspectral) is hundreds of narrow bands supporting material identification. Each needs different preprocessing, different models, and different analytical frames.
Does Segment Anything (SAM) work for satellite imagery?
Not out of the box for orbital imagery — SAM was trained on natural photos and struggles with top-down, varying scales, and domain shift. SAM 2 and geospatial variants (GeoSAM, SAMGeo, SatSAM) have improved this significantly. For production work, fine-tuning on domain data is still usually required.
What foundation models exist for satellite imagery?
Prithvi (NASA/IBM), Clay, SatMAE, DINOv2-satellite variants. All are open-weight or permissive. Fine-tuning one of these on the target task beats training from scratch by a wide margin on most federal geospatial tasks.
How do you do change detection reliably?
Two paradigms: bi-temporal direct comparison (siamese networks, Change Former) and time-series anomaly detection (LSTM or transformer over a dense stack of observations). Choice depends on cadence and use case. Time-series is more robust to noise and seasonality; bi-temporal is faster to deploy.
How do you handle classification markings on geospatial products?
Every derived product inherits the classification of its inputs plus any aggregation-induced uplift. Commercial Sentinel-2 pipelines are unclassified; GEOINT derived from NRO assets is not. Pipelines must track data lineage through every step and propagate markings to outputs. Chip-level tagging is standard.
Where does GPU capacity matter most on geospatial work?
Training foundation-scale backbones needs 8-64 H100s for weeks. Fine-tuning for a target task fits on 1-4 H100s. Inference on very large scenes is GPU-bound when doing dense pixel-wise inference over thousands of square kilometers; this is where most production budget goes.

Related insights

Building geospatial ML on federal imagery?

We build SAR, MSI, and HSI pipelines, change-detection systems, and segmentation models on NGA and USGS data — with classification and provenance intact.