What we build
Computer vision is where federal AI gets operational. It tracks vehicles in border imagery, reads handwritten fields off a DD-214, segments lung nodules in VA chest CTs, counts storage tanks in overhead imagery, and flags tampered documents in fraud investigations. The common thread: a labeled image goes in, a structured decision comes out, and a human somewhere makes a better choice because of it.
- Object detection & tracking — YOLO11, DINO-DETR, Co-DETR for closed-set detection; Grounding-DINO and OWLv2 for open-vocabulary queries.
- Segmentation — SAM2 for interactive and promptable workflows, Mask2Former for panoptic, nnU-Net for medical volumes.
- Classification & retrieval — ViT, SwinV2, ConvNeXt-V2, EVA-02 backbones. CLIP-style image embeddings for content-based retrieval.
- OCR & document understanding — PaddleOCR, TrOCR, Donut, Nougat, LayoutLMv3 for forms, tables, handwriting, and scientific content.
- Satellite & aerial imagery — multispectral ingest, tiling, change detection, object counting, land-use segmentation.
- Video analytics — activity recognition, object tracking (ByteTrack, BoT-SORT), anomaly detection in surveillance feeds.
- Medical imaging — radiology (X-ray, CT, MRI), pathology WSI, dermatology with FDA SaMD awareness.
Satellite and drone imagery
Geospatial computer vision is a distinct discipline. The images are enormous (10,000 x 10,000 pixels and larger), multispectral (RGB + NIR + SAR in some cases), and the labels are almost always sparse. Our geospatial pipeline starts with tiling and overlap management, handles projection and georeferencing, applies atmospheric corrections where needed, and trains detection and segmentation models that respect the scale.
For object counting tasks — storage tanks, vehicles, structures, aircraft on a tarmac — we combine density-map estimation with detection-based counting and reconcile the two. For change detection, we use bi-temporal and multi-temporal architectures (BIT, ChangeFormer) that compare aligned tiles over time.
OCR for federal forms
Federal forms are the densest, most consistent OCR target in the world. SF-50, SF-86, SF-1449, DD-214, W-2, 1040, VA claims forms, DHS applications — every one has a known layout and a known field schema. Generic OCR wastes that information. We combine layout-aware models (LayoutLMv3, Donut) with form-specific field extractors and validation rules. Handwritten entries, stamps, redactions, and varying scan quality get handled explicitly rather than hoped through.
For legacy records scanning — NARA digitization efforts, VA claims backlogs, FBI file conversion — we build end-to-end pipelines: deskew, denoise, binarize, segment, OCR, extract, validate, and route for human review when confidence drops below threshold.
Medical imaging
VA, DHA, and NIH medical imaging work requires a different posture. Models that inform clinical decisions fall under FDA's Software as a Medical Device (SaMD) framework. Even when a deliverable is explicitly decision-support rather than a diagnostic device, we design with the SaMD trajectory in mind: reproducible training, documented validation cohorts, demographic performance reporting, and traceable model versions.
Technical stack: nnU-Net for segmentation, MedSAM for interactive prompting, TotalSegmentator for whole-body anatomical labeling, RadImageNet or foundation models (RAD-DINO, BiomedCLIP) for pretrained backbones. For pathology WSI, we handle slide-level tiling, stain normalization (Macenko, Vahadane), and multiple-instance learning (CLAM, TransMIL) for whole-slide classification.
Small-data strategy
Federal CV problems rarely come with a million labels. 100-10,000 is typical. Modern techniques make that workable:
- Foundation model transfer — freeze a DINOv2, SAM2, or EVA-02 backbone and train a small head.
- Synthetic data — diffusion-generated augmentation, 3D-rendered scenarios for edge cases, domain randomization for sim-to-real.
- Active learning — label the examples the model is least sure about, not random samples.
- Self-supervised pretraining — masked image modeling on unlabeled in-domain imagery before supervised fine-tuning.
- Semi-supervised — FixMatch, pseudo-labeling, and consistency training when unlabeled data is abundant.
Deployment and performance
A model on a laptop is not a system. We ship CV systems with: TensorRT or ONNX Runtime export, INT8 quantization where accuracy allows, batching for throughput, GPU inference on AWS GovCloud G5/G6 instances or on-prem A100/H100 clusters, edge deployment on NVIDIA Jetson for disconnected environments, and continuous drift monitoring against labeled holdout sets.
Federal agencies and programs we target
- DoD — overhead imagery, autonomous platforms, sensor fusion, maintenance inspection
- DHS — border imagery, document authentication, cargo scanning
- VA and HHS — medical imaging, claims forms, clinical documentation
- NASA — earth observation, planetary science imagery, spacecraft inspection
- USDA — crop monitoring, disease detection, land use
- FBI and law enforcement — evidence analysis, document forensics, face and object search