Forecasting for federal operations.

Demand forecasting, predictive maintenance, readiness prediction, and budget planning. Foundation models, probabilistic forecasts, honest accuracy reporting. Built to improve decisions, not impress a demo audience.

Discuss your series See past performance

What we build

Time series is the unglamorous heart of federal operations. Fuel demand at a base, medication dispensing at a VA hospital, claims inflow at SSA, flight hours on an airframe, patient volume at a DHA clinic, equipment failure on a naval platform, benefit applications at HUD, traffic on a DHS checkpoint, and thousands of other series govern staffing, inventory, contracting, and budgeting decisions. A 5% improvement in forecast accuracy on a major federal series can free hundreds of millions in working capital and readiness.

Prophet / LSTM

Classical + deep learning

Real-time

Streaming forecast

AI RMF

Forecast governance

TIME SERIES — delivery pipeline

Discover

scope + data

Build

prototype + eval

Operate

ATO artifacts

GovCloud / IL5

monitor

We build production forecasting systems across the stack: data ingestion, feature engineering, model training, hierarchical reconciliation, probabilistic outputs, backtesting, monitoring, and decision-layer integration. Every model is benchmarked against naive baselines. Every forecast comes with calibrated uncertainty. Every deployment is instrumented for drift.

Foundation models

Chronos-Bolt, TimesFM-2.0, Moirai-MoE, Lag-Llama for zero-shot and fine-tuned forecasting.

Deep learning

N-BEATS, N-HiTS, Temporal Fusion Transformer (TFT), PatchTST, iTransformer, DLinear, DeepAR.

Classical methods

ETS, ARIMA, SARIMAX, Theta, Prophet, TBATS for series where they still win.

Gradient boosting

LightGBM, XGBoost, CatBoost with engineered lag and calendar features.

Hierarchical reconciliation

MinT, trace-minimization, bottom-up, top-down via hts, scikit-hts, Nixtla HierarchicalForecast.

Probabilistic forecasting

conformal prediction, quantile regression, DeepAR, GluonTS, NeuralForecast.

Survival & RUL models

Cox PH, DeepSurv, WeiBull AFT for predictive maintenance.

Intervention and causal analysis

synthetic control, difference-in-differences, CausalImpact for policy effect estimation.

FEDERAL FORECASTING USE CASE COVERAGE

Logistics and supply demand

88%

Budget and expenditure forecasting

82%

Workforce and capacity planning

78%

Infrastructure load prediction

85%

Environmental and climate data

72%

Demand forecasting

Federal demand forecasting is a hierarchical, intermittent, multi-horizon problem with strong calendar effects. Fuel, parts, food, medication, benefits, services. The data comes in messy: mid-series distribution shifts from policy changes, fiscal-year-end spikes, transfers between facilities that look like demand changes but are not, and long periods of zero demand on specialty items. Our demand forecasting pipelines start with data quality work (outlier detection, transfer-adjustment, holiday alignment) before any modeling, because a clean series forecast with a simple model beats a noisy series forecast with a fancy model nearly every time.

For high-volume smooth series, foundation models (Chronos-Bolt, TimesFM-2.0) give strong zero-shot performance and a solid baseline to beat. For series with strong seasonality and external drivers, TFT and SARIMAX with exogenous regressors are the workhorses. For intermittent demand, Croston, SBA, and TSB methods, combined with negative binomial or Tweedie regression, give calibrated forecasts that raw deep learning models struggle to match. For items with zero or near-zero volume, we move to hierarchical pooling and cross-series learning rather than per-item models.

Predictive maintenance

Predictive maintenance is not forecasting. It is survival analysis plus anomaly detection plus root-cause inference. For federal platforms (aircraft, ships, ground vehicles, medical equipment, IT infrastructure), the goal is to estimate remaining useful life from sensor streams, maintenance history, usage profile, and environmental conditions, then decide whether to fly, repair, replace, or inspect.

Our predictive maintenance stack: (1) Cox proportional hazards and DeepSurv for time-to-failure modeling with covariates; (2) LSTM and transformer-based RUL regression for dense sensor streams; (3) anomaly detection (isolation forest, LSTM autoencoder, variational methods) on vibration, temperature, current, and acoustic signals; (4) maintenance-record text extraction to align corrective actions with sensor trajectories; (5) physics-informed models where first-principles equations (thermodynamics, material fatigue) are available. We deliver decision thresholds calibrated to operational cost: the cost of a missed failure vs the cost of premature replacement is agency-specific and we tune to that curve explicitly.

Readiness and capacity prediction

Readiness prediction crosses forecasting and classification. Will this aircraft be mission-capable next month? Will this base have enough medical staff for expected demand? Will the IT fleet sustain operations through a surge? We build agent-based and statistical models that combine equipment forecasts, personnel availability, and demand projections to estimate readiness windows with uncertainty. For DoD specifically, mission-capable rate forecasting drives sustainment contracts worth billions, and the current forecasting methods across the services leave a lot on the table.

Probabilistic forecasting, honestly

A point forecast is a lie with convenient framing. The real answer to "how much fuel will we need next month" is a distribution, and the cost of being wrong high is not symmetric with the cost of being wrong low. Every forecast we deliver includes quantile outputs (typically 5%, 10%, 25%, 50%, 75%, 90%, 95%), calibrated to historical holdout performance.

Calibration is a first-class metric. A 90% prediction interval that only covers 70% of actuals is not a 90% interval, it is a lie. We use conformal prediction (Vovk, Gammerman, and Shafer) for distribution-free coverage guarantees, combined with quantile regression for deep models. Backtesting is walk-forward, not random, and we report both accuracy (pinball loss, CRPS, MASE) and calibration (coverage, reliability diagrams). Agency reviewers get the data to see whether uncertainty claims are trustworthy.

Hierarchical reconciliation

Federal forecasting is almost always hierarchical. Fuel forecast at the SKU-base-region-service level. Benefits forecast at the program-state-region level. Medication forecast at the NDC-pharmacy-facility-VISN level. A forecast that sums 1,000 item forecasts rarely matches the independently-made aggregate forecast, and both cannot be right. Reconciliation methods (MinT, OLS, bottom-up, top-down, middle-out) make all levels coherent while preserving the information in each level. We use Nixtla's HierarchicalForecast and hts for reconciliation and report whether reconciliation improved accuracy at each level, because it sometimes hurts lower levels to help upper levels.

Foundation models for time series

2024 was the year time-series foundation models became real. Chronos (Amazon, T5-based), TimesFM (Google), Moirai (Salesforce), Lag-Llama, and Tiny Time Mixers (IBM) ship pretrained on millions of series and produce zero-shot forecasts that beat classical baselines on many datasets. For federal use, the value is speed: an agency with thousands of new series (a new program, a new facility, a new platform) can get reasonable forecasts on day one without training per-series models. We fine-tune foundation models on agency history when data permits and use them as strong zero-shot baselines otherwise.

That said, foundation models are not magic. They struggle with heavily intermittent data, sharp regime changes, and series with strong exogenous drivers. We always benchmark against classical and deep learning alternatives and choose based on measured performance, not hype.

Backtesting and evaluation

The single biggest source of federal forecasting failures is bad evaluation. Random cross-validation on time-series leaks future into past. Single-point backtests hide variance. Aggregate MAPE hides segment-level disasters. We use walk-forward backtesting with expanding or rolling windows, report MASE (which is scale-free and compares against naive), pinball loss for quantile accuracy, CRPS for full distributional accuracy, and per-segment breakdowns (by item class, by region, by demand volume tier) so agencies see where the model works and where it does not. Hidden failure modes in federal forecasting almost always live in specific segments (rare items, new regions, holiday weeks) that aggregate metrics hide.

Federal agencies and programs

DoD — fuel, parts, spares, munitions demand; aircraft and ship RUL; mission-capable rate forecasting
VA — pharmacy demand, appointment volume, claims inflow, facility capacity
SSA and HHS — benefits applications, payment volume, program utilization
DHS — border crossing volume, checkpoint throughput, FEMA supply needs
USDA — crop yield, food program demand, market price indicators
GSA and Treasury — facility energy use, revenue forecasting, procurement demand

Turn history into better decisions.

Production forecasting for federal agencies. Ready to deliver.

Contact the PI See which agencies we serve →

UEI Y2JVCZXT9HP5CAGE 1AYQ0NAICS 541512SAM.GOV ACTIVE

Forecasting for federal operations.

What we build

Demand forecasting

Predictive maintenance

Readiness and capacity prediction

Probabilistic forecasting, honestly

Hierarchical reconciliation

Foundation models for time series

Backtesting and evaluation

Federal agencies and programs

Related reading

Machine Learning

Anomaly Detection

MLOps

Turn history into better decisions.