What we build
Time series is the unglamorous heart of federal operations. Fuel demand at a base, medication dispensing at a VA hospital, claims inflow at SSA, flight hours on an airframe, patient volume at a DHA clinic, equipment failure on a naval platform, benefit applications at HUD, traffic on a DHS checkpoint, and thousands of other series govern staffing, inventory, contracting, and budgeting decisions. A 5% improvement in forecast accuracy on a major federal series can free hundreds of millions in working capital and readiness.
We build production forecasting systems end-to-end: data ingestion, feature engineering, model training, hierarchical reconciliation, probabilistic outputs, backtesting, monitoring, and decision-layer integration. Every model is benchmarked against naive baselines. Every forecast comes with calibrated uncertainty. Every deployment is instrumented for drift.
- Foundation models — Chronos-Bolt, TimesFM-2.0, Moirai-MoE, Lag-Llama for zero-shot and fine-tuned forecasting.
- Deep learning — N-BEATS, N-HiTS, Temporal Fusion Transformer (TFT), PatchTST, iTransformer, DLinear, DeepAR.
- Classical methods — ETS, ARIMA, SARIMAX, Theta, Prophet, TBATS for series where they still win.
- Gradient boosting — LightGBM, XGBoost, CatBoost with engineered lag and calendar features.
- Hierarchical reconciliation — MinT, trace-minimization, bottom-up, top-down via hts, scikit-hts, Nixtla HierarchicalForecast.
- Probabilistic forecasting — conformal prediction, quantile regression, DeepAR, GluonTS, NeuralForecast.
- Survival & RUL models — Cox PH, DeepSurv, WeiBull AFT for predictive maintenance.
- Intervention and causal analysis — synthetic control, difference-in-differences, CausalImpact for policy effect estimation.
Demand forecasting
Federal demand forecasting is a hierarchical, intermittent, multi-horizon problem with strong calendar effects. Fuel, parts, food, medication, benefits, services. The data comes in messy: mid-series distribution shifts from policy changes, fiscal-year-end spikes, transfers between facilities that look like demand changes but are not, and long periods of zero demand on specialty items. Our demand forecasting pipelines start with data quality work (outlier detection, transfer-adjustment, holiday alignment) before any modeling, because a clean series forecast with a simple model beats a noisy series forecast with a fancy model nearly every time.
For high-volume smooth series, foundation models (Chronos-Bolt, TimesFM-2.0) give strong zero-shot performance and a solid baseline to beat. For series with strong seasonality and external drivers, TFT and SARIMAX with exogenous regressors are the workhorses. For intermittent demand, Croston, SBA, and TSB methods, combined with negative binomial or Tweedie regression, give calibrated forecasts that raw deep learning models struggle to match. For items with zero or near-zero volume, we move to hierarchical pooling and cross-series learning rather than per-item models.
Predictive maintenance
Predictive maintenance is not forecasting. It is survival analysis plus anomaly detection plus root-cause inference. For federal platforms (aircraft, ships, ground vehicles, medical equipment, IT infrastructure), the goal is to estimate remaining useful life from sensor streams, maintenance history, usage profile, and environmental conditions, then decide whether to fly, repair, replace, or inspect.
Our predictive maintenance stack: (1) Cox proportional hazards and DeepSurv for time-to-failure modeling with covariates; (2) LSTM and transformer-based RUL regression for dense sensor streams; (3) anomaly detection (isolation forest, LSTM autoencoder, variational methods) on vibration, temperature, current, and acoustic signals; (4) maintenance-record text extraction to align corrective actions with sensor trajectories; (5) physics-informed models where first-principles equations (thermodynamics, material fatigue) are available. We deliver decision thresholds calibrated to operational cost: the cost of a missed failure vs the cost of premature replacement is agency-specific and we tune to that curve explicitly.
Readiness and capacity prediction
Readiness prediction crosses forecasting and classification. Will this aircraft be mission-capable next month? Will this base have enough medical staff for expected demand? Will the IT fleet sustain operations through a surge? We build agent-based and statistical models that combine equipment forecasts, personnel availability, and demand projections to estimate readiness windows with uncertainty. For DoD specifically, mission-capable rate forecasting drives sustainment contracts worth billions, and the current forecasting methods across the services leave a lot on the table.
Probabilistic forecasting, honestly
A point forecast is a lie with convenient framing. The real answer to "how much fuel will we need next month" is a distribution, and the cost of being wrong high is not symmetric with the cost of being wrong low. Every forecast we deliver includes quantile outputs (typically 5%, 10%, 25%, 50%, 75%, 90%, 95%), calibrated to historical holdout performance.
Calibration is a first-class metric. A 90% prediction interval that only covers 70% of actuals is not a 90% interval, it is a lie. We use conformal prediction (Vovk, Gammerman, and Shafer) for distribution-free coverage guarantees, combined with quantile regression for deep models. Backtesting is walk-forward, not random, and we report both accuracy (pinball loss, CRPS, MASE) and calibration (coverage, reliability diagrams). Agency reviewers get the data to see whether uncertainty claims are trustworthy.
Hierarchical reconciliation
Federal forecasting is almost always hierarchical. Fuel forecast at the SKU-base-region-service level. Benefits forecast at the program-state-region level. Medication forecast at the NDC-pharmacy-facility-VISN level. A forecast that sums 1,000 item forecasts rarely matches the independently-made aggregate forecast, and both cannot be right. Reconciliation methods (MinT, OLS, bottom-up, top-down, middle-out) make all levels coherent while preserving the information in each level. We use Nixtla's HierarchicalForecast and hts for reconciliation and report whether reconciliation improved accuracy at each level, because it sometimes hurts lower levels to help upper levels.
Foundation models for time series
2024 was the year time-series foundation models became real. Chronos (Amazon, T5-based), TimesFM (Google), Moirai (Salesforce), Lag-Llama, and Tiny Time Mixers (IBM) ship pretrained on millions of series and produce zero-shot forecasts that beat classical baselines on many datasets. For federal use, the value is speed: an agency with thousands of new series (a new program, a new facility, a new platform) can get reasonable forecasts on day one without training per-series models. We fine-tune foundation models on agency history when data permits and use them as strong zero-shot baselines otherwise.
That said, foundation models are not magic. They struggle with heavily intermittent data, sharp regime changes, and series with strong exogenous drivers. We always benchmark against classical and deep learning alternatives and choose based on measured performance, not hype.
Backtesting and evaluation
The single biggest source of federal forecasting failures is bad evaluation. Random cross-validation on time-series leaks future into past. Single-point backtests hide variance. Aggregate MAPE hides segment-level disasters. We use walk-forward backtesting with expanding or rolling windows, report MASE (which is scale-free and compares against naive), pinball loss for quantile accuracy, CRPS for full distributional accuracy, and per-segment breakdowns (by item class, by region, by demand volume tier) so agencies see where the model works and where it does not. Hidden failure modes in federal forecasting almost always live in specific segments (rare items, new regions, holiday weeks) that aggregate metrics hide.
Federal agencies and programs
- DoD — fuel, parts, spares, munitions demand; aircraft and ship RUL; mission-capable rate forecasting
- VA — pharmacy demand, appointment volume, claims inflow, facility capacity
- SSA and HHS — benefits applications, payment volume, program utilization
- DHS — border crossing volume, checkpoint throughput, FEMA supply needs
- USDA — crop yield, food program demand, market price indicators
- GSA and Treasury — facility energy use, revenue forecasting, procurement demand