What we build
- Data lakehouses — Apache Iceberg, Delta Lake, or Hudi on S3 (GovCloud) / ADLS Gen2, with Spark or Trino compute. Full ACID, time travel, schema evolution.
- ELT pipelines — Airflow, Dagster, or Prefect orchestration; dbt for SQL-based transformation with lineage and testing.
- Real-time streaming — Kafka / MSK, Kinesis, Azure Event Hubs; Flink or Spark Structured Streaming for processing.
- Data governance — column-level lineage, PII/PHI classification, access control via Ranger or Unity Catalog equivalents, audit logs.
- Analytics layers — warehouse modeling (Kimball, Data Vault), semantic layers, BI integration (Power BI, Tableau, Superset).
- Feature engineering — for ML at federal scale, because the ML is only as good as the features.
Past performance
Enterprise Data Platform
Built data ingestion and analytics platform processing millions of health records. Real-time dashboards, automated reporting, HIPAA-compliant architecture. Full past performance →
Stack
- Storage: S3 (GovCloud), ADLS Gen2, Iceberg, Delta Lake, Hudi, Parquet, ORC.
- Compute: Spark, Trino, DuckDB, Flink, Ray.
- Warehouse: Postgres, Snowflake Government, Redshift, Synapse.
- Orchestration: Airflow, Dagster, Prefect.
- Transformation: dbt, SQLMesh.
- Streaming: Kafka, Kinesis, Event Hubs, Pulsar, NATS.
- Quality & lineage: Great Expectations, Soda, OpenLineage, DataHub.