Federal graph analytics, entities to networks.

Neo4j, Amazon Neptune, entity resolution, link analysis, and graph neural networks for intelligence, law enforcement, benefits integrity, and mission-critical network analytics.

Overview — why federal is a graph problem

Most federal data is, at its core, a graph. Citizens connect to programs. Programs connect to funds. Funds flow to contractors. Contractors employ people. People travel between addresses. Vehicles register to addresses. Addresses sit in jurisdictions. Jurisdictions receive grants. The moment any analyst asks a question that traverses more than two of these relationships — "which contractors working on a DoD program have overlapping ownership with contractors excluded from HHS procurement?" — a relational join-heavy database starts buckling under its own query plan. A graph database answers in milliseconds what an analyst would otherwise wait minutes for, if the query even finishes.

Precision Delivery Federal LLC builds federal graph systems end to end: ingestion, entity resolution, storage, query, ML, and analyst-facing UI. We are a SAM.gov registered small business (UEI Y2JVCZXT9HP5, CAGE 1AYQ0, NAICS 541512). Our graph engineering spans the full spectrum — from intuitive analyst investigation tools to industrial-scale graph ML on billions of edges.

Our technical stack

LayerPrimaryAlternatesWhen we use it
Graph DB (property)Neo4j 5.x EnterpriseAmazon Neptune, TigerGraph, MemgraphDefault for interactive analyst workloads.
Graph DB (RDF)Amazon Neptune (RDF)Stardog, GraphDBWhen SPARQL / OWL reasoning is in scope.
Distributed graphJanusGraph on CassandraGraphFrames on SparkMulti-billion-edge analytic scale.
Entity resolutionZingg + SplinkSenzing, custom PyTorchProbabilistic matching with graph-based collective resolution.
Graph MLPyTorch GeometricDGL, GraphSAGE, Node2VecGNNs for classification, link prediction, embeddings.
QueryCypherGremlin, SPARQL, GQLCypher default. Gremlin for Neptune property workloads.
Streaming ingestKafka + Debezium CDCKinesis, FlinkCDC from systems of record.
VisualizationNeo4j Bloom, Cytoscape.jsLinkurious, D3 force-directedAnalyst investigation UIs.
Graph data scienceNeo4j GDS libraryNetworkX, igraph, cuGraphPageRank, community, centrality, path algorithms.
APIsGraphQL + gRPCREST (read-through), Cypher HTTPDownstream application integration.

Federal use cases

  • FBI investigative link analysis — entity, relationship, and timeline graphs across case files, electronic records, and public-records augmentation. FBI page.
  • DHS illicit-network discovery — trafficking, smuggling, and fraud-network surfacing from administrative records. DHS page.
  • HHS benefits integrity — Medicaid / Medicare fraud detection through provider-beneficiary-claim graphs. HHS page.
  • SAMHSA program linkage — cross-state substance use services network analysis, drawing on the SAMHSA production ML patterns we have already shipped. SAMHSA PP.
  • Treasury / FinCEN financial-network analysis — suspicious activity reporting, beneficial ownership traversal.
  • DoD supply-chain graph — tier-N supplier visibility, single-source-of-failure discovery. DoD page.
  • VA provider-veteran-claim graph — community-care network analysis and fraud surfacing. VA page.
  • USDA farm-program graph — payment overlap and related-party detection across programs. USDA page.
  • NIH grants graph — researcher-institution-project networks for program evaluation.
  • GSA procurement graph — vendor, past-performance, award-protest, and exclusion networks.

Reference architectures

1. Entity resolution + graph warehouse in GovCloud

Source systems stream CDC events via Debezium into Kafka (MSK in GovCloud). A stream-processing job resolves entities with Splink (probabilistic matching) and applies deterministic overrides. Resolved entities land in an Iceberg-backed graph staging layer; a nightly load builds the canonical Neo4j graph. Analyst queries hit a Neo4j Enterprise cluster; ML queries hit an export in Neptune Analytics. Audit logs capture every query and every entity-resolution decision for reviewability.

2. Intelligence investigation portal in Azure Government IL5

A custom investigation UI fronts a Neo4j cluster on AKS IL5. Analysts see a Cytoscape-based graph canvas with lasso selection, ego-network expansion, and timeline reconstruction. Every graph mutation logged for attribution. Sensitive attributes masked unless the analyst's clearance and need-to-know permit. Export to i2 Analyst's Notebook for downstream briefing products.

3. Graph ML pipeline for fraud scoring

A heterogeneous R-GCN over a healthcare claims graph, trained in PyTorch Geometric on a SageMaker GPU instance. Nightly retraining on the last 18 months of claims. Inference produces a fraud-risk embedding for every provider; downstream rules convert embeddings to review priorities. MLflow captures every training run, every data slice, and every evaluation metric.

Delivery methodology

  1. Discovery — graph hypothesis: what question, what entities, what edges, what sources. Agree on success metrics (queries/sec, entity-resolution F1, analyst time-to-insight).
  2. Design — schema, identifier strategy, ingest plan, security zones, query patterns, UI workflows.
  3. Build — increments: raw ingest → ER → canonical graph → first analyst workflow → ML overlays.
  4. Validate — ER quality review with subject matter experts; false-positive audit; pen test; ATO support.
  5. Operate — observability dashboards, backup/restore drills, graph rebalance runbooks, ongoing ER tuning.

Engagement models

  • SBIR Phase I / II fixed-price — ER prototypes, graph ML pilots.
  • Fixed-price pilot — scoped single-mission investigation portal.
  • T&M modernization — ongoing graph platform work.
  • OTA — DIU, NSIN, NavalX, AFWERX, Tradewinds.
  • Sub to prime — graph specialist inside a larger investigation or benefits integrity program.

Maturity model

  • Level 1 — Directory graph: single-source node-edge extraction.
  • Level 2 — Integrated graph: multiple sources unified with rule-based matching.
  • Level 3 — Resolved graph: probabilistic ER with confidence and provenance on every identity.
  • Level 4 — Analytical graph: ML overlays (embeddings, anomaly scores) consumed by analysts and downstream systems.
  • Level 5 — Operational graph: closed-loop with systems of record; graph insights drive real casework and measurable outcomes.

Deliverables catalog

  • Entity and relationship schema (Arrows.app diagram + PlantUML export).
  • Entity resolution pipeline (Zingg/Splink job + override rules).
  • Graph load jobs (Cypher/openCypher LOAD or bulk importer).
  • Graph database deployment (Helm charts, Terraform modules).
  • Analyst UI (React + Cytoscape.js).
  • GraphQL + gRPC APIs with OpenAPI / proto docs.
  • Graph ML training and inference code (PyTorch Geometric).
  • Observability dashboards.
  • SSP appendix + control inheritance.
  • ER quality audit workbook.

Technology comparison

PlatformStrengthsWeaknessesFederal fit
Neo4j EnterpriseCypher, GDS library, Bloom, bolt protocol.Licensing cost at scale, clustering operational overhead.High — analyst-heavy workloads.
Amazon NeptuneGovCloud-native, managed, IAM-integrated, Gremlin + SPARQL.Less analyst ecosystem, limited GDS-equivalent.High — AWS-native programs.
TigerGraphDistributed, fast multi-hop, GSQL.Smaller federal footprint, GSQL learning curve.Medium — large-scale analytics.
JanusGraphOpen source, distributed.Operational complexity, slower innovation.Medium.
MemgraphIn-memory, Cypher-compatible, streaming-first.Smaller ecosystem.Medium — streaming-heavy workloads.
SenzingTurnkey entity resolution, strong accuracy out of the box.Proprietary, limited tuning for novel domains.Medium — quick ER wins.

Federal compliance mapping

  • AC-3, AC-6, AC-16 — attribute-based access control enforced at the graph API layer; sensitive edges filtered based on clearance / need-to-know.
  • AU-2, AU-3, AU-12 — every query and every graph mutation logged with analyst identity, timestamp, and result size.
  • SI-4 — graph-anomaly detection doubles as user-behavior-analytics for insider threat.
  • MP-4 — entity-resolution decisions carry full provenance so analyst products can be audited and reviewed.
  • PT-4, PT-5 — privacy impact assessment considerations documented for any graph touching PII.

Sample technical approach — benefits integrity graph

An HHS program office wants to surface potential over-billing networks in a claims dataset. Current state: SQL warehouse, 18-month analyst backlog, ad-hoc rule-based flagging.

Discovery: we define the graph — providers, beneficiaries, claims, addresses, phone numbers, bank accounts, NPI numbers. Sources: claims warehouse, NPPES, state licensure feeds, exclusion lists. Success metric: F1 on a held-out audit set and analyst time-to-first-lead.

Design: probabilistic matching on providers (Splink), deterministic matching on NPI. R-GCN for embedding; community detection via Louvain; pattern-based rules (e.g., suspiciously tight provider-address rings) via GDS.

Build: 12 weeks. Ingest → ER → graph → investigation UI → ML overlays.

Validation: 30-day shadow pilot with 4 auditors. Measured against their traditional queue.

Related capabilities, agencies, vehicles, insights

Federal graph analytics, answered.
Neo4j or Amazon Neptune?

Both. Neo4j for richer Cypher ecosystem and GDS; Neptune for GovCloud-native serverless. We pick per program.

What is entity resolution?

Deciding that two records refer to the same entity. Probabilistic matching (Fellegi-Sunter / Splink), deterministic rules, and graph-based collective resolution. Tools: Zingg, Senzing, Splink, custom PyTorch.

Do you support intelligence and law enforcement use cases?

Yes. Link analysis, network-of-interest discovery, timeline reconstruction. Bo Peng is sponsorable for clearance; unclassified CUI work deliverable today.

Can you do graph ML?

Yes. PyTorch Geometric and DGL for GCN, GAT, GraphSAGE, R-GCN. Node2Vec embeddings for classical scoring.

Can the graph live in GovCloud / IL5?

Yes. Neptune natively in GovCloud; Neo4j Enterprise on EKS in GovCloud / Azure Gov. IL5 via Azure Gov IL5.

How do you scale past a billion edges?

Partitioned graph stores, pre-computed aggregations, sampled subgraphs for ML. GraphFrames on Spark or TigerGraph for the largest scales.

What about visualization?

Neo4j Bloom, Cytoscape.js, Linkurious, or D3. 508-compliant data-table alternates always provided.

Do you support streaming graph updates?

Yes. CDC → Kafka → ER service → idempotent upserts. Seconds for high-signal events, minutes for bulk.

Can you integrate with analyst toolchains?

Yes. i2 ANB exports, Palantir interop, ArcGIS geolinks, REST / gRPC downstream.

What is your pricing model?

Fixed-price pilots, T&M modernization, OTA and SBIR where applicable, sub to prime for large programs.

Often deployed together.
1 business day response

Relationships, finally queryable.

Federal graph analytics, entity resolution, and graph ML — ready to deliver.

[email protected]
UEI Y2JVCZXT9HP5CAGE 1AYQ0NAICS 541512SAM.GOV ACTIVE