Data Source Strategy
Semiconductor Supply Chain Ontology — The Data Landscape
Ross & Perry · Stanford GSB · April 2026
A priority-ranked map of 47 sources to seed the digital twin — what’s free, what’s gated, what to hit first.
47 Total Sources | 28 Free/GSB Access | 15 P1 Must-Haves | 0 P1 Sources Behind Paywall
How to Read This Deck
Access Classification
-
FREE — Publicly accessible. No credential.
-
GSB — Available via Stanford GSB library (Bloomberg, Capital IQ, Panjiva, FactSet, etc.).
-
PAID — Commercial license required. Not GSB-accessible.
-
VARIES — Tiered: some data free, full datasets require membership.
Priority Tiers
-
P1 MUST HAVE — Core to ontology population. Day-1 ingestion targets.
-
P2 HIGH VALUE — Enriches the graph: product-to-facility linkages, ownership trees, trade flows. Phase 2.
-
P3 ENRICHMENT — Dynamic updates, facility verification, event monitoring. After core graph established.
Thesis: The free + GSB tier is sufficient to seed the ontology skeleton. Paid sources buy velocity via pre-resolved relationships — defer until a design partner or post-prototype.
The Matrix: Priority × Access
P1 · MUST HAVE
-
FREE (11 sources): EDGAR, TWSE, KRX, Euronext, BIS, OFAC, UFLPA, USITC, Census, Comtrade, CHIPS.gov
-
GSB (2 sources): Bloomberg SPLC, Capital IQ (Corp Trees)
-
PAID (0 sources): None
Every P1 source is free or GSB-accessible. Zero P1 dollar-cost.
P2 · HIGH VALUE
-
FREE (13 sources): Company IR, ESG, Form SD, DOD 1260H, ITC 337, METI, EU Dual-Use, Taiwan MOEA, Korea MOTIE, Eurostat, Taiwan Customs, SIA, Patents
-
GSB (6 sources): Panjiva, Refinitiv, FactSet, Gartner, IDC, PatSnap
-
PAID (6 sources): Omdia, Yole, TechInsights, IC Insights, Orbis, ImportGenius
P3 · ENRICHMENT
-
FREE (5 sources): WTO TPR, OSM/Google Maps, EPA, GDELT, SemiAnalysis
-
GSB (1 source): Factiva
-
PAID (2 sources): Planet Labs/Maxar, DigiTimes
The 10 Source Categories
-
Corporate Disclosures — Supplier dependency, geographic risk, subsidiary trees. EDGAR is the richest free source. → Company, Facility nodes
-
Regulatory & Gov Lists — Compliance layer. BIS, OFAC, UFLPA — free, machine-readable, updated daily. → SubjectToRegulation edges
-
Trade & Customs — HS 8541/8542/8486 bilateral flows. Comtrade free; Panjiva adds shipper-consignee pairs. → ShipsProduct edges
-
Market Intelligence — Pre-resolved relationships, market share, capacity. Gartner/IDC via GSB; Omdia paid. → Ownership, market share weights
-
Industry Bodies — SEMI fab census, SIA factbook, JEDEC standards. → Product taxonomy
-
Geospatial / Facility — OSM + CHIPS.gov free; Planet Labs for construction tracking (defer). → Facility verification
-
Patents & Technology — Assignee mapping reveals R&D footprint and IP dependency (ARM, RISC-V). → TechnologyDependsOn edges
-
Financial Infra — Bloomberg SPLC, Capital IQ — pre-resolved corporate hierarchies. → Entity resolution backbone
-
Academic Research — RAND, CSIS, CNAS on geopolitical risk; NBER for empirical supply chain models. → Risk overlays, validation
-
News & Events — GDELT structured events; DigiTimes for Taiwan signal; SemiAnalysis for deep technical. → Dynamic disruption signals
P1 — Must Have (Week 1 Targets)
Every P1 node is reachable with zero commercial spend. Entity resolution (EDGAR ↔ Bloomberg ↔ Cap IQ) is the bridging skill.
P2 & P3 — Enrichment Layers
P2 · HIGH VALUE
-
Panjiva (GSB) — Bill of lading: shipper/consignee pairs EDGAR doesn’t disclose
-
Gartner + IDC (GSB) — Market share by segment (fabless/foundry/OSAT/IDM)
-
Omdia / Wood Mac (Paid) — Fab-level capacity, capex tracking, node roadmaps
-
Yole Group (Paid) — Advanced packaging supply chains — buy 2-3 targeted reports
-
TechInsights (Paid) — Die-level supplier ID via teardowns — expensive, target key chips
-
USPTO / EPO Patents — IP dependency mapping (ARM licensees, RISC-V adopters)
-
Refinitiv · FactSet (GSB) — Geographic revenue, supply chain module
-
DOD 1260H · ITC 337 — Litigation-surfaced relationships, Chinese military firms
P3 · ENRICHMENT
-
GDELT (Free) — Structured event extraction from global news — bulk BigQuery
-
SemiAnalysis (Free) — Deep technical supply chain analysis — subscribe free tier
-
DigiTimes (-3K/yr) — Highest signal-to-noise for TSMC/Taiwan supply chain
-
Factiva (GSB) — Historical archive — use for event back-fill
-
OSM + Google Earth — Fab/OSAT location verification — Overpass API for bulk
-
EPA Permits (Free) — U.S. facility confirmation + capacity correlation
-
Planet Labs / Maxar — Satellite construction tracking — defer to post-Series A
-
RAND / CSIS / CNAS — Geopolitical risk overlays — free high-quality reports
Paid-spend posture: Buy 2-3 targeted Yole reports + DigiTimes sub now (~-15K). Defer Omdia, TechInsights, Planet Labs until design-partner or post-prototype.
Week 1 Acquisition Sequence
-
Day 1 — Regulatory Lists: Download BIS, OFAC, UFLPA. Merge into unified restricted-party table. Compliance layer live.
-
Day 2 — EDGAR Bulk: Run sec-edgar-downloader on top 20 semi tickers. LLM-extract supplier mentions from 10-K/20-F.
-
Day 3 — Trade Flows: UN Comtrade + USITC DataWeb pulls for HS 8541/8542/8486, 2019-2023. Seed ShipsProduct edges.
-
Day 4 — GSB Stack: Bloomberg SPLC for top 20 tickers. Capital IQ corporate tree exports. Resolve entity aliases.
-
Day 5 — Entity Resolution: Build master alias table. LEI as primary key via OpenCorporates. OpenRefine / recordlinkage for fuzzy matching.
Outcome: ~200 company nodes · 500+ relationship edges · full compliance layer operational · entity resolution backbone in place
Recommended Next Steps
-
Audit GSB access now — Email the library: confirm Bloomberg SPLC, Capital IQ bulk export, Panjiva, Refinitiv, FactSet, Gartner/IDC. Bulk-export permissions vary.
-
Run Week 1 sequence — Regulatory lists → EDGAR → Comtrade → GSB stack → entity resolution. Five days to a live ontology skeleton.
-
Entity resolution is the unlock — TSMC appears as 5+ aliases across sources. Stand up LEI-keyed master alias table early.
-
Defer paid spend, buy targeted — Skip Omdia/TechInsights/Planet Labs for prototype. Pull trigger on ~2 Yole reports + DigiTimes sub when Phase 2 kicks off.
-
Convert table to Cowork tracker — Use the prompt in the source doc to generate a live research tracker spreadsheet.
Source: ProjectTBD_DataSource_Strategy.pptx.pdf — also stored in Obsidian vault at Project-TBD/Research/ProjectTBD_DataSource_Strategy.pdf