Project TBD Q2 Product Roadmap

Context

We’re building a proprietary digital twin of the semiconductor supply chain. Our current focus is prototyping on public data to validate our thesis and create a concrete artifact for design-partnership conversations. We will pursue this effort full-time over the summer after receiving a Botha-Chan grant. Before then, we’re using an independent study with Jonathan Berk and the RDI program as our vehicle for dedicated work. We want your advice on our technical approach and the specific product development phases outlined below.

Goal

Build a working v0 prototype of the digital twin on public data by end of quarter: a knowledge graph of semiconductor firms, products, supply relationships, and regulatory status that an LLM can query conversationally. The prototype is our vehicle for research. By end of summer, we want a demo that can answer a concrete scenario end-to-end (e.g., “Show me TSMC’s tier-1 supplier network and flag any entities on the BIS Entity List”) and use that to open design-partnership conversations with real semi/compliance buyers.

What We’re Building

• Digital twin V0 — a property graph (knowledge graph) of firms, products, facilities, financials, logistics, and regulatory status, seeded entirely from public data

  • Agentic simulation demo — natural-language interface to explore simulations and stress-test the twin’s assumptions (e.g., “What if TSMC Fab 18 goes offline?”)
  • Compliance screening MVP — supplier onboarding checks against entity lists, sanctions, UFLPA (the wedge that feeds the twin)

Prototype Roadmap

Step 1: Public and Academic Data Integration

• Public Data: Extract supplier relationships from SEC 10-K/10-Q filings (Item 1, Item 1A, and Item 7); resolve against government lists such as BIS Entity List, OFAC SDN list, UFLPA Entity List using LLMs.

  • Legal Entity Identifiers: Query OpenCorporates API/SEC CIK for canonical LEIs (Legal Entity Identifiers).
  • Academic Databases: Use S&P Capital IQ, Refinitiv, and Bloomberg to cross-reference and enrich public filings with ownership trees, M&A history, and credit data.

Step 2: Assemble Structured Supply Chain Ontology

• Core entity types: Company, Facility, Product, Regulation, Jurisdiction

  • Core relationship types: SUPPLIES_TO (Company/Company), MANUFACTURES_AT (Company/Facility), MANUFACTURED_AT (Product/Facility), LOCATED_IN (Company/Jurisdiction), SUBJECT_TO_REGULATION (Jurisdiction/Regulation), SHIPS_PRODUCT (Company/Product).
  • Resolve synonyms (“TSMC”, “Taiwan Semiconductor Manufacturing Company”) with LEI codes/LLM fuzzy matching.

Step 3: Agentic Query Infrastructure

• Architecture: User asks a natural-language question → LLM translates to a structured query → executes against ontology graph → LLM formats the result as a narrative answer with citations to source data and relevant calculations explained step-by-step, enabling user to follow-up on specific assumptions made.

  • Scenario modeling: For “what if” queries, temporarily remove or degrade a node from the graph (e.g., TSMC Fab 18) and propagate impact through remaining supply edges. Return affected downstream firms and estimated revenue exposure.

Step 4: MVP UI for Compliance Screening

• Input: Company name or shipment manifest. Process: Fuzzy-match against unified restricted-party database (Entity List + SDN + UFLPA). Output: risk score, supplier matches against restricted-party database, liability.

  • Package as lightweight web app: search bar for compliance checks, chat interface for agentic queries and simulation, graph visualization for exploring the ontology.