作品Mar 2026 – PresentCase study
Data Warehouse
A data warehouse for rebar delivery orders. Ingests delivery documents from steel suppliers and scanned/WhatsApp uploads, reads each PDF into structured data with AI, deduplicates and reconciles against purchase orders, and serves it through a web app — orchestrated with Apache Airflow over AWS Lambda.
PeriodMar 2026 – Present
StackApache Airflow · AWS Lambda · Python · .NET
A data warehouse for rebar delivery orders (DOs). It pulls delivery documents from steel suppliers and scanned / WhatsApp uploads, uses AI to read each PDF into structured data, deduplicates and reconciles deliveries against purchase orders, and serves everything through a web app — orchestrated with Apache Airflow over AWS Lambda.
Overview
Construction sites receive rebar against delivery orders from many suppliers, almost always as PDFs. This warehouse centralises every DO — from supplier portals, scanned packages and a WhatsApp intake flow — into one queryable source of truth, with AI extraction replacing manual data entry.
Role
Full-stack + data engineering — .NET back end, React front end, and the Airflow pipelines (scanned-DO and per-supplier sync) wired to the AWS Lambda functions.
Highlights
- Per-supplier connectors — each vendor has its own fetch → download → split flow (with EventBridge relays and fan-out PDF downloaders), triggered on a schedule or on demand.
- AI document extraction — Lambdas split package PDFs into per-DO pages and read them into structured rebar data with vision LLMs, using company-aware prompts and automatic DO-type classification.
- Dedup & reconciliation — new deliveries are deduped against the warehouse, then DO line items are matched back to purchase-order items (including vision-based matching) and vendor names normalised.
- Airflow orchestration — TaskFlow DAGs with dynamic fan-out, bounded concurrency to protect database connections, retries and WhatsApp failure alerts.
- Multi-source ingest — one pipeline serves supplier APIs, scanned uploads and a WhatsApp intake flow.
- Web app — React/TypeScript + AG Grid front end on a clean-architecture .NET / PostgreSQL back end for browsing and managing delivery orders.
Outcome
Replaced manual rebar-DO entry with an automated, multi-source pipeline — supplier syncs and scanned / WhatsApp uploads land as clean, reconciled records in a single warehouse.
Technologies
Data & orchestration: Apache Airflow (Python), AWS Lambda, EventBridge, SQS / SNS · AI: PDF → structured extraction with vision LLMs · Backend: .NET, EF Core, PostgreSQL · Frontend: React, TypeScript, AG Grid