
A distributed geospatial data platform built for a Fortune 15 energy company, enabling market researchers to identify RNG station expansion opportunities across 41,000+ ZIP codes in seconds.
During Spring 2025, our CodeLab UC Davis team partnered with a Fortune 15 renewable energy company to build a distributed geospatial data platform from scratch. The client's market research team needed a way to identify strategic locations for expanding their renewable natural gas (RNG) station network — a process that previously required hours of manual research across fragmented tools and costly SaaS subscriptions.
Analyzing 41,000+ ZIP codes across the continental U.S. for business fleet concentrations was prohibitively slow by hand. Existing SaaS tools were expensive and returned incomplete or inaccurate data, leaving the research team with an unreliable, time-consuming workflow.
A three-tier distributed system: a Next.js frontend for intuitive job management, a FastAPI and PostgreSQL backend, and a RabbitMQ-powered fleet of Puppeteer scrapers with anti-bot evasion. Built cloud-ready for Azure Kubernetes Service to scale with demand.
Research turnaround dropped from hours to seconds. Clean, analysis-ready data is delivered directly to Excel on download, with real-time job status monitoring and no ongoing SaaS costs.

The main application interface, featuring fine-grained geographic targeting controls and real-time job status monitoring.

Data preview with Excel-style formatting and customizable export pipeline.

System architecture: distributed scraper fleet coordinated via RabbitMQ job queues.
Designed a three-tier distributed architecture balancing scraping throughput with the client's existing Azure infrastructure. Chose RabbitMQ for job queuing to allow the scraper fleet to scale horizontally on AKS without introducing complex state management, and FastAPI for a lightweight, async-first backend.
Built the Next.js frontend with TanStack Table for structured, sortable data display and animated download progress indicators. Implemented the Puppeteer scraper fleet with anti-bot evasion techniques to reliably collect data across diverse web sources, with intelligent retry and error handling baked into the job pipeline.
Built an automated data validation and cleaning pipeline to ensure output accuracy before export. Containerized all services with Docker and prepared the full deployment configuration for Azure Kubernetes Service, enabling the client to scale the scraper fleet on demand.
RabbitMQ-coordinated Puppeteer scrapers with anti-bot evasion, capable of covering 41,000+ ZIP codes across the continental U.S. and scaling horizontally on Azure Kubernetes Service.
Researchers can scope searches from country level down to individual city, with company-specific or category-wide queries — giving precise control over what data is collected and where.
Live job status updates, animated download progress indicators, and a full history of past jobs — giving researchers full visibility into every scraping task from submission to export.
Validated, clean data delivered as analysis-ready Excel files on download, with customizable filenames and optional email notifications — eliminating manual cleanup entirely.
We extend our sincere thanks to the Chevron team for their collaboration and support throughout this project.