Chevron Geospatial Scraper
Next.jsFastAPIPuppeteerRabbitMQAzure Kubernetes

Chevron Geospatial Scraper

A distributed geospatial data platform built for a Fortune 15 energy company, enabling market researchers to identify RNG station expansion opportunities across 41,000+ ZIP codes in seconds.

Tech Stack

Next.js
TanStack Table
Tailwind CSS
FastAPI
PostgreSQL
RabbitMQ
Puppeteer
Docker
Azure Kubernetes (AKS)
Figma

Project Overview

During Spring 2025, our CodeLab UC Davis team partnered with a Fortune 15 renewable energy company to build a distributed geospatial data platform from scratch. The client's market research team needed a way to identify strategic locations for expanding their renewable natural gas (RNG) station network — a process that previously required hours of manual research across fragmented tools and costly SaaS subscriptions.

The Challenge

Analyzing 41,000+ ZIP codes across the continental U.S. for business fleet concentrations was prohibitively slow by hand. Existing SaaS tools were expensive and returned incomplete or inaccurate data, leaving the research team with an unreliable, time-consuming workflow.

The Solution

A three-tier distributed system: a Next.js frontend for intuitive job management, a FastAPI and PostgreSQL backend, and a RabbitMQ-powered fleet of Puppeteer scrapers with anti-bot evasion. Built cloud-ready for Azure Kubernetes Service to scale with demand.

The Outcome

Research turnaround dropped from hours to seconds. Clean, analysis-ready data is delivered directly to Excel on download, with real-time job status monitoring and no ongoing SaaS costs.

Project Visuals

Chevron Geospatial Scraper application interface

The main application interface, featuring fine-grained geographic targeting controls and real-time job status monitoring.

Data results and export interface

Data preview with Excel-style formatting and customizable export pipeline.

System architecture and data flow

System architecture: distributed scraper fleet coordinated via RabbitMQ job queues.

Development Process

Architecture Design

Designed a three-tier distributed architecture balancing scraping throughput with the client's existing Azure infrastructure. Chose RabbitMQ for job queuing to allow the scraper fleet to scale horizontally on AKS without introducing complex state management, and FastAPI for a lightweight, async-first backend.

Frontend & Scraper Development

Built the Next.js frontend with TanStack Table for structured, sortable data display and animated download progress indicators. Implemented the Puppeteer scraper fleet with anti-bot evasion techniques to reliably collect data across diverse web sources, with intelligent retry and error handling baked into the job pipeline.

Data Validation & Cloud Preparation

Built an automated data validation and cleaning pipeline to ensure output accuracy before export. Containerized all services with Docker and prepared the full deployment configuration for Azure Kubernetes Service, enabling the client to scale the scraper fleet on demand.

Key Features

Distributed Scraper Fleet

RabbitMQ-coordinated Puppeteer scrapers with anti-bot evasion, capable of covering 41,000+ ZIP codes across the continental U.S. and scaling horizontally on Azure Kubernetes Service.

Fine-Grained Geographic Targeting

Researchers can scope searches from country level down to individual city, with company-specific or category-wide queries — giving precise control over what data is collected and where.

Real-Time Job Monitoring

Live job status updates, animated download progress indicators, and a full history of past jobs — giving researchers full visibility into every scraping task from submission to export.

Automated Export Pipeline

Validated, clean data delivered as analysis-ready Excel files on download, with customizable filenames and optional email notifications — eliminating manual cleanup entirely.

Interested in working together?

We extend our sincere thanks to the Chevron team for their collaboration and support throughout this project.