Analytics platform for finding a place to live in Australia.
Turning messy property data into answers to questions like:
- Should I buy or rent?
- Which property best fits my constraints?
- How do options compare under different buying or renting strategies?
- Fast interactive dashboards for exploring property data.
- Cross-filter and highlight listings across multiple property attributes.
- Compare buy and rent options in one place.
- Save and share dashboard configurations with a link.
- Customize dashboards and SQL for deeper analysis.
Note: running in production since October 2025.
1775803312038943.mp4
This is the core architecture of the app:
Data sources
AusPost / Domain / ACARA
│
▼
Scrape pipeline
preprocess → batch workers → postprocess
│
├─ retries / validation / quality filters
├─ logs / metrics / traces
▼
Primary storage
Supabase Postgres + PostGIS
│
▼
API delivery
Hono + ORPC + OpenAPI
│
├─ CloudFront caching
▼
Frontend
React + TanStack + DuckDB WASM + Arrow
│
▼
Interactive dashboards
filters / cross-highlighting / shareable URL state
Important characteristics:
- Data quality:
- Quality data leads to quality insights.
- Reduces data poisoning for upstream services.
- Observability:
- Ambiguous data and errors are recorded.
- CPU and RAM usage are tracked to improve cost efficiency.
- Regional resilience - Multi AZ 99.9% SLA:
- Scrape timing is flexible, so a lower end-to-end SLA can be tolerated.
- Recovery is handled with resilient pipelines using Step Functions and AWS Batch on Fargate - rated at 99.99% SLA.
- Small blast radius through separate workers: preprocessor, 15 minute batch scraper, postprocessor.
- Weakest link is database insertion with lower SLA - Supabase at 99.9% SLA.
These data sources are filtered for quality control:
- AusPost - for locality data:
- Some localities don't exist and need to be detected in production.
- Domain - for sale, rent and locality data:
- Missing data and inconsistent formats for price.
- Data integrity issues such as duplicate addresses, autogenerated prices, and changed addresses.
- Junk data cleanup, such as car parks and garages listed for sale or rent
- Pagination and termination handling.
- Retries on scrape failures.
- Acara - for school data:
- Mostly clean data, with filtering applied for relevance.
Limitations:
- No anomaly detection yet.
- Some valid data still missed.
Important characteristics:
- Accessibility:
- query with SQL.
- Prefer OLAP performance to OLTP - as complex queries are required:
- Column-based databases reduce IOPs for OLAP queries.
- Choose best performer from local-first database benchmark
- Supports spatial queries:
- Zero copy data - so data does not have to move.
Uses Arrow IPC to insert data into DuckDB with zero-copy transfer where possible. This improves ingestion speed. The main downside is the initial WASM size, which increases CDN cost and startup latency, so precaching is used to improve subsequent starts.
Important characteristics:
- Cacheability:
- Reduce backend load as much as possible for cost and latency reasons.
- Improves availability and scalability.
- Use CloudFront caching at 99.9% SLA.
- Type safe - Generate OpenAPI:
- Use Hono and ORPC to generate OpenAPI.
- openapi-ts for client type safety.
- Accessible:
- Colorblind friendly.
- Interactive:
- Filtering and cross-highlighting UX.
- Prefer interactions like hover and drag to clicks and forms.
- Shareable:
- Dashboards are configured as code through URL search params.
SQL queries are executed against the local database and cached.
- Monorepo:
- Nx - to manage local development with caching.
- Pnpm - to manage monorepo scripts.
- Observability:
- Grafana LGTM - for flexibility and ability to test locally.
- Infra:
- AWS + Supabase Postgres.
- IaC managed by SST (built on top of Pulumi) - chosen for fast serverless iteration.
- API:
- Hono, ORPC for OpenAPI generation.
- Data:
- Postgres + PostGIS - for spatial data and local testing.
- DuckDB WASM + Apache Arrow for local-first database.
- Frontend:
- React - for ecosystem support.
- TanStack Query - for cache control.
- TanStack Router - for search param and link type safety.
This monorepo is arranged in the following format:
.github- CICD and repo management.actions- Composite actions to be used in workflows.workflows- Define jobs for CICD.
apps- architectural quantumsobservability- Grafana LGTM dashboard management and OpenTelemetry library.service-scrape- House data webscrape and access.service-auth- Authentication lambda authoriser.service-user- User management.web- Web app.
infra- Manage infrastructure on AWS using CDK.
CI checks ensure the infrastructure is deployable and the code meets standards. Preview branches are used to review changes live in public.
Install using pnpm only - git hooks should auto configure:
pnpm iWatch all build and tests as you develop:
pnpm watchSpin up docker for development and generation scripts:
pnpm docker:upOr spin down docker:
pnpm docker:downDetect stale code:
pnpm knipUpgrade all dependencies:
pnpm bumpVisualise local package dependency:
pnpm graphIf you have experience with:
- data pipelines
- observability
- analytical frontends
I’d love to hear your thoughts or suggestions.