Skip to content

New-Homie/new-homie

Repository files navigation

Analytics platform for finding a place to live in Australia.

Turning messy property data into answers to questions like:

  • Should I buy or rent?
  • Which property best fits my constraints?
  • How do options compare under different buying or renting strategies?

Features

  • Fast interactive dashboards for exploring property data.
  • Cross-filter and highlight listings across multiple property attributes.
  • Compare buy and rent options in one place.
  • Save and share dashboard configurations with a link.
  • Customize dashboards and SQL for deeper analysis.

Note: running in production since October 2025.

1775803312038943.mp4

Design

Architecture

This is the core architecture of the app:

Data sources
  AusPost / Domain / ACARA
          │
          ▼
Scrape pipeline
  preprocess → batch workers → postprocess
          │
          ├─ retries / validation / quality filters
          ├─ logs / metrics / traces
          ▼
Primary storage
  Supabase Postgres + PostGIS
          │
          ▼
API delivery
  Hono + ORPC + OpenAPI
          │
          ├─ CloudFront caching
          ▼
Frontend
  React + TanStack + DuckDB WASM + Arrow
          │
          ▼
Interactive dashboards
  filters / cross-highlighting / shareable URL state

1. Data pipeline

Important characteristics:

  1. Data quality:
    • Quality data leads to quality insights.
    • Reduces data poisoning for upstream services.
  2. Observability:
    • Ambiguous data and errors are recorded.
    • CPU and RAM usage are tracked to improve cost efficiency.
  3. Regional resilience - Multi AZ 99.9% SLA:
    • Scrape timing is flexible, so a lower end-to-end SLA can be tolerated.
    • Recovery is handled with resilient pipelines using Step Functions and AWS Batch on Fargate - rated at 99.99% SLA.
    • Small blast radius through separate workers: preprocessor, 15 minute batch scraper, postprocessor.
    • Weakest link is database insertion with lower SLA - Supabase at 99.9% SLA.

These data sources are filtered for quality control:

  • AusPost - for locality data:
    • Some localities don't exist and need to be detected in production.
  • Domain - for sale, rent and locality data:
    • Missing data and inconsistent formats for price.
    • Data integrity issues such as duplicate addresses, autogenerated prices, and changed addresses.
    • Junk data cleanup, such as car parks and garages listed for sale or rent
    • Pagination and termination handling.
    • Retries on scrape failures.
  • Acara - for school data:
    • Mostly clean data, with filtering applied for relevance.

Limitations:

  • No anomaly detection yet.
  • Some valid data still missed.

2. Data format and local-first database

Important characteristics:

  1. Accessibility:
    • query with SQL.
  2. Prefer OLAP performance to OLTP - as complex queries are required:
  3. Supports spatial queries:
    • Zero copy data - so data does not have to move.

Uses Arrow IPC to insert data into DuckDB with zero-copy transfer where possible. This improves ingestion speed. The main downside is the initial WASM size, which increases CDN cost and startup latency, so precaching is used to improve subsequent starts.

3. Data delivery

Important characteristics:

  1. Cacheability:
    • Reduce backend load as much as possible for cost and latency reasons.
    • Improves availability and scalability.
    • Use CloudFront caching at 99.9% SLA.
  2. Type safe - Generate OpenAPI:

4. Local-first dashboard as code

  1. Accessible:
    • Colorblind friendly.
  2. Interactive:
    • Filtering and cross-highlighting UX.
    • Prefer interactions like hover and drag to clicks and forms.
  3. Shareable:
    • Dashboards are configured as code through URL search params.

SQL queries are executed against the local database and cached.

Developer notes

Tech stack - overview

  • Monorepo:
    • Nx - to manage local development with caching.
    • Pnpm - to manage monorepo scripts.
  • Observability:
    • Grafana LGTM - for flexibility and ability to test locally.
  • Infra:
    • AWS + Supabase Postgres.
    • IaC managed by SST (built on top of Pulumi) - chosen for fast serverless iteration.
  • API:
    • Hono, ORPC for OpenAPI generation.
  • Data:
    • Postgres + PostGIS - for spatial data and local testing.
    • DuckDB WASM + Apache Arrow for local-first database.
  • Frontend:
    • React - for ecosystem support.
    • TanStack Query - for cache control.
    • TanStack Router - for search param and link type safety.

Monorepo structure

This monorepo is arranged in the following format:

  • .github - CICD and repo management.
    • actions - Composite actions to be used in workflows.
    • workflows - Define jobs for CICD.
  • apps - architectural quantums
    • observability - Grafana LGTM dashboard management and OpenTelemetry library.
    • service-scrape - House data webscrape and access.
    • service-auth - Authentication lambda authoriser.
    • service-user - User management.
    • web - Web app.
  • infra - Manage infrastructure on AWS using CDK.

Deployment

CI checks ensure the infrastructure is deployable and the code meets standards. Preview branches are used to review changes live in public.

Scripts

Install using pnpm only - git hooks should auto configure:

pnpm i

Watch all build and tests as you develop:

pnpm watch

Spin up docker for development and generation scripts:

pnpm docker:up

Or spin down docker:

pnpm docker:down

Detect stale code:

pnpm knip

Upgrade all dependencies:

pnpm bump

Visualise local package dependency:

pnpm graph

Feedback

If you have experience with:

  • data pipelines
  • observability
  • analytical frontends

I’d love to hear your thoughts or suggestions.

About

WIP: Analyse Australian house prices to find good bargains

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages