Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
4711717
Add AGENTS.md with engine test and DB commands
lewispb May 13, 2026
fbd3f94
Scaffold public status pages behind a feature flag
lewispb May 13, 2026
dcdabef
Group probes into user-facing services
lewispb May 13, 2026
96cd87f
Roll daily uptime from Prometheus into ProbeRollup and ServiceRollup
lewispb May 13, 2026
31af4b0
Tighten Status concern style
lewispb May 13, 2026
a7e78e2
Drop ServiceRollup; compute service uptime on the fly
lewispb May 13, 2026
e7989cf
Seed probe_service-tagged uptime and roll it up
lewispb May 13, 2026
cf1177c
Render a public status page in the dummy app
lewispb May 13, 2026
ed1c0f3
Style the public status page
lewispb May 13, 2026
3e74869
Widen seed uptime distributions to exercise every status tier
lewispb May 13, 2026
4010e7c
Don't paint no-data bars as operational
lewispb May 13, 2026
096f997
Align seed-prometheus and recording rules with the 90-day window
lewispb May 15, 2026
da828e2
Stop rolling up today; query Prometheus in UTC
lewispb May 15, 2026
c750507
Drive the public status page from a unified daily collection
lewispb May 15, 2026
7f37f51
Wrap conditionals instead of early-returning
lewispb May 15, 2026
383c150
Rename Status#show to Services#index
lewispb May 15, 2026
7fcadec
Lift Status out of the Rollups namespace
lewispb May 15, 2026
f806987
Simplify aggregate_day with find_or_create_by and a callback
lewispb May 15, 2026
d64f6ff
Rename ProbeRollup.aggregate_day to rollup_day
lewispb May 15, 2026
e35d1ed
Cache the status page and serve an RSS feed of degraded services
lewispb May 15, 2026
2fa954b
Address review: fix live-status rule, RSS feed link and guid
lewispb Jun 2, 2026
b5e7c3b
Centralize the Prometheus client and URL on Upright
lewispb Jun 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Upright engine

This is the `upright` Rails engine. All development, testing, and database
management run from this directory (the engine root). Never `cd` into
`test/dummy/`.

## Running tests

```bash
bin/rails test # full suite
bin/rails test test/models/upright/service_test.rb
bin/rails test test/models/upright/service_test.rb:10
```

## Database management

The test/dev database lives in `test/dummy/storage/`. The schema is
checked in at `test/dummy/db/schema.rb`. Always operate via the engine
root — don't prefix with `app:`.

```bash
RAILS_ENV=test bin/rails db:drop db:create db:schema:load # reset test DB
RAILS_ENV=test bin/rails db:schema:load # apply current schema
```

### Adding a new engine migration

Engine migrations live in `db/migrate/`. The `upright.migrations`
initializer deliberately skips the dummy app, so `bin/rails db:migrate`
from the engine root won't apply engine migrations to the test DB. To
apply a new migration and refresh `schema.rb`:

```bash
RAILS_ENV=test bin/rails app:db:migrate # one-time, after adding a migration
```

After this, `schema.rb` is updated and subsequent runs of
`db:schema:load` will pick up the new tables. Commit `schema.rb`
alongside the migration file.
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
@AGENTS.md
166 changes: 166 additions & 0 deletions app/assets/stylesheets/upright/public_status.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
@layer components {
:root {
--status-operational: oklch(70% 0.18 145);
--status-degraded: oklch(78% 0.18 95);
--status-partial: oklch(70% 0.18 50);
--status-major: oklch(60% 0.21 28);
--status-none: oklch(var(--lch-ink-lightest));
}

.public {
font-family: var(--font-sans);
}

.status-page {
margin: 0 auto;
max-width: 56rem;
padding: calc(var(--block-space) * 3) var(--main-padding);
}

.status-page__title {
font-size: var(--text-x-large);
font-weight: 600;
letter-spacing: -0.02em;
margin: 0 0 calc(var(--block-space) * 1.5);
}

.status-banner {
align-items: center;
border-radius: 0.5rem;
color: var(--color-white);
display: flex;
font-size: var(--text-large);
font-weight: 600;
gap: 0.75rem;
padding: 1rem 1.25rem;
margin-bottom: calc(var(--block-space) * 3);
}

.status-banner--operational { background: var(--status-operational); }
.status-banner--degraded { background: var(--status-degraded); color: oklch(20% 0 0); }
.status-banner--partial_outage { background: var(--status-partial); }
.status-banner--major_outage { background: var(--status-major); }

.status-banner__dot {
background: currentColor;
border-radius: 50%;
height: 0.6rem;
opacity: 0.8;
width: 0.6rem;
}

.status-degraded {
border: var(--border);
border-radius: 0.5rem;
list-style: none;
margin: calc(var(--block-space) * -2) 0 calc(var(--block-space) * 3);
padding: 0.25rem 0;
}

.status-degraded__item {
align-items: center;
display: flex;
gap: 0.75rem;
padding: 0.5rem 1rem;
}

.status-degraded__item + .status-degraded__item {
border-top: var(--border);
}

.status-degraded__dot {
border-radius: 50%;
flex-shrink: 0;
height: 0.6rem;
width: 0.6rem;
}

.status-degraded__dot--degraded { background: var(--status-degraded); }
.status-degraded__dot--partial_outage { background: var(--status-partial); }
.status-degraded__dot--major_outage { background: var(--status-major); }

.status-degraded__name { font-weight: 600; }

.status-degraded__detail {
color: var(--color-ink-medium);
margin-left: auto;
}

.service {
border-top: var(--border);
padding: calc(var(--block-space) * 1.25) 0;
}

.service__row {
align-items: center;
display: flex;
gap: 1rem;
justify-content: space-between;
margin-bottom: 0.75rem;
}

.service__name {
font-size: var(--text-large);
font-weight: 500;
}

.service__status {
color: var(--color-ink-medium);
font-size: var(--text-small);
font-weight: 500;
text-transform: uppercase;
letter-spacing: 0.05em;
}

.service__status--operational { color: var(--status-operational); }
.service__status--degraded { color: var(--status-degraded); }
.service__status--partial_outage { color: var(--status-partial); }
.service__status--major_outage { color: var(--status-major); }

.service__description {
color: var(--color-ink-medium);
font-size: var(--text-small);
margin: 0 0 0.75rem;
}

.uptime {
display: flex;
flex-direction: column;
gap: 0.4rem;
}

.uptime__bars {
display: flex;
gap: 2px;
height: 2rem;
}

.uptime__bar {
background: var(--status-none);
border-radius: 1px;
flex: 1 1 0;
min-width: 2px;
transition: filter 100ms ease-out;
}

.uptime__bar:hover {
filter: brightness(0.9);
}

.uptime__bar--operational { background: var(--status-operational); }
.uptime__bar--degraded { background: var(--status-degraded); }
.uptime__bar--partial_outage { background: var(--status-partial); }
.uptime__bar--major_outage { background: var(--status-major); }

.uptime__meta {
color: var(--color-ink-medium);
display: flex;
font-size: var(--text-x-small);
justify-content: space-between;
}

.uptime__meta-percent {
color: var(--color-ink-dark);
font-weight: 500;
}
}
6 changes: 1 addition & 5 deletions app/controllers/upright/prometheus_proxy_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -45,15 +45,11 @@ def proxy_to_prometheus(path, method: request.method, body: nil)
end

def prometheus_connection
@prometheus_connection ||= Faraday.new(url: prometheus_url) do |f|
@prometheus_connection ||= Faraday.new(url: Upright.configuration.prometheus_url) do |f|
f.options.timeout = 30
end
end

def prometheus_url
ENV.fetch("PROMETHEUS_URL", "http://localhost:9090")
end

def authenticate_otlp_token
authenticate_or_request_with_http_token do |token|
ActiveSupport::SecurityUtils.secure_compare(token, ENV.fetch("PROMETHEUS_OTLP_TOKEN", ""))
Expand Down
11 changes: 11 additions & 0 deletions app/controllers/upright/public/base_controller.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
class Upright::Public::BaseController < ActionController::Base
layout "upright/public"

helper :all
protect_from_forgery with: :exception

private
def default_url_options
Rails.application.routes.default_url_options
end
end
6 changes: 6 additions & 0 deletions app/controllers/upright/public/services_controller.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
class Upright::Public::ServicesController < Upright::Public::BaseController
def index
@services = Upright::Service.public_facing
expires_in 15.seconds, public: true
end
end
37 changes: 37 additions & 0 deletions app/helpers/upright/public/services_helper.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
module Upright::Public::ServicesHelper
OVERALL_STATUS_LABELS = {
operational: "All Systems Operational",
degraded: "Some Systems Degraded",
partial_outage: "Partial Outage",
major_outage: "Major Outage"
}

def overall_status_label(status)
OVERALL_STATUS_LABELS.fetch(status)
end

def status_label(status)
status.to_s.humanize
end

def outage_duration_phrase(started_at:)
if started_at
"for #{distance_of_time_in_words(started_at, Time.current)}"
else
"for 24 hours+"
end
end

# Stable per-outage id so feed readers treat one ongoing outage as a single
# item. Falls back to the service code alone when the outage predates the live
# lookback window and has no known start time.
def feed_item_guid(issue)
[ issue[:service].code, issue[:started_at]&.to_i ].compact.join("-")
end

def average_uptime_percentage(fractions)
if fractions.present?
(fractions.sum.to_f / fractions.size) * 100
end
end
end
12 changes: 12 additions & 0 deletions app/jobs/upright/rollups/daily_aggregation_job.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
class Upright::Rollups::DailyAggregationJob < Upright::ApplicationJob
queue_as :default

# Aggregates daily rollups for completed days only — today is still in progress
# and is represented live by Service#live_status, so persisting a half-day
# rollup would just produce a stale value the rest of the day.
def perform(past: 1.day)
(past.ago.to_date..Date.yesterday).each do |day|
Upright::Rollups::ProbeRollup.rollup_day(day)
end
end
end
6 changes: 5 additions & 1 deletion app/models/concerns/upright/probeable.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,16 @@ module Upright::Probeable

ALERT_SEVERITIES = %i[ medium high critical ]

mattr_accessor :probe_classes, default: []

included do
attr_writer :logger

def logger
@logger || Rails.logger
end

Upright::Probeable.probe_classes |= [ self ]
end

class_methods do
Expand Down Expand Up @@ -61,7 +65,7 @@ def on_check_recorded(probe_result)
end

def probe_service
nil
try(:service)
end

def probe_alert_severity
Expand Down
43 changes: 43 additions & 0 deletions app/models/concerns/upright/services/live_status.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
module Upright::Services::LiveStatus
extend ActiveSupport::Concern

OUTAGE_LOOKBACK = 24.hours

def live_status
Upright::Status.for(live_up_fraction)
end

# Earliest moment of the current outage, or nil if the service is currently
# clear OR the outage predates OUTAGE_LOOKBACK. Callers should treat nil on a
# non-operational service as "longer than the live window."
def current_outage_started_at(now: Time.current)
history = live_down_history(now: now)
last_clear = history.rindex { |_ts, value| value.to_f == 0 }

if last_clear && last_clear < history.length - 1
Time.zone.at(history[last_clear + 1].first.to_f)
end
end

private
def live_up_fraction
1 - live_down_fraction
end

def live_down_fraction
response = Upright.prometheus_client.query(
query: "max(upright:probe_down_fraction{probe_service=\"#{code}\"}) or vector(0)"
).deep_symbolize_keys
response.dig(:result, 0, :value, 1).to_f
end

def live_down_history(now:)
response = Upright.prometheus_client.query_range(
query: "max(upright:probe_down_fraction{probe_service=\"#{code}\"}) or vector(0)",
start: (now - OUTAGE_LOOKBACK).iso8601,
end: now.iso8601,
step: "300s"
).deep_symbolize_keys
response.dig(:result, 0, :values) || []
end
end
9 changes: 1 addition & 8 deletions app/models/upright/probes/status.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
class Upright::Probes::Status
class << self
def for_type(probe_type)
results = prometheus_client.query_range(
results = Upright.prometheus_client.query_range(
query: query(probe_type),
start: 30.minutes.ago.iso8601,
end: Time.current.iso8601,
Expand All @@ -22,13 +22,6 @@ def label_selector(probe_type)
"{#{matchers.join(",")}}"
end

def prometheus_client
Prometheus::ApiClient.client(
url: ENV.fetch("PROMETHEUS_URL", "http://localhost:9090"),
options: { timeout: 30.seconds }
)
end

def build_probes(results)
# Group results by probe identity (name + type + probe_target)
grouped = results.group_by { |r| [ r[:metric][:name], r[:metric][:type], r[:metric][:probe_target] ] }
Expand Down
Loading