Skip to content

v1.27.3-rc1

Pre-release
Pre-release

Choose a tag to compare

@LexLuthr LexLuthr released this 19 Feb 12:13
· 107 commits to main since this release
a3473a4

Curio v1.27.3-rc1

curio_banner_27_3_rc1

✨ Overview

Curio v1.27.3-rc1 is a substantial update focused on performance, stability, and operational polish for our Curio Storage Providers. This release brings major improvements to the HarmonyTask scheduler - including batch task acceptance, smarter GPU distribution, and critical resource accounting fixes - making large-scale sealing operations significantly more efficient and reliable.

Operators benefit from a heavily upgraded WebUI with a complete sector page overhaul, new storage path management, and enhanced chain connectivity monitoring. The PDP subsystem gains a new sync task for on-chain state tracking, and Fast Snap encoding is now mainlined for faster SnapDeal processing. Across the board, this release hardens error handling, plugs goroutine and resource leaks, and delivers dozens of targeted bug fixes discovered through improved testing and real-world operator feedback.

⚠️ Build Dependency Changes (Linux)

Curio now compiles supraseal (for SnapDeals fast TreeR and batch sealing) by default on Linux.
This adds new build requirements:

Component Requirement
CUDA Toolkit 12.x or later (nvcc must be in PATH)
GCC 12 or 13 — must match your CUDA version
Python venv tooling + build tools (autoconf, automake, libtool, nasm, xxd)

Quick Dependency Install (Ubuntu/Debian)

sudo apt install -y mesa-opencl-icd ocl-icd-opencl-dev git jq pkg-config curl clang build-essential hwloc libhwloc-dev wget python3 python3-dev python3-pip python3-venv autoconf automake libtool libgmp-dev libconfig++-dev nasm xxd && sudo apt upgrade -y

Installing GCC 12 and 13 (Ubuntu/Debian)

Install versions:

For CUDA 12.5 and below


sudo apt install -y gcc-12 g++-12

or

For CUDA 12.6/13 and above


sudo apt install -y gcc-13 g++-13

Optional: Configure update-alternatives:

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 120
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-13 130

sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-12 120
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-13 130

Select version:

sudo update-alternatives --config gcc
sudo update-alternatives --config g++

CUDA / GCC Compatibility

  • CUDA 12.0–12.5 → use gcc-12 / g++-12 only
  • CUDA 12.6+ → either GCC 12 or 13 works
  • CUDA 13+ → use gcc-13 / g++-13

Non-GPU / Non-CUDA Servers

If you don't have CUDA or want to skip supraseal:

# Option 1: Skip supraseal entirely
make build DISABLE_SUPRASEAL=1

# Option 2: Use OpenCL instead of CUDA (AMD GPUs or no NVIDIA GPU)
make build FFI_USE_OPENCL=1

The build will fail if nvcc is not found and no override flag is set.
This prevents accidental builds without proper GPU support.

⭐ Highlights

🚀 Batch Task Acceptance & Scheduler Improvements

HarmonyTask now accepts multiple tasks at once, dramatically improving throughput on machines with available capacity. Combined with reduced storage scheduling backoff times, smarter GPU round-robin distribution, and fixes to the shouldCommit flag, the scheduler is now faster, more responsive, and better at utilizing available hardware.

🖥️ WebUI Sector Page Overhaul & Storage Paths

The Sectors interface has been completely redesigned with richer deadline tooltips (including timing and PoSt submission warnings), proper Storage Path pages for managing and inspecting storage, and new RPC endpoints for storage path management. Chain Connectivity now includes Lotus network summary signals for better at-a-glance cluster health.

⚡ Fast Snap Encoding (Mainlined)

Fast Snap logic has been mainlined into Curio, enabling faster SnapDeal processing with optimized encoding paths. This is a significant performance win for operators doing snap upgrades at scale.

🛡️ Critical Resource Accounting Fixes

Several high-impact bugs were fixed in the task engine: a releaseStorage() function that never actually released anything (causing fake resource exhaustion), a goroutine leak in CommP calculation, a panic in the GPU device provisioner under overload, and corrected PoRep RAM allocation (50GB → 96GB) to prevent OOM crashes.

⚙️ Configuration & Scheduling

  • Batch task acceptance in HarmonyTask — machines now claim as many tasks as they have capacity for in a single call (#854)
  • Use correct shouldCommit value so idle machines pick up work immediately instead of waiting 3 seconds (#886)
  • Reduced storage scheduler backoff from 1 hour to a more reasonable interval, improving throughput on storage-bound nodes (#966)
  • Better GPU overprovisioning distribution — work is assigned to the least-loaded GPU instead of filling cards sequentially (#936)
  • Fixed dynamic config reads from unreliable database connections (#894)
  • Fixed precommit batch timeout due to inconsistent timezone handling in SQL (#941)
  • Stop requiring maxFee in wallets for precommit batching (#941)

🧱 Sealing, Proofing & Pipeline

  • Mainlined Fast Snap encoding for faster SnapDeal processing (#804)
  • Corrected PoRep RAM requirement from 50GB to 96GB to prevent OOM under real workloads (#932)
  • Fixed goroutine leak in CommP calculation that could accumulate over time (#906)
  • Fixed panic in go-fil-commp-hashhash (#928)
  • GPU device provisioner now waits instead of panicking under rare over-request scenarios (#972)
  • FFISelect panic fix when round-robin runs out of slots (#962)
  • Removed optimal (no FVM) FFI build due to GPU detection issues (#876)
  • Fixed Linux build breakage from Fast Snap PR (#882)
  • Service file now sets LD_LIBRARY_PATH correctly, and nvcc missing properly errors when GPU proving is needed (#935)
  • WindowPoSt exits cleanly for empty deadlines instead of erroring (#982)
  • Fix sector extension manager checks: can't extend faulted sectors, max lifetime ≠ max extension (#969)

🗂️ WebUI Enhancements

  • Complete sector page overhaul with richer deadline tooltips, PoSt submission warnings, and new Storage Path pages (#893)
  • Added storage path RPC endpoints: StoragePathList() and StoragePathDetail() (#646)
  • Extended Chain Connectivity panel with Lotus network summary signals (#977)
  • Fixed /info endpoint to return actual build version instead of hardcoded "Curio/0.0.0" (#916)
  • Fixed Susbystems typo in auto-generated web config layer that silently broke WebGui enablement (#991)

🔗 PDP & Market

  • New PDP sync task for on-chain state tracking and automatic cleanup (#660)
  • Fixed PDP pool handling — items correctly returned to the pool on error paths (#827)
  • Boost migration fix utility for raw size data (#832, #856)
  • Fixed Market miner reading bug introduced by dynamic config (#861)

🗃️ Database & Core Systems

  • HarmonyDB is now an official CurioStorage package (#853)
  • HarmonyTaskList fixes including critical releaseStorage() bug that never released resources (#891)
  • Fixed HarmonyDB error return in backoff retry logic (#890)
  • Fixed BTFP (transaction blocker) bug (#913)
  • Fixed process_piece_deal duplicate SQL functions causing indexing errors (#925)
  • Error sanitization — internal details no longer leaked to clients, passwords removed from connection strings (#919)
  • Improved canAccept() performance (#855)
  • Shutdown poller design improvements and task name cleanup (#963)
  • Storage redeclare no longer fetches full sector list for every path — critical fix for large clusters with millions of files (#896)
  • Durability fixes: data race in OnChange notifier, nil-error panic, rollback error handling (#917)

🔧 Alerting Fixes

  • Fixed alert manager iterating wrong map for machine failure alerts (#988)
  • Fixed inverted storage space condition in permanentStorageCheck — sectors were being misreported as placed/unplaced (#989)
  • Wallet balance alert now uses the configured MinimumWalletBalance instead of hardcoded "below 5 FIL" (#990)

🔬 Testing & CI

  • Added comprehensive CI test coverage with all test suites checked (#960)
  • Basic code coverage tool integrated into CI — visible in PR checks (#862)
  • Added missed piecereader test for comprehensive coverage of existing logic (#859)
  • Fixed flaky TestWaitList goroutine scheduling race (#999)
  • CI updated to Ubuntu 24.04 with improved dependency management (#839, #836)
  • Faster make gen — from ~5 minutes to under 30 seconds (#938)

📚 Documentation

  • Troubleshooting & operator playbooks based on real-world issues (#959)
  • Refreshed Curio GUI screenshots (#958)
  • YugabyteDB backup/restore documentation and other fixes (#957)
  • Added missing images to docs (#838)
  • Prometheus monitoring documentation improvements (#926)
  • Auto-labeling workflow for PDPv0 issues and PRs (#950)

🐛 Bug Fixes

🧱 Sealing & Resource Management

  • Fixed releaseStorage() never releasing storage, causing fake resource exhaustion (#891)
  • Fixed goroutine leak in CommP calculation (#906)
  • Fixed panic in go-fil-commp-hashhash (#928)
  • Fixed GPU device provisioner panic under over-request (#972)
  • Fixed FFISelect panic when round-robin exhausts slots (#962)
  • Corrected PoRep RAM allocation: 50GB → 96GB (#932)
  • Fixed WindowPoSt error on empty deadlines (#982)
  • Fixed sector extension checks for faults and max lifetime (#969)

🖥️ WebUI & API

  • Fixed /info endpoint returning hardcoded version "Curio/0.0.0" (#916)
  • Fixed Susbystems typo silently breaking WebGui config (#991)
  • Removed superfluous WriteHeader warning in prometheus SD handler (#993)

⚙️ Scheduler & Config

  • Fixed shouldCommit flag causing 3-second task pickup delays (#886)
  • Fixed dynamic config DB read errors (#894)
  • Fixed precommit batch timezone inconsistency (#941)

🗃️ Database & Internals

  • Fixed HarmonyDB backoff error propagation (#890)
  • Fixed BTFP transaction blocker bug (#913)
  • Fixed duplicate process_piece_deal SQL functions (#925)
  • Fixed storage redeclare full-scan performance issue at scale (#896)
  • Error sanitization — stopped leaking internal details to clients (#919)
  • Fixed Spark delete-peer type error (#817)

🔧 Alerting

  • Fixed wrong map iteration in machine failure alerts (#988)
  • Fixed inverted storage check condition (#989)
  • Fixed hardcoded balance alert message (#990)

What's Changed

  • Put back into the pool by @snadrus in #827
  • Add storage path RPC endpoints by @strahe in #646
  • boost migration fix utility by @LexLuthr in #832
  • add missing images to docs by @LexLuthr in #838
  • update CI deps by @LexLuthr in #839
  • CI-fvm & sql reentrant by @snadrus in #836
  • feat: wdpost: Add messages to message_waits by @magik6k in #850
  • Fix Market Miner by @snadrus in #861
  • fix raw size by @LexLuthr in #856
  • Mainlining of Fast Snap logic by @snadrus in #804
  • Add PDP sync task by @LexLuthr in #660
  • Remove optimal (no FVM) FFI build by @snadrus in #876
  • FastSnap broke Linux builds by @snadrus in #882
  • Accept Task List by @snadrus in #854
  • harmonydb: fix err return in backoff by @magik6k in #890
  • Use correct shouldCommit value to considerWork immediately when bored by @ZenGround0 in #886
  • fix/dynamic-db-err by @snadrus in #894
  • HarmonyTaskList fixes by @snadrus in #891
  • Basic coverage tool by @snadrus in #862
  • fix: resolve goroutine leak in CommP calculation(main) by @beck-8 in #906
  • chore: make function comment match function name by @tinyfoolish in #800
  • chore: fix spark delete-peer type error by @beck-8 in #817
  • chore(deps): bump github.com/ethereum/go-ethereum from 1.15.0 to 1.16.8 by @dependabot[bot] in #878
  • fix: use build version in /info endpoint by @parkan in #916
  • fix: fix panic in go-fil-commp-hashhash (main) by @beck-8 in #928
  • service needs library path & nvcc missing should err when needed by @snadrus in #935
  • Better overprovisioning distribution by @snadrus in #936
  • Porep undersized by @snadrus in #932
  • Prometheus stuff by @snadrus in #926
  • Precommit batch delays, Stop demanding maxFee in wallets by @snadrus in #941
  • harmonydb is now an official CurioStorage package by @snadrus in #853
  • fix process_piece_deal multiples by @snadrus in #925
  • Error sanitize by @snadrus in #919
  • Curio documentation fixes by @Reiers in #957
  • Add auto-labeling workflow for PDPv0 issues and PRs by @Copilot in #950
  • check all tests in CI by @LexLuthr in #960
  • FFISelect Panic by @snadrus in #962
  • harmonytask: reduce the waay too high storage sched backoff by @magik6k in #966
  • ark-fixes by @snadrus in #917
  • improve canAccept() performance by @LexLuthr in #855
  • Btfp bug fix by @snadrus in #913
  • always fast make-gen by @snadrus in #938
  • Shutdown poller design improvements by @snadrus in #963
  • chore(deps): bump github.com/pion/dtls/v3 from 3.0.7 to 3.1.0 by @dependabot[bot] in #968
  • chore(deps): bump github.com/consensys/gnark-crypto from 0.19.0 to 0.19.1 by @dependabot[bot] in #766
  • Docs – Troubleshooting & Operator Playbooks by @Reiers in #959
  • docs(curio-gui): refresh UI screenshots by @Reiers in #958
  • storage: Don't get full sector list in redeclare for every path by @magik6k in #896
  • feat: webui: Sector page overhaul, SP day-2 ops goodies by @magik6k in #893
  • tests: missed piecereader test by @magik6k in #859
  • fix: expmgr: Fix some checks by @magik6k in #969
  • wdpost exit without error for empty deadline by @LexLuthr in #982
  • webui: extend Chain Connectivity with Lotus net summary by @Reiers in #977
  • Unerring device provisioner by @snadrus in #972
  • chore(deps): bump github.com/pion/dtls/v3 from 3.1.0 to 3.1.1 by @dependabot[bot] in #984
  • fix(alertmanager): iterate mmap instead of tmap for machine failure alerts by @Reiers in #988
  • fix(alertmanager): use configured MinimumWalletBalance in alert message by @Reiers in #990
  • fix(web): correct Susbystems typo in auto-generated config layer by @Reiers in #991
  • fix(rpc): remove superfluous WriteHeader in prometheus service discovery handler by @Reiers in #993
  • fix(alertmanager): correct inverted condition in permanentStorageCheck by @Reiers in #989
  • fix: flaky TestWaitList by @LexLuthr in #999
  • release v1.27.3-rc1 by @LexLuthr in #1000
  • feat(webui): config layer change history with diff view by @Reiers in #994
  • fix(harmonytask): add RAM zero-guard and prevent uint64 underflow in resource accounting by @Reiers in #992
  • FFI no -1 for GPU by @snadrus in #1001
  • no more optimal (no fvm) build by @snadrus in #1004
  • fix(ffi): downgrade reflink fallback log from ERROR to WARN by @Reiers in #1005
  • chore(deps): update IPLD & IPFS dependencies by @rvagg in #1008
  • fix(build): use prebuilt FFI on macOS, remove forced clean on Linux by @Reiers in #1012
  • docs: update snark market build instructions and batch sealing deal guidance by @Reiers in #1011
  • feat(src): copy lotus' fiximports parallelism by @rvagg in #1010
  • refactor Makefile by @LexLuthr in #1014
  • fix(build): accept GCC 12 for CUDA 12 compatibility by @Reiers in #1019
  • docs(supraseal): fix typos in batch sealing page by @Reiers in #1021
  • fix: small correctness and quality improvements by @Reiers in #1020
  • speed up DB itests by @LexLuthr in #1022

New Contributors

Full Changelog: v1.27.2...v1.27.3-rc1