This repository is the deliverable described in specification.pdf: an EasyStudy-based study framework extended with an SAE Steering plugin, the surrounding admin/participant UI, and a reproducible audit/export pipeline suitable for research studies.
- What this repository adds to upstream EasyStudy
Core deliverables and major additions. - Evaluation alignment with
specification.pdf
How the contribution aligns with the proposal's evaluation section. - Risks from
specification.pdf
How the contribution addresses the proposal's risks.
- A new study plugin:
server/plugins/steering/implements the SAE Steering study flow as a first-class EasyStudy plugin (create → initialize → join → dispose → results). - A typed audit pipeline: steering studies write domain-specific typed tables (
Sae*) plus a thinSaeSteeringEventenvelope for timeline ordering. This enables stable analytics and exports without JSON parsing at read time. - Multiple steering modalities (FR-06): sliders, toggles, example-based steering, text steering (with explicit composition modes), and a dedicated reset endpoint.
- Iteration updates are end-to-end (FR-10): steering actions propagate through the iteration loop and recompute recommendations. Three reranking strategies (
feature-conditioned,latent-perturbation,constrained-subset) are implemented, selectable in the create UI, snapshotted per run, and covered by regression tests. - Researcher dashboard + exports (FR-16/FR-17): a per-approach results dashboard plus a ZIP CSV export (one file per typed table) designed for downstream analysis.
- Operational features beyond the proposal: per-participant journey view, attention-check evaluation declared in questionnaire HTML and persisted at submit time, and robust deployment bootstrapping of runtime assets from GitHub Releases.
- Upstream parity preserved: upstream EasyStudy plugins (
fastcompare,layoutshuffling,vae,empty_template) remain loadable via the canonical contract so researchers see the same plugin matrix as upstream plus the steering plugin. - Modular layout: platform and steering plugin are split for easier testing, deployment, and extension while keeping EasyStudy's plugin contract.
- Evaluation artefacts: a five-question Likert instrument and a filled-in supplementary in-house run with five colleagues across all three steering modalities (toggle, slider, text) on default settings are committed in
evaluation/. The primary 200-participant Prolific study (paper in review) is hosted in the private OfflineEasyStudy repository and shared on request.
The proposal emphasises case-based validation and limited usability validation (5–10 users) rather than a full comparative study. This repository supports that approach:
- Case-based validation: the steering loop, reranking strategies and audit/export contracts are covered by regression tests in
tests/. - Limited usability validation: a short five-question Likert instrument is provided as
evaluation/data/questionnaire.csvso the team can collect quick feedback from a small group of non-expert users. A filled-in supplementary run with five colleagues across all three steering modalities is committed alongside it inevaluation/. The primary evaluation is a separate ≈200-participant study (no-steering vs slider-steering) for a submitted paper; raw data and analysis live in the private offline repository referenced intech-docs.mdSection 3.4. - Questionnaire infrastructure: the system supports per-approach and final questionnaires as drop-in HTML files under
server/static/questionnairs/, with attention-check specs declared inline and evaluated at submit time. - Documentation for reproducibility: the audit schema, algorithms and deployment steps are documented in
docs/(tech-docs, equations, formative recipes) so future researchers can reproduce runs and extend the framework.
- SAE interpretability: the submitted paper’s results indicate the interpretability problem is manageable in practice for the slider-steering setup (participants reported mostly positive experience). The system also supports fallback interaction patterns (example-based steering, feature search).
- Natural-language mapping: treated as an active research question; we continue to refine semantics and composition modes for text steering in follow-on work.
- Steering coherence: conflicts were not observed as a practical issue in the submitted paper; the UI and audit pipeline make it possible to detect and analyse incoherent patterns post-hoc if they occur.
- Performance: the system is operated in controlled participant batches (≈5–50) to manage budget and data quality; this load does not require Redis-backed sessions. The architecture remains swappable if future deployments need it.
- Integration complexity: keeping upstream EasyStudy parity (canonical plugin registry + smoke tests for upstream plugins) reduces integration risk; merging with original EasyStudy is being discussed with the author.