Skip to content

Latest commit

 

History

History
41 lines (31 loc) · 5.62 KB

File metadata and controls

41 lines (31 loc) · 5.62 KB

Contribution Summary

This repository is the deliverable described in specification.pdf: an EasyStudy-based study framework extended with an SAE Steering plugin, the surrounding admin/participant UI, and a reproducible audit/export pipeline suitable for research studies.

Contents

  1. What this repository adds to upstream EasyStudy
    Core deliverables and major additions.
  2. Evaluation alignment with specification.pdf
    How the contribution aligns with the proposal's evaluation section.
  3. Risks from specification.pdf
    How the contribution addresses the proposal's risks.

What this repository adds to upstream EasyStudy

  • A new study plugin: server/plugins/steering/ implements the SAE Steering study flow as a first-class EasyStudy plugin (create → initialize → join → dispose → results).
  • A typed audit pipeline: steering studies write domain-specific typed tables (Sae*) plus a thin SaeSteeringEvent envelope for timeline ordering. This enables stable analytics and exports without JSON parsing at read time.
  • Multiple steering modalities (FR-06): sliders, toggles, example-based steering, text steering (with explicit composition modes), and a dedicated reset endpoint.
  • Iteration updates are end-to-end (FR-10): steering actions propagate through the iteration loop and recompute recommendations. Three reranking strategies (feature-conditioned, latent-perturbation, constrained-subset) are implemented, selectable in the create UI, snapshotted per run, and covered by regression tests.
  • Researcher dashboard + exports (FR-16/FR-17): a per-approach results dashboard plus a ZIP CSV export (one file per typed table) designed for downstream analysis.
  • Operational features beyond the proposal: per-participant journey view, attention-check evaluation declared in questionnaire HTML and persisted at submit time, and robust deployment bootstrapping of runtime assets from GitHub Releases.
  • Upstream parity preserved: upstream EasyStudy plugins (fastcompare, layoutshuffling, vae, empty_template) remain loadable via the canonical contract so researchers see the same plugin matrix as upstream plus the steering plugin.
  • Modular layout: platform and steering plugin are split for easier testing, deployment, and extension while keeping EasyStudy's plugin contract.
  • Evaluation artefacts: a five-question Likert instrument and a filled-in supplementary in-house run with five colleagues across all three steering modalities (toggle, slider, text) on default settings are committed in evaluation/. The primary 200-participant Prolific study (paper in review) is hosted in the private OfflineEasyStudy repository and shared on request.

Evaluation alignment with specification.pdf (Section “Evaluation”)

The proposal emphasises case-based validation and limited usability validation (5–10 users) rather than a full comparative study. This repository supports that approach:

  • Case-based validation: the steering loop, reranking strategies and audit/export contracts are covered by regression tests in tests/.
  • Limited usability validation: a short five-question Likert instrument is provided as evaluation/data/questionnaire.csv so the team can collect quick feedback from a small group of non-expert users. A filled-in supplementary run with five colleagues across all three steering modalities is committed alongside it in evaluation/. The primary evaluation is a separate ≈200-participant study (no-steering vs slider-steering) for a submitted paper; raw data and analysis live in the private offline repository referenced in tech-docs.md Section 3.4.
  • Questionnaire infrastructure: the system supports per-approach and final questionnaires as drop-in HTML files under server/static/questionnairs/, with attention-check specs declared inline and evaluated at submit time.
  • Documentation for reproducibility: the audit schema, algorithms and deployment steps are documented in docs/ (tech-docs, equations, formative recipes) so future researchers can reproduce runs and extend the framework.

Risks from specification.pdf (Section “Risks”)

  • SAE interpretability: the submitted paper’s results indicate the interpretability problem is manageable in practice for the slider-steering setup (participants reported mostly positive experience). The system also supports fallback interaction patterns (example-based steering, feature search).
  • Natural-language mapping: treated as an active research question; we continue to refine semantics and composition modes for text steering in follow-on work.
  • Steering coherence: conflicts were not observed as a practical issue in the submitted paper; the UI and audit pipeline make it possible to detect and analyse incoherent patterns post-hoc if they occur.
  • Performance: the system is operated in controlled participant batches (≈5–50) to manage budget and data quality; this load does not require Redis-backed sessions. The architecture remains swappable if future deployments need it.
  • Integration complexity: keeping upstream EasyStudy parity (canonical plugin registry + smoke tests for upstream plugins) reduces integration risk; merging with original EasyStudy is being discussed with the author.