rail-berkeley · rojas70 · Apr 4, 2025 · Apr 10, 2025 · Apr 10, 2025 · Apr 10, 2025
diff --git a/.gitignore b/.gitignore
@@ -172,3 +172,11 @@ MUJOCO_LOG.TXT
 _METADATA
 checkpoint
 wandb/
+
+# VS Code settings
+*.code-workspace
+
+# checkpoints
+**/checkpoints/
+**/classifier_checkpoints/
+**/classifier_demos/
diff --git a/README.md b/README.md
@@ -1,124 +1,70 @@
-# SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning
+# FractalSerl — Fractal Symmetries for Sample-Efficient Robotic Learning
 
-![](https://github.com/rail-berkeley/serl/workflows/pre-commit/badge.svg)
+[![Discord](https://img.shields.io/discord/1302866684612444190?label=Join%20Us%20on%20Discord&logo=discord&color=7289da)](https://discord.com/invite/bAxjvvJzNM)
+[![Notion](https://img.shields.io/badge/Notion-Workspace-000000?logo=notion&logoColor=white)](https://lipscomb-robotics.notion.site/?source=copy_link)
+[![Paper](https://img.shields.io/badge/Paper-Frontiers-blue?logo=zenodo&logoColor=white)](https://www.frontiersin.org/journals/robotics-and-ai/articles/10.3389/frobt.2026.1791812/abstract)
+[![Instagram](https://img.shields.io/badge/Instagram-Follow-E4405F?logo=instagram&logoColor=white)](https://www.instagram.com/lippyrobotics/)
+[![YouTube](https://img.shields.io/badge/YouTube-Channel-FF0000?logo=youtube&logoColor=white)](https://www.youtube.com/@lippyrobotics)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-[![Static Badge](https://img.shields.io/badge/Project-Page-a)](https://serl-robot.github.io/)
-[![Discord](https://img.shields.io/discord/1302866684612444190?label=Join%20Us%20on%20Discord&logo=discord&color=7289da)](https://discord.gg/G4xPJEhwuC)
 
 
-![](./docs/images/tasks-banner.gif)
+Short description
+-----------------
 
-**Webpage: [https://serl-robot.github.io/](https://serl-robot.github.io/)**
+FractalSERL implements Branched Euclidean Group Fractal Symmetries — a trajectory-level augmentation framework that accelerates policy learning by iteratively applying affine and Euclidean-group transformations to episodic trajectories. Treating an episodic MDP as a tree of state–action pairs, self-similar branching produces fractal symmetry expansions that populate replay buffers with diverse, consistent experiences. We demonstrate improvements on simulated and real Franka manipulation tasks, achieving robust policies in as little as 14 minutes (avg. ~20 minutes) of wall-clock training.
 
+Contributions in this repo include:
+- **SymmGrid Framework**: A preliminary research implementation of fractal symmetry for deep reinforcement learning, demonstrating how branched symmetries accelerate DRL policy learning in physical robots.
+- **Data Augmentation via Super-Scaling**: Efficient robot data generation through trajectory-level augmentation that significantly speeds up policy learning while improving performance and consistency on physical hardware.
+- **Fractal Symmetry Replay Buffer**: An Optimized Datastore and Replay Buffer implementation designed to support parallelized computations and image handling without excessive memory overhead, enabling faster training iterations.
+- **nAUC Performance Metric**: Using normalized Area under the Curve (nAUC) as a trajectory-wide performance metric to capture combined contributions of sample efficiency and policy performance throughout training.
 
-**Also check out our new project HIL-SERL: [https://hil-serl.github.io/](https://hil-serl.github.io/)**
 
+<figure>
+      <img src="docs/images/cable-success-rate-with-robot.png" width="100%">
+</figure>
 
-SERL provides a set of libraries, env wrappers, and examples to train RL policies for robotic manipulation tasks. The following sections describe how to use SERL. We will illustrate the usage with examples.
+<!-- <figure>
+      <img src="docs/images/fetch_rewards_function_grid_size.png" alt="SymmGrid returns as a function of grid size" width="100%">
+</figure>
 
-🎬: [SERL video](https://www.youtube.com/watch?v=Um4CjBmHdcw), [additional video](https://www.youtube.com/watch?v=17NrtKHdPDw) on sample efficient RL.
+<figure>
+      <img src="docs/images/fractal_grid.png" alt="Fractal grid enclosed by the blue blobs" width="100%">
+</figure> -->
 
-**Table of Contents**
-- [SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning](#serl-a-software-suite-for-sample-efficient-robotic-reinforcement-learning)
-  - [Installation](#installation)
-  - [Overview and Code Structure](#overview-and-code-structure)
-  - [Quick Start with SERL in Sim](#quick-start-with-serl-in-sim)
-  - [Run with Franka Arm on Real Robot](#run-with-franka-arm-on-real-robot)
-  - [Contribution](#contribution)
-  - [Citation](#citation)
+Navigation
+----------
 
-## Major updates
-#### June 24, 2024
-For people who use SERL for tasks involving controlling the gripper (e.g.,pick up objects), we strong recommend adding a small penalty to the gripper action change, as it will greatly improves the training speed.
-For detail, please refer to: [PR #65](https://github.com/rail-berkeley/serl/pull/65).
+The `docs/` folder contains additional Markdown files with step-by-step guides. Quick links are provided below:
 
+- [Overview of code structure](docs/overview.md)
+- [Installation guide](docs/installation.md)
+- [Run in simulation](docs/run_sim.md)
+- [Run on the real robot](docs/run_realrobot.md)
 
-Further, we also recommend  providing interventions online during training in addition to loading the offline demos. If you have a Franka robot and SpaceMouse, this can be as easy as just touching the SpaceMouse during training.
 
-#### April 25, 2024
-We fixed a major issue in the intervention action frame. See release [v0.1.1](https://github.com/rail-berkeley/serl/releases/tag/v0.1.1) Please update your code with the main branch.
+Quick start (very short)
+------------------------
 
-## Installation
-1. **Setup Conda Environment:**
-    create an environment with
-    ```bash
-    conda create -n serl python=3.10
-    ```
+1. Install dependencies: see `docs/installation.md`.
+2. Run a demo in sim: see `docs/run_sim.md` for instructions to launch `franka_sim`
+3. For real hardware, follow the instructions in `docs/run_realrobot.md` and configure the files related to `serl_robot_infra/`.
 
-2. **Install Jax as follows:**
-    - For CPU (not recommended):
-        ```bash
-        pip install --upgrade "jax[cpu]"
-        ```
+Citation
+--------
 
-    - For GPU:
-        ```bash
-        pip install --upgrade "jax[cuda12_pip]==0.4.35" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
-        ```
-
-    - For TPU
-        ```bash
-        pip install --upgrade "jax[tpu]" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
-        ```
-    - See the [Jax Github page](https://github.com/google/jax) for more details on installing Jax.
-
-3. **Install the serl_launcher**
-    ```bash
-    cd serl_launcher
-    pip install -e .
-    pip install -r requirements.txt
-    ```
-
-## Overview and Code Structure
-
-SERL provides a set of common libraries for users to train RL policies for robotic manipulation tasks. The main structure of running the RL experiments involves having an actor node and a learner node, both of which interact with the robot gym environment. Both nodes run asynchronously, with data being sent from the actor to the learner node via the network using [agentlace](https://github.com/youliangtan/agentlace). The learner will periodically synchronize the policy with the actor. This design provides flexibility for parallel training and inference.
-
-<p align="center">
-  <img src="./docs/images/software_design.png" width="80%"/>
-</p>
-
-**Table for code structure**
-
-| Code Directory | Description |
-| --- | --- |
-| [serl_launcher](https://github.com/rail-berkeley/serl/blob/main/serl_launcher) | Main code for SERL |
-| [serl_launcher.agents](https://github.com/rail-berkeley/serl/blob/main/serl_launcher/serl_launcher/agents/) | Agent Policies (e.g. DRQ, SAC, BC) |
-| [serl_launcher.wrappers](https://github.com/rail-berkeley/serl/blob/main/serl_launcher/serl_launcher/wrappers) | Gym env wrappers |
-| [serl_launcher.data](https://github.com/rail-berkeley/serl/blob/main/serl_launcher/serl_launcher/data) | Replay buffer and data store |
-| [serl_launcher.vision](https://github.com/rail-berkeley/serl/blob/main/serl_launcher/serl_launcher/vision) | Vision related models and utils |
-| [franka_sim](./franka_sim) | Franka mujoco simulation gym environment |
-| [serl_robot_infra](./serl_robot_infra/) | Robot infra for running with real robots |
-| [serl_robot_infra.robot_servers](https://github.com/rail-berkeley/serl/blob/main/serl_robot_infra/robot_servers/) | Flask server for sending commands to robot via ROS |
-| [serl_robot_infra.franka_env](https://github.com/rail-berkeley/serl/blob/main/serl_robot_infra/franka_env/) | Gym env for real franka robot |
-
-## Quick Start with SERL in Sim
-
-We provide a simulated environment for trying out SERL with a franka robot.
-
-Check out the [Quick Start with SERL in Sim](/docs/sim_quick_start.md)
- - [Training from state observation example](/docs/sim_quick_start.md#1-training-from-state-observation-example)
- - [Training from image observation example](/docs/sim_quick_start.md#2-training-from-image-observation-example)
- - [Training from image observation with 20 demo trajectories example](/docs/sim_quick_start.md#3-training-from-image-observation-with-20-demo-trajectories-example)
-
-## Run with Franka Arm on Real Robot
-
-We provide a step-by-step guide to run RL policies with SERL on the real Franka robot.
-
-Check out the [Run with Franka Arm on Real Robot](/docs/real_franka.md)
- - [Peg Insertion 📍](/docs/real_franka.md#1-peg-insertion-📍)
- - [PCB Component Insertion 🖥️](/docs/real_franka.md#2-pcb-component-insertion-🖥️)
- - [Cable Routing 🔌](/docs/real_franka.md#3-cable-routing-🔌)
- - [Object Relocation 🗑️](/docs/real_franka.md#4-object-relocation-🗑️)
-
-## Contribution
-
-We welcome contributions to this repository! Fork and submit a PR if you have any improvements to the codebase. Before submitting a PR, please run `pre-commit run --all-files` to ensure that the codebase is formatted correctly.
-
-## Citation
-
-If you use this code for your research, please cite our paper:
+If you use FractalSERL in your research, please cite our paper:
 
 ```bibtex
+@misc{vanderstelt2026SymmGrid,
+      title={Towards Accelerating Deep Reinforcement Learning via Branched Symmetries},
+      author={Ryan Vanderstelt, Cleiver Ruiz Martinez, Caeden Rosen, Blake Hull, and Juan Rojas},
+      year={2026},
+      eprint={____},
+      archivePrefix={arXiv},
+      primaryClass={cs.RO}
+}
+
 @misc{luo2024serl,
       title={SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning},
       author={Jianlan Luo and Zheyuan Hu and Charles Xu and You Liang Tan and Jacob Berg and Archit Sharma and Stefan Schaal and Chelsea Finn and Abhishek Gupta and Sergey Levine},
@@ -127,4 +73,4 @@ If you use this code for your research, please cite our paper:
       archivePrefix={arXiv},
       primaryClass={cs.RO}
 }
-```
+```
diff --git a/demos/demos/__init__.py b/demos/demos/__init__.py
diff --git a/demos/demos/demoHandling.py b/demos/demos/demoHandling.py
@@ -0,0 +1,157 @@
+import os
+from pathlib import Path
+import numpy as np
+from agentlace.data.data_store import QueuedDataStore
+
+class DemoHandling:
+    """
+    Koads an .npz file containing demonstration data into a data object.
+    This class is designed to work with Gymnasium-style demonstration data
+    and is intended to be used with a QueuedDataStore or similar data store.
+
+    The .npz file should contain the following arrays:
+      - 'obs'            : shape (N, T+1, *obs_shape*), list of observations
+      - 'acs'            : shape (N, T, *act_shape*),   list of actions
+      - 'rewards'        : shape (N, T),                list of rewards
+      - 'terminateds'    : shape (N, T),                list of terminated flags
+      - 'truncateds'     : shape (N, T),                list of truncated flags
+      - 'info'          : shape (N,  T),                list of info dicts
+      - 'dones'         : shape (N,  T),                list of done flags (if available)
+
+    Parameters
+    ----------
+    demo_dir : str
+        Directory where demo .npz files live by default.
+    file_name : str
+        Name of the demo file to load. If not provided, a default will be used.
+    """
+    def __init__(
+        self,
+        demo_dir: str = '/data/data/serl/demos',
+        file_name: str = 'data_franka_reach_random_20.npz'
+    ):
+
+        self.debug = False  # Set to True for debugging purposes
+        self.demo_dir = demo_dir
+        self.transition_ctr = 0  # Global counter for transitions across all episodes
+
+        # Load the demo data from the .npz file 
+
+        # Check if the demo directory exists
+        if not os.path.exists(self.demo_dir):
+            raise FileNotFoundError(f"Demo directory '{self.demo_dir}' does not exist.")
+
+        # Construct the full path to the demo file
+        self.demo_npz_path = os.path.join(self.demo_dir, file_name)
+        if not os.path.isfile(self.demo_npz_path):
+            raise FileNotFoundError(f"Demo file '{self.demo_npz_path}' does not exist.")
+
+        # Load the .npz file
+        self.data = np.load(self.demo_npz_path, allow_pickle=True)
+
+    def get_num_transitions(self):
+        """
+        Returns the total number of transitions counted in the demo data.
+        """
+        return int(self.data["transition_ctr"]) if "transition_ctr" in self.data else 0
+
+    def get_num_demos(self):
+        """
+        Returns the total number of demonstrations in the demo data.
+        """
+        return int(self.data["num_demos"]) if "num_demos" in self.data else 0
+
+    def insert_data_to_buffer(self,data_store: QueuedDataStore): 
+        """
+        Load a raw Gymnasium-style .npz of expert episodes into data_store.
+        The .npz file must contain arrays named 'obs', 'acs', 'rewards',
+        'terminateds', 'truncateds', 'info', and optionally 'dones'.
+        Each episode is processed, and transitions are inserted into the data_store.
+        Inserted transitions in data store will remain in the data_store as pointers.
+
+        ***Note***
+        Need to insert obs and acs in the same way as async_sac_state via jax
+
+        Parameters
+        ----------
+        data_store : QueuedDataStore    
+
+        Returns
+        -------
+        None
+        """
+
+        obs_buffer   = self.data['obs']         # shape (N, T+1, ...)
+        act_buffer   = self.data['acs']         # shape (N, T,   ...)
+        rew_buffer   = self.data['rewards']     # shape (N, T)
+        term_buffer  = self.data['terminateds'] # shape (N, T)
+        trunc_buffer = self.data['truncateds']  # shape (N, T)
+        info_buffer  = self.data['info']        # shape (N, T)
+        done_buffer  = self.data['dones']        # shape (N, T) #.get('dones', term_buffer | trunc_buffer)
+
+        num_demos = self.get_num_demos()
+        if num_demos == 0:
+            raise ValueError("No demonstrations found in the provided .npz file.")
+
+        num_transitions = self.get_num_transitions()
+        if num_transitions == 0:
+            raise ValueError("No transitions found in the provided .npz file.")
+
+
+        # Extract the number of episodes and transitions
+        for ep in range(num_demos):
+            ep_obs   = obs_buffer[ep]
+            ep_acts  = act_buffer[ep]
+            ep_rews  = rew_buffer[ep]
+            ep_terms = term_buffer[ep]
+            ep_trunc = trunc_buffer[ep]
+            ep_done  = done_buffer[ep]
+            ep_info  = info_buffer[ep]
+
+            T = len(ep_acts)
+            for t in range(T):
+                obs_t       = np.asarray(ep_obs[t], dtype=np.float32)
+                next_obs_t  = np.asarray(ep_obs[t+1], dtype=np.float32)
+                a_t         = np.asarray(ep_acts[t], dtype=np.float32)
+                r_t         = float(ep_rews[t])
+                done_t      = bool(ep_done[t] or ep_terms[t] or ep_trunc[t])
+                #info_t     = ep_info[t]
+                # masks will be created right before insert below
+
+                if self.debug:
+                    np.set_printoptions(precision=3, suppress=True)
+
+                    print(f"Demo {ep:2}, Step {t:3} \n "
+                        f"Obs: [{obs_t[0]:.2f} {obs_t[1]:.2f} {obs_t[2]:.2f}] \n "
+                        f"Action: [{a_t[0]:.2f} {a_t[1]:.2f} {a_t[2]:.2f}] \n "
+                        f"Reward: {r_t:.2f} \n "
+                        f"Done: {done_t}")
+
+                # Insert using SERLs data_store/ReplayBuffer insert mechanism directly.
+                data_store.insert(
+                    dict(
+                        observations     =obs_t,
+                        actions          =a_t,
+                        next_observations=next_obs_t,
+                        rewards          =r_t,
+                        masks            =1.0 - done_t,
+                        dones            =done_t
+                    )
+                )
+
+        print(f"Loaded a total of {num_transitions} from {num_demos} episodes from '{self.demo_npz_path}' ")
+
+
+# if __name__ == "__main__":
+#     # Instantiate a DemoHandling object
+#     handler = DemoHandling(demo_dir='/data/data/serl/demos',
+#                            file_name='data_franka_reach_random_20.npz')
+
+#     # Idenitfy the total number of transitions in the datastore
+#     print(f'We have {handler.data["transition_ctr"]} transitions in the datastore.')
+
+#     # Simulate SERL's datastore creation w/ capacity 2000
+#     ds = QueuedDataStore(2000)
+
+#     # Insert the demo data into the datastore
+#     handler.insert_data_to_buffer(ds)