Actuated Pendulum with Propeller

A robot designed for reinforcement learning and control experiments with real hardware.

Repository Map

train.py: training script for the robot.
hardware/: 3D-printable models for the structure.
firmware/: Arduino code for the ESP32 microcontroller.
test_robot.py: testing script for the robot.
env.py: Gymnasium environment definition.
wrappers.py: environment wrappers.

Getting Started

Hardware

Bill of Materials

Item	Description
ESP32 (Dev Kit C)	Microcontroller. Several board sizes are available on the market. We use the 25.70 × 53.40 mm version, but you may need to adapt the design for your board.
L9110H + propeller kit	Motor, propeller, and controller.
AS5600	Rotary encoder.
3D-printed structure	Printed support structure.
2 × M3 locking nuts	Fasteners.
2 × M3×25 screws	Fasteners.
623z	Bearing.
Flexible 4-wire cable	Electrical connection.
Wooden or cardboard base	Base, approximately 120 × 100 mm.
Counterweight	It has to still fall when left alone, but helps the actuator lift the pendulum. A M6 bolt, washer and nut was used in our case.

Assembly

Connect the VCC pins of the L9110H and AS5600 to the 3.3 V pin on the ESP32.
Connect all GND pins together, and w
Connect the signal pins to the appropriate GPIOs on the ESP32 (as specified in firmware.ino).
Glue the encoder magnet to the end of the screw that acts as the shaft.
Some boards may require a small amount of glue to remain securely in place.

Firmware

Upload the firmware to the ESP32 using the Arduino IDE.

Usage

Install the required Python packages:
```
pip install -r requirements.txt
```
Verify that everything is working by running:
```
python test_robot.py
```
The robot should move in a somewhat random manner.
Start training with:
```
python train.py
```

Reinforcement Learning

History Wrapper

A history wrapper is used to maintain the last observations and actions taken by the agent. This provides the agent with short-term memory and context, effectively restoring the Markov property of the environment. This is necessary because certain state variables (such as $\dot{\theta}$ or the propeller speed) are not directly observable from a single timestep.

State-Space Representation

If we assume the propeller force is first order, thus directly controllable (not entirely realistic), the system can be represented as:

$$ \mathbf{x} = \begin{bmatrix}\theta\ \dot\theta\end{bmatrix}). $$

Dynamics:

$$ \dot{\mathbf{x}} = \begin{bmatrix} \dot\theta \\ \displaystyle \frac{1}{J}\Big( l \omega_p = F(u) - m g l \sin\theta - b\dot\theta \Big) \end{bmatrix} $$

Empirical measurements of F(u)

u [V]	F [N]	I [A]
8.4	0.111	0.340
7.3	0.087	—
6.0	0.066	0.222
5.0	0.046	0.170
4.0	0.031	0.120
2.8	0.016	—
0.0	0.000	0.000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Actuated Pendulum with Propeller

Repository Map

Getting Started

Hardware

Bill of Materials

Assembly

Firmware

Usage

Reinforcement Learning

History Wrapper

State-Space Representation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Actuated Pendulum with Propeller

Repository Map

Getting Started

Hardware

Bill of Materials

Assembly

Firmware

Usage

Reinforcement Learning

History Wrapper

State-Space Representation