PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

Morphology-Aware Graph Reinforcement Learning for Tensegrity Robot Locomotion

Chi Zhang, Mingrui Li, Wenzhe Tong, and Xiaonan Huang

University of Michigan
IEEE Robotics and Automation Letters

Code arXiv

A morphology-aware reinforcement learning framework that combines graph neural networks with Soft Actor-Critic for robust 3-bar tensegrity robot locomotion.

Abstract

Tensegrity robots combine rigid rods and elastic cables, offering high resilience and deployability while posing major challenges for locomotion control due to their underactuated and highly coupled dynamics. This paper introduces a morphology-aware reinforcement learning framework that integrates a graph neural network (GNN) into the Soft Actor-Critic (SAC) algorithm.

By representing the robot's physical topology as a graph, the proposed GNN-based policy captures coupling among components, enabling faster and more stable learning than conventional multilayer perceptron (MLP) policies. The method is validated on a physical 3-bar tensegrity robot across three locomotion primitives, including straight-line tracking and bidirectional turning.

The learned policies show improved sample efficiency, robustness to noise and stiffness variations, and better trajectory accuracy. They also transfer directly from simulation to hardware without fine-tuning, achieving stable real-world locomotion.

Tensegrity Robots Graph Neural Networks Soft Actor-Critic Morphology-aware RL Sim-to-Real Transfer

Contribution 1

Morphology-aware GNN Actor

We design a GNN-based actor that encodes the physical topology of tensegrity robots and captures the intrinsic coupling between rods, tendons, and structural components.

Contribution 2

Efficient and High-Performance GNN-SAC

We integrate the proposed GNN actor into the Soft Actor-Critic framework and demonstrate substantial gains in both training efficiency and final task performance compared with conventional MLP baselines.

Contribution 3

Hardware Sim-to-Real Transfer

We validate the learned locomotion primitives on hardware, including straight-line tracking and clockwise/counterclockwise turning, with direct transfer from simulation to the physical robot without fine-tuning.

Motivation

Tensegrity robots are not independent-joint systems. Their locomotion emerges from global tension, structural coupling, and morphology.

1

Global tensional coupling

Every rod and cable contributes to the overall equilibrium. A local actuation can influence the entire structure, making the dynamics strongly interdependent.

2

Natural graph structure

The network of tensile and compressive elements naturally forms a graph, making tensegrity systems suitable for graph-based learning architectures.

3

Morphology-aware control

A GNN policy mirrors the physical topology of the robot, allowing the controller to learn coordinated behaviors that respect structural coupling.

From tensegrity robot to graph-based learning — From physical morphology to graph representation: rods and tendons define the message-passing structure used by the policy.

Methodology

The framework converts robot morphology into a graph, learns structural interactions through message passing, and trains the resulting policy with SAC.

Step 1

Graph Construction

The physical topology of the 3-bar tensegrity robot is modeled as a directed graph G = (V, E). Nodes represent rod end-caps, while edges represent rigid rods, passive tendons, and active tendons.

Step 2

GNN-based SAC

Node and edge features are processed through message passing. The GNN actor extracts a morphology-aware representation and outputs tendon length commands for actuation.

Step 3

Training and Evaluation

Policies are trained on straight-line tracking and bidirectional turning, then compared against MLP-based SAC and other reinforcement learning baselines.

Overview of the proposed morphology-aware GNN-SAC framework for tensegrity robot locomotion.

Results

We evaluate the proposed GNN-SAC policy across learning efficiency, motion primitive quality, robustness, cross-morphology generalization, and zero-shot sim-to-real transfer on a physical 3-bar tensegrity robot.

Learning

Efficient Learning and Better Motion Performance

The proposed GNN-SAC policy improves both training efficiency and final task performance, achieving more accurate tracking and more stable turning than MLP-based baselines.

Robustness

Robustness and Generalization

The learned policy remains effective under stiffness variation, observation noise, sloped terrain, and actuation errors, and further generalizes across robot morphologies.

Composition

Composable Motion Primitives

Straight-line tracking and bidirectional turning primitives can be sequenced by a high-level planner to follow complex waypoint trajectories such as infinity, spiral, and flower paths.

Hardware

Zero-shot Sim-to-Real Transfer

Policies trained only in simulation transfer directly to the physical 3-bar tensegrity robot without fine-tuning, demonstrating stable real-world locomotion.

Result 1 / Training

Learning Performance

Key Findings

G-SAC achieves better reward per training step and per wall-clock time across straight-line tracking, counterclockwise turning, and clockwise turning.
The improvement over M-SAC indicates that morphology-aware graph encoding accelerates policy learning.
Three-layer GNN encoders slightly outperform two-layer encoders, suggesting the benefit of multi-hop structural message passing.

Takeaway: graph structure improves both sample efficiency and final policy quality.

Result 2 / Locomotion

Motion Primitive Evaluation

Simulation evaluation of learned motion primitives. G-SAC achieves lower tracking error and faster, more stable bidirectional turning than M-SAC.

Takeaway: G-SAC achieves more accurate tracking and more stable turning than M-SAC.

Primitive Quality

We evaluate straight-line tracking and bidirectional turning under randomized initial poses.

For straight-line tracking, the robot reaches target waypoints at different heading offsets, and performance is measured by the final-position distribution around the target.
For turning, policies are evaluated by average yaw rate during counterclockwise and clockwise rotations. G-SAC produces faster and more stable rotations compared with M-SAC.

Straight-line Tracking CW Turning CCW Turning

Result 3 / Robustness

Robustness

The policy remains effective under stiffness variation, observation noise, inclined terrain, dynamic stiffness variation, and actuation error. Across perturbations, G-SAC maintains smoother and more stable behaviors than M-SAC.

The same policy trained on the 3-bar robot can also be deployed to 4-bar and 5-bar tensegrity configurations without retraining, demonstrating zero-shot generalization across morphologies.

Takeaway: Embedding morphological structure in the policy network enhances robustness to uncertainties.

Result 4 / Generalization

Cross-Morphology Generalization

A straight-line tracking policy trained on the 3-bar prism tensegrity robot is directly deployed to 4-bar and 5-bar configurations without retraining. Each subplot aggregates 100 trials.

Whileaccuracy degrades as the structural difference from the training morphology increases, the locomotion remains functional.

It is important to note that conventional MLP-based policies cannot be directly applied in this setting due to their fixed input and output dimensions, which depend on the specific robot configuration.

Takeaway: The GNN-based policy generalizes zero-shot across tensegrity morphologies, preserving functional locomotion with gradual performance degradation.

Result 5 / Composition

Composable Motion Primitives

From Motion Primitives to Trajectories

Straight-line tracking and bidirectional turning can be sequenced by a high-level planner to follow complex waypoint trajectories. The robot follows infinity-shaped, spiral, and six-petal flower paths by composing learned primitives.

Primitive Composition Waypoint Tracking Trajectory Following

Straight-line Tracking

Counterclockwise Turning

Clockwise Turning

Result 6 / Sim-to-Real

Zero-shot Sim-to-Real Transfer

Policies trained only in simulation are deployed on the physical 3-bar tensegrity robot without fine-tuning. The robot successfully executes straight-line tracking, clockwise turning, and counterclockwise turning.

In hardware experiments, the robot achieves stable coordinated rolling, with mean forward tracking speed of 0.256 m/s, counterclockwise turning rate of 2.76°/s, and clockwise turning rate of 1.72°/s.

Zero-shot Transfer Physical Robot GNN-SAC

Overall: the results show that encoding tensegrity morphology with a graph neural network improves learning efficiency, final performance, robustness, enables composability, and zero-shot hardware transfer.

Morphology-Aware Graph Reinforcement Learning for Tensegrity Robot Locomotion

A morphology-aware reinforcement learning framework that combines graph neural networks with Soft Actor-Critic for robust 3-bar tensegrity robot locomotion.

Abstract

Motivation

Methodology

Results

Learning Performance

Motion Primitive Evaluation

Robustness

Cross-Morphology Generalization

Composable Motion Primitives

Zero-shot Sim-to-Real Transfer

BibTeX