Skip to content

Commit 8e24c27

Browse files
authored
chore: rai-perception - update docs to use pip install and add more tests (#719)
1 parent a0addd0 commit 8e24c27

File tree

16 files changed

+1835
-160
lines changed

16 files changed

+1835
-160
lines changed

docs/extensions/perception.md

Lines changed: 2 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,5 @@
11
--8<-- "src/rai_extensions/rai_perception/README.md:sec1"
2-
Agents create two ROS 2 Nodes: `grounding_dino` and `grounded_sam` using [ROS2Connector](../API_documentation/connectors/ROS_2_Connectors.md).
3-
These agents can be triggered by ROS2 services:
4-
5-
- `grounding_dino_classify`: `rai_interfaces/srv/RAIGroundingDino`
6-
- `grounded_sam_segment`: `rai_interfaces/srv/RAIGroundedSam`
7-
8-
> [!TIP]
9-
>
10-
> If you wish to integrate open-set detection into your ros2 launch file, a premade launch
11-
> file can be found in `rai/src/rai_bringup/launch/openset.launch.py`
12-
13-
> [!NOTE]
14-
> The weights will be downloaded to `~/.cache/rai` directory.
15-
16-
## RAI Tools
17-
18-
`rai_perception` package contains tools that can be used by [RAI LLM agents](../tutorials/walkthrough.md)
19-
enhance their perception capabilities. For more information on RAI Tools see
20-
[Tool use and development](../tutorials/tools.md) tutorial.
21-
22-
--8<-- "src/rai_extensions/rai_perception/README.md:sec3"
23-
24-
> [!TIP]
25-
>
26-
> you can try example below with [rosbotxl demo](../demos/rosbot_xl.md) binary.
27-
> The binary exposes `/camera/camera/color/image_raw` and `/camera/camera/depth/image_raw` topics.
282

293
--8<-- "src/rai_extensions/rai_perception/README.md:sec4"
4+
5+
--8<-- "src/rai_extensions/rai_perception/README.md:sec5"

src/rai_extensions/rai_perception/README.md

Lines changed: 89 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -2,125 +2,139 @@
22

33
# RAI Perception
44

5-
This package provides ROS2 integration with [Idea-Research GroundingDINO Model](https://github.com/IDEA-Research/GroundingDINO) and [Grounded-SAM-2, RobotecAI fork](https://github.com/RobotecAI/Grounded-SAM-2) for object detection, segmentation, and gripping point calculation. The `GroundedSamAgent` and `GroundingDinoAgent` are ROS2 service nodes that can be readily added to ROS2 applications. It also provides tools that can be used with [RAI LLM agents](../tutorials/walkthrough.md) to construct conversational scenarios.
5+
RAI Perception brings powerful computer vision capabilities to your ROS2 applications. It integrates [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO) and [Grounded-SAM-2](https://github.com/RobotecAI/Grounded-SAM-2) to detect objects, create segmentation masks, and calculate gripping points.
66

7-
In addition to these building blocks, this package includes utilities to facilitate development, such as a ROS2 client that demonstrates interactions with agent nodes.
7+
The package includes two ready-to-use ROS2 service nodes (`GroundedSamAgent` and `GroundingDinoAgent`) that you can easily add to your applications. It also provides tools that work seamlessly with [RAI LLM agents](../tutorials/walkthrough.md) to build conversational robot scenarios.
88

9-
## Installation
9+
## Prerequisites
10+
11+
Before installing `rai-perception`, ensure you have:
1012

11-
While installing `rai_perception` via Pip is being actively worked on, to incorporate it into your application, you will need to set up a ROS2 workspace.
13+
1. **ROS2 installed** (Jazzy recommended, or Humble). If you don't have ROS2 yet, follow the official ROS2 installation guide for [jazzy](https://docs.ros.org/en/jazzy/Installation.html) or [humble](https://docs.ros.org/en/humble/Installation.html).
14+
2. **Python 3.8+** and `pip` installed (usually pre-installed on Ubuntu).
15+
3. **NVIDIA GPU** with CUDA support (required for optimal performance).
16+
4. **wget** installed (required for downloading model weights):
17+
```bash
18+
sudo apt install wget
19+
```
1220

13-
### ROS2 Workspace Setup
21+
## Installation
1422

15-
Create a ROS2 workspace and copy this package:
23+
**Step 1:** Source ROS2 in your terminal:
1624

1725
```bash
18-
mkdir -p ~/rai_perception_ws/src
19-
cd ~/rai_perception_ws/src
20-
21-
# only checkout rai_perception package
22-
git clone --depth 1 --branch main https://github.com/RobotecAI/rai.git temp
23-
cd temp
24-
git archive --format=tar --prefix=rai_perception/ HEAD:src/rai_extensions/rai_perception | tar -xf -
25-
mv rai_perception ../rai_perception
26-
cd ..
27-
rm -rf temp
26+
# For ROS2 Jazzy (recommended)
27+
source /opt/ros/jazzy/setup.bash
28+
29+
# For ROS2 Humble
30+
source /opt/ros/humble/setup.bash
2831
```
2932

30-
### ROS2 Dependencies
33+
**Step 2:** Install ROS2 dependencies. `rai-perception` requires its ROS2 packages that needs to be installed separately:
34+
35+
```bash
36+
# Update package lists first
37+
sudo apt update
38+
39+
# Install rai_interfaces as a debian package
40+
sudo apt install ros-jazzy-rai-interfaces # or ros-humble-rai-interfaces for Humble
41+
```
3142

32-
Add required ROS dependencies. From the workspace root, run
43+
**Step 3:** Install `rai-perception` via pip:
3344

3445
```bash
35-
rosdep install --from-paths src --ignore-src -r
46+
pip install rai-perception
3647
```
3748

38-
### Build and Run
49+
> [!TIP]
50+
> It's recommended to install `rai-perception` in a virtual environment to avoid conflicts with other Python packages.
3951
40-
Source ROS2 and build:
52+
> [!TIP]
53+
> To avoid sourcing ROS2 in every new terminal, add the source command to your `~/.bashrc` file:
54+
>
55+
> ```bash
56+
> echo "source /opt/ros/jazzy/setup.bash" >> ~/.bashrc # or humble
57+
> ```
4158
42-
```bash
43-
# Source ROS2 (humble or jazzy)
44-
source /opt/ros/${ROS_DISTRO}/setup.bash
59+
<!--- --8<-- [end:sec1] -->
4560
46-
# Build workspace
47-
cd ~/rai_perception_ws
48-
colcon build --symlink-install
61+
<!--- --8<-- [start:sec4] -->
4962
50-
# Source ROS2 packages
51-
source install/setup.bash
52-
```
63+
## Getting Started
5364
54-
### Python Dependencies
65+
This section provides a step-by-step guide to get you up and running with RAI Perception.
5566
56-
`rai_perception` depends on `rai-core` and `sam2`. There are many ways to set up a virtual environment and install these dependencies. Below, we provide an example using Poetry.
67+
### Quick Start
5768
58-
**Step 1:** Copy the following template to `pyproject.toml` in your workspace root, updating it according to your directory setup:
69+
After installing `rai-perception`, launch the perception agents:
5970
60-
```toml
61-
# rai_perception_project pyproject template
62-
[tool.poetry]
63-
name = "rai_perception_ws"
64-
version = "0.1.0"
65-
description = "ROS2 workspace for RAI perception"
66-
package-mode = false
71+
**Step 1:** Open a terminal and source ROS2:
6772
68-
[tool.poetry.dependencies]
69-
python = "^3.10, <3.13"
70-
rai-core = ">=2.5.4"
71-
rai-perception = {path = "src/rai_perception", develop = true}
73+
```bash
74+
source /opt/ros/jazzy/setup.bash # or humble
75+
```
7276
73-
[build-system]
74-
requires = ["poetry-core>=1.0.0"]
75-
build-backend = "poetry.core.masonry.api"
77+
**Step 2:** Launch the perception agents:
78+
79+
```bash
80+
python -m rai_perception.scripts.run_perception_agents
7681
```
7782
78-
**Step 2:** Install dependencies:
83+
> [!NOTE]
84+
> The weights will be downloaded to `~/.cache/rai` directory on first use.
85+
86+
The agents create two ROS 2 nodes: `grounding_dino` and `grounded_sam` using [ROS2Connector](../API_documentation/connectors/ROS_2_Connectors.md).
7987
80-
First, we create Virtual Environment with Poetry:
88+
### Testing with Example Client
89+
90+
The `rai_perception/talker.py` example demonstrates how to use the perception services for object detection and segmentation. It shows the complete pipeline: GroundingDINO for object detection followed by GroundedSAM for instance segmentation, with visualization output.
91+
92+
**Step 1:** Open a terminal and source ROS2:
8193
8294
```bash
83-
cd ~/rai_perception_ws
84-
poetry lock
85-
poetry install
95+
source /opt/ros/jazzy/setup.bash # or humble
8696
```
8797
88-
Now, we are ready to launch perception agents:
98+
**Step 2:** Launch the perception agents:
8999
90100
```bash
91-
# Activate virtual environment
92-
source "$(poetry env info --path)"/bin/activate
93-
export PYTHONPATH
94-
PYTHONPATH="$(dirname "$(dirname "$(poetry run which python)")")/lib/python$(poetry run python --version | awk '{print $2}' | cut -d. -f1,2)/site-packages:$PYTHONPATH"
101+
python -m rai_perception.scripts.run_perception_agents
102+
```
95103
96-
# run agents
97-
python src/rai_perception/scripts/run_perception_agents.py
104+
**Step 3:** In a different terminal (remember to source ROS2 first), run the example client:
105+
106+
```bash
107+
source /opt/ros/jazzy/setup.bash # or humble
108+
python -m rai_perception.examples.talker --ros-args -p image_path:="<path-to-image>"
98109
```
99110
111+
You can use any image containing objects like dragons, lizards, or dinosaurs. For example, use the `sample.jpg` from the package's `images` folder. The client will detect these objects and save a visualization with bounding boxes and masks to `masks.png` in the current directory.
112+
100113
> [!TIP]
101-
> To manage ROS 2 + Poetry environment with less friction: Keep build tools (colcon) at system level, use Poetry only for runtime dependencies of your packages.
114+
>
115+
> If you wish to integrate open-set vision into your ros2 launch file, a premade launch
116+
> file can be found in `rai/src/rai_bringup/launch/openset.launch.py`
102117

103-
<!--- --8<-- [end:sec1] -->
118+
### ROS2 Service Interface
104119

105-
`rai-perception` agents create two ROS 2 nodes: `grounding_dino` and `grounded_sam` using [ROS2Connector](../../../docs/API_documentation/connectors/ROS_2_Connectors.md).
106-
These agents can be triggered by ROS2 services:
120+
The agents can be triggered by ROS2 services:
107121

108122
- `grounding_dino_classify`: `rai_interfaces/srv/RAIGroundingDino`
109123
- `grounded_sam_segment`: `rai_interfaces/srv/RAIGroundedSam`
110124

111-
> [!TIP]
112-
>
113-
> If you wish to integrate open-set vision into your ros2 launch file, a premade launch
114-
> file can be found in `rai/src/rai_bringup/launch/openset.launch.py`
125+
<!--- --8<-- [end:sec4] -->
115126

116-
> [!NOTE]
117-
> The weights will be downloaded to `~/.cache/rai` directory.
127+
<!--- --8<-- [start:sec5] -->
128+
129+
## Dive Deeper: Tools and Integration
118130

119-
## RAI Tools
131+
This section provides information for developers looking to integrate RAI Perception tools into their applications.
120132

121-
`rai_perception` package contains tools that can be used by [RAI LLM agents](../../../docs/tutorials/walkthrough.md)
133+
### RAI Tools
134+
135+
`rai_perception` package contains tools that can be used by [RAI LLM agents](../tutorials/walkthrough.md)
122136
to enhance their perception capabilities. For more information on RAI Tools see
123-
[Tool use and development](../../../docs/tutorials/tools.md) tutorial.
137+
[Tool use and development](../tutorials/tools.md) tutorial.
124138

125139
<!--- --8<-- [start:sec2] -->
126140

@@ -132,7 +146,7 @@ This tool calls the GroundingDINO service to detect objects from a comma-separat
132146

133147
> [!TIP]
134148
>
135-
> you can try example below with [rosbotxl demo](../../../docs/demos/rosbot_xl.md) binary.
149+
> you can try example below with [rosbotxl demo](../demos/rosbot_xl.md) binary.
136150
> The binary exposes `/camera/camera/color/image_raw` and `/camera/camera/depth/image_rect_raw` topics.
137151

138152
<!--- --8<-- [start:sec3] -->
@@ -198,30 +212,6 @@ with ROS2Context():
198212
I have detected the following items in the picture desk: 2.43m away
199213
```
200214

201-
## Simple ROS2 Client Node Example
202-
203-
The `rai_perception/talker.py` example demonstrates how to use the perception services for object detection and segmentation. It shows the complete pipeline: GroundingDINO for object detection followed by GroundedSAM for instance segmentation, with visualization output.
204-
205-
This example is useful for:
206-
207-
- Testing perception services integration
208-
- Understanding the ROS2 service call patterns
209-
- Seeing detection and segmentation results with bounding boxes and masks
210-
211-
Run the example:
212-
213-
```bash
214-
cd ~/rai_perception_ws
215-
python src/rai_perception/scripts/run_perception_agents.py
216-
```
217-
218-
In a different window, run
219-
220-
```bash
221-
cd ~/rai_perception_ws
222-
ros2 run rai_perception talker --ros-args -p image_path:=src/rai_perception/images/sample.jpg
223-
```
224-
225-
The example will detect objects (dragon, lizard, dinosaur) and save a visualization with bounding boxes and masks to `masks.png`.
226-
227215
<!--- --8<-- [end:sec3] -->
216+
217+
<!--- --8<-- [end:sec5] -->

src/rai_extensions/rai_perception/pyproject.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
[tool.poetry]
2-
name = "rai_perception"
3-
version = "0.1.2"
2+
name = "rai-perception"
3+
# TODO, update the version once it is published to PyPi
4+
version = "0.1.5"
45
description = "Package for object detection, segmentation and gripping point detection."
56
authors = ["Kajetan Rachwał <[email protected]>"]
67
readme = "README.md"

src/rai_extensions/rai_perception/rai_perception/agents/base_vision_agent.py

Lines changed: 32 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -67,19 +67,46 @@ def _load_model_with_error_handling(self, model_class):
6767
raise e
6868

6969
def _download_weights(self):
70+
self.logger.info(
71+
f"Downloading weights from {self.WEIGHTS_URL} to {self.weights_path}"
72+
)
7073
try:
7174
subprocess.run(
7275
[
7376
"wget",
7477
self.WEIGHTS_URL,
7578
"-O",
76-
self.weights_path,
79+
str(self.weights_path),
7780
"--progress=dot:giga",
78-
]
81+
],
82+
check=True,
83+
capture_output=True,
84+
text=True,
85+
)
86+
# Verify file exists and has reasonable size (> 1MB)
87+
if not os.path.exists(self.weights_path):
88+
raise Exception(f"Downloaded file not found at {self.weights_path}")
89+
file_size = os.path.getsize(self.weights_path)
90+
if file_size < 1024 * 1024:
91+
raise Exception(
92+
f"Downloaded file is too small ({file_size} bytes), expected > 1MB"
93+
)
94+
self.logger.info(
95+
f"Successfully downloaded weights ({file_size / (1024 * 1024):.2f} MB)"
7996
)
80-
except Exception:
81-
self.logger.error("Could not download weights")
82-
raise Exception("Could not download weights")
97+
except subprocess.CalledProcessError as e:
98+
error_msg = e.stderr if e.stderr else e.stdout if e.stdout else str(e)
99+
self.logger.error(f"wget failed: {error_msg}")
100+
# Clean up partial download
101+
if os.path.exists(self.weights_path):
102+
os.remove(self.weights_path)
103+
raise Exception(f"Could not download weights: {error_msg}")
104+
except Exception as e:
105+
self.logger.error(f"Could not download weights: {e}")
106+
# Clean up partial download
107+
if os.path.exists(self.weights_path):
108+
os.remove(self.weights_path)
109+
raise
83110

84111
def _remove_weights(self):
85112
os.remove(self.weights_path)
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Copyright (C) 2025 Robotec.AI
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.

src/rai_extensions/rai_perception/scripts/run_perception_agents.py renamed to src/rai_extensions/rai_perception/rai_perception/scripts/run_perception_agents.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515

1616
import rclpy
1717
from rai.agents import wait_for_shutdown
18+
1819
from rai_perception.agents import GroundedSamAgent, GroundingDinoAgent
1920

2021

src/rai_extensions/rai_perception/rai_perception/tools/gdino_tools.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
import numpy as np
1818
import sensor_msgs.msg
1919
from langchain_core.tools import BaseTool
20-
from pydantic import BaseModel, Field
20+
from pydantic import BaseModel, ConfigDict, Field
2121
from rai.communication.ros2 import ROS2Connector
2222
from rai.communication.ros2.api import convert_ros_img_to_ndarray
2323
from rai.communication.ros2.ros_async import get_future_result
@@ -84,12 +84,18 @@ class GroundingDinoBaseTool(BaseTool):
8484
box_threshold: float = Field(default=0.35, description="Box threshold for GDINO")
8585
text_threshold: float = Field(default=0.45, description="Text threshold for GDINO")
8686

87+
model_config = ConfigDict(arbitrary_types_allowed=True)
88+
89+
def _run(self, *args, **kwargs):
90+
"""Abstract method - must be implemented by subclasses."""
91+
raise NotImplementedError("Subclasses must implement _run method")
92+
8793
def _call_gdino_node(
8894
self, camera_img_message: sensor_msgs.msg.Image, object_names: list[str]
8995
) -> Future:
9096
cli = self.connector.node.create_client(RAIGroundingDino, GDINO_SERVICE_NAME)
9197
while not cli.wait_for_service(timeout_sec=1.0):
92-
self.node.get_logger().info(
98+
self.connector.node.get_logger().info(
9399
f"service {GDINO_SERVICE_NAME} not available, waiting again..."
94100
)
95101
req = RAIGroundingDino.Request()

0 commit comments

Comments
 (0)