RAPID: A Reproducible Multi-Agent Pipeline for Interpretable Disaster Damage Assessment from Satellite and Street-View Imagery
With the increasing frequency and intensity of extreme climate events, there is a growing demand for intelligent, scalable, and autonomous approaches to disaster damage assessment. Existing methods, largely based on supervised learning and task-specific fine-tuning, struggle to generalize under domain shifts, long-tailed data distributions, and heterogeneous geospatial data sources. This paper introduces RAPID, an autonomous multi-agent pipeline for interpretable disaster damage assessment — encompassing the assessment of damage levels, interpretation of damage types and degrees, and generation of actionable suggestions for response, remediation, and recovery.
Unlike conventional approaches that rely on single-task supervised models, RAPID coordinates multiple specialized agents to perform cross-view understanding, image restoration, structured damage recognition, and geographical reasoning across heterogeneous data modalities. Without task-specific fine-tuning, RAPID supports zero-shot damage assessment by jointly leveraging complementary information from remote sensing and ground-level perspectives. Experiments show that RAPID achieves an overall accuracy of 0.92 on multi-disaster type classification and up to 0.627 on cross-view damage severity prediction.
Keywords: Disaster Assessment · Vision-Language Models · Cross-View Imagery · Zero-Shot Learning · Multi-Agent Pipeline
Figure 1. RAPID: An Autonomous Multi-Agent Framework for Disaster Damage Intelligence
RAPID addresses three core research questions:
- RQ1 — How can an autonomous multi-agent pipeline achieve multimodal disaster understanding across different geospatial data sources without task-specific fine-tuning, while maintaining robustness under domain shifts?
- RQ2 — How can multiple agents efficiently coordinate perception, data restoration, damage recognition, and reasoning to generate structured, interpretable assessment results?
- RQ3 — To what extent can such a system automatically produce location-specific and decision-relevant disaster intelligence, and what are its strengths and limitations in supporting real-world disaster assessment?
The framework comprises four collaborative specialized agents coordinated through task decomposition:
| Agent | Abbreviation | Core Role | Key Output |
|---|---|---|---|
| Disaster Perception Agent | DPA | Zero-shot identification of disaster type, image modality, and structural context; plans the downstream workflow | Disaster label + confidence score + task plan |
| Image Restoration Agent | IRA | Diagnoses image quality issues (blur, haze, low-light) in SVI/RSI; applies constrained enhancement strategies to preserve disaster-relevant visual evidence | Restored imagery + quality score (Q) |
| Damage Recognition Agent | DRA | Structured damage diagnosis across cross-view and bi-temporal settings; severity classification without task-specific fine-tuning | Severity label · object-level indicators · confidence scores |
| Disaster Reasoning Agent | DReA | High-level cognitive synthesis; causal interpretation, recovery recommendations, and structured disaster report generation | Decision-relevant disaster report |
Disaster Perception Agent (DPA) comprises three modules: ModePerceiver (zero-shot image mode + disaster type recognition), DisasterReasoner (natural-language scene explanation with visual evidence), and TaskPlanner (downstream agent orchestration).
Image Restoration Agent (IRA) evaluates three restoration branches — heuristic baseline, Gemini-guided planner, and image-only Gemini enhancement — and accepts a branch output only when the composite quality score Q (combining contrast, sharpness, and NIQE-proxy) improves beyond a preset margin over the original.
Damage Recognition Agent (DRA) operates through four complementary tasks: (1) cross-view hurricane damage prediction from paired RSI+SVI, (2) bi-temporal visual change analysis from pre/post SVI, (3) wildfire-specific five-level classification, and (4) object-level detection + instance segmentation for spatial damage characterization. Evaluated with both standard metrics and the severity-aware Normalized Cross-Severity Error (NCSE):
Disaster Reasoning Agent (DReA) ingests structured JSON outputs from DRA and applies templated chain-of-thought reasoning to generate causal explanations, secondary risk assessments, and FEMA-guideline-aligned recovery recommendations — evaluated by both LLM-based and human expert scoring across factual consistency, causal plausibility, information completeness, and actionability.
RAPID is evaluated across three complementary multimodal disaster dataset categories, covering cross-view, bi-temporal, and multi-hazard scenarios:
| Dataset | Data Type | Images | Disaster | Severity | Source | Agents |
|---|---|---|---|---|---|---|
| A | SVI + RSI pairs | 300 | Hurricane | 3 levels | CVDisaster (Li et al., 2025) | DPA · IRA · DRA · DReA |
| B | Bi-temporal SVI | 300 | Hurricane | 3 levels | BiTemporal (Yang et al., 2025) | DPA · IRA · DRA · DReA |
| C1 | Post-disaster SVI | 188 | Drought / Earthquake / Flood / Ice Storm / Wildfire | N/A | Incidents Dataset (Weber et al., 2020) | DPA |
| C2 | Post-disaster SVI | 295 | Wildfire | 5 levels | LA DINS (2025) | DPA · DRA |
Geographic coverage spans California and Florida, USA. Dataset A and B each contain 150 image pairs; Dataset C totals 483 images across diverse hazard types.
| Geolocation Distribution | Dataset Statistics |
|---|---|
![]() |
![]() |
| Model | Overall Accuracy |
|---|---|
| GPT-5-mini | 0.92 |
| GPT-5.1 | 0.88 |
| Gemini-2.5-flash | 0.86 |
Restoration consistently improves quality scores across all disaster types and image modalities. Representative results:
| Category | Image Type | Q_original | Q_baseline | Q_planner | Q_gemini |
|---|---|---|---|---|---|
| Dataset A — Hurricane | Satellite | 0.62 | 0.73 | 0.71 | 0.69 |
| Dataset A — Hurricane | SVI | 0.75 | 0.78 | 0.76 | 0.79 |
| Dataset B — Hurricane | SVI | 0.76 | 0.78 | 0.78 | 0.79 |
| Model | Dataset A Accuracy | Dataset B Accuracy | Dataset C Accuracy |
|---|---|---|---|
| Gemini-3-Pro | 0.627 | 0.493 | 0.442 |
| GPT-5.1 | 0.573 | 0.591 | 0.570 |
| GPT-5-mini | 0.387 | 0.503 | 0.573 |
Errors concentrate at adjacent severity levels, validating the role of NCSE as a more sensitive evaluation metric.
LLM-based and human evaluation of multimodal disaster reasoning across Gemini-3-Pro, Gemini-2.5-Pro, and GPT-5.1
| LLM-Based Object Detection | Final Structured Report |
|---|---|
![]() |
![]() |
RAPID/
├── Disaster Perception Agent/
│ ├── DisasterPerceptionAgent.py # ModePerceiver + DisasterReasoner + TaskPlanner
│ └── Prompt--Disaster Perception Agent
├── Image Restoration Agent/
│ ├── test.py # IQA diagnosis + 3-branch restoration
│ └── Prompt--Image Restoration Agent
├── Damage Recognition Agent/
│ ├── SVI&RSI.py # Cross-view hurricane (Dataset A)
│ ├── SVI-pre&post.py # Bi-temporal street-view (Dataset B)
│ ├── SVI-wildfire.py # Wildfire severity (Dataset C2)
│ ├── zero_shot_object_detection_Agent3.ipynb
│ └── Prompt--Damage Recognition Agent
├── Disaster Reasoning Agent/
│ ├── Large Language Model-based evaluation.py
│ ├── test.py # Report generation + evaluation
│ └── Prompt--Disaster Reasoning Agent
└── figure/
Yifan Yang — Department of Geography, Texas A&M University
yyang295@tamu.edu · rayford295.github.io
Lei Zou (Corresponding) — Department of Geography, Texas A&M University
lzou@tamu.edu





