MMQA has been accepted by TGRS 2026. It is a Mars mineral question answering system that combines the Martian Mineral Knowledge Graph, multi-source geological datasets, a Mars mineral text corpus, and large language models for interpretable geological reasoning.
MMQA is built around two complementary data resources:
- M200: a curated bibliography of 200 mineral-formation-related papers. The source list is provided in
geodata/MM200.csv. - M2000: a larger Mars mineral text corpus containing 2,000+ papers and reports collected for retrieval, evidence grounding, and corpus-scale knowledge extraction. The source list is provided in
geodata/MM2000.csv.
From these resources, we construct the Martian Mineral Knowledge Graph (MMKG). The graph stores mineral entities, geological environments, formation processes, relations, descriptions, and provenance evidence. The extraction and fusion workflow is summarized below.
The ontology design is provided in FIG/Fig2_01.png, and the knowledge extraction prompt template is provided in kg_extract_prompts.txt.
The MMQA corpus also integrates multi-source geological data, including raster maps, vector maps, tabular geomorphological records, and text evidence. These data provide the local geological context used during formation analysis.
| Data type | Data | Spatial resolution / content | Format |
|---|---|---|---|
| Physical property | OMEGA NIR Albedo | 14400 x 7200 pixels (1.48 km/px) | Raster map |
| Physical property | TES Thermal Inertia | 7200 x 3600 pixels (3 km/px) | Raster map |
| Physical property | MOLA Terrain Elevation | 200 m/px | Raster map |
| Chemical property | TES Mineral Maps | 1440 x 720 pixels (16 km/px) | Raster map |
| Chemical property | Elemental Abundance | 72 x 36 pixels (300 km/px) | Raster map |
| Geological age | Geologic Map | Global distribution of geological eras | Vector map |
| Geomorphological feature | Paleolake Basins | Distribution of 425 paleolake basins | Tabular |
| Geomorphological feature | Fluvial Systems | Distribution of 3,772 valley systems | Vector map |
| Geomorphological feature | Craters > 1 km | Distribution of 384,343 craters | Tabular |
| Geomorphological feature | HiRISE Topography | 96,365 coordinate-topography pairs | Tabular |
| Text corpus | Multi-source texts | 214 research articles and 15 NASA reports | Text |
Main public sources include:
- ASU Mars Data Portal for mineral abundance and thermal inertia.
- ESA Planetary Science Archive for surface albedo.
- USGS SIM 3292 Mars Global Geologic GIS Database for global geologic units.
- USGS MOLA-HRSC Blended DEM for terrain elevation.
- University of Arizona HiRISE Archive for HiRISE imagery and topographic context.
- Robbins Crater Database for crater records.
- UT Austin Goudge Lab Shared Data for paleolake basins.
- Global Valley Network Database for valley systems.
The code is organized around a compact reasoning pipeline:
MMAgentV2.py: main MMQA pipeline for intent recognition, geological context retrieval, graph/text retrieval, and answer generation.MMQAsimple.py: lightweight formation-analysis demo using only geological context, without MMKG or text-corpus retrieval.graph_query.py: knowledge graph path retrieval and provenance handling.retrieval_with_context_v2.py,text_retrival.py: text and context retrieval.path_selector.py,link_scorer.py,embedding_utils.py: embedding-based path scoring and representation utilities.geo_context_loader.py,geo_context_summary.py: loading and summarizing multi-source geological data.intent_classifier.py,answer_generator.py,prompt.py: intent detection, prompt templates, and final response generation.kg_extract_prompts.txt: prompt template used for Martian mineral knowledge graph extraction.proxy_config.py: API key and OpenAI-compatible endpoint configuration.
To get started with MMQA, prepare the local project folder, configure the OpenAI-compatible API endpoint, and place the required geological data, MMKG files, text corpus, and embedding indexes in the expected local paths.
cd MMQAConfigure API access in proxy_config.py:
API_KEY = "your_api_key"
BASE_URL = "your_base_url"Run the full MMQA pipeline with graph-path reasoning and text retrieval:
python MMAgentV2.pyFor a minimal demonstration without the knowledge graph and text corpus, run the geological-context-only version:
python MMQAsimple.pyExample query:
At 109.9 degrees E, 25.1 degrees N on Mars, sulfate was detected. What could be the formation mechanism?
The full system returns an answer grounded in geological context, retrieved text evidence, and MMKG reasoning paths. The simplified version is useful for testing coordinate-based geological reasoning when the complete MMKG and corpus resources are not available.

