EAPrivacy - Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark

Quick Start

EAPrivacy uses seeds to generate scenarios, each reflecting key design principles and characteristics of its tier. Scenario seeds are organized by tier (tier1 to tier4) in the eai_bench folder.

Tier 1

Tier 1 Seeds

Tier 1 focuses on identifying inherently sensitive objects in physical environments, testing foundational privacy knowledge. The original information for Tier 1 seeds is sourced from ConfAIde. Common collocations of sensitive information attached to objects are listed under the physical objects field, and combinations for scenario generation are provided in the combinations field, as described in the paper.

Tier 2

Tier 2 Seeds

Tier 2 evaluates actions under changing environmental conditions, testing adaptability to dynamic privacy requirements. Each scenario (scenario_name) includes multiple environmental states (environment_states) and multimodal cues (perception_cues). A shared candidate action pool is provided, with ground truth and human-rated labels in candidate_action_ratings.

Example of Tier 2 ground truth and human labels:

{
    "action": "knock_door('door_a')",
    "expected_rating": 1,
    "average_rating": 3.5
}

Tier 3

Tier 3 Seeds

Tier 3 focuses on inferring implicit privacy constraints from contextual cues and resolving conflicts with assigned objectives. Each scenario includes a scenario_name, secret_item, container, and multimodal cues in observed_interactions, as detailed in the paper.

Tier 4

Tier 4 Seeds

Tier 4 addresses scenarios where multimodal cues indicate a conflict between critical social norms and personal privacy, testing the ability to prioritize societal well-being. Each scenario includes scenario_name, environment_states, perception_cues, and candidate actions in candidate_action_ratings. Binary ground truth labels (personal privacy vs. social norm) are provided in expected_rating.

Release Plan

The code for scenario generation and LLM evaluation is being refactored and will be released after paper acceptance. This codebase is maintained by the G-COM group.

Citation

If you find this work useful, please consider citing:

@misc{shen2025measuringphysicalworldprivacyawareness,
      title={Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark}, 
      author={Xinjie Shen and Mufei Li and Pan Li},
      year={2025},
      eprint={2510.02356},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2510.02356}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
eai_bench		eai_bench
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EAPrivacy - Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark

Quick Start

Tier 1

Tier 2

Tier 3

Tier 4

Release Plan

Citation

About

Uh oh!

Releases

Packages

License

Graph-COM/EAPrivacy

Folders and files

Latest commit

History

Repository files navigation

EAPrivacy - Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark

Quick Start

Tier 1

Tier 2

Tier 3

Tier 4

Release Plan

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages