Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
218 changes: 218 additions & 0 deletions alerts/openshift-container-storage-operator/CephXattrSetLatency.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
# CephXattrSetLatency

## Meaning

This alert indicates that the Ceph Metadata Server (MDS) is experiencing high
latency when setting extended attributes (xattr) on files. The alert fires when
the average setxattr operation latency exceeds 30 milliseconds over a 5-minute
period.

**What are extended attributes (xattrs)?**

Extended attributes are named key/value metadata entries stored alongside
filesystem objects (inodes, directories, symlinks) in CephFS. They follow POSIX
conventions with namespace prefixes:

- **user.*** — Application-specific metadata
- **security.*** — SELinux labels and security contexts
- **system.*** — POSIX ACLs and system metadata
- **trusted.*** — Administrative attributes (requires CAP_SYS_ADMIN)

**What does setxattr do?**

The MDS performs setxattr operations on behalf of clients to write or update
extended attributes. This involves:

- Permission and capability checks
- Namespace validation
- In-memory metadata update
- Journal entry creation and durable commit

## Impact

**Severity:** Warning

High xattr set latency can cause:

- Slow file system operations, particularly for applications that rely heavily
on extended attributes (e.g., SELinux, NFSv4 ACLs, backup tools)
- Degraded overall CephFS performance
- Application timeouts or failures when interacting with the file system
- Increased latency for file creation and modification operations
- Stalled workflows for operations like save, chmod, backup/restore

**Typical latency ranges:**

| Condition | Expected Latency |
| --------- | ---------------- |
| Light-load LAN with SSDs | 1–10 ms |
| Busy clusters or HDD-backed metadata | 10–100+ ms |
| Under contention, failover, or WAN | 100 ms to seconds |

## Diagnosis

### Step 1: Check MDS Status and Health

Access the Ceph tools pod and check the overall cluster and MDS health:

```bash
oc rsh -n openshift-storage $(oc get pods -n openshift-storage -l app=rook-ceph-tools -o name)
```

Run the following commands:

```bash
ceph status
ceph mds stat
ceph fs status
```

Look for any warnings related to slow metadata IOs or MDS health issues.

### Step 2: Check MDS Performance Metrics

Examine the MDS operations in flight:

```bash
ceph daemon mds.<mds-name> dump_ops_in_flight
```

To find the active MDS name:

```bash
ceph fs status -f json-pretty | jq -r '.mdsmap[] | select(.state=="active") | .name'
```

### Step 3: Check MDS CPU and Memory Usage

Using the OpenShift console, go to Workloads -> Pods and select the MDS pod
(e.g., `rook-ceph-mds-ocs-storagecluster-cephfilesystem-*`). Click on the
Metrics tab to review CPU and memory usage.

Alternatively, check MDS resource usage:

```bash
oc adm top pod -n openshift-storage -l app=rook-ceph-mds
```

### Step 4: Check for Network Issues

Network latency between MDS and OSDs can cause slow metadata operations.
The client-to-MDS RPC round-trip typically adds 0.5–5 ms on LAN, but can be
significantly higher on loaded or WAN links.

Follow the steps in the
[Check Ceph Network Connectivity SOP](helpers/networkConnectivity.md)
to verify network health.

### Step 5: Check OSD Performance

Slow OSD operations can cascade into MDS latency issues. The metadata
write-to-durable-store step depends on underlying storage performance:

- NVMe: ~0.1–1 ms
- SSD: ~1–5 ms
- HDD: significantly higher

Check for slow OSD operations:

```bash
ceph health detail
ceph osd perf
```

### Step 6: Check for Lock Contention

If an inode is locked or requires cross-MDS coordination (cap flushing,
referrals, recovery), latency can increase by 10s to 100s of milliseconds
in pathological cases.

Check for blocked operations:

```bash
ceph daemon mds.<mds-name> dump_blocked_ops
```

## Mitigation

### Recommended Actions

1. **Increase MDS CPU Resources:**

If the MDS CPU usage is consistently high, increase the allocated CPU.
MDS is largely single-threaded, so higher clock speed CPUs are more
effective than additional cores:

```bash
oc patch -n openshift-storage storagecluster ocs-storagecluster \
--type merge \
--patch '{"spec": {"resources": {"mds": {"limits": {"cpu": "8"}, "requests": {"cpu": "8"}}}}}'
```
**Note:** If the above step doesn't resolve the issue,
that is the CPU usage remains high even after the above change,
request the next higher power of two (16 CPUs), and repeat as needed
(32, 64, etc.).

2. **Increase MDS Cache Memory:**

If the MDS cache is under pressure, increase the memory allocation:

```bash
oc patch -n openshift-storage storagecluster ocs-storagecluster \
--type merge \
--patch '{"spec": {"resources": {"mds": {"limits": {"memory": "8Gi"}, "requests": {"memory": "8Gi"}}}}}'
```

**Note:** ODF sets `mds_cache_memory_limit` to half of the MDS pod memory
request/limit. Setting memory to 8GB results in a 4GB cache limit.

3. **Scale Out with Multiple Active MDS:**

For high metadata workloads, consider running multiple active MDS instances
to reduce lock contention and distribute metadata operations:

```bash
oc patch -n openshift-storage storagecluster ocs-storagecluster \
--type merge \
--patch '{"spec": {"managedResources": {"cephFilesystems": {"activeMetadataServers": 2}}}}'
```

Always increase `activeMetadataServers` by 1. This is effective when
metadata load is distributed across multiple directories/PVs.

4. **Use Faster Metadata Storage:**

If the metadata pool is backed by HDDs, consider migrating to SSD or NVMe
storage for improved journal/WAL commit latency.

5. **Address Network Issues:**

If network connectivity issues are identified, escalate to the network
or infrastructure team. Optimize network by lowering RTT and ensuring
adequate bandwidth. See
[Check Ceph Network Connectivity SOP](helpers/networkConnectivity.md).

6. **Address Underlying OSD Issues:**

If OSDs are slow, investigate and resolve OSD performance problems first.
Slow OSDs directly impact MDS performance. Refer to
[CephOSDSlowOps runbook](CephOSDSlowOps.md) for guidance.

7. **Restart MDS (if stuck operations detected):**

If operations appear stuck due to internal issues, restarting the MDS may
help:

```bash
oc delete pod -n openshift-storage -l app=rook-ceph-mds
```

The pod will be automatically recreated by the operator.

If the issue persists after taking the above actions, please contact Red Hat
Support for further assistance.

## Additional Resources

- [Ceph MDS Troubleshooting](https://docs.ceph.com/en/latest/cephfs/troubleshooting/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not use upstream links in our downstream product runbooks ?

- [Red Hat Ceph Storage Troubleshooting Guide](https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/troubleshooting_guide/index)