Skip to content

Conversation

@thanhtung0201
Copy link

@thanhtung0201 thanhtung0201 commented May 17, 2025

Add system health MCP Server

Description

A robust server monitoring system built on the Multi-Channel Protocol (MCP) framework, designed for seamless integration with Claude and other AI assistants.

Motivation and Context

MCP System Health Monitoring provides real-time health and performance metrics for remote Linux servers. It establishes SSH connections to collect system metrics including CPU usage, memory utilization, disk space, network statistics, security metrics, and more.
Comprehensive Metrics Collection: CPU, memory, disk, network, security metrics, and more
Real-time Monitoring: Live system status checks and performance insights
Multi-Server Support: Monitor multiple servers from a single MCP instance
Threshold-based Alerts: Automatic detection of critical system conditions
SSH Connection Management: Efficient connection pooling and reuse
Security-focused: Monitor for failed login attempts, suspicious processes, and security updates
MCP Integration: Ready for AI assistant interaction via the MCP protocol

How Has This Been Tested?

Tested directly with Claude Desktop
Untitled

Breaking Changes

This is new feature

Types of changes

  • New feature (non-breaking change which adds functionality)

Checklist

  • I have read the MCP Protocol Documentation
  • My changes follows MCP security best practices
  • I have updated the server's README accordingly
  • I have tested this with an LLM client
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have documented all environment variables and configuration options

Additional context

The MCP server exposes the following tools:

system_status: General system status information
cpu_metrics: Detailed CPU metrics
memory_metrics: Memory usage and swap statistics
disk_metrics: Disk usage for all or specific mount points
network_metrics: Network interface statistics
security_metrics: Security-related metrics
process_list: List of top CPU-consuming processes
system_alerts: Current alerts based on threshold violations
health_summary: Comprehensive health summary
The system provides automatic alerts based on these default thresholds:

CPU:

Critical: Usage ≥ 90%
Warning: Usage ≥ 80%
Warning: Load average > 1.5 × core count
Warning: I/O wait > 20%
Memory:

Critical: Usage ≥ 95%
Warning: Usage ≥ 85%
Warning: Swap usage ≥ 80%
Warning: Free memory < 1GB (on systems with ≥ 2GB)
Disk:

Critical: Usage ≥ 95%
Warning: Usage ≥ 85%
Warning: Free space < 1GB (on disks ≥ 10GB)
Warning: Inode usage ≥ 90%
Warning: Disk I/O utilization > 80%
Security:

Warning: Failed logins > 10
Critical: Security updates > 5
Warning: Security updates ≥ 1
Warning: System not updated for ≥ 30 days
Critical: Suspicious processes detected
Warning: Unusual ports open

@thanhtung0201 thanhtung0201 changed the title Update README.md Update README.md with description May 17, 2025
@olaservo
Copy link
Member

olaservo commented Jun 9, 2025

Thanks for your contribution to the servers list. This has been merged in this combined PR: #2007

This is a new process we're trying out, so if you see any issues feel free to re-open the PR and tag me.

@olaservo olaservo closed this Jun 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants