Skip to content

Conversation

helmut-hoffer-von-ankershoffen
Copy link
Contributor

Add CLI command aignostics system online to check connectivity with configurable caching.

Changes

  • Add is_online() method to system service with TTL-based caching
  • Add online_cache_ttl setting (default 60s) to system settings
  • Add CLI command that exits with 0 (green) or 1 (red)
  • Add comprehensive tests for caching and CLI behavior

Resolves #150

Generated with Claude Code

- Add is_online() method to system service that checks connectivity
  using api.ipify.org and caches results for configurable TTL
- Add online_cache_ttl setting (default 60 seconds) to system settings
- Add 'aignostics system online' CLI command that exits with code 0
  if online (green) or 1 if offline (red)
- Add comprehensive tests for caching behavior, TTL expiration,
  and CLI command with both online and offline scenarios

Resolves #150

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Helmut Hoffer von Ankershoffen né Oertel <[email protected]>
Copy link

claude bot commented Oct 6, 2025

PR Review: Add online detection CLI command with caching

Summary

This PR adds a useful aignostics system online command with intelligent caching. The implementation is well-structured with comprehensive test coverage. Below is my detailed review.

✅ Strengths

  1. Excellent test coverage - 138 lines of new tests with comprehensive scenarios including caching, TTL expiration, and edge cases
  2. Good separation of concerns - Service logic separated from CLI presentation layer
  3. Configurable caching - TTL-based caching with sensible 60s default and environment variable override
  4. Clear documentation - Docstrings explain exit codes and behavior
  5. Follows module conventions - Consistent with existing system module patterns

🔍 Code Quality Issues

1. Thread Safety Concern - Instance Variable Cache (Critical)

Location: src/aignostics/system/_service.py:76

_online_cache: tuple[bool, float] | None = None  # (is_online, timestamp)

Issue: The cache is stored as a class-level instance variable, which means:

  • Multiple Service() instances will have independent caches
  • Each call creates a new service instance in the CLI: _service = Service() (line 73 of _service.py shows instantiation pattern)
  • The cache will never be reused between CLI invocations

Impact: The caching mechanism won't work as intended in the CLI - every command invocation creates a new Service instance with _online_cache = None.

Recommendation: Use one of these approaches:

  • Option A: Class-level variable with @classmethod
  • Option B: Module-level cache variable
  • Option C: File-based cache with timestamp (like token caching in platform module)

Example fix (Option A):

class Service(BaseService):
    """System service."""
    
    _online_cache: ClassVar[tuple[bool, float] | None] = None  # Class-level
    
    @classmethod
    def _get_cached_status(cls) -> tuple[bool, float] | None:
        return cls._online_cache
    
    @classmethod
    def _set_cached_status(cls, status: bool, timestamp: float) -> None:
        cls._online_cache = (status, timestamp)

2. Missing Import for sys.exit()

Location: src/aignostics/system/_cli.py:77-80

sys.exit(0)  # sys not imported at top

Issue: The code uses sys.exit() but I don't see sys imported in the visible diff context. Need to verify import exists.

Verification needed: Check if import sys is present at the top of _cli.py.

3. Logger Not Imported in Tests

Location: Tests use mock.patch.object on _determine_network_health without verifying logger is available

Recommendation: Ensure silent_logging fixture is used where debug logs are generated, or mock the logger.

🎯 Design Considerations

4. Dependency on External Service

The online check depends on api.ipify.org being available. Consider:

  • What if ipify is down but the internet is accessible?
  • Should we have a fallback endpoint?

Recommendation: Document this dependency clearly or consider checking multiple endpoints with fallback logic.

5. Cache TTL = 0 Edge Case

Location: src/aignostics/system/_service.py:170

if cache_age < self._settings.online_cache_ttl:

Observation: When TTL=0, the condition cache_age < 0 will never be true (cache_age ≥ 0), so cache is effectively disabled. This is good, but worth documenting explicitly.

Test coverage: ✅ Already covered in test_is_online_zero_ttl_disables_cache

🧪 Test Quality

Excellent Coverage:

  • ✅ Cache hit within TTL
  • ✅ Cache expiration after TTL
  • ✅ Different status caching
  • ✅ Custom TTL values
  • ✅ Zero TTL (cache disabled)
  • ✅ CLI exit codes
  • ✅ Mock-based testing avoiding network calls

Minor Suggestion:

Consider adding a test for concurrent access to verify thread safety once the cache sharing issue is fixed.

🔒 Security

No security concerns identified. The feature:

  • ✅ Only performs outbound HTTP GET requests
  • ✅ No user input processing
  • ✅ No secrets or sensitive data handling

⚡ Performance

  • ✅ Caching prevents excessive network calls
  • ✅ 60s default TTL is reasonable for online status checks
  • ✅ 5s timeout on network requests (from NETWORK_TIMEOUT constant)

📋 Recommendations Summary

Must Fix (Blocking):

  1. Fix cache persistence - Current implementation won't cache across CLI invocations

Should Fix:

  1. Verify sys import exists in _cli.py
  2. Consider fallback endpoints for reliability

Nice to Have:

  1. Document dependency on api.ipify.org
  2. Add thread safety test once cache is shared

📊 Overall Assessment

Code Quality: ⭐⭐⭐⭐☆ (4/5)
Test Coverage: ⭐⭐⭐⭐⭐ (5/5)
Documentation: ⭐⭐⭐⭐☆ (4/5)

The PR is well-implemented with excellent tests, but has one critical issue with cache persistence that should be addressed before merging. Once the cache sharing is fixed, this will be a solid addition to the SDK.


🤖 Generated by Claude Code

Copy link

codecov bot commented Oct 6, 2025

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
296 1 295 12
View the top 1 failed test(s) by shortest run time
tests.aignostics.system.service_test::test_is_online_cache_status_change
Stack Traces | 0.005s run time
@pytest.mark.timeout(15)
    def test_is_online_cache_status_change() -> None:
        """Test that is_online correctly caches different statuses."""
        with mock.patch.dict(os.environ, {"AIGNOSTICS_SYSTEM_ONLINE_CACHE_TTL": "60"}):
            service = Service()
    
            # First check: network is UP
            with mock.patch.object(
                service, "_determine_network_health", return_value=Health(status=Health.Code.UP)
            ):
                assert service.is_online() is True
    
            # Second check within TTL: should return cached True (even though we're mocking DOWN)
            with mock.patch.object(
>               service, "_determine_network_health", return_value=Health(status=Health.Code.DOWN)
                                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
            ):
E           pydantic_core._pydantic_core.ValidationError: 1 validation error for Health
E             Value error, Health DOWN must have a reason [type=value_error, input_value={'status': <_HealthStatus.DOWN: 'DOWN'>}, input_type=dict]
E               For further information visit https://errors.pydantic.dev/2.11/v/value_error

.../aignostics/system/service_test.py:432: ValidationError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Copy link

sonarqubecloud bot commented Oct 6, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PR with online detection

1 participant