Skip to content

feat: implement core bot infrastructure and monitoring system #957

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 33 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
40a6d03
feat: implement core bot infrastructure and monitoring system
kzndotsh Jul 20, 2025
155bb58
refactor: modernize error handling and sentry integration
kzndotsh Jul 20, 2025
12c9d7d
fix: add missing get_prefix function to resolve type checking errors
kzndotsh Jul 20, 2025
9850ff4
fix: suppress type checker warnings for discord.py tasks.Loop.coro at…
kzndotsh Jul 20, 2025
cd34a6a
fix: simplify operation handling in BaseController
kzndotsh Jul 20, 2025
138719d
refactor: enhance SentryManager with operation mapping and span filte…
kzndotsh Jul 20, 2025
2828d41
refactor: streamline tracing logic in enhanced_span function
kzndotsh Jul 20, 2025
f8106d5
chore(deps): update aiohttp to version 3.12.14 and xattr to version 1…
kzndotsh Jul 20, 2025
d2bfbb0
refactor(bot.py): remove command_prefix argument from super().__init_…
kzndotsh Jul 20, 2025
ceef3d8
refactor(cog_loader.py): load cogs sequentially within priority group…
kzndotsh Jul 20, 2025
60ca11d
fix(cog_loader.py): improve error handling for missing cog paths
kzndotsh Jul 20, 2025
e2b998e
refactor(bot.py): enhance command transaction handling in Sentry inte…
kzndotsh Jul 20, 2025
3772808
fix(influxdblogger.py): improve logging and task management for Influ…
kzndotsh Jul 20, 2025
3817168
refactor(task_manager.py): enhance task management with cog unloading…
kzndotsh Jul 20, 2025
f13c30b
refactor(bot.py): add critical task registration and cog unloading cl…
kzndotsh Jul 20, 2025
51ba4d4
feat(utils): consolidate Sentry SDK usage behind SentryManager abstra…
kzndotsh Jul 20, 2025
5efa11d
refactor(utils): redesign task manager for dynamic cog-driven task re…
kzndotsh Jul 20, 2025
25365f5
feat(utils): enhance BotProtocol with runtime checking and add_cog me…
kzndotsh Jul 20, 2025
a55c50c
refactor(utils): refactor hot reload to use SentryManager and improve…
kzndotsh Jul 20, 2025
270b4e1
refactor(bot): update bot to use dynamic task discovery and remove ha…
kzndotsh Jul 20, 2025
93b9d45
refactor(handlers): update sentry handler to use SentryManager abstra…
kzndotsh Jul 20, 2025
7d833e6
feat(cogs): add get_critical_tasks method to InfluxLogger for dynamic…
kzndotsh Jul 20, 2025
b4d7d90
feat(cogs): add get_critical_tasks method to GifLimiter for dynamic t…
kzndotsh Jul 20, 2025
11c5d90
feat(cogs): add get_critical_tasks method to TempBan for dynamic task…
kzndotsh Jul 20, 2025
4394b40
feat(cogs): add get_critical_tasks method to Afk for dynamic task reg…
kzndotsh Jul 20, 2025
b22618d
fix(utils): resolve type errors in task manager with proper Protocol …
kzndotsh Jul 20, 2025
b216cee
feat(cli): add check-all command for comprehensive development valida…
kzndotsh Jul 20, 2025
e46b4a3
feat(cli): add check-all command for comprehensive development valida…
kzndotsh Jul 20, 2025
3440841
refactor(handlers): streamline command-specific tag setting in Sentry…
kzndotsh Jul 21, 2025
cf6a7bb
feat(utils): validate configuration in CogWatcher for improved reliab…
kzndotsh Jul 21, 2025
fc16eab
feat(utils): add asynchronous flush method to SentryManager
kzndotsh Jul 21, 2025
252a56b
feat(utils): add @instrumented_task decorator for task instrumentation
kzndotsh Jul 21, 2025
7b06cc9
feat(app): enhance signal handling for graceful shutdown with event l…
kzndotsh Jul 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
837 changes: 419 additions & 418 deletions poetry.lock

Large diffs are not rendered by default.

189 changes: 108 additions & 81 deletions tux/app.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,28 @@
"""TuxApp: Orchestration and lifecycle management for the Tux Discord bot."""
"""
TuxApp: Main application entrypoint and lifecycle orchestrator.

This module contains the `TuxApp` class, which serves as the primary entrypoint
for the Tux Discord bot. It is responsible for:

- **Environment Setup**: Validating configuration, initializing Sentry, and setting
up OS-level signal handlers for graceful shutdown.
- **Bot Instantiation**: Creating the instance of the `Tux` bot class with the
appropriate intents, command prefix logic, and owner IDs.
- **Lifecycle Management**: Starting the asyncio event loop and managing the
bot's main `start` and `shutdown` sequence, including handling `KeyboardInterrupt`.
"""

import asyncio
import signal
from types import FrameType
import sys

import discord
import sentry_sdk
from loguru import logger

from tux.bot import Tux
from tux.help import TuxHelp
from tux.utils.config import CONFIG
from tux.utils.env import get_current_env
from tux.utils.sentry_manager import SentryManager


async def get_prefix(bot: Tux, message: discord.Message) -> list[str]:
Expand All @@ -28,129 +39,145 @@


class TuxApp:
"""Orchestrates the startup, shutdown, and environment for the Tux bot."""
"""
Orchestrates the startup, shutdown, and environment for the Tux bot.

def __init__(self):
"""Initialize the TuxApp with no bot instance yet."""
self.bot = None
This class is not a `discord.py` cog, but rather a top-level application
runner that manages the bot's entire lifecycle from an OS perspective.
"""

def run(self) -> None:
"""Run the Tux bot application (entrypoint for CLI)."""
asyncio.run(self.start())
# --- Initialization ---

def setup_sentry(self) -> None:
"""Initialize Sentry for error monitoring and tracing."""
if not CONFIG.SENTRY_DSN:
logger.warning("No Sentry DSN configured, skipping Sentry setup")
return

logger.info("Setting up Sentry...")

try:
sentry_sdk.init(
dsn=CONFIG.SENTRY_DSN,
release=CONFIG.BOT_VERSION,
environment=get_current_env(),
enable_tracing=True,
attach_stacktrace=True,
send_default_pii=False,
traces_sample_rate=1.0,
profiles_sample_rate=1.0,
_experiments={
"enable_logs": True, # https://docs.sentry.io/platforms/python/logs/
},
)
def __init__(self):
"""Initializes the TuxApp, setting the bot instance to None initially."""
self.bot: Tux | None = None

Check warning on line 53 in tux/app.py

View check run for this annotation

Codecov / codecov/patch

tux/app.py#L53

Added line #L53 was not covered by tests

# Add additional global tags
sentry_sdk.set_tag("discord_library_version", discord.__version__)
# --- Application Lifecycle ---

logger.info(f"Sentry initialized: {sentry_sdk.is_initialized()}")
def run(self) -> None:
"""
The main synchronous entrypoint for the application.

except Exception as e:
logger.error(f"Failed to initialize Sentry: {e}")

def setup_signals(self) -> None:
"""Set up signal handlers for graceful shutdown."""
signal.signal(signal.SIGTERM, self.handle_sigterm)
signal.signal(signal.SIGINT, self.handle_sigterm)

def handle_sigterm(self, signum: int, frame: FrameType | None) -> None:
"""Handle SIGTERM/SIGINT by raising KeyboardInterrupt for graceful shutdown."""
logger.info(f"Received signal {signum}")

if sentry_sdk.is_initialized():
with sentry_sdk.push_scope() as scope:
scope.set_tag("signal.number", signum)
scope.set_tag("lifecycle.event", "termination_signal")

sentry_sdk.add_breadcrumb(
category="lifecycle",
message=f"Received termination signal {signum}",
level="info",
)
This method starts the asyncio event loop and runs the primary `start`
coroutine, effectively launching the bot.
"""
asyncio.run(self.start())

Check warning on line 64 in tux/app.py

View check run for this annotation

Codecov / codecov/patch

tux/app.py#L64

Added line #L64 was not covered by tests

raise KeyboardInterrupt
async def start(self) -> None:
"""
The main asynchronous entrypoint for the application.

def validate_config(self) -> bool:
"""Validate that all required configuration is present."""
if not CONFIG.BOT_TOKEN:
logger.critical("No bot token provided. Set DEV_BOT_TOKEN or PROD_BOT_TOKEN in your .env file.")
return False
This method orchestrates the entire bot startup sequence: setting up
Sentry and signal handlers, validating config, creating the `Tux`
instance, and connecting to Discord. It includes a robust
try/except/finally block to ensure graceful shutdown.
"""

return True
# Initialize Sentry
SentryManager.setup()

Check warning on line 77 in tux/app.py

View check run for this annotation

Codecov / codecov/patch

tux/app.py#L77

Added line #L77 was not covered by tests

async def start(self) -> None:
"""Start the Tux bot, handling setup, errors, and shutdown."""
self.setup_sentry()

self.setup_signals()
# Set up signal handlers using the event loop for cross-platform compatibility
loop = asyncio.get_event_loop()
self.setup_signals(loop)

Check warning on line 81 in tux/app.py

View check run for this annotation

Codecov / codecov/patch

tux/app.py#L80-L81

Added lines #L80 - L81 were not covered by tests

# Validate config
if not self.validate_config():
return

# Configure owner IDs, dynamically adding sysadmins if configured.
# This allows specified users to have access to sensitive commands like `eval`.
owner_ids = {CONFIG.BOT_OWNER_ID}

if CONFIG.ALLOW_SYSADMINS_EVAL:
logger.warning(
"⚠️ Eval is enabled for sysadmins, this is potentially dangerous; see settings.yml.example for more info.",
"⚠️ Eval is enabled for sysadmins, this is potentially dangerous; "
"see settings.yml.example for more info.",
)
owner_ids.update(CONFIG.SYSADMIN_IDS)

else:
logger.warning("🔒️ Eval is disabled for sysadmins; see settings.yml.example for more info.")

# Instantiate the main bot class with all necessary parameters.
self.bot = Tux(
command_prefix=get_prefix,
strip_after_prefix=True,
case_insensitive=True,
intents=discord.Intents.all(),
# owner_ids={CONFIG.BOT_OWNER_ID, *CONFIG.SYSADMIN_IDS},
owner_ids=owner_ids,
allowed_mentions=discord.AllowedMentions(everyone=False),
help_command=TuxHelp(),
activity=None,
status=discord.Status.online,
)

# Start the bot
try:
# This is the main blocking call that connects to Discord and runs the bot.
await self.bot.start(CONFIG.BOT_TOKEN, reconnect=True)

except KeyboardInterrupt:
# This is caught when the user presses Ctrl+C.
logger.info("Shutdown requested (KeyboardInterrupt)")
except Exception as e:
logger.critical(f"Bot failed to start: {e}")
await self.shutdown()

# Catch any other unexpected exception during bot runtime.
logger.critical(f"Bot failed to start or run: {e}")

Check warning on line 122 in tux/app.py

View check run for this annotation

Codecov / codecov/patch

tux/app.py#L122

Added line #L122 was not covered by tests
finally:
# Ensure that shutdown is always called to clean up resources.
await self.shutdown()

async def shutdown(self) -> None:
"""Gracefully shut down the bot and flush Sentry."""
"""
Gracefully shuts down the bot and its resources.

This involves calling the bot's internal shutdown sequence and then
flushing any remaining Sentry events to ensure all data is sent.
"""
if self.bot and not self.bot.is_closed():
await self.bot.shutdown()

if sentry_sdk.is_initialized():
sentry_sdk.flush()
await asyncio.sleep(0.1)
await SentryManager.flush_async()
await asyncio.sleep(0.1) # Brief pause to allow buffers to flush

Check warning on line 138 in tux/app.py

View check run for this annotation

Codecov / codecov/patch

tux/app.py#L137-L138

Added lines #L137 - L138 were not covered by tests

logger.info("Shutdown complete")

# --- Environment Setup ---

def setup_signals(self, loop: asyncio.AbstractEventLoop) -> None:
"""
Sets up OS-level signal handlers for graceful shutdown using the event loop for better cross-platform compatibility.

Note: loop.add_signal_handler may not be available on all platforms (e.g., Windows for some signals).
"""

def handle_sigterm() -> None:
SentryManager.report_signal(signal.SIGTERM, None)

Check warning on line 152 in tux/app.py

View check run for this annotation

Codecov / codecov/patch

tux/app.py#L151-L152

Added lines #L151 - L152 were not covered by tests

def handle_sigint() -> None:
SentryManager.report_signal(signal.SIGINT, None)

Check warning on line 155 in tux/app.py

View check run for this annotation

Codecov / codecov/patch

tux/app.py#L154-L155

Added lines #L154 - L155 were not covered by tests

try:
loop.add_signal_handler(signal.SIGTERM, handle_sigterm)
loop.add_signal_handler(signal.SIGINT, handle_sigint)
except NotImplementedError:

Check warning on line 160 in tux/app.py

View check run for this annotation

Codecov / codecov/patch

tux/app.py#L157-L160

Added lines #L157 - L160 were not covered by tests
# Fallback for platforms that do not support add_signal_handler (e.g., Windows)
signal.signal(signal.SIGINT, SentryManager.report_signal)
signal.signal(signal.SIGTERM, SentryManager.report_signal)

Check warning on line 163 in tux/app.py

View check run for this annotation

Codecov / codecov/patch

tux/app.py#L162-L163

Added lines #L162 - L163 were not covered by tests
if sys.platform.startswith("win"):
# Document limitation
logger.warning(

Check warning on line 166 in tux/app.py

View check run for this annotation

Codecov / codecov/patch

tux/app.py#L166

Added line #L166 was not covered by tests
"Warning: Signal handling is limited on Windows. Some signals may not be handled as expected.",
)

def validate_config(self) -> bool:
"""
Performs a pre-flight check for essential configuration.

Returns
-------
bool
True if the configuration is valid, False otherwise.
"""
if not CONFIG.BOT_TOKEN:
logger.critical("No bot token provided. Set DEV_BOT_TOKEN or PROD_BOT_TOKEN in your .env file.")
return False

Check warning on line 181 in tux/app.py

View check run for this annotation

Codecov / codecov/patch

tux/app.py#L180-L181

Added lines #L180 - L181 were not covered by tests

return True

Check warning on line 183 in tux/app.py

View check run for this annotation

Codecov / codecov/patch

tux/app.py#L183

Added line #L183 was not covered by tests
Loading
Loading