Title: feat: implement dead-letter queue and configurable retry policy for webhook delivery
Labels: enhancement reliability feature
Complexity: medium
Branch: feat/webhook-dead-letter-queue
Problem Context
The current webhook dispatch task retries on failure, but permanently failed deliveries are silently dropped. Operators have no visibility into which webhooks failed, why they failed, or how to replay them. For any subscriber relying on webhooks for critical data pipelines, silent drops are unacceptable.
Scope
Included:
WebhookDelivery model: records every delivery attempt (status, response code, response body, duration)
- Dead-letter queue: after N failed attempts, move to a
dead_letter state
- Manual replay: admin action to re-enqueue a dead-lettered delivery
GET /api/webhooks/{id}/deliveries/ endpoint returning delivery history
- Slack/email alert when a subscription enters dead-letter state (configurable, optional)
- Delivery retention policy: purge delivery records older than 30 days via a periodic Celery task
Not included:
- Signed webhook payloads (separate security issue)
- Per-subscriber retry configuration UI (Phase 2)
Implementation Guidelines
Files to update:
django-backend/soroscan/ingest/models.py — add WebhookDelivery model
django-backend/soroscan/ingest/tasks.py — update dispatch_webhook to record delivery; add purge_old_webhook_deliveries periodic task
django-backend/soroscan/ingest/views.py — add WebhookDeliveryViewSet
django-backend/soroscan/ingest/admin.py — register WebhookDelivery with replay action
django-backend/soroscan/ingest/serializers.py — add WebhookDeliverySerializer
django-backend/soroscan/urls.py — add delivery history route
Model sketch:
class WebhookDelivery(models.Model):
class Status(models.TextChoices):
PENDING = 'pending'
SUCCESS = 'success'
FAILED = 'failed'
DEAD_LETTER = 'dead_letter'
subscription = models.ForeignKey(
WebhookSubscription, on_delete=models.CASCADE, related_name='deliveries'
)
event = models.ForeignKey(ContractEvent, on_delete=models.SET_NULL, null=True)
status = models.CharField(max_length=16, choices=Status.choices, default=Status.PENDING)
attempt_number = models.PositiveIntegerField(default=1)
response_status_code = models.IntegerField(null=True, blank=True)
response_body = models.TextField(blank=True)
duration_ms = models.IntegerField(null=True, blank=True)
error_message = models.TextField(blank=True)
created_at = models.DateTimeField(auto_now_add=True)
class Meta:
ordering = ['-created_at']
indexes = [
models.Index(fields=['subscription', '-created_at']),
models.Index(fields=['status']),
]
Retry policy:
@app.task(bind=True, max_retries=5, default_retry_delay=60)
def dispatch_webhook(self, delivery_id: int):
delivery = WebhookDelivery.objects.get(pk=delivery_id)
try:
# ... HTTP dispatch ...
delivery.status = WebhookDelivery.Status.SUCCESS
except Exception as exc:
if self.request.retries >= self.max_retries:
delivery.status = WebhookDelivery.Status.DEAD_LETTER
delivery.save()
return
delivery.attempt_number += 1
delivery.save()
raise self.retry(exc=exc, countdown=60 * (2 ** self.request.retries))
Constraints:
- Delivery records must be created before the HTTP request, status updated after
purge_old_webhook_deliveries must run daily and delete records older than WEBHOOK_DELIVERY_RETENTION_DAYS (default: 30)
- Response body truncated to 4 KB to prevent large payloads bloating the DB
Acceptance Criteria
Title:
feat: implement dead-letter queue and configurable retry policy for webhook deliveryLabels:
enhancementreliabilityfeatureComplexity:
mediumBranch:
feat/webhook-dead-letter-queueProblem Context
The current webhook dispatch task retries on failure, but permanently failed deliveries are silently dropped. Operators have no visibility into which webhooks failed, why they failed, or how to replay them. For any subscriber relying on webhooks for critical data pipelines, silent drops are unacceptable.
Scope
Included:
WebhookDeliverymodel: records every delivery attempt (status, response code, response body, duration)dead_letterstateGET /api/webhooks/{id}/deliveries/endpoint returning delivery historyNot included:
Implementation Guidelines
Files to update:
django-backend/soroscan/ingest/models.py— addWebhookDeliverymodeldjango-backend/soroscan/ingest/tasks.py— updatedispatch_webhookto record delivery; addpurge_old_webhook_deliveriesperiodic taskdjango-backend/soroscan/ingest/views.py— addWebhookDeliveryViewSetdjango-backend/soroscan/ingest/admin.py— registerWebhookDeliverywith replay actiondjango-backend/soroscan/ingest/serializers.py— addWebhookDeliverySerializerdjango-backend/soroscan/urls.py— add delivery history routeModel sketch:
Retry policy:
Constraints:
purge_old_webhook_deliveriesmust run daily and delete records older thanWEBHOOK_DELIVERY_RETENTION_DAYS(default: 30)Acceptance Criteria
WebhookDeliverymodel with migration addedWebhookDeliveryrecord with status, response code, and durationdead_letterstatusGET /api/webhooks/{id}/deliveries/returns paginated delivery historypurge_old_webhook_deliveriestask deletes records older than 30 days