A Python module for parsing HTTP/1.x traffic from pcap files using tshark.
- Parse HTTP/1.x requests and responses from pcap files
- TLS decryption support via keylog files
- Automatic decompression (gzip, brotli, deflate)
- JSON parsing utilities
- Stream filtering and searching
- Transaction matching (request-response pairs)
- Python 3.8+
- tshark (Wireshark command-line tool)
- PyYAML
Optional:
- brotli (for Brotli decompression)
pip install pyyaml
# Optional: pip install brotlifrom http_parser import HTTPCapture
# Create a capture (with optional TLS decryption)
capture = HTTPCapture(
'/path/to/capture.pcap',
keylog_file='/path/to/keylog.txt'
)
# Iterate over all transactions
for tx in capture:
print(f"{tx.method} {tx.url} -> {tx.status}")for tx in capture:
if tx.request:
req = tx.request
# Access request properties
print(f"Method: {req.method}")
print(f"URL: {req.url}")
print(f"Path: {req.path}")
print(f"Host: {req.host}")
print(f"Query params: {req.query_params}")
# Access headers
print(f"Content-Type: {req.content_type}")
auth = req.get_header('authorization')
# Access body
if req.is_json:
data = req.json()
else:
text = req.body_textfor tx in capture:
if tx.response:
resp = tx.response
# Access response properties
print(f"Status: {resp.status} {resp.status_text}")
print(f"Content-Type: {resp.content_type}")
# Body is automatically decompressed
if resp.is_json:
data = resp.json()
print(data)
elif resp.is_text:
print(resp.body_text)# Filter by method
for tx in capture.filter(method='POST'):
print(tx.url)
# Filter by status
for tx in capture.filter(status=200):
print(tx.url)
# Filter by status range
for tx in capture.filter(status_range=(400, 499)):
print(f"{tx.status} {tx.url}")
# Filter by host
for tx in capture.filter(host='api.example.com'):
print(tx.url)
# Filter by path
for tx in capture.filter(path_contains='/api/'):
print(tx.url)
# Combined filters
for tx in capture.filter(method='POST', status=201):
print(tx.url)# Find by URL pattern (regex)
transactions = capture.get_by_url(r'/api/v\d+/users')
for tx in transactions:
print(tx.url)
# Get specific stream
tx = capture.get_transaction(tcp_stream=13)
print(f"{tx.method} {tx.url}")# Print summary of all transactions
print(capture.summary())
# Debug raw packets for a stream
print(capture.dump_stream(tcp_stream=13))from http_parser import HTTPStreamParser
# Parse from tshark YAML output
parser = HTTPStreamParser()
# From file
tx = parser.parse_yaml_file('/path/to/stream.yaml')
# From string
tx = parser.parse_yaml(yaml_content)
# Debug raw packets
print(parser.dump_packets(yaml_content))Main interface for working with pcap files.
pcap_file: Path to the pcap filekeylog_file: Optional path to TLS keylog filetshark_path: Path to tshark executable (default: "tshark")
discover_streams(): Find all TCP streams with HTTP trafficget_transaction(tcp_stream): Get transaction for a specific streamfilter(...): Filter transactions by criteriaget_by_url(pattern): Find transactions matching URL regexsummary(): Get text summary of all transactionsdump_stream(tcp_stream): Get raw packet dump for debugging
Represents an HTTP/1.x request.
method: HTTP method (GET, POST, etc.)path: Request path including query stringversion: HTTP version (e.g., "HTTP/1.1")headers: Dict of headers (lowercase keys)body: Raw body bytesurl: Full URL (scheme://host:port/path)host: Host from Host headerport: Port numberscheme: "http" or "https"path_only: Path without query stringquery_string: Query string portionquery_params: Parsed query parameterscontent_type: Content-Type headeris_json: True if JSON content typebody_text: Body as stringjson(): Parse body as JSON
Represents an HTTP/1.x response.
status: Status code (e.g., 200)status_text: Status text (e.g., "OK")version: HTTP versionheaders: Dict of headers (lowercase keys)body: Raw body bytesok: True if status is 2xxcontent_type: Content-Type headercontent_encoding: Content-Encoding headeris_json: True if JSON content typeis_html: True if HTML content typeis_text: True if text-based contentdecompressed_body: Body after decompressionbody_text: Body as string (decompressed)json(): Parse body as JSON
Represents a request-response pair.
tcp_stream: TCP stream numberrequest: HTTPRequest object (or None)response: HTTPResponse object (or None)url: Shortcut to request.urlmethod: Shortcut to request.methodstatus: Shortcut to response.statusduration_ms: Time from request to responsecomplete: True if both request and response present
HTTP/1.x differs from HTTP/2 in several ways:
- No stream multiplexing: HTTP/1.x uses one TCP connection per request (or sequential requests with keep-alive)
- No pseudo-headers: Uses standard
Hostheader instead of:authority, etc. - Text-based protocol: Request/response lines are human-readable
- No header compression: Headers are plain text
This module handles these differences transparently, providing a similar API to the http2_parser module.
from http_parser import HTTPCapture
capture = HTTPCapture('traffic.pcap', keylog_file='keylog.txt')
# Find all JSON API responses
for tx in capture.filter(content_type='json', method='POST'):
if tx.response and tx.response.ok:
print(f"\n{tx.url}")
print(tx.response.json())from http_parser import HTTPCapture
capture = HTTPCapture('traffic.pcap')
for tx in capture:
if tx.request:
print(f"\n{tx.method} {tx.url}")
for name, value in tx.request.headers.items():
print(f" {name}: {value}")