- Title: Simple Central Logger
- Date: 2017-11-30
- Subject: Independent logging REST service
- Author: Anil (Neil) Gulati
Simple Python (3.6.4) central logging server to collect logs of errors and events from multiple locations, provide short term storage of logs and respond on a REST API to queries about stored logs.
- A single API accepts log submissions and responds to requests to provide log information.
- Two services running on the logging host are used to receive and store logger submissions quickly.
- A client module is provided for making remote log submissions.
- Service manages data clean up according to automated schedule.
The services are light and simple. The client module is barely a wrapper around existing standard Python logging.
Consists of these files:
README.md
: This file.logger_remote.py
: Client module providing call to log from remote clients.logger_httpd.py
: Direct recipient of log submissions on server.logger_collector.py
: Secondary processing of submitted logs to organise for querying.logger_resource.py
: Responds on REST API to provide query service.test_resource.py
: Tests forlogger_resource.py
.
logger_remote.py
Logging to the remote server is supported using the standard Python library logging.handlers.HTTPHandler
.
A wrapper is provided to simplify usage.
The client module POSTs url-encoded messages to the server.
Date / time is automatically recorded.
POSTs can provide arbitrary name value pairs to be included in the log record.
POSTs must provide:
- name: facility name identifying the logging source (required).
- msg: description of the error or event (required). It is recommended this message text is fixed, not containing variable parameters, which can be provided through other values.
- levelno: standard log level numbers are recommended: 20, 25, 30, 40, 50, 60, 70 (required).
- created: UTC timestamp e.g. 1514634365.9128094
import logger_remote # import logger_remote.py
# This usage will use the current module name as the facility name in the log.
logger = logger_remote.get_logger(__name__)
# ... do stuff ...
# Provide a mapping of additional arbitrary name value pairs to be logged.
record = { 'key': 'value' }
# Supply the log level number and error message text.
logger.log(level, 'error message', extra=record)
# ... do more stuff ...
# Call shutdown after use of the logger has finished.
logger_remote.shutdown()
Not yet implemented. Authenticate using basic auth over SSL or a shared secret token.
- Run HTTPS server with basic auth.
- Logging information may include user data so this is another reason to use HTTPS.
- Potentially client IP address authentication could also be applied.
Run python logger_remote.py
to generate a stream of random test messages to test the logging server.
logger_httpd.py
logger_collector.py
The logger service is managed by two separate services running on the server. These services run as two separate processes, and potentially with additional copies if required. These services accept requests to submit (and query) the logs and managed storage of logs.
All POSTs are log submissions. (All GETs are log information requests).
POST API consists of one route / resource with all parameters url-encoded.
Messages must POST to /api/v1/messages
with these parameters.
Parameters align to those generated and used by Python standard library logging module.
created
: datetime timestamp generated by the logger e.g: 1512386686.0873692.name
: Facility name (given to logger on creation), using usual identifier syntax: alphanumeric, underscore, no spaces.levelno
: The syslog logging level numeric equivalent: emerg=70 alert=60 crit=50 error=40 warn=30 notice=25 info=20 debug=10. Two decimal digits required (10 - 99).msg
: Static string description of error. Do not use embedded formatting or variables, in order to support counting of similar errors and other analysis.name/value
: Additional arbitrary name/values are allowed and will be logged (clients need to co-ordinate the same names to support analysis). This is the best way to include variable parameters.
E.g. curl -i -d 'name=facility_name' -d 'levelno=40' -d 'msg=This is the error message.' -d 'created=1512386686.123456' -d 'additional_key=additional_value' http://hostname:8080/api/v1/messages
logger_remote.py
client module POSTs messages to the URL wherelogger_httpd.py
server is listening over HTTPS.- Messages are cached by
logger_httpd.py
at one file per message using a filename of the form:YYYYMMDD-HHMMSS.uuuuuu-levelno-facility
.logger_httpd.py
immediately returns confirmation response to client. logger_collector.py
, also running on the server, repeatedly moves blocks of cached messages into a per process, per day, temporary directory, to isolate them. Directories are per process to eliminate the need for locking. Directories are per day to partition message stream into discrete days to simplify processing and clean up.logger_collector.py
then adds the isolated individual messages to the appropriate log files, one line per message. Log files are namedYYYYMMDD-levelno-facility
to partition by day, level and facility to reduce workload when querying.- The individual message files are destroyed.
logger_collector.py
performs appropriate clean up at intervals, such as expiring and deleting logs.
/srv/logger/cache/
: Primary cache contains individual message files of the formYYYYMMDD-HHMMSS.uuuuuu-LL-facility_name
(u for microseconds, L for log level). All log submissions arrive here first./srv/logger/pids/YYYYMMDD/pid_number/
: Secondary cache contains the same files from the primary cache but in batch lots for processing. All log submissions are moved into one of these sub-directories./srv/logger/logs/
: Contains all log files. Log files are namedYYYYMMDD-LL-facility_name
.
The primary cache is a single catch-all directory for receiving all events for logging as one file per event. This directory will never contain files for long as they are usually immediately moved for further processing. In the event of a partial failure the event information is retained, though, and processing can continue seamlessly after a delay.
The secondary cache provides isolation of the message files for adding to the log files in batches.
Each secondary processor (logger_collector.py
) isolates files using an atomic move to a directory named by its process number, to prevent race conditions or conflicts.
Again any temporary or partial failures are handled seamlessly. An interruption in processing can be re-continued at any time.
Separating the directories by date also supports easy clean up, and ensures the eventual re-use of process IDs is not an issue.
The log file naming scheme effectively provides simple indexing of the logs ready for querying.
Message information is appended to the appropriate log file by logger_collector.py
processes.
This requires negotiation of access to the files when there are multiple processes doing this work (not yet implemented)
but processing in batches improves efficiency and reduces wait times.
In any case, because the actual HTTP server has already responded to the client, delays here will not affect network logging response times for clients.
The maximum number of log files in the logging directory will be:
no. of days history x no. of log levels x no. of distinct facilities
E.g. 14 x 8 x 50 = 5,600
logger_httpd.py
logger_resource.py
GET requests to the log submission server are used to retrieve log information. These separate resource types are available:
- counts: Retrieve a count of how many recorded events match the criteria supplied with the request.
- ranges: TBA.
- messages: Retrieve full information for one or more recorded event or error messages matching criteria supplied.
The GET API consists of different routes / resources.
Messages must GET from /api/v1/counts
, /api/v1/ranges
or /api/v1/messages
.
Only counts are implemented so far.
All GET responses are returned in JSON.
Request parameters are supplied in order in the resource path. URL parameters are not used.
URLs are of this form:
/api/v1/<resource>/<since>/<until>/<levels>/<facility_name>/<facility_name>/...
Where:
<resource>
is one ofcounts
,ranges
,messages
.<since>
and<until>
are date/times of the form "YYYYMMDD-HHMMSS". Microsecond resolution is not supported.<levels>
are log levels always expressed as double digit numbers, either an individual level "LL" or a range "LL-MM".<facility_name>
is an individual facility name to be included. Multiple facility names can be requested.
If any of these value types are omitted the meaning is taken as "including all".
So for example, if no facility name was provided, all facilities would be intended.
Refer to logger_resource.py
.
Pagination is managed directly through the date/time ranges supplied.
The server may potentially truncate responses if too much data is requested.
Clients are expected to request data in chunks by managing since
and until
parameters.
- Provides total counts over the period requested by facility, level and message.
- When requested for durations greater than an hour counts are returned per hour.
- When requested for durations under an hour counts are returned per minute.
- For multiple messages provides the facility, level, date time and message string for all messages logged during the duration.
- OR For multiple messages provides all recorded data for all messages during the duration.
- For a single message provides all recorded data for the individual message.
- Individual messages are uniquely identified by their individual attributes such as date time, facility, and log level. No message ID is used.
- Provides an exhaustive list of all key values occurring in messages within the period requested.
- This is therefore a list of all levels, all facilities, all error messages, as well as composited key values from the additional records.
- May be an issue with clock skew when trusting timestamps from clients. Alternatively generate a server timestamp when logs received.
- Initial tests indicate logging should scale to 500 requests per second although errors within the Python logging module need to be checked.
- Support an API to allow individual users to define tags as a collection of search parameters.
- Then support providing those tags to the GET API.
- Filtering on extra parameters or message content can be provided.
- Full text search should not be complicated.
- Regular expression searches should also be possible.
- Add an automated step in the collector to remove unwanted data at a pre-determined age.
- A DELETE API could be used to support early flushing on demand.
- GET Messages.
- GET Ranges.
- Convert text/plain responses to JSON.
- Write example js web app to present stats.
- Add SSL and basic auth. Read userid/password from a file or the environment.
- More tests.
- Replace newlines with spaces in content string. Newlines will be damaging if included, just disallow by converting to spaces.
- Catch remote logging server down exception.
- Expiry of finished log files and removal from the server at automated intervals.
- Further commenting and description in README.md and doc strings.
- Manage dedicated processes per log file to make use of more cores and achieve other efficiencies if throughput needs to be increased.
- Add protection from failure to open log file errors.
- Forking to handle more requests (if required).
- Matching name/value pairs in GET requests e.g. userid=xyz.
- Default to UTC now() if created timestamp is missing.
- Catch exceptions and report sensibly.
- Inspect internal operation of logging.handlers.HTTPHandler in case of client side errors that need to be caught.
- Consider reporting server responses in general in case of error.
- Strip superfluous empty strings in facilities list generated from trailing slash in URL.
- Using a local cache on the client, with a separate process to send messages to the server would also improve reliability. This would also cover the client for when the logging server goes down.
- Additional exception detection in the collector to ensure reliable.