Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: NDJSON Support for Hawk Log Output #267

Open
jonnybottles opened this issue Feb 21, 2025 · 7 comments
Open

Feature: NDJSON Support for Hawk Log Output #267

jonnybottles opened this issue Feb 21, 2025 · 7 comments
Assignees
Labels
priority/high For important tasks status/in-progress Being worked type/feature New feature or request

Comments

@jonnybottles
Copy link
Collaborator

jonnybottles commented Feb 21, 2025

What problem would this feature solve?

Currently, Hawk outputs JSON logs in standard JSON format, which is human-readable but not optimized for SIEM ingestion. NDJSON (Newline Delimited JSON) offers performance and efficiency benefits, including:

  • Faster write speeds due to streaming each log entry as a separate JSON object.
  • Smaller file sizes due to reduced whitespace and formatting overhead.
  • Optimized ingestion for SIEMs like Splunk, ElasticSearch, Microsoft Sentinel, and others that prefer NDJSON for bulk ingestion.

The primary decision point is whether NDJSON formatting should be handled within Hawk or delegated to HawkEye, which is responsible for SIEM ingestion. This ticket tracks the discussion and potential implementation within Hawk.


Proposed Solution

Introduce NDJSON support in Hawk's logging mechanism as an optional feature. This allows users to choose between traditional JSON and NDJSON without forcing a format change.

Options for Implementation:

  1. Hawk generates NDJSON by default

    • All JSON logs are formatted as NDJSON.
    • Users must convert back to standard JSON if needed.
    • Simplifies ingestion for SIEMs but may require updates for users parsing logs manually.
  2. Hawk provides an option for output type, which can be one or multiple output types (JSON, NDJSON, CSV) output

    • Introduce a command-line switch (-OutputType) for Start-HawkUserInvestigation & Start-HawkTenantInvestigation and all public Tenant / User functions.
    • Users can choose output type on a per-investigation / per-function basis.
    • Backward-compatible and allows gradual adoption.
  3. Hawk produces both JSON and NDJSON

    • Hawk will output both JSON and CSV as usual, and also add NDJSON output.
  4. Keep NDJSON conversion in HawkEye

    • Hawk continues outputting standard JSON, and HawkEye transforms it into NDJSON for ingestion.
    • Reduces complexity in Hawk but offloads work to HawkEye.

The team should discuss which approach aligns best with Hawk’s long-term vision.


⚙️ Developer Section (For Hawk Team Members Only)

Technical Requirements

  • Modify Out-MultipleFileType.ps1 to support NDJSON output.
  • Ensure UTF-8 encoding is maintained.
  • Decide on appropriate file extension for NDJSON (.ndjson or .jsonl).
  • Maintain backward compatibility with existing scripts that process JSON.

Implementation Approach

  • Introduce a new parameter (-ndjson) to toggle NDJSON output.
  • Use ConvertTo-Json -Compress and write each object as a separate line.
  • Validate that Splunk, ELK, and other SIEMs ingest the NDJSON format correctly.
  • Update documentation to reflect changes.

Acceptance Criteria

  • Hawk successfully writes logs in NDJSON format when enabled.
  • NDJSON files are smaller and more efficiently ingested into SIEMs.
  • Users can still output traditional JSON if needed.
  • Performance benchmarks show improved write speeds and reduced memory overhead.
  • No breaking changes to existing functionality.

This ticket will remain open for discussion until the team reaches consensus on whether NDJSON should be implemented in Hawk or left to HawkEye.

@jonnybottles jonnybottles added status/backlog In backlog / validated type/feature New feature or request labels Feb 21, 2025
@Guzzy711
Copy link

Guzzy711 commented Mar 6, 2025

It's a super useful feature. We were planning to use hawk to ingest into our ELK stack, but we have to make some foo to convert to ndjson. It would be cool to have it natively.

@jonnybottles
Copy link
Collaborator Author

@Guzzy711 thank you for your feedback! In terms of implementation options, would you prefer option 1, 2, or 3? Based upon four feedback and a discussion with some of the Hawk contributors, we will look to implement this in our next minor release.

@jonnybottles
Copy link
Collaborator Author

@Guzzy711, we would also be interested in hearing any pain points, suggestions, and any feedback in general as you begin ingesting the Hawk data into ELK. Thanks again for your feedback on this ticket!

@Guzzy711
Copy link

Guzzy711 commented Mar 6, 2025

I think option 2 would be preferable for the wider community; however, it probably also requires a bit more work. :-)

@Guzzy711
Copy link

Guzzy711 commented Mar 6, 2025

@Guzzy711, we would also be interested in hearing any pain points, suggestions, and any feedback in general as you begin ingesting the Hawk data into ELK. Thanks again for your feedback on this ticket!

For sure! Will definitely let you know. 👍🏽

@Guzzy711
Copy link

Guzzy711 commented Mar 6, 2025

Maybe you can get inspired by the following to do the conversion: https://www.blackhillsinfosec.com/wrangling-the-m365-ual-part-3-of-3/

@jonnybottles jonnybottles self-assigned this Mar 7, 2025
@jonnybottles
Copy link
Collaborator Author

@Guzzy711 , we are rolling with option 2. Starting some work on it this weekend!

@jonnybottles jonnybottles added priority/high For important tasks status/in-progress Being worked and removed status/backlog In backlog / validated labels Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/high For important tasks status/in-progress Being worked type/feature New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants