Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(decoder): Add support for Structured CLP IR streams but without log-level filtering. #85

Merged
merged 10 commits into from
Nov 15, 2024
10 changes: 5 additions & 5 deletions src/services/LogFileManager/index.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/* eslint max-lines: ["error", 400] */
import {
Decoder,
DecoderOptionsType,
DecoderOptions,
} from "../../typings/decoders";
import {MAX_V8_STRING_LENGTH} from "../../typings/js";
import {LogLevelFilter} from "../../typings/logs";
Expand Down Expand Up @@ -110,7 +110,7 @@ class LogFileManager {
static async create (
fileSrc: FileSrcType,
pageSize: number,
decoderOptions: DecoderOptionsType,
decoderOptions: DecoderOptions,
onQueryResults: (queryResults: QueryResults) => void,
): Promise<LogFileManager> {
const {fileName, fileData} = await loadFile(fileSrc);
Expand Down Expand Up @@ -138,13 +138,13 @@ class LogFileManager {
static async #initDecoder (
fileName: string,
fileData: Uint8Array,
decoderOptions: DecoderOptionsType
decoderOptions: DecoderOptions
): Promise<Decoder> {
let decoder: Decoder;
if (fileName.endsWith(".jsonl")) {
decoder = new JsonlDecoder(fileData, decoderOptions);
} else if (fileName.endsWith(".clp.zst")) {
decoder = await ClpIrDecoder.create(fileData);
decoder = await ClpIrDecoder.create(fileData, decoderOptions);
} else {
throw new Error(`No decoder supports ${fileName}`);
}
Expand All @@ -161,7 +161,7 @@ class LogFileManager {
/* Sets any formatter options that exist in the decoder's options.
* @param options
*/
setFormatterOptions (options: DecoderOptionsType) {
setFormatterOptions (options: DecoderOptions) {
this.#decoder.setFormatterOptions(options);
}

Expand Down
2 changes: 2 additions & 0 deletions src/services/MainWorker.ts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear to me why we need this bigint support in this PR? It seems to work with IRv1 stream reader?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two reasons:

  1. We enabled link flag -sWASM_BIGINT when compiling clp-ffi-js: https://github.com/y-scope/clp-ffi-js/blob/450e8f84f17ec233765f1e79e60984cd8085076e/CMakeLists.txt#L72
  2. The timestamps returned by reader.decodeRange() are indeed BigInt. See TS type of timestamp in decodeRange returned results should be BigInt instead of Number. clp-ffi-js#33 . Also:
    1. In IRv1, timestamps are of type int64_t which maps to BigInt by Emscripten thanks to the above flag; however, we previously have never processed the returned timestamps with Dayjs / at all so Dayjs's inability to deal with BigInt timestamps without extensions was not discovered.
    2. In IRv2, integers are of type int64_t. At the moment, we read all timestamp node values as integers and return directly without any processing.

Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import dayjs from "dayjs";
import dayjsBigIntSupport from "dayjs/plugin/bigIntSupport";
import dayjsTimezone from "dayjs/plugin/timezone";
import dayjsUtc from "dayjs/plugin/utc";

Expand All @@ -17,6 +18,7 @@ import LogFileManager from "./LogFileManager";
/* eslint-disable import/no-named-as-default-member */
dayjs.extend(dayjsUtc);
dayjs.extend(dayjsTimezone);
dayjs.extend(dayjsBigIntSupport);
/* eslint-enable import/no-named-as-default-member */

/**
Expand Down
76 changes: 67 additions & 9 deletions src/services/decoders/ClpIrDecoder.ts
Original file line number Diff line number Diff line change
@@ -1,32 +1,61 @@
import clpFfiJsModuleInit, {ClpIrStreamReader} from "clp-ffi-js";

import clpFfiJsModuleInit, {ClpStreamReader} from "../../../deps/ClpFfiJs-worker";

Check failure on line 1 in src/services/decoders/ClpIrDecoder.ts

View workflow job for this annotation

GitHub Actions / lint-check

Unable to resolve path to module '../../../deps/ClpFfiJs-worker'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need day js for other suggestion

Suggested change
import clpFfiJsModuleInit, {ClpStreamReader} from "../../../deps/ClpFfiJs-worker";
import {Dayjs} from "dayjs";
import clpFfiJsModuleInit, {ClpStreamReader} from "../../../deps/ClpFfiJs-worker";

import {Nullable} from "../../typings/common";
import {
ClpIrDecoderOptions,
Decoder,
DecodeResultType,
FilteredLogEventMap,
LogEventCount,
} from "../../typings/decoders";
import {LogLevelFilter} from "../../typings/logs";
import {JsonObject} from "../../typings/js";
import {
LOG_LEVEL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need log event for another suggestion.

Suggested change
LOG_LEVEL,
LOG_LEVEL,
LogEvent,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a way to avoid the usage of ts-expect-error:

            // `this.#formatter` has been null-checked at the method entry.
            // eslint-disable-next-line @typescript-eslint/no-non-null-assertion
            result[0] = this.#formatter!.formatLogEvent({
                fields: fields,
                level: level,
                timestamp: dayJsTimestamp,
            });

so we can still inline the argument object creation with the type checks in effect, which looks more concise in my opinion. Let me know if you prefer to have variable definition separated.

LogLevelFilter,
} from "../../typings/logs";
import LogbackFormatter from "../formatters/LogbackFormatter";
import {
convertToDayjsTimestamp,
isJsonObject,
} from "./JsonlDecoder/utils";


class ClpIrDecoder implements Decoder {
#streamReader: ClpIrStreamReader;
#streamReader: ClpStreamReader;

#formatter: Nullable<LogbackFormatter>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a formatter interface might as well use it

Suggested change
#formatter: Nullable<LogbackFormatter>;
#formatter: Nullable<Formatter>;


#logLevelKey: Nullable<string>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#logLevelKey: Nullable<string>;


constructor (streamReader: ClpIrStreamReader) {
constructor (streamReader: ClpStreamReader, formatter: Nullable<LogbackFormatter>, logLevelKey: Nullable<string>) {

Check warning on line 29 in src/services/decoders/ClpIrDecoder.ts

View workflow job for this annotation

GitHub Actions / lint-check

This line has a length of 119. Maximum allowed is 100
Copy link
Contributor

@davemarco davemarco Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
constructor (streamReader: ClpStreamReader, formatter: Nullable<LogbackFormatter>, logLevelKey: Nullable<string>) {
constructor (streamReader: ClpStreamReader, formatter: Nullable<Formatter>) {

this.#streamReader = streamReader;

Check failure on line 30 in src/services/decoders/ClpIrDecoder.ts

View workflow job for this annotation

GitHub Actions / lint-check

Unsafe assignment of an error typed value
this.#formatter = formatter;
this.#logLevelKey = logLevelKey;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.#logLevelKey = logLevelKey;

}

/**
* Creates a new ClpIrDecoder instance.
*
* @param dataArray The input data array to be passed to the decoder.
* @param decoderOptions
* @return The created ClpIrDecoder instance.
*/
static async create (dataArray: Uint8Array): Promise<ClpIrDecoder> {
static async create (
dataArray: Uint8Array,
decoderOptions: ClpIrDecoderOptions
): Promise<ClpIrDecoder> {
const module = await clpFfiJsModuleInit();

Check failure on line 46 in src/services/decoders/ClpIrDecoder.ts

View workflow job for this annotation

GitHub Actions / lint-check

Unsafe assignment of an error typed value

Check failure on line 46 in src/services/decoders/ClpIrDecoder.ts

View workflow job for this annotation

GitHub Actions / lint-check

Unsafe call of an `error` type typed value
const streamReader = new module.ClpIrStreamReader(dataArray);
return new ClpIrDecoder(streamReader);
const streamReader = new module.ClpStreamReader(dataArray, decoderOptions);

Check failure on line 47 in src/services/decoders/ClpIrDecoder.ts

View workflow job for this annotation

GitHub Actions / lint-check

Unsafe assignment of an error typed value

Check failure on line 47 in src/services/decoders/ClpIrDecoder.ts

View workflow job for this annotation

GitHub Actions / lint-check

Unsafe construction of an any type value

Check failure on line 47 in src/services/decoders/ClpIrDecoder.ts

View workflow job for this annotation

GitHub Actions / lint-check

Unsafe member access .ClpStreamReader on an `error` typed value

let formatter: Nullable<LogbackFormatter> = null;
let logLevelKey: Nullable<string> = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let logLevelKey: Nullable<string> = null;

if (
module.IRProtocolErrorCode.SUPPORTED === streamReader.getIrProtocolErrorCode()

Check failure on line 52 in src/services/decoders/ClpIrDecoder.ts

View workflow job for this annotation

GitHub Actions / lint-check

Unsafe member access .IRProtocolErrorCode on an `error` typed value

Check failure on line 52 in src/services/decoders/ClpIrDecoder.ts

View workflow job for this annotation

GitHub Actions / lint-check

Unsafe call of an `error` type typed value

Check failure on line 52 in src/services/decoders/ClpIrDecoder.ts

View workflow job for this annotation

GitHub Actions / lint-check

Unsafe member access .getIrProtocolErrorCode on an `error` typed value
) {
formatter = new LogbackFormatter({formatString: decoderOptions.formatString});
({logLevelKey} = decoderOptions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
({logLevelKey} = decoderOptions);

}

return new ClpIrDecoder(streamReader, formatter, logLevelKey);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return new ClpIrDecoder(streamReader, formatter, logLevelKey);
return new ClpIrDecoder(streamReader, formatter);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one streamType argument to make the handling more obvious.

}

getEstimatedNumEvents (): number {
Expand Down Expand Up @@ -60,7 +89,36 @@
endIdx: number,
useFilter: boolean
): Nullable<DecodeResultType[]> {
return this.#streamReader.decodeRange(beginIdx, endIdx, useFilter);
const results = this.#streamReader.decodeRange(beginIdx, endIdx, useFilter);
if (null === this.#formatter) {
return results;
}
results.forEach((result) => {
const [message, timestamp] = result;
let fields: JsonObject = {};
try {
fields = JSON.parse(message) as JsonObject;
if (false === isJsonObject(fields)) {
throw new Error("Unexpected non-object.");
} else if (null !== this.#logLevelKey) {
const logLevel = fields[this.#logLevelKey];
fields[this.#logLevelKey] = LOG_LEVEL[logLevel as LOG_LEVEL];
}
} catch (e) {
if (0 === message.length) {
return;
}
console.error(e, message);
}

// @ts-expect-error `this.#formatter` is certainly non-null
result[0] = this.#formatter.formatLogEvent({
fields: fields,
timestamp: convertToDayjsTimestamp(timestamp),
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a simpler version. I feel like we dont need to get the log level since it should already be in the result. I also dont think we need to check if the length is 0, since i dont think clp-ffi-js would return length 0. I think worst case it will return {} if the log event is empty.

Suggested change
results.forEach((result) => {
const [message, timestamp] = result;
let fields: JsonObject = {};
try {
fields = JSON.parse(message) as JsonObject;
if (false === isJsonObject(fields)) {
throw new Error("Unexpected non-object.");
} else if (null !== this.#logLevelKey) {
const logLevel = fields[this.#logLevelKey];
fields[this.#logLevelKey] = LOG_LEVEL[logLevel as LOG_LEVEL];
}
} catch (e) {
if (0 === message.length) {
return;
}
console.error(e, message);
}
// @ts-expect-error `this.#formatter` is certainly non-null
result[0] = this.#formatter.formatLogEvent({
fields: fields,
timestamp: convertToDayjsTimestamp(timestamp),
});
});
for (const result of results) {
const [message, timestamp, level] = result;
let fields: JsonObject = {};
try {
fields = JSON.parse(message) as JsonObject;
if (false === isJsonObject(fields)) {
throw new Error("Unexpected non-object.");
}
} catch (e) {
console.error(e, message);
}
const dayJsTimestamp: Dayjs = convertToDayjsTimestamp(timestamp);
const logEvent: LogEvent = {
timestamp: dayJsTimestamp,
level: level,
fields: fields
}
result[0] = this.#formatter.formatLogEvent(logEvent);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we dont need to get the log level since it should already be in the result.

Right. For now we shall store log level strings as authoritative log levels in the logging library.

I also dont think we need to check if the length is 0, since i dont think clp-ffi-js would return length 0. I think worst case it will return {} if the log event is empty.

Good catch


return results;
}
}

Expand Down
8 changes: 4 additions & 4 deletions src/services/decoders/JsonlDecoder/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import {
Decoder,
DecodeResultType,
FilteredLogEventMap,
JsonlDecoderOptionsType,
JsonlDecoderOptions,
LogEventCount,
} from "../../../typings/decoders";
import {Formatter} from "../../../typings/formatters";
Expand All @@ -25,7 +25,7 @@ import {


/**
* A decoder for JSONL (JSON lines) files that contain log events. See `JsonlDecoderOptionsType` for
* A decoder for JSONL (JSON lines) files that contain log events. See `JsonlDecoderOptions` for
* properties that are specific to log events (compared to generic JSON records).
*/
class JsonlDecoder implements Decoder {
Expand All @@ -49,7 +49,7 @@ class JsonlDecoder implements Decoder {
* @param dataArray
* @param decoderOptions
*/
constructor (dataArray: Uint8Array, decoderOptions: JsonlDecoderOptionsType) {
constructor (dataArray: Uint8Array, decoderOptions: JsonlDecoderOptions) {
this.#dataArray = dataArray;
this.#logLevelKey = decoderOptions.logLevelKey;
this.#timestampKey = decoderOptions.timestampKey;
Expand Down Expand Up @@ -81,7 +81,7 @@ class JsonlDecoder implements Decoder {
};
}

setFormatterOptions (options: JsonlDecoderOptionsType): boolean {
setFormatterOptions (options: JsonlDecoderOptions): boolean {
this.#formatter = new LogbackFormatter({formatString: options.formatString});

return true;
Expand Down
5 changes: 3 additions & 2 deletions src/services/decoders/JsonlDecoder/utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -53,12 +53,13 @@ const convertToLogLevelValue = (field: JsonValue | undefined): LOG_LEVEL => {
* - the timestamp's value is an unsupported type.
* - the timestamp's value is not a valid dayjs timestamp.
*/
const convertToDayjsTimestamp = (field: JsonValue | undefined): dayjs.Dayjs => {
const convertToDayjsTimestamp = (field: JsonValue | bigint | undefined): dayjs.Dayjs => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi i just noticed i dont think json.parse will work with big int? Like it will throw exception since it is not a json value? How do you feel about using this? There is also this hack i saw on github. I feel like in practice we won't see these weird timestamps, but wtv u think is best.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also how do u feel about this file to utils? Since it is shared by both IR and Json now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right that's an issue... Just to understand why we need the mitigation, where are we using json.parse with BigInts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially also in json decoder if logs have an bigint?

Copy link
Member Author

@junhaoliao junhaoliao Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omg...

  1. nlohmann/json acknowledges there are implementation whose number types can store up to 2^53-1 : https://json.nlohmann.me/features/types/number_handling/#number-interoperability
  2. yet it does serialize numbers out of that range, imo for good reasons though: https://github.com/nlohmann/json/blob/4a602df34e71afca2242b34c743d4cc2486cd071/tests/src/unit-serialization.cpp#L168-L183

funny finding: https://www.github.com/nlohmann/json/issues/1708 -> https://www.github.com/nlohmann/json/pull/1722

Let's discuss the issue with the clp core people before we make changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, calling JSON.parse() with any clp-ffi-js-returned JSON string containing number in +/- 2^53-1 ~ 2^64-1 would encounter loss of precision, but luckily nothing throws.
Let's address the loss of precision issue in another PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be tracked at #116

// If the field is an invalid type, then set the timestamp to `INVALID_TIMESTAMP_VALUE`.
// NOTE: dayjs surprisingly thinks `undefined` is a valid date. See
// https://day.js.org/docs/en/parse/now#docsNav
if (("string" !== typeof field &&
"number" !== typeof field) ||
"number" !== typeof field &&
"bigint" !== typeof field) ||
"undefined" === typeof field
) {
// `INVALID_TIMESTAMP_VALUE` is a valid dayjs date. Another potential option is
Expand Down
4 changes: 2 additions & 2 deletions src/typings/config.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import {JsonlDecoderOptionsType} from "./decoders";
import {JsonlDecoderOptions} from "./decoders";
import {TAB_NAME} from "./tab";


Expand Down Expand Up @@ -27,7 +27,7 @@ enum LOCAL_STORAGE_KEY {
/* eslint-enable @typescript-eslint/prefer-literal-enum-member */

type ConfigMap = {
[CONFIG_KEY.DECODER_OPTIONS]: JsonlDecoderOptionsType,
[CONFIG_KEY.DECODER_OPTIONS]: JsonlDecoderOptions,
[CONFIG_KEY.INITIAL_TAB_NAME]: TAB_NAME,
[CONFIG_KEY.THEME]: THEME_NAME,
[CONFIG_KEY.PAGE_SIZE]: number,
Expand Down
23 changes: 18 additions & 5 deletions src/typings/decoders.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,25 @@ interface LogEventCount {
* @property logLevelKey The key of the kv-pair that contains the log level in every record.
* @property timestampKey The key of the kv-pair that contains the timestamp in every record.
*/
interface JsonlDecoderOptionsType {
interface JsonlDecoderOptions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about just naming this DecoderOptions and get rid of JsonlDecoderOptions, StructuredClpIrDecoderOptions, ClpIrDecoderOptions. I feel like we have all these extra types that serve no purpose. Maybe if we actually need different types in the future can add back?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

formatString: string,
logLevelKey: string,
timestampKey: string,
}

type DecoderOptionsType = JsonlDecoderOptionsType;
/**
* Options for the Structured CLP IR decoder.
*/
interface StructuredClpIrDecoderOptions extends JsonlDecoderOptions {
}

/**
* Options for the CLP IR decoder. Currently, those options are only effective if the stream is
* Structured IR.
*/
type ClpIrDecoderOptions = StructuredClpIrDecoderOptions;

type DecoderOptions = JsonlDecoderOptions | ClpIrDecoderOptions;

/**
* Type of the decoded log event. We use an array rather than object so that it's easier to return
Expand Down Expand Up @@ -85,7 +97,7 @@ interface Decoder {
* @param options
* @return Whether the options were successfully set.
*/
setFormatterOptions(options: DecoderOptionsType): boolean;
setFormatterOptions(options: DecoderOptions): boolean;

/**
* Decodes log events in the range `[beginIdx, endIdx)` of the filtered or unfiltered
Expand All @@ -106,10 +118,11 @@ interface Decoder {

export type {
ActiveLogCollectionEventIdx,
ClpIrDecoderOptions,
Decoder,
DecodeResultType,
DecoderOptionsType,
DecoderOptions,
FilteredLogEventMap,
JsonlDecoderOptionsType,
JsonlDecoderOptions,
LogEventCount,
};
4 changes: 2 additions & 2 deletions src/typings/worker.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import {Nullable} from "./common";
import {
ActiveLogCollectionEventIdx,
DecoderOptionsType,
DecoderOptions,
} from "./decoders";
import {
LOG_LEVEL,
Expand Down Expand Up @@ -89,7 +89,7 @@ type WorkerReqMap = {
fileSrc: FileSrcType,
pageSize: number,
cursor: CursorType,
decoderOptions: DecoderOptionsType
decoderOptions: DecoderOptions
},
[WORKER_REQ_CODE.LOAD_PAGE]: {
cursor: CursorType,
Expand Down
4 changes: 2 additions & 2 deletions src/utils/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import {
LOCAL_STORAGE_KEY,
THEME_NAME,
} from "../typings/config";
import {DecoderOptionsType} from "../typings/decoders";
import {DecoderOptions} from "../typings/decoders";
import {TAB_NAME} from "../typings/tab";


Expand Down Expand Up @@ -131,7 +131,7 @@ const getConfig = <T extends CONFIG_KEY>(key: T): ConfigMap[T] => {
timestampKey: window.localStorage.getItem(
LOCAL_STORAGE_KEY.DECODER_OPTIONS_TIMESTAMP_KEY
),
} as DecoderOptionsType;
} as DecoderOptions;
break;
case CONFIG_KEY.INITIAL_TAB_NAME:
value = window.localStorage.getItem(LOCAL_STORAGE_KEY.INITIAL_TAB_NAME);
Expand Down
Loading