Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use error messages from jcc err.log in experiments #230

Draft
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

cjx10
Copy link
Contributor

@cjx10 cjx10 commented Apr 26, 2024

Before merging:

  • Remove custom jcc

This pr adds support for extracting error messages from jcc's err.log

@cjx10 cjx10 requested a review from DonggeLiu April 26, 2024 03:32
experiment/evaluator.py Outdated Show resolved Hide resolved
@cjx10
Copy link
Contributor Author

cjx10 commented May 3, 2024

Special cases when get_jcc_errstr() cannot find error message but it should:

proftpd: cp tests/fuzzing/json_fuzzer.c /src/fuzzer.c

In all other projects, get_jcc_errstr() can return desired error messages, especially linker errors, or there's build issue from the project source. (In other words, fuzz target is never touched by jcc and error messages from source are not returned mistakenly)

@DonggeLiu
Copy link
Collaborator

proftpd: cp tests/fuzzing/json_fuzzer.c /src/fuzzer.c

In all other projects, get_jcc_errstr() can return desired error messages, especially linker errors, or there's build issue from the project source. (In other words, fuzz target is never touched by jcc and error messages from source are not returned mistakenly)

Thanks for documenting this.
I reckon if this is the only case, then it's probably not too important.

However, could you please try building the original fuzz target with JCC and see if that fails?
E.g., build its image in OSS-Fuzz, set JCC in the docker container, and compile to build the binaries.
This will confirm if the problem is from JCC, given proftpd can build without it:
https://oss-fuzz-build-logs.storage.googleapis.com/index.html#proftpd

@cjx10
Copy link
Contributor Author

cjx10 commented May 6, 2024

try building the original fuzz target with JCC

To clarify, proftpd built in oss-fuzz-gen, but we lost track of the fuzz target in logs because the file name changed.

For testing, in oss-fuzz the following 2 lines were added to projects/{project_name}/Dockerfile:

ENV CC=/usr/local/bin/clang-jcc
ENV CXX=/usr/local/bin/clang++-jcc

Then python3 infra/helper.py shell {project_name}, manually run compile in docker shell.

Below is a list of projects found not building in oss-fuzz-gen, and whether they build with jcc (7c67544) in oss-fuzz:

  • firestore: build
  • libidn2: fail
  • libpsl: fail
  • libtasn1: fail
  • mdbtools: fail
  • mosh: fail
  • myanmar-tools: fail
  • oatpp: build
  • openvswitch: build
  • piex: build
  • proj4: fail
  • protobuf-c: fail
  • qpid-proton: build
  • tarantool: fail

@cjx10 cjx10 marked this pull request as ready for review May 7, 2024 22:34
@cjx10 cjx10 changed the title Check errlog in experiments Use error messages from jcc err.log in experiments May 8, 2024
Base automatically changed from err_parsing to main May 10, 2024 04:10
@cjx10 cjx10 force-pushed the get_err_from_errlog branch 3 times, most recently from 105bac4 to e9d5283 Compare May 13, 2024 04:27
experiment/builder_runner.py Outdated Show resolved Hide resolved
experiment/builder_runner.py Outdated Show resolved Hide resolved
experiment/builder_runner.py Outdated Show resolved Hide resolved
experiment/builder_runner.py Outdated Show resolved Hide resolved
experiment/builder_runner.py Outdated Show resolved Hide resolved
llm_toolkit/code_fixer.py Outdated Show resolved Hide resolved
llm_toolkit/code_fixer.py Outdated Show resolved Hide resolved
# Assume the default output name.
return 'a.out'
return os.path.basename(output_name)
return ''
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Exclude the simple cases first to avoid nested conditions, e.g.,
if not target_found:
  return ''
  1. Log a warning when the target is not found.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please simplify this function, e.g.,

    for i, arg in enumerate(compile_args):
        if arg in ['-o', '--output'] and i < len(compile_args) - 1:
            output_name = compile_args[i + 1]
        elif arg.startswith('--output='):
            output_name = arg.removeprefix('--output=')
        elif not arg.startswith('-') and os.path.basename(arg) in target_names:
            target_found = os.path.basename(arg)

    if not target_found:
        return ''
    if output_name:
      return os.path.basename(output_name)

    if '-c' in compile_args:
        return f'{os.path.splitext(target_found)[0]}.o'
    logging.warning(
            'Output file not specified in [%s], but fuzz target found',
            ' '.join(compile_args))
    return 'a.out'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion.
Re 2.
Probably better to not log a warning when the target is not found. It's expected to have many such cases where the command is not compiling the fuzz target, indicating the log lines followed are not the target error lines we want.
Rename variable instead: target_found -> fuzz_target_found

llm_toolkit/code_fixer.py Show resolved Hide resolved
llm_toolkit/code_fixer.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@cjx10 cjx10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the detailed review Dongge :)

llm_toolkit/code_fixer.py Outdated Show resolved Hide resolved
llm_toolkit/code_fixer.py Outdated Show resolved Hide resolved
llm_toolkit/code_fixer.py Outdated Show resolved Hide resolved
# Assume the default output name.
return 'a.out'
return os.path.basename(output_name)
return ''
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion.
Re 2.
Probably better to not log a warning when the target is not found. It's expected to have many such cases where the command is not compiling the fuzz target, indicating the log lines followed are not the target error lines we want.
Rename variable instead: target_found -> fuzz_target_found

except FileNotFoundError as e:
logging.error('Cannot get err.log for %s: %s', generated_project, e)
# Touch err.log in results folder to avoid FileNotFoundError when
# extracting errors.
open(jcc_errlog_path, 'x')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better solution than this?
I guess this intends to make parsing easier, but it may confuse us to think JCC created an empty file. We will have to search in gcloud logs to distinguish these two cases.

Could we check os.path.isfile() at the beginning extract_error_message() instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a magic string in err.log when err.log does not exist?

This solution is for local experiment, creating an empty file is currently consistent with the behaviour of getting the build log and run log. Might help prevent increasing the complexity of build_and_run() workflow?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a magic string in err.log when err.log does not exist?

This can work but not preferred.
Intuitively, if err.log is not generated, then it should not exist.

creating an empty file is currently consistent with the behaviour of getting the build log and run log

Where do we create empty build log and run logs when they do not exist?

Might help prevent increasing the complexity of build_and_run() workflow?

Not sure if this relates, but I suppose this will only add two lines in code_fixer?

Could we check os.path.isfile() at the beginning extract_error_message() instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we create empty build log and run logs?

It's in cloud builder, we open the local file object before checking if the file blob on cloud exists
https://github.com/google/oss-fuzz-gen/blob/main/experiment/builder_runner.py#L645

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see; let's keep it, then.
Thanks!

elif arg.startswith('-o'):
output_name = arg.removeprefix('-o')
elif (not arg.startswith('-') and not arg == output_name and
os.path.basename(arg) in target_names):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A silly question:
Why do we need not arg == output_name?

Also, would it be a good idea to log a warning if arg in ['-o', '--output'] but i + 1 >= len(compile_args)?
This is unlikely to happen now but may occur later when we automate the build script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need not arg == output_name?

Since we assigned output_name = compile_args[i + 1] in the previous iteration, we want to skip it now.
I'm thinking of the situation when we have clang src.c -o src.o, and we added src.o to target_names for search.
Then it gets compiled again clang++ src.cpp -o src.o. We dont want to assign src.otofuzz_target_found`.
Although this should be ok because we will not use it later on.

log a warning if i + 1 >= len(compile_args)

Sure good point

llm_toolkit/code_fixer.py Show resolved Hide resolved
llm_toolkit/code_fixer.py Show resolved Hide resolved
llm_toolkit/code_fixer.py Outdated Show resolved Hide resolved
llm_toolkit/code_fixer.py Outdated Show resolved Hide resolved
llm_toolkit/code_fixer.py Outdated Show resolved Hide resolved
@DonggeLiu
Copy link
Collaborator

Thanks for addressing the comments, I have no more suggestions now.
Could you please run a PR experiment and triple-check the report that everything is intact?
Thanks

@cjx10
Copy link
Contributor Author

cjx10 commented May 16, 2024

Could you please run a PR experiment

Many thanks Dongge :)
Will do after I double check the parsing is correct, since the code structure has changed quite a bit

@cjx10
Copy link
Contributor Author

cjx10 commented May 16, 2024

/gcbrun request_pr_exp.py -n jim -f

@DonggeLiu
Copy link
Collaborator

nit:

-n jim

nit:Could you please add the branch id to the name in the future?
This helps us identify the job/bucket_dir/report.
Thanks : )

@DonggeLiu DonggeLiu marked this pull request as draft July 11, 2024 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants