Project: using LLMs to write SPises

Since LLMs are so smart, we wonder how good they are for generating 'formal' specifications. Or at least, specifications that can be executed so we can actually use them for checking the inputs and outputs of programs under test.

The project provides a framework for assessing LLMs ability to extract formal pre- and post-conditions from a program's description. We can consider both commercial LLMs like OpenAI GPTs as well open source LLMs.

Example prompt-1: Give a Python program Postcondition(x,y,z) that checks whether the sum of x and y is zero or equal to z

Answer by Orca-mini-3B: 😅

Postcondition(x,y,z):
     if x + y == 0 or x - y == 0 or z - x == 0:
     return True
     else:
     return False

Example prompt-2: Give a Python program Postcondition(joke:str) that checks whether the joke is funny. 😉

Answer by OpenAI's GPT:

def Postcondition(joke: str) -> bool:
    # Use VADER for a basic sentiment analysis approach
    sia = SentimentIntensityAnalyzer()
    sentiment = sia.polarity_scores(joke)

    # Consider a joke "funny" if it has a positive sentiment
    if sentiment['pos'] > sentiment['neg']:
        return True
    return False

# Example usage
joke = "Why don't scientists trust atoms? Because they make up everything!"
print(Postcondition(joke))  # Output: True

	simple-specs-40	HEx-26
GPT 4o	92.5%
GPT 4 turbo	87.5%
GPT 3.5 turbo	85%	57% (base-test1) / 73%
codellama-7b-instruct.Q8_0	27.5% (using prg-desc) / 35%	15%
codellama-13b-instruct.Q6_K	32.5%, but slow
codellama-13b-instruct.Q4_0	30%
Meta-Llama-3-8B-Instruct.Q4_0	35% (using prg-desc) / 35%
Llama-3-15b-Instruct-GLUED.Q6_K	32.5%, but very slow
Meta-Llama-3-8B-Instruct (Groq, possibly 16f)		48% using prg-desc
Meta-Llama-3-70B-Instruct (Groq, possibly 16f)		65.5% using prg-desc
mistral-7b-instruct-v0.2.Q8_0	27.5%
orca-2-13b.Q4_0	15%
wizardcoder-python-13b-v1.0.Q4_K_M	10%, very slow
gemma2-9b-it (Groq, possibly 16f)	48% using prg-desc

Required Python packages

To use OpenAI models: pip install openai
If you want to use Gpt4All: pip install gpt4all
To use Hugging Face models: pip install huggingface-hub

Datasets

Need to be in a json-format with a structure compatible for llm4spi. TODO: describe the structure. Example: see mini.json.

Some notes on using GPT4All

You can use a Docker-image with GPT4All installed. The image has:

Ubuntu 22-04
Python 3.12 installed
NVIDIA Container Toolkit installed
Vulkan SDK installed
GPT4All installed (as a Python package)

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
llm4spi		llm4spi
.gitignore		.gitignore
README.md		README.md
license.txt		license.txt
mini.json		mini.json
notes.txt		notes.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project: using LLMs to write SPises

Required Python packages

Datasets

Some notes on using GPT4All

About

Releases

Packages

Languages

License

deid00/llm4spi

Folders and files

Latest commit

History

Repository files navigation

Project: using LLMs to write SPises

Required Python packages

Datasets

Some notes on using GPT4All

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages