Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add only option to data entries #61

Open
AriPerkkio opened this issue Dec 12, 2024 · 4 comments
Open

Add only option to data entries #61

AriPerkkio opened this issue Dec 12, 2024 · 4 comments

Comments

@AriPerkkio
Copy link

While developing test cases, I want to focus on a single input.

In my setup I have quite large set of already verified evals. I want to add new one and work on that one in watch mode for a while. Running every single input unnecessarily can be slow and expensive.

evalite("Something", {
  data: async () => {
    return [
      // A should produce B when ...
      { input: 'A', expected: 'B' },

      // C should produce D when ...
      { input: 'C', expected: 'D' },

+     // E should produce F when ...
+     { input: 'E', expected: 'F' },
    ];
  }

I would like to be able to add only: true flag to a data entry. This should skip all other data entries.

evalite("Something", {
  data: async () => {
    return [
       { input: 'A', expected: 'B' },
       { input: 'C', expected: 'D' },
-      { input: 'E', expected: 'F' },
+      { input: 'E', expected: 'F', only: true },
@mattpocock
Copy link
Owner

@AriPerkkio

This is interesting! It's something I have been noodling on a fair bit.

One thing I wondered is whether an aggressive, opt-in cache would be useful for designing datasets.

I.e. running evalite watch --experimental-cache-results, which would introduce an in-memory cache for data only.

There are obviously downsides to this: you might change a piece of non-data code, which would NOT invalidate the cache. You might accidentally leave it on, leading to confusion.

only is more explicit, is visible to the user, and intuitive.

@mattpocock
Copy link
Owner

mattpocock commented Dec 13, 2024

I have also been thinking about using a separate file for data. Either a JSON file (bad for DX) or a markdown file:

# Input

My prompt here

# Expected

My expected output here

---

# Input

![](./image.png)

# Expected

A painting of a penguin in watercolours.

(image links would be used to pass images/files)

You would then reference this markdown/JSON file in tests:

evalite('Test', {
  data: './inputs.md',
});

The useful thing about a separate file is that it reduces the surface area for caching. You know that if a markdown file change causes a test re-run, only the inputs and 'expected' have changed.

Might also just be a nicer authoring experience than writing JS objects.

@ShiboSoftwareDev
Copy link

@mattpocock we used toml format to represent our data, it's neat and easy to parse

@AriPerkkio
Copy link
Author

AriPerkkio commented Dec 16, 2024

Opt-in cache sounds also good, but maybe it's a bit too magical/automatic. I like how explicit only flag would be.

For cache invalidation you could probably use Vite's module graph to see when module's depedency tree contains changes - exactly how Vite's HMR and Vitest's watch-mode work. Though in my use case it would not help, as I'm just calling same function with different inputs. Changes in the function would trigger re-run for all inputs I guess.

So far I've liked the format of data property, as in defining data in plain JS objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants