Paths() Function Support #277

osevill · 2025-03-30T00:24:17Z

I would like to see if there's any performance gain using jaq for the above task, but when I get started, trying paths(scalars), I get an "undefined filter" error.
Since the function paths() doesn't seem to be supported, is there an alternate way to get paths to scalar values in jaq?

Also, is there jaq-specific documentation that lists available operators and functions?

Thanks!

The text was updated successfully, but these errors were encountered:

01mf02 · 2025-03-31T12:36:44Z

Hi @osevill, you should be able to test your filter with jaq using two changes:

Instead of getpath, you can use jqjq's _getpath. Just copy def _getpath($path): ...; to the beginning of your filter. I would then also put your filter into a file and import that with -f file.jq.
Instead of paths(scalars | true), you should be able to use something like paths as $p | _getpath($p) | scalars | $p. This is probably slower than jq's paths(scalars), but this is probably not going to matter much in your case, because that code does not seem to be in a hot loop.

Let us know how your performance (and your final filter) looks like!

Regarding jaq-specific documentation: I've submitted a PR to the jq repository that adds this information to the jq manual itself, showing for example that paths(f) is not supported in jaq. However, I have not heard anything back since three months. :( The jq manual, IMO, is the logical point where such information should be located.

01mf02 · 2025-03-31T12:39:33Z

(Small improvement suggestion: . | map(tostring) is equivalent to just map(tostring), because . | f is equivalent to f for any f.)

osevill · 2025-04-01T00:04:34Z

Thank you for the suggestions. Will implement and let you know.

01mf02 · 2025-04-09T09:39:12Z

Hi @osevill, just wanted to let you know that your use case motivated me to implement support for getpath and paths/1 #280. This should also increase the performance for paths/0. If you want to give it a try --- that jaq branch should be able to run your filter now.
(It would be also nice to have some example input that you run your filter on.)

osevill · 2025-04-10T02:13:59Z

tyvm.
Will look into further this weekend.
Always relied on the binary releases previously, so I would have to install the rust compiler and follow the instructions on the main project page to compile from the path-values branch.
If I run this:

$ cargo install --locked jaq
$ cargo install --locked --git https://github.com/01mf02/jaq # latest development version

...will I get the path-values branch, or do I need to specify the branch above?

Also, what's the prefered method for providing you sample json input data?
Thx.

01mf02 · 2025-04-10T07:02:24Z

...will I get the path-values branch, or do I need to specify the branch above?

$ cargo install --locked --git https://github.com/01mf02/jaq --branch path-values

should do it.

Also, what's the prefered method for providing you sample json input data?

Just posting one or two lines of JSON in this thread, for example. :)

The best thing would be to have a small jq script that produces arbitrarily large input data.
For example, to produce a large array: jq -n '[limit(1000; repeat("Hello world!"))]'
That makes it easier to benchmark things, because one can adapt the size of the input data until the execution of the script to evaluate takes a convenient amount of time.

osevill · 2025-04-12T16:40:08Z

Successfully compiled the path-values branch and did some testing with the filter in my original post:

jq -r '(.[] | map(paths(scalars|true)) | unique) as $cols | map(.[] as $row | ($cols | map(. as $col | $row | getpath($col)))) as $rows | ([($cols | map(. | map(tostring) | join(".")))] + $rows) |  map(@csv) | .[]'

The updated jaq seems to work fine with this filter when keys are the same between elements of the json array. When keys from element to element differ however, I get the following error:
Error: cannot use null as iterable (array or object)

Here's a sample schema typical of what I use with jq to convert to csv. There are 3 elements in the array. The first includes inner_array_1, the second includes inner_array_2, and the third element includes both. (For my use case with the above jq filter, it's appropriate to consider the nested array keys as part of the respective outer array element, i.e., no matter how many elements are in the inner arrays, there should only be 3 csv rows because the nested array elements don't define new records. Only the outer array elements define new records in this use case.) I also added an additional scalar "field6" key to the third outer array element.

{
  "outer_array": [
    {
      "record_no": 1,
      "inner_array_1": [
        {
          "IA1_field1": "IA1_value_1",
          "IA1_field2": "IA1_value_2",
          "IA1_field3": "IA1_value_3"
        },
        {
          "IA1_field1": "IA1_value_4",
          "IA1_field2": "IA1_value_5",
          "IA1_field3": "IA1_value_6"
        },
        {
          "IA1_field1": "IA1_value_7",
          "IA1_field2": "IA1_value_8",
          "IA1_field3": "IA1_value_9"
        }
      ],
      "OA_field1": {
        "name": "name1"
      },
      "OA_field2": 2,
      "OA_field3": 3,
      "OA_field4": 4,
      "OA_field5": 5
    },
    {
      "record_no": 2,
      "inner_array_2": [
        {
          "IA2_field1": "IA2_value_1",
          "IA2_field2": "IA2_value_2",
          "IA2_field3": "IA2_value_3"
        },
        {
          "IA2_field1": "IA2_value_4",
          "IA2_field2": "IA2_value_5",
          "IA2_field3": "IA2_value_6"
        },
        {
          "IA2_field1": "IA2_value_7",
          "IA2_field2": "IA2_value_8",
          "IA2_field3": "IA2_value_9"
        }
      ],
      "OA_field1": {
        "name": "name2"
      },
      "OA_field2": "b",
      "OA_field3": "c",
      "OA_field4": "d",
      "OA_field5": "e"
    },
    {
      "record_no": 3,
      "inner_array_1": [
        {
          "IA1_field1": "IA1_value_10",
          "IA1_field2": "IA1_value_11",
          "IA1_field3": "IA1_value_12"
        },
        {
          "IA1_field1": "IA1_value_13",
          "IA1_field2": "IA1_value_14",
          "IA1_field3": "IA1_value_15"
        },
        {
          "IA1_field1": "IA1_value_16",
          "IA1_field2": "IA1_value_17",
          "IA1_field3": "IA1_value_18"
        }
      ],
      "inner_array_2": [
        {
          "IA2_field1": "IA2_value_10",
          "IA2_field2": "IA2_value_11",
          "IA2_field3": "IA2_value_12"
        },
        {
          "IA2_field1": "IA2_value_13",
          "IA2_field2": "IA2_value_14",
          "IA2_field3": "IA2_value_15"
        },
        {
          "IA2_field1": "IA2_value_16",
          "IA2_field2": "IA2_value_17",
          "IA2_field3": "IA2_value_18"
        }
      ],
      "OA_field1": {
        "name": "name3"
      },
      "OA_field2": 7,
      "OA_field3": 8,
      "OA_field4": "f",
      "OA_field5": "g",
      "OA_field6": "h"
    }
  ]
}

Even when I simplify the sample json to just 2 records, with 2 keys each (one scalar that matches between the records; one object that differs, I also get the same "cannot use null as iterable" error.

  "outer_array": [
    {
      "record_no": 1,
      "OA_field1": {
        "name": "name1"
      }
    },
    {
      "record_no": 2,
      "OA_field2": {
        "name": "name2"
      }
    }
  ]
}

...but if both keys of the record are scalar and the second scalar key differs between records, the path-values jaq branch successfully converts to csv:

  "outer_array": [
    {
      "record_no": 1,
      "OA_field1": "name1"
    },
    {
      "record_no": 2,
      "OA_field2": "name2"
    }
  ]
}

Regards

01mf02 · 2025-04-14T11:05:27Z

The "cannot use null as iterable" error is to be expected, because of the way I implemented getpath in jaq. In a nutshell, jaq expands getpath([$a, $b, $c]) to .[$a][$b][$c], and indexing a null value yields an error in jaq, which explains what you are seeing.
Your filter works in jaq with a small change, namely replacing getpath($col) by getpath($col)? // null. That should preserve the behaviour of your filter in jq and also makes it explicit what should happen when you try to access a value at a path that does not exist.

(.[] | map(paths(scalars|true)) | unique) as $cols |
map(.[] as $row | ($cols | map(. as $col | $row | getpath($col)? // null))) as $rows |
([($cols | map(. | map(tostring) | join(".")))] + $rows) |  map(@csv) | .[]

01mf02 · 2025-04-14T11:50:14Z

I made a little change to implement getpath more efficiently c7fa9c0. After this, the performance of jaq is a bit better than jq's, but not by much:

$ hyperfine -M 2 -L jq jq,target/release/jaq '{jq} -f osevill.jq bla.json'
Benchmark 1: jq -f osevill.jq bla.json
  Time (mean ± σ):      1.345 s ±  0.027 s    [User: 1.288 s, System: 0.049 s]
  Range (min … max):    1.326 s …  1.365 s    2 runs
 
Benchmark 2: target/release/jaq -f osevill.jq bla.json
  Time (mean ± σ):      1.300 s ±  0.002 s    [User: 1.280 s, System: 0.013 s]
  Range (min … max):    1.298 s …  1.302 s    2 runs
 
Summary
  target/release/jaq -f osevill.jq bla.json ran
    1.03 ± 0.02 times faster than jq -f osevill.jq bla.json

I generated bla.json by repeating {"record_no": 1, ...} 10,000 times. I also added | empty at the end of your filter in order not to measure output performance.

osevill · 2025-04-15T00:12:04Z

Thank you again for your time working on this request. I'll recompile the path-values branch to get the updated getpath, and try it out again. Should have some time toward the end of the week.
Regards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paths() Function Support #277

Paths() Function Support #277

osevill commented Mar 30, 2025

01mf02 commented Mar 31, 2025

01mf02 commented Mar 31, 2025

osevill commented Apr 1, 2025

01mf02 commented Apr 9, 2025

osevill commented Apr 10, 2025

01mf02 commented Apr 10, 2025

osevill commented Apr 12, 2025

01mf02 commented Apr 14, 2025 •

edited

Loading

01mf02 commented Apr 14, 2025 •

edited

Loading

osevill commented Apr 15, 2025

Paths() Function Support #277

Paths() Function Support #277

Comments

osevill commented Mar 30, 2025

01mf02 commented Mar 31, 2025

01mf02 commented Mar 31, 2025

osevill commented Apr 1, 2025

01mf02 commented Apr 9, 2025

osevill commented Apr 10, 2025

01mf02 commented Apr 10, 2025

osevill commented Apr 12, 2025

01mf02 commented Apr 14, 2025 • edited Loading

01mf02 commented Apr 14, 2025 • edited Loading

osevill commented Apr 15, 2025

01mf02 commented Apr 14, 2025 •

edited

Loading

01mf02 commented Apr 14, 2025 •

edited

Loading