Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

./script.sh parse-defs - ENTRY and SYSCALL_DEFINE are handled on projects other than Linux #386

Open
tleb opened this issue Feb 5, 2025 · 1 comment
Labels
indexing Related to the index content — missing definitions/references, lexer bugs, new ctags features...

Comments

@tleb
Copy link
Member

tleb commented Feb 5, 2025

We have an issue with the script.sh parse-defs function for C files: parse_defs_C().

We have an ident for 16 on Zephyr: https://elixir.bootlin.com/zephyr/v4.0.0/C/ident/16

That implies some Zephyr version has a definition for 16. We find it below and notice that it is not ctags causing the issue but our post-processing. I am not sure we want to fix that in script.sh or if we want to start porting parts of the script.sh, and this (subtle) bug would be a good starting point.

$ docker run -it --name lxr -v ./elixir-data:/srv/elixir-data --entrypoint /bin/bash elixir
root@0ce474c994b6:/usr/local/elixir# python3
>>> import berkeleydb
>>> import elixir.data
>>> 
>>> db = berkeleydb.db.DB()
>>> db.open('/srv/elixir-data/zephyr/data/definitions.db', flags=berkeleydb.db.DB_RDONLY)
>>> 
>>> [k for k in db.keys() if all(48 <= c <= 57 for c in k)]
[b'16', b'48']
>>> db.get(b'16')
b'14056f265C,10104f260C,36292f258C,44735f258C,31114f258C,47995f258C,50622f258C#C'
>>> x = elixir.data.DefList(db.get(b'16'))
>>> list(x.iter())
[(10104, 'function', 260, 'C'), (14056, 'function', 265, 'C'), (31114, 'function', 258, 'C'), (36292, 'function', 258, 'C'), (44735, 'function', 258, 'C'), (47995, 'function', 258, 'C'), (50622, 'function', 258, 'C')]
>>> 
>>> db_hashes = berkeleydb.db.DB()
>>> db_hashes.open('/srv/elixir-data/zephyr/data/hashes.db', flags=berkeleydb.db.DB_RDONLY)
>>> x = elixir.data.DefList(db.get(b'16'))
>>> for y in set(map(lambda y: y[0], x.iter())): 
...     print(y, db_hashes.get(str(y).encode()), db_filenames.get(str(y).encode()))
36292 b'316ed8e1dca4ed5c75b0b431a56ec1350bee18fd' b'xt_zephyr.S'
14056 b'69ebfd72d57b045b5a62eec1e7cd57fdbf00f658' b'xt_zephyr.S'
31114 b'228d443e1f80de388aa0356e454170dbd93f2fe7' b'xt_zephyr.S'
10104 b'be58a1ff043752ce49dea30a74390f1ebc5d0c9b' b'xt_zephyr.S'
47995 b'990e6645c0af756618328e4a84404322ec945996' b'xt_zephyr.S'
50622 b'0bd59451a5266738a875b9f0c7ce29a50fda1129' b'xt_zephyr.S'
44735 b'3e6dc212ec5333f4b1d9192e91c01ea65a96da9d' b'xt_zephyr.S'
>>> x = elixir.data.DefList(db.get(b'48'))
>>> for y in set(map(lambda y: y[0], x.iter())):
...     print(y, db_hashes.get(str(y).encode()), db_filenames.get(str(y).encode()))
62144 b'13082bb1c5496323330571aa2199eee15d6ffe7a' b'atomic.S'
67393 b'c7c3d7777af8af81ee6497076fd0bdd49cdb077a' b'atomic.S'
36292 b'316ed8e1dca4ed5c75b0b431a56ec1350bee18fd' b'xt_zephyr.S'
14056 b'69ebfd72d57b045b5a62eec1e7cd57fdbf00f658' b'xt_zephyr.S'
31114 b'228d443e1f80de388aa0356e454170dbd93f2fe7' b'xt_zephyr.S'
10091 b'20b510787831cd06b26bb369c35542da83d944bf' b'atomic.S'
56941 b'ff93314cb670ac4c21428bfd8e3c4d73dabd58ac' b'atomic.S'
10104 b'be58a1ff043752ce49dea30a74390f1ebc5d0c9b' b'xt_zephyr.S'
47995 b'990e6645c0af756618328e4a84404322ec945996' b'xt_zephyr.S'
50622 b'0bd59451a5266738a875b9f0c7ce29a50fda1129' b'xt_zephyr.S'
44735 b'3e6dc212ec5333f4b1d9192e91c01ea65a96da9d' b'xt_zephyr.S'

root@0ce474c994b6:/usr/local/elixir# git -C /srv/elixir-data/zephyr/repo cat-file blob 69ebfd72d57b045b5a62eec1e7cd57fdbf00f658 > /tmp/xt_zephyr.S
root@0ce474c994b6:/usr/local/elixir# # ctags output is all fine.
root@0ce474c994b6:/usr/local/elixir# # Only giving us labels, as expected from assembly.
root@0ce474c994b6:/usr/local/elixir# ctags -x --kinds-c=+p+x --extras='-{anonymous}' /tmp/xt_zephyr.S
L_frxt_dispatch_sol label        36 /tmp/xt_zephyr.S .L_frxt_dispatch_sol:
L_frxt_dispatch_stk label        65 /tmp/xt_zephyr.S .L_frxt_dispatch_stk:
L_xt_timer_int_catchup label       266 /tmp/xt_zephyr.S .L_xt_timer_int_catchup:
Lnested          label       147 /tmp/xt_zephyr.S .Lnested:
Lnesting         label       210 /tmp/xt_zephyr.S .Lnesting:
_interrupt_stack_top define       16 /tmp/xt_zephyr.S .set _interrupt_stack_top, _interrupt_stack + CONFIG_ISR_STACK_SIZE
_zxt_dispatch    label        27 /tmp/xt_zephyr.S _zxt_dispatch:
_zxt_int_enter   label       120 /tmp/xt_zephyr.S _zxt_int_enter:
_zxt_int_exit    label       167 /tmp/xt_zephyr.S _zxt_int_exit:
_zxt_task_coproc_state label       381 /tmp/xt_zephyr.S _zxt_task_coproc_state:
_zxt_tick_timer_init label       326 /tmp/xt_zephyr.S _zxt_tick_timer_init:
_zxt_timer_int   label       243 /tmp/xt_zephyr.S _zxt_timer_int:
noReschedule     label       203 /tmp/xt_zephyr.S .noReschedule:

root@0ce474c994b6:/usr/local/elixir# # But script.sh parse-defs is remove the correct entries and appending some bad ones
root@0ce474c994b6:/usr/local/elixir# LXR_REPO_DIR=/srv/elixir-data/zephyr/repo ../elixir/script.sh parse-defs 69ebfd72d57b045b5a62eec1e7cd57fdbf00f658 xt_zephyr.S C
L_frxt_dispatch_sol label 36
L_frxt_dispatch_stk label 65
L_xt_timer_int_catchup label 266
Lnested label 147
Lnesting label 210
_zxt_dispatch label 27
_zxt_int_enter label 120
_zxt_int_exit label 167
_zxt_task_coproc_state label 381
_zxt_tick_timer_init label 326
_zxt_timer_int label 243
noReschedule label 203
16 function 265
48 function 328

root@0ce474c994b6:/usr/local/elixir# # Here is where the new entries are coming from:
root@0ce474c994b6:/usr/local/elixir# perl -ne '/^\s*ENTRY\((\w+)\)/ and print "$1 function $.\n"' /tmp/xt_zephyr.S
16 function 265
48 function 328

Thanks for the report @Fomys!

@fstachura
Copy link
Collaborator

These two lines are adding the extra entries. This part is relevant only for Linux and should be moved to a project-specific part of Elixir. I guess we could redefine parse_defs_C in projects/linux.sh if this should be fixed fast. (script.sh uses this inheritance scheme where functions in the script can be replaced with project-specific versions if projects/project-name.sh exists).

Also, identifiers that consist only of digits are not filtered out by the tokenizer and are treated like "normal" identifiers. This is addressed in new-lexers.

@fstachura fstachura added the indexing Related to the index content — missing definitions/references, lexer bugs, new ctags features... label Feb 21, 2025
@fstachura fstachura changed the title Bug in the ./script.sh parse-defs ./script.sh parse-defs - ENTRY and SYSCALL_DEFINE are handled on projects other than Linux Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
indexing Related to the index content — missing definitions/references, lexer bugs, new ctags features...
Projects
None yet
Development

No branches or pull requests

2 participants