-
Notifications
You must be signed in to change notification settings - Fork 5
Some fixes and features to make it easier to access data #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Thank you very much for submitting a PR! I like the changes you're proposing, however I can see an issue with setting the string encoding. The way it is now, whenever zenkit._native.load
is called with an encoding, it will change the encoding for all strings loaded by the library. This is unexpected behaviour, I'd think.
I propose instead, to add a function for setting the encoding globally and then applying ot to all string encoding and decoding performed by the library (e.g. a zenkit.set_string_encoding
function). This should then also replace the hardcoded "windows-1252"
in all str.encode
calls (you don't need to do that right now, but that'd be the plan).
2847c83
to
6d02807
Compare
What's the "real" functional difference between DaedalusScript and DaedalusVm? The latter is a subclass, and the Would it be possible to detect if a symbol is a function parameter inside the DLL, other than the The Same goes to other aspects, like the Lastly, it seems that DaedalusVm requires registering externals, even the defaults from Gothic. Why can't ZenKit provide them with the DLL, is there some licensing issue? |
The difference between Daedalus instances are runtime-initialized objects, meaning they have to be explicitly initialized by the engine by running some Daedalus code. That's the job of the VM. So no, Getting the parameters of a Daedalus function isn't as simple as checking for a Caching is something I haven't considered yet, because there are many cases in ZenKit where caching is the wrong answer. In those examples specifically though, I think caving would be a valid strategy, because the script itself is immutable. A generator could be great too. Externals are not registered by ZenKit, because most of them have something to do with game engine stuff, which ZenKit is not concerned with. Of course, things like |
Hello again, I wish you a belated happy new year ^^ However, there is some bug, which is possibly there in 1.3.0 or I'm missing something about memory management and DAT file access. "$INSTANCE_HELP": {
"crashed": "Init Failed - NPC is None"
},
"BDT_10030_ADDON_BUDDLER": {
"crashed": "exception: access violation reading 0x000000474E49525C"
},
"BDT_10031_ADDON_WACHE": {
"dialogues": {
"DIA_ADDON_BDT_10031_WACHE_HI": "Alles klar?"
},
"id": 10031,
"instance_name": "BDT_10031_ADDON_WACHE",
"name": "Wache",
"routines": [
"START"
]
}, As you can see there are NPC Instances, which crash during initialization (OSError), I think this happens with Info Instances as well, but if that happens I skip them without any logging. The worst part is that, the output of my script changes every time, I mean the amount of Sometimes, but not always, the script also shows this Traceback: $ python script.py
Exception ignored in: <function DaedalusScript.__del__ at 0x000002203C8216C0>
Traceback (most recent call last):
File "...\DecDat-Zenkitpy_project\venv\Lib\site-packages\zenkit\daedalus_script.py", line 277, in __del__
self._deleter()
File "...\DecDat-Zenkitpy_project\venv\Lib\site-packages\zenkit\daedalus_vm.py", line 203, in _deleter
DLL.ZkDaedalusVm_del(self._handle)
OSError: exception: access violation reading 0xFFFFFFFFFFFFFFFF Sometimes, but not always, when I enable the Debug LogLevel, I get different errors;
there can be also no errors in the Terminal, and Here is the script.zip, it expects an exported Any advice how could I debug it? I was thinking of setting externals to lessen the burden of the |
Next day of investigation about the random crashes. I've added some externals and started getting this error
I don't really understand where the recursion is, but looking at the Traceback it's related to the Exception ignored on calling ctypes callback function <function DaedalusVm._register_external.<locals>.<lambda> at 0x0000015599AF8540>:
Traceback (most recent call last):
File "...\zenkit\daedalus_vm.py", line 198, in <lambda>
fptr = DaedalusVmExternalCallback(lambda _, __: cb())
File "...\zenkit\daedalus_vm.py", line 177, in _wrapper
vals = [self.pop(arg) for arg in reversed(args)][::-1]
File "...\zenkit\daedalus_vm.py", line 109, in pop
return self.pop_instance()
File "...\zenkit\daedalus_vm.py", line 133, in pop_instance
return DaedalusInstance.from_native(handle)
File "...\zenkit\daedalus\base.py", line 60, in from_native
typ = DaedalusInstanceType(DLL.ZkDaedalusInstance_getType(handle))
RecursionError: maximum recursion depth exceeded
Traceback (most recent call last):
File "...\script.py", line 178, in <module>
main()
File "...\script.py", line 62, in main
if can_skip_symbol(vm_sym):
File "...\script.py", line 50, in can_skip_symbol
return sym.is_member or sym.type not in (DaedalusDataType.INSTANCE, DaedalusDataType.FUNCTION)
File "...\zenkit\daedalus_script.py", line 180, in type
return DaedalusDataType(DLL.ZkDaedalusSymbol_getType(self._handle))
RecursionError: maximum recursion depth exceeded Worst part is that I still get random results, sometimes there is no |
Hm I do see a crash with G2 NotR with your first script, so there's probably a bug here. I'll hopefully have time to investigate a bit on the weekend. In the mean time, the only good way to debug this is to add debug prints in the C/C++ source, then recompile the native library, since the crash happens in C-land, inside |
Okay @kamilkrzyskow, I did find at least one issue which, if fixed, seems to resolve your problem. Before calling You could, for example, add that check to this: if class_sym and vm_sym.is_const:
... There is a bug with externals still though, which causes the program to crash when an incorrect parameter is taken. I've fixed that in GothicKit/ZenKitCAPI@2a1b554 and d9fedd0 :) |
Thanks a lot for the fix 🫶 ! Also sorry for this blunder 😞 over the many iterations of random errors I've seen some As for other observations I've had from my shenanigans:
So with this, the dialogue and routine (script above needed a small fix) extraction solution is superior to my previous DecDat -> Regex/Text extraction method, and less buggy 🥳 I was thinking about simplifying So I'm getting closer to being happy with this PR, thanks again ✌️ |
I was at first confused but I think I understand now. The way you implemented Generally, you should not use the
You can write your own logger if you like. Just use |
I like the sound of that, instantiation-aware instances 🤩 I like how you have solutions already in the making to the issues I face 😆
That's why there is the protection with
Thanks, I've written one for debugging the class Container:
logger_messages = dict()
def main():
# Setup logger
vm_sym_name = "*"
def logger_callback(lvl: LogLevel, name: str, message: str):
count = Container.logger_messages.get(message, 0)
if count == 0:
nonlocal vm_sym_name
print(lvl.name, name, vm_sym_name, message, sep=" - ")
Container.logger_messages[message] = count + 1
set_logger(LogLevel.DEBUG, logger_callback)
# ...
for vm_sym in vm.symbols:
vm_sym_name = vm_sym.name
# ... |
Small progress with the externals, as I use 1 in my next extraction project. My bug came from reading the parameters from the DecDat extraction header comment.
Where it says |
I've add |
37a16c1
to
820f058
Compare
820f058
to
d438abf
Compare
fix: remove print from instance creation
d438abf
to
d367f6d
Compare
Hello again, func void b_givetradeinv_veit(var c_npc slf) {
if ((kapitel >= 1) && (veit_itemsgiven_chapter_1 == false)) {
CreateInvItems(slf, itse_arrowpacket_10, 5);
CreateInvItems(slf, itse_boltpacket_10, 5);
// ...
veit_itemsgiven_chapter_1 = true;
};
if ((sq227_veitmarket == 2) && (veit_itemsgiven_city == false)) {
CreateInvItems(slf, itse_arrowpacket_10, 2);
CreateInvItems(slf, itse_boltpacket_10, 2);
// ... So in my previous text-base and regex-based approach I scanned for such lines. However, here the This is what I came up with: def get_variables_used_in_function(vm: DaedalusVm, func_sym: DaedalusSymbol):
"""Get all used variables in the function to later try out different permutations of values"""
@dataclass
class Flags:
found_return: bool = False
found_next_symbol: bool = False
def __iter__(self):
return iter(asdict(self).values())
Flags = Flags()
variables = set()
current_address = func_sym.address
while not all(Flags):
instruction = vm.get_instruction(current_address)
addr_symbol = vm.get_symbol_by_address(current_address)
if instruction.op == DaedalusOpcode.RSR:
Flags.found_return = True
elif Flags.found_return and addr_symbol:
Flags.found_next_symbol = True
continue # End of loop, no need to get size
elif instruction.op == DaedalusOpcode.PUSHV:
index_symbol = vm.get_symbol_by_index(instruction.symbol)
variables.add(index_symbol.name)
current_address += instruction.size
return list(variables)
# ['VEIT_ITEMSGIVEN_CITY', 'KAPITEL', 'VEIT_ITEMSGIVEN_CHAPTER_1', 'FALSE', 'TRUE', 'SQ227_VEITMARKET'] Afterwards I did the usual shtick and looped over the for vm_sym in all_symbols[1:]:
if can_skip_symbol(vm_sym):
continue
if _CLASS_TYPES.get(vm.get_parent_symbol(vm_sym, find_root=True).name) != DaedalusInstanceType.INFO:
continue
info_instance: InfoInstance = vm.init_instance(vm_sym)
if not info_instance.trade:
continue
# Initialize the NPC here so that every instance is in memory?
trade_npc_sym = vm.get_symbol_by_index(info_instance.npc)
npc_instance: NpcInstance = trade_npc_sym.value
if npc_instance is None:
npc_instance = vm.init_instance(trade_npc_sym)
# ...
for trade_info_sym in trade_info_symbols:
info_instance: InfoInstance = trade_info_sym.value
if info_instance is None:
info_instance = vm.init_instance(trade_info_sym)
trade_npc_sym = vm.get_symbol_by_index(info_instance.npc)
npc_instance: NpcInstance = trade_npc_sym.value
if npc_instance is None:
npc_instance = vm.init_instance(trade_npc_sym)
if trade_npc_sym.name not in chapter:
npc_handle = chapter[trade_npc_sym.name] = {}
trade_npc = npc_instance
trade_func = vm.get_symbol_by_index(info_instance.information)
vm.global_self = trade_npc
vm.global_other = trade_npc
vm.call(trade_func, trade_npc) idk, it just feels a bit weird or "clunky" to use for a Python enjoyer like myself ✌️ Later it turned out that So I felt like another DecDat-like function is needed: def get_trader_functions_calls(vm: DaedalusVm):
"""
Archolos trade manager function is B_GIVETRADEINV(C_NPC slf),
so data extraction is required to get the proper functions.
"""
@dataclass
class Flags:
found_return: bool = False
found_next_symbol: bool = False
def __iter__(self):
return iter(asdict(self).values())
Flags = Flags()
current_address = vm.get_symbol_by_name("B_GIVETRADEINV").address
pushed_instances: list[DaedalusSymbol] = []
pushed_ints: list[int] = []
global_npcs: list[DaedalusSymbol] = []
local_sym_to_global_map: dict[str, str] = {}
func_sym_to_local_map: dict[str, str] = {}
def _recent_local_trd() -> str:
nonlocal pushed_instances
for sym in reversed(pushed_instances):
if ".TRD_" in sym.name:
return sym.name
while not all(Flags):
instruction = vm.get_instruction(current_address)
addr_symbol = vm.get_symbol_by_address(current_address)
op = instruction.op
if op == DaedalusOpcode.RSR:
Flags.found_return = True
elif Flags.found_return and addr_symbol:
Flags.found_next_symbol = True
elif op == DaedalusOpcode.PUSHI:
pushed_ints.append(instruction.symbol)
elif op == DaedalusOpcode.PUSHVI:
pushed_instances.append(_get_by_index_or_address(vm, instruction))
pushed_name = pushed_instances[-1].name
if global_npcs and pushed_name not in local_sym_to_global_map:
local_sym_to_global_map[pushed_name] = global_npcs[-1].name
elif op == DaedalusOpcode.BL:
maybe_sym = _get_by_index_or_address(vm, instruction)
if maybe_sym.name.startswith("B_GIVE") and maybe_sym.name not in func_sym_to_local_map:
func_sym_to_local_map[maybe_sym.name] = _recent_local_trd()
elif op == DaedalusOpcode.BE:
maybe_sym = _get_by_index_or_address(vm, instruction)
if maybe_sym.name and maybe_sym.name == "HLP_GETNPC":
global_npcs.append(vm.get_symbol_by_index(pushed_ints[-1]))
current_address += instruction.size
merged_map: dict[str, str] = {}
for key, value in func_sym_to_local_map.items():
npc = local_sym_to_global_map[value]
merged_map[npc] = key
return merged_map
# len(merged_map) == 97
# {'BAU_11041_SHEPHERD': 'B_GIVETRADEINV_SHEPHERD',
# 'BAU_2243_GUMBERT': 'B_GIVETRADEINV_GUMBERT',
# ... So now I have both the NPCs and their functions, I don't even need to go over the Like mentioned in the past:
Would it be possible for ZenKit4Py to somehow make things easier here?
I added the |
Heyo, providing a better API for script decomp is currently not really in scope for ZenKit itself I think. Have you tried calling these functions using the VM and either registering externals for the Maybe interesting as well: https://github.com/GothicKit/mdd (see Releases as well) |
Thanks for the quick reply, and your time.
Totally understandable, since other tools are targeted for this purpose.
I did try mdd after starting this PR: I did have some issues with how it works, which could probably be resolved, but I would ultimately end up with text based processing of the
I'm not sure what you mean with "calling these functions using the VM", but I guess you mean calling I guess gathering the trade items for the functions and not the NPCs would be easier, and then doing some post-processing to replace the function names with their NPCs would maybe be even more performant than constantly resolving I mean the issue is resolved, I just mentioned my process, to hint at me maybe going circling around the proper solution. def createinvitems(npc: DaedalusInstance, item: int, amount: int) -> None:
nonlocal chapter # Handle for chapter dict, chapter[npc_sym.name]
npc_sym: DaedalusSymbol = vm.get_symbol_by_index(npc.index)
item_sym: DaedalusSymbol = vm.get_symbol_by_index(item)
print(f"<local>{npc_sym}, {item_sym}, {amount}", flush=True)
return None
vm.register_external(*_get_registration_data(createinvitems)) But after you mentioned it now, I had the 💡 OMG 🤦 realization that I have not tried to run the nonlocal printed_trace
if not printed_trace:
vm.print_stack_trace()
printed_trace = True
# ERROR - DaedalusVm - DIA_GUMBERT_TRADE - ------- CALL STACK (MOST RECENT CALL FIRST) -------
# ERROR - DaedalusVm - DIA_GUMBERT_TRADE - in B_GIVETRADEINV_GUMBERT at 2fff98
# ERROR - DaedalusVm - DIA_GUMBERT_TRADE - in B_GIVETRADEINV at 312772
# ERROR - DaedalusVm - DIA_GUMBERT_TRADE - in DIA_GUMBERT_TRADE_INFO at 399dbf Now a train of thought from the top of my head. The stack is printed to the stdout, so I would have to redirect it to some file and read it later again or redirect to some string IO object (I'm not sure if there is one in Python, but should be doable with a custom class that proxies the So maybe it would be cool to get a
If it would work in similar fashion to Union hooking, where the old function is also somehow available then it would be helpful I think. # vm is global, or the function would be a member of the DaedalusVm object
def decorator(func_sym: DaedalusSymbol):
def wrapper(some: int, args: str, with_: float, type_hints: DaedalusInstance) -> None:
print("wrapper did execute")
return vm.call(func_sym.name + "_old", some, args, with_, type_hints, rtype=":hmm: reading it from the wrapper would be best, but don't know if it's possible")
return wrapper
vm.override_function(func_sym, decorator) In general I find it funny that Daedalus sort of forces the usage of globals, or nonlocal accessing of variables 😆 |
Hm it's likely that I don't understand your exact use-case, but adding an API for retrieving the call stack would be possible. I will come up with something and get back to you :) |
Oh don't worry, perhaps I do not understand it myself 😅 😆 So this approach works in Archolos: def createinvitems(npc: DaedalusInstance, item: int, amount: int) -> None:
# SG is SharedGlobals
if SG.get_func_from_stack:
SG.is_printing_trace = True
with io.StringIO() as stack_trace:
with contextlib.redirect_stdout(stack_trace):
SG.vm.print_stack_trace()
SG.last_npc_give_func = get_last_call_func_name(stack_trace.getvalue())
SG.is_printing_trace = False
# No need to refech last_npc_give_func, so set False early
# However, it should be also set to False after the information func call
SG.get_func_from_stack = False
return None
# ... However, I've noticed there's a buggy case with this approach, as the external So if there is a function with a guard statement
So instead of [EDIT: So in other words, in OP code language, I would like to be able to hook the OP.CALL and OP.CALL_EXTERNAL to know when it executes, perhaps no need to process the stack trace, "just" expose a hooking mechanism for these actions. I understand this might not be "exposable", but worth asking 😸] I had like 3-5 rewrites of my approach for this traders items case (still not finished), and only now I see that working with ZenKit is more akin to being inside of the "daedalus maze", and I need to find all of the correct keys (variables set to certain values) and I don't know what those keys are, so I need to break the lock to see inside (OP code processing) to know what keys could fit to that lock and later after opening the door I get all of the goodies like already processed DaedalusSymbols and Instances with parsed data inside of the code to extract. And processing DecDat exported scripts is more akin to seeing the map of the maze, with all of the information about possible doors and combinations, however the map is only on paper I need to load it into the code again deserialising it again into some kind of structure. Initially, I thought it would be easier to get all of the data with ZenKit, since I'm already "inside of the game", as I didn't want to write a proper parser for the text Still, for now I'm slowly progressing with my brute force approach, where I call the same function with multiple permutations of variables multiple times😃
|
This really does sound like an issue where an AST would be better suited. MDD does actually generate one from the bytecode, so maybe that'd be a good place to start. You could port that implementation to Python and then you'd have a fully capable bytecode -> AST generator with which you could do exactly this. Just scanning the bytecode for specific CALLs to the external wouldn't be enough because after finding one, you'd need to backtrack to the last BRANCH instruction to figure out the branch it was in. Maybe another option would be to make MDD capable of outputting the AST as JSON or some such so it could be imported again in Python? Maybe something like this Gothic2_GOTHICDAT.json.zip? |
Hmm, I never worked with AST data, but I can imagine that a JSON file generated for Archolos would be quite big 😅, so maybe another data type like YAML or somehow SQLite would be better. Here is an old output file from 2022: The solution sort of worked, as I separated the "blocks" with So yeah after a longer break between updates of Archolos my setup became too cumbersome to manage, so I'm looking for alternatives before Archolos 2.0 ✌️ While writing the message I saw your edit above, so I'll take a look at it 🕙 ... I do see value in seeing the variables inside separated into blocks 👍 Would be nice to be able to either target specific functions or only limit the output to Going back to the example before with the call stack:
The function func void b_givetradeinv(var c_npc slf) {
var c_npc trd_veit;
var c_npc trd_bastian;
var c_npc trd_keth;
var c_npc trd_gumbert;
// ...
trd_gumbert = Hlp_GetNpc(bau_2243_gumbert);
// ...
if (Hlp_GetInstanceID(slf) == Hlp_GetInstanceID(trd_gumbert)) {
b_clearfakeitems(slf);
b_clearjunktradeinv(slf);
b_givetradeinv_gumbert(slf);
}; Therefore, I don't think that the AST is the singular best solution which would solve all of the issues, as I would have to traverse Currently in my imagination this solution seems "best" -> Let's say that I could detect the function calls that happen in ZenKit during javaw.exe -jar mdd.jar --path-to-dat ../gothic.dat --export-ast-to-json ./ast.json --only-symbol B_GIVETRADEINV_GUMBERT executed via subprocess, then I would load the JSON and use the data provided to fill out the blanks of ZenKit. However, this would likely load the Gothic.DAT anew each time the command is run, so it would be hellishly slow. For now I want to finish the brute force approach, to see how close I can get to a 100% solution ✌️ |
So is there anything specific you need from me right now? |
Thanks for the continued support 🫶 The So the idea from #5 (comment) seems the most useful at the moment: stack_trace = vm.call(func_sym, return_stack_trace_dict=True)
# or
vm.call(func_sym)
vm.print_stack_of_previous_call_not_the_current_one()
# or
logger.setlevel(LogLevel.TRACE_DAEDALUS) # outputs Called internal function: name or Called external: name
vm.call(func_sym) # use the logger messages to know what was called inside When you said that ZenKit shouldn't initialize instances with
Basically somehow, doesn't matter how TBH, expose information from inside the |
I fat fingered the enter key on title and it created an empty PR 🙄
Anyway, I always wanted to be able to access script data from Python, there is PyDecDat but it doesn't support Ikarus/Lego extended scripts, and iirc was rather slow.
So for now I typically used DecDat and parsed the
.d
file with Python to extract data.When I saw this project I was happy that I don't have to parse the
.d
myself 😄...or so I thought.Turned out there are some kinks to be straightened out, so I will fix them along the way.
Edits for maintainers are enabled, so if you want to adjust some things, then go ahead ✌️
My goal is not to create another DecDat, or so I think for now 🤔
My goal is to easily extract data, and cross-connected it afterwards in my external scripts.
Example of previously extracted data about dialogues of an NPC together with their possible routines: