You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have had some issues with the existing Obsidian Reader in llama index, which were:
It doesn't do much Obsidian specific stuff besides loading the .md files and passing them to the MarkdownReader
The ObsidianReader doesn't just read content but also simultaneously splits text (based on headers) which is an action that should be done later in the pipeline by a TextSplitter in my opinion
What I want to propose is a Reader class that can load the data of the various data fields of an Obsidian file and put them into a llama index Document as metadata.
I wrote a new class, borrowing some code of the langchain ObsidianLoader, and implemented the ability to also read the tags from the ob-timelines plugin (https://github.com/seanlowe/obsidian-timelines)
A Obsidian note can have multiple ob-timelines events and since the metadata dict should be kept flat I came up with the idea to let the user choose if they want to have the title of an event as prefix for the data fields or "event" + a sequential number.
I also removed the call to MarkdownReader because of the issue I mentioned above.
So this Reader can read:
in text Tags ("#thisisatag")
ob-timelines <div> tags
Front Matter tags (yaml)
Dataview fields
The new Reader removes the tags from the text with regex after adding them as metadata to the Document object
The ob-timelines reading capability requires the addition of BeautifulSoup4 as dependency.
The Front Matter reading capability requires the addition of yaml as dependency.
I'll attach the class, please let me know your suggestions for improvements and/or bugs that I may have missed. ObsidianTLReader.py.zip
edit:
I have another idea, but I don't know if/how I could implement that I'm not that deep into the llama-index internals. Since obsidian has support to link its Markdown note files with Internal Links (https://help.obsidian.md/Linking+notes+and+files/Internal+links) I wonder if it would be a good idea to represent that somehow with the relationships dict of llama-index Documents so that each Document (from a Note) has references to all the other Documents it links to.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I have had some issues with the existing Obsidian Reader in llama index, which were:
Obsidian Link: https://obsidian.md/
What I want to propose is a Reader class that can load the data of the various data fields of an Obsidian file and put them into a llama index Document as metadata.
I wrote a new class, borrowing some code of the langchain ObsidianLoader, and implemented the ability to also read the tags from the ob-timelines plugin (https://github.com/seanlowe/obsidian-timelines)
A Obsidian note can have multiple ob-timelines events and since the metadata dict should be kept flat I came up with the idea to let the user choose if they want to have the title of an event as prefix for the data fields or "event" + a sequential number.
I also removed the call to MarkdownReader because of the issue I mentioned above.
So this Reader can read:
<div>
tagsThe new Reader removes the tags from the text with regex after adding them as metadata to the Document object
The ob-timelines reading capability requires the addition of BeautifulSoup4 as dependency.
The Front Matter reading capability requires the addition of yaml as dependency.
I'll attach the class, please let me know your suggestions for improvements and/or bugs that I may have missed.
ObsidianTLReader.py.zip
edit:
I have another idea, but I don't know if/how I could implement that I'm not that deep into the llama-index internals. Since obsidian has support to link its Markdown note files with Internal Links (https://help.obsidian.md/Linking+notes+and+files/Internal+links) I wonder if it would be a good idea to represent that somehow with the relationships dict of llama-index Documents so that each Document (from a Note) has references to all the other Documents it links to.
Beta Was this translation helpful? Give feedback.
All reactions