Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Can Chonkie directly split a Markdown file by headers? #177

Open
qiankunli opened this issue Feb 17, 2025 · 1 comment
Open

[FEAT] Can Chonkie directly split a Markdown file by headers? #177

qiankunli opened this issue Feb 17, 2025 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@qiankunli
Copy link

  1. Split the Markdown file according to the smallest level of header detected.
  2. After splitting by headers, if the length of a certain block exceeds a threshold, split that block by threshold.
@qiankunli qiankunli added the enhancement New feature or request label Feb 17, 2025
@bhavnicksm
Copy link
Collaborator

Hey @qiankunli! 😄

Thanks for opening an issue~

To answer you, in short, yes! It's quite straight-forward to do so with the RecursiveChunker and we have a tutorial on this as well (see: Cookbook on one way to go about it)

Currently, in the src we have the v0.5.0 available, which has the include_delim="next" option, which is helpful for markdown headers to be included in the next chunk, rather than the prev which is default behavior. You can also test that out if you wish.

Hope that helps~

Thanks 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants