-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Restore the useLocalCrawling & maxDepth settings for indexed documents #5958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Your cubic subscription is currently inactive. Please reactivate your subscription to receive AI reviews and use cubic. |
✅ Deploy Preview for continuedev ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
All contributors have signed the CLA ✍️ ✅ |
I have read the CLA Document and I hereby sign the CLA |
…uments Signed-off-by: Vincent Kelleher <[email protected]>
cd75831
to
28b2c57
Compare
😱 Found 1 issue. Time to roll up your sleeves! 😱 |
Signed-off-by: Vincent Kelleher <[email protected]>
All tests are green 🎉 So recurseml, is that enough sleeve rolling for you ? 😏 🤣 |
Bump Could someone do a quick review of this PR ? 😇 |
This PR is now one month old, anyone to check it and merge it ? 😢 |
@vincentkelleher do you need the maxDepth param specifically? Apologies for the delays! |
@RomneyDa I was just aware of I imagine Thanks for the feedback 😊 |
Got it! So would it solve your issue if I merged and then removed maxdepth and kept uselocalCrawling? (Or if you'd like to) Yes, I think glob patterns for allow/block |
@RomneyDa I have the feeling that |
we're thinking also about adding a |
would |
It's true that there are clearly two types of limits:
Usually having a max depth of 1 seems like a reasonable case as you want everything directly linked to the subject of the page, doing more than 1 or 2 would directly bring you to indexing the whole website in most cases IMHO. I also think that having an allow or block list would be about the same, if not worse, than a max depth over 1 as you will have to clearly know the sitemap. Having a maximum number of pages would be a good guard-rail to avoid using too many hardware resources, that seems like a good feature 👍 I would go for |
@vincentkelleher appreciate the feedback! Do you currently have cases for which you set maxDepth > 1? |
@RomneyDa I don't have any in mind right now 🤔 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adds the maxDepth and useLocalCrawling params back in for YAML docs config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reinstates missing YAML docs config params
After running by team opened a new PR to remove maxDepth for YAML, opened a ticket to add maxPages/allow/block list or similar to replace. Will leave useLocalCrawling. Thanks for the contribution! |
🎉 This PR is included in version 1.1.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Description
This re-introduces the
useLocalCrawling
&maxDepth
configuration parameters for document indexing as they were ignored since the JSON to YAML configuration migration.Checklist
Tests
DocsService
tests are skipped and commented at the moment 👉 https://github.com/continuedev/continue/blob/bbb81ff032608e03a2208be908c1394da228ad6a/core/indexing/docs/DocsService.skip.ts