Skip to content

Conversation

vincentkelleher
Copy link
Contributor

Description

This re-introduces the useLocalCrawling & maxDepth configuration parameters for document indexing as they were ignored since the JSON to YAML configuration migration.

Checklist

  • I've read the contributing guide
  • The relevant docs, if any, have been updated or created
  • The relevant tests, if any, have been updated or created

Tests

DocsService tests are skipped and commented at the moment 👉 https://github.com/continuedev/continue/blob/bbb81ff032608e03a2208be908c1394da228ad6a/core/indexing/docs/DocsService.skip.ts

@vincentkelleher vincentkelleher requested a review from a team as a code owner June 3, 2025 09:06
@vincentkelleher vincentkelleher requested review from RomneyDa and removed request for a team June 3, 2025 09:06
Copy link
Contributor

cubic-dev-ai bot commented Jun 3, 2025

Your cubic subscription is currently inactive. Please reactivate your subscription to receive AI reviews and use cubic.

@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Jun 3, 2025
Copy link

netlify bot commented Jun 3, 2025

Deploy Preview for continuedev ready!

Name Link
🔨 Latest commit 16b80d9
🔍 Latest deploy log https://app.netlify.com/projects/continuedev/deploys/68516b828b9d6c00083ce358
😎 Deploy Preview https://deploy-preview-5958--continuedev.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link

github-actions bot commented Jun 3, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@vincentkelleher
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

Copy link

recurseml bot commented Jun 13, 2025

😱 Found 1 issue. Time to roll up your sleeves! 😱

@vincentkelleher
Copy link
Contributor Author

All tests are green 🎉

So recurseml, is that enough sleeve rolling for you ? 😏 🤣

@vincentkelleher
Copy link
Contributor Author

Bump

Could someone do a quick review of this PR ? 😇
It's very simple and is highly needed in my team 😊

@vincentkelleher
Copy link
Contributor Author

This PR is now one month old, anyone to check it and merge it ? 😢

@RomneyDa
Copy link
Collaborator

RomneyDa commented Jul 10, 2025

@vincentkelleher do you need the maxDepth param specifically?
We want to merge this with useLocalCrawling but maybe deprecate maxDepth in favor of an allowList/blockList pattern

Apologies for the delays!

@vincentkelleher
Copy link
Contributor Author

@RomneyDa I was just aware of maxDepth because it was there historically.

I imagine allowList/blockList would be a list of regex ?

Thanks for the feedback 😊

@RomneyDa
Copy link
Collaborator

RomneyDa commented Jul 10, 2025

Got it! So would it solve your issue if I merged and then removed maxdepth and kept uselocalCrawling?

(Or if you'd like to)

Yes, I think glob patterns for allow/block

@vincentkelleher
Copy link
Contributor Author

@RomneyDa I have the feeling that maxDepth requires less thinking and is safer as you won't explicitly know how many pages will be indexed by each glob, don't you think ?

@RomneyDa
Copy link
Collaborator

we're thinking also about adding a maxPages to give a more direct limit, but I think people generally either want to index all docs that match a pattern (with a hard limit perhaps). maxDepth doesn't create any hard limit, it could yield tens of thousands of pages in somewhat edge casey scenarios

@RomneyDa
Copy link
Collaborator

would maxPages and useLocalCrawling be sufficient?
The other issue with maxDepth is it's not super clear how it works, i.e. as a dev I can't keep a 3-link-deep map of the docs pages I want in my head

@vincentkelleher
Copy link
Contributor Author

It's true that there are clearly two types of limits:

  • hard limits with maxPages
  • soft limits with maxDepth, allowList and blockList

Usually having a max depth of 1 seems like a reasonable case as you want everything directly linked to the subject of the page, doing more than 1 or 2 would directly bring you to indexing the whole website in most cases IMHO. I also think that having an allow or block list would be about the same, if not worse, than a max depth over 1 as you will have to clearly know the sitemap.

Having a maximum number of pages would be a good guard-rail to avoid using too many hardware resources, that seems like a good feature 👍

I would go for useLocalCrawling, maxDepth and maxPages 😇

@RomneyDa
Copy link
Collaborator

RomneyDa commented Jul 13, 2025

@vincentkelleher appreciate the feedback! Do you currently have cases for which you set maxDepth > 1?

@vincentkelleher
Copy link
Contributor Author

@RomneyDa I don't have any in mind right now 🤔

Copy link
Collaborator

@RomneyDa RomneyDa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds the maxDepth and useLocalCrawling params back in for YAML docs config

@github-project-automation github-project-automation bot moved this from Todo to In Progress in Issues and PRs Jul 17, 2025
Copy link
Collaborator

@RomneyDa RomneyDa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reinstates missing YAML docs config params

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jul 17, 2025
@RomneyDa RomneyDa merged commit cb17516 into continuedev:main Jul 17, 2025
35 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Issues and PRs Jul 17, 2025
@github-actions github-actions bot locked and limited conversation to collaborators Jul 17, 2025
@RomneyDa
Copy link
Collaborator

After running by team opened a new PR to remove maxDepth for YAML, opened a ticket to add maxPages/allow/block list or similar to replace. Will leave useLocalCrawling. Thanks for the contribution!

@sestinj
Copy link
Contributor

sestinj commented Jul 22, 2025

🎉 This PR is included in version 1.1.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm This PR has been approved by a maintainer released size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants