-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat(drivers): add autoindex driver #1978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a new "Autoindex" driver that can crawl and parse autoindex pages (directory listings) from web servers like Apache using XPath expressions. The driver uses the antchfx/htmlquery library to parse HTML content and extract file information.
Changes:
- Added autoindex driver with configurable XPath expressions for parsing different autoindex formats
- Implemented List and Link operations for browsing and accessing files from autoindex pages
- Added size parsing utility that handles various unit formats (K, M, G, etc.)
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| go.mod | Added dependencies for htmlquery and xpath libraries |
| go.sum | Added checksums for new dependencies and their transitive dependency golang/groupcache |
| drivers/autoindex/util.go | Implements size parsing from human-readable strings with unit conversion |
| drivers/autoindex/meta.go | Defines driver configuration and registration with required XPath fields |
| drivers/autoindex/driver.go | Core driver implementation with List and Link methods for fetching and parsing autoindex pages |
| drivers/all.go | Registers the new autoindex driver with the driver registry |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: MadDogOwner <[email protected]>
Signed-off-by: MadDogOwner <[email protected]>
Signed-off-by: MadDogOwner <[email protected]>
Signed-off-by: MadDogOwner <[email protected]>
xrgzs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM,文档加一下
有空写 |
Description / 描述
基于
antchfx/htmlquery爬取 autoindex 页面。由于各 web 服务器的 autoindex 页面各不相同,需要自己写 XPath。Motivation and Context / 背景
https://github.com/orgs/OpenListTeam/discussions/170#discussioncomment-14943992
How Has This Been Tested? / 测试
测试环境1为 Apache/2.4.18,所用 XPath 如下:
//table/tbody/tr[position() > 2]./td[2]/a./td[3]./td[4]测试环境2为 nginx/1.29.0,所用参数如下:
//pre/a.substring(normalize-space(./following-sibling::text()[1]),1,17)substring(normalize-space(./following-sibling::text()[1]),19)02-Jan-2006 15:04测试环境3(本地)为 Caddy/v2.10.2,所用参数如下:
//table/tbody/tr./td[2]/a/span./td[4]/time./td[3]/div/div[2]Up01/02/2006 03:04:05 PM -07:00测试环境4(本地)为 SimpleHTTP/0.6 Python/3.11.5,所用 XPath 如下:
//ul/li./a测试环境5响应头里只写了 Apache 没写版本,所用参数如下:
//pre/pre/a.substring(normalize-space(./following-sibling::text()[1]),1,16)substring(normalize-space(./following-sibling::text()[1]),18)Name、Last modified、Size、DescriptionChecklist / 检查清单
我已阅读 CONTRIBUTING 文档。
go fmtor prettier.我已使用
go fmt或 prettier 格式化提交的代码。我已为此 PR 添加了适当的标签(如无权限或需要的标签不存在,请在描述中说明,管理员将后续处理)。
我已在适当情况下使用"Request review"功能请求相关代码作者进行审查。
我已相应更新了相关仓库(若适用)。