Skip to content

Conversation

@KirCute
Copy link
Member

@KirCute KirCute commented Jan 15, 2026

Description / 描述

基于antchfx/htmlquery爬取 autoindex 页面。由于各 web 服务器的 autoindex 页面各不相同,需要自己写 XPath。

Motivation and Context / 背景

https://github.com/orgs/OpenListTeam/discussions/170#discussioncomment-14943992

How Has This Been Tested? / 测试

测试环境1为 Apache/2.4.18,所用 XPath 如下:

  • 条目://table/tbody/tr[position() > 2]
  • 文件名:./td[2]/a
  • 修改时间:./td[3]
  • 文件大小:./td[4]

测试环境2为 nginx/1.29.0,所用参数如下:

  • 条目://pre/a
  • 文件名:.
  • 修改时间:substring(normalize-space(./following-sibling::text()[1]),1,17)
  • 文件大小:substring(normalize-space(./following-sibling::text()[1]),19)
  • 修改时间格式:02-Jan-2006 15:04

测试环境3(本地)为 Caddy/v2.10.2,所用参数如下:

  • 条目://table/tbody/tr
  • 文件名:./td[2]/a/span
  • 修改时间:./td[4]/time
  • 文件大小:./td[3]/div/div[2]
  • 忽略文件名:Up
  • 修改时间格式:01/02/2006 03:04:05 PM -07:00

测试环境4(本地)为 SimpleHTTP/0.6 Python/3.11.5,所用 XPath 如下:

  • 条目://ul/li
  • 文件名:./a
  • 修改时间:没有,所以为空
  • 文件大小:没有,所以为空

测试环境5响应头里只写了 Apache 没写版本,所用参数如下:

  • 条目://pre/pre/a
  • 文件名:.
  • 修改时间:substring(normalize-space(./following-sibling::text()[1]),1,16)
  • 文件大小:substring(normalize-space(./following-sibling::text()[1]),18)
  • 忽略文件名:添加NameLast modifiedSizeDescription

Checklist / 检查清单

  • I have read the CONTRIBUTING document.
    我已阅读 CONTRIBUTING 文档。
  • I have formatted my code with go fmt or prettier.
    我已使用 go fmtprettier 格式化提交的代码。
  • I have added appropriate labels to this PR (or mentioned needed labels in the description if lacking permissions).
    我已为此 PR 添加了适当的标签(如无权限或需要的标签不存在,请在描述中说明,管理员将后续处理)。
  • I have requested review from relevant code authors using the "Request review" feature when applicable.
    我已在适当情况下使用"Request review"功能请求相关代码作者进行审查。
  • I have updated the repository accordingly (If it’s needed).
    我已相应更新了相关仓库(若适用)。

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new "Autoindex" driver that can crawl and parse autoindex pages (directory listings) from web servers like Apache using XPath expressions. The driver uses the antchfx/htmlquery library to parse HTML content and extract file information.

Changes:

  • Added autoindex driver with configurable XPath expressions for parsing different autoindex formats
  • Implemented List and Link operations for browsing and accessing files from autoindex pages
  • Added size parsing utility that handles various unit formats (K, M, G, etc.)

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
go.mod Added dependencies for htmlquery and xpath libraries
go.sum Added checksums for new dependencies and their transitive dependency golang/groupcache
drivers/autoindex/util.go Implements size parsing from human-readable strings with unit conversion
drivers/autoindex/meta.go Defines driver configuration and registration with required XPath fields
drivers/autoindex/driver.go Core driver implementation with List and Link methods for fetching and parsing autoindex pages
drivers/all.go Registers the new autoindex driver with the driver registry

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

xrgzs added 4 commits January 15, 2026 16:43
Signed-off-by: MadDogOwner <[email protected]>
Signed-off-by: MadDogOwner <[email protected]>
Signed-off-by: MadDogOwner <[email protected]>
Signed-off-by: MadDogOwner <[email protected]>
xrgzs
xrgzs previously approved these changes Jan 15, 2026
Copy link
Member

@xrgzs xrgzs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,文档加一下

@KirCute
Copy link
Member Author

KirCute commented Jan 15, 2026

LGTM,文档加一下

有空写

jyxjjj
jyxjjj previously approved these changes Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Module: Driver Driver-Related Issue/PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants