Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized PDF extraction function #15572

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

fengsvkn
Copy link

介绍:优化dify知识库的PDF文档提取能力,支持纯图片pdf的提取,支持pdf中图像、表格、公式的识别。

使用方法:
使用OpenDataLab开源项目优化dify在知识库部分的PDF文档识别功能。
https://github.com/opendatalab/PDF-Extract-Kit

本地部署

在api文件夹下下载模型文件
git clone https://www.modelscope.cn/opendatalab/pdf-extract-kit-1.0.git
如果不想下在api文件夹下可以修改pdf_extractor_config.yaml文件
其余部分按照官方文档安装即可

Docker 部署

进入docker目录
docker compose build
然后更改docker-compose.yaml文件,换到刚刚编译好的docker镜像
services:
api:
image: docker-api
worker:
image: docker-api
执行命令
docker compose up -d

等待5分种
进入127.0.0.1就可以使用

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. 💪 enhancement New feature or request labels Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💪 enhancement New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant