Skip to content

Latest commit

 

History

History
83 lines (46 loc) · 3.15 KB

2025-01-30__docling.md

File metadata and controls

83 lines (46 loc) · 3.15 KB

Data Preparation with Docling (2025-Jan-30)

Event Details

Event information{:target="_blank" rel="noopener"}
🗓️: Thursday Jan 30, 2025
⏰: 9 am PST / 11 am CST / 12 pm EST / 5pm GMT
Duration: 1 hour

Event recording is available{:target="_blank" rel="noopener"}

Check resources - code, presentation slides ..etc


Agenda

  • Welcome
  • Quick intro about AI Alliance (3 min)
  • Presentation: Data preparation with Docling (40 mins)
  • Q&A (10 mins)
  • Wrap-up

Session: Data preparation with Docling

About

When building machine learning applications, a significant portion of your time will be dedicated to data wrangling - starting with content extraction from various documents like PDF, DOCX ..etc.

Docling is an open source, versatile document processor that handles various file types.

In this session, I will introduce Docling and walk through some its core features. And will show a demo (attendees can run it alongside)

More about docking{:target="_blank" rel="noopener"}

Session Type:
Presentation and Demo

Audience:
LLM app developers, data scientists, data engineers

Technical Level:
Beginner - Intermediate

Prerequisites:
None

Resources

📺 Presentation: Data Prep for LLM Applications with Docling - Part 1{:target="_blank" rel="noopener"}

💻 Code

https://github.com/sujee/data-prep-kit-examples

Docling code examples are in /docling{:target="_blank" rel="noopener"}

You can walk through the README to setup a local Docling environment.

Let's start with this notebook : docling_1_intro.ipynb{:target="_blank" rel="noopener"} - We can execute this on Google colab

Support and Community

🙋 Ask questions, get help, give us feedback at Data Prep Kit discussion forum{:target="_blank" rel="noopener"}

Speaker: Sujee Maniyam

AI Engineer, Developer Advocate @ Node51 (Consulting for IBM / The AI Alliance)

Sujee Maniyam is an expert in Generative AI, Machine Learning, Deep Learning, Big Data, Distributed Systems, and Cloud technologies. He is passionate about developer education, fostering community engagement. Sujee has led numerous training sessions, hackathons, and workshops. He is also an author, open source contributor and frequent speaker at conferences and meetups.

[email protected]   •   Linkedin{:target="_blank" rel="noopener"}   •   💼 portfolio{:target="_blank" rel="noopener"}