Skip to content

Commit 2837df1

Browse files
committed
added 2025-03-13 session
1 parent eb129e4 commit 2837df1

File tree

2 files changed

+89
-0
lines changed

2 files changed

+89
-0
lines changed

assets/docling_processing.png

472 KB
Loading
+89
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Hands on With Docling (2025 Mar 13)
2+
3+
<!-- ## 🔗 [tinyurl.com/jzbvaeak](https://tinyurl.com/jzbvaeak) -->
4+
5+
<!-- <img src="../assets/qrcode_2025-02-27__data-prep-review.png" width="400px"> -->
6+
7+
## Event Details
8+
9+
[Event sign up](https://www.meetup.com/ibm-developer-sf-bay-area-meetup/events/306535130){:target="_blank" rel="noopener"}<br>
10+
🗓️: **March 13, 2025 Thursday**<br>
11+
⏰: **9 am PST / 11 am CST / 12 pm EST / 5pm GMT**
12+
Duration: **1 hour**
13+
14+
**Event recording will be available soon**
15+
16+
**[Check resources](#resources)** - code, presentation slides ..etc
17+
18+
**[Q & A section](#q--a)**
19+
20+
---
21+
22+
23+
## Agenda
24+
25+
- Welcome, housekeeping, etc.
26+
- Quick intro about AI Alliance (3 min)
27+
- Hands on workshop on Docling (40 mins)
28+
- Q&A (10 mins)
29+
- Wrap-up
30+
31+
## Workshop: Hands-on with Docling
32+
33+
![](../assets/docling_processing.png)
34+
35+
36+
### Overview
37+
38+
When building machine learning and data applications, a significant portion of your time will be dedicated to data wrangling - from content extraction and cleaning up data. This session introduces Dockling - a robust, open source tool, designed to handle many types of document formats including PDF, DOCX, HTML and PPTX. Attendees will learn first hand how to use Docling to extract and cleanup data from various documents
39+
40+
### Description
41+
42+
Docling is a versatile document processor that handles various file types, including PDF, HTML, and DOCX. It can handle complex document structures like tables, multi-column format etc. It can even extract text from scanned documents. Docling is open source and easy to use.
43+
44+
More about docking: https://github.com/DS4SD/docling
45+
46+
Join us for this hands-on session to explore how to use Docling for your data needs..
47+
48+
In this workshop we will do the following:
49+
50+
- getting started with Docling
51+
- extracting content from various documents (PDF / HTML)
52+
- Handling table and image data
53+
- Extracting content from scanned PDF documents using OCR (Optical Character Recognition)
54+
55+
**What do you need to participate in this workshop?**
56+
57+
- Comfortable in python programming language
58+
- We will run the workshop code using Google Collab (free) - no other setup is needed!
59+
60+
**Session Type:**
61+
Hands on workshop
62+
63+
**Audience**:
64+
LLM app developers, data scientists, data engineers
65+
66+
**Technical Level**:
67+
Intermediate
68+
69+
**Prerequisites**:
70+
None
71+
72+
**Duration**
73+
45 mins
74+
75+
### Speaker: Sujee Maniyam
76+
77+
**AI Engineer, Developer Advocate @ Node51 (Consulting for [IBM / The AI Alliance](https://thealliance.ai/))** <br>
78+
79+
Sujee Maniyam is an expert in Generative AI, Machine Learning, Deep Learning, Big Data, Distributed Systems, and Cloud technologies. He is passionate about developer education, fostering community engagement. Sujee has led numerous training sessions, hackathons, and workshops. He is also an author, open source contributor and frequent speaker at conferences and meetups.
80+
81+
[email protected] &nbsp;&nbsp;
82+
<img src="../assets/linkedin.svg" width="16 px"> [Linkedin](https://www.linkedin.com/in/sujeemaniyam/){:target="_blank" rel="noopener"} &nbsp;&nbsp;
83+
[portfolio](https://sujee.dev/portfolio?utm_medium=speaker_bio&utm_source=the-ai-alliance.github.io&utm_campaign=speaking_aialliance_offie_hours){:target="_blank" rel="noopener"}
84+
85+
---
86+
87+
## Q & A
88+
89+
Please review the session recording

0 commit comments

Comments
 (0)