|
| 1 | +# Data Preparation with Docling (2025-Jan-30) |
| 2 | + |
| 3 | +<!-- ## 🔗 [#](#) --> |
| 4 | + |
| 5 | +<!-- <img src="../assets/qrcode_2025-02-27__data-prep-review.png" width="400px"> --> |
| 6 | + |
| 7 | +## Event Details |
| 8 | + |
| 9 | +[Event information](https://www.meetup.com/ibm-developer-sf-bay-area-meetup/events/305798918/){:target="_blank" rel="noopener"}<br> |
| 10 | +🗓️: **Thursday Jan 30, 2025** <br> |
| 11 | +⏰: **9 am PST / 11 am CST / 12 pm EST / 5pm GMT**<br> |
| 12 | +Duration: **1 hour** |
| 13 | + |
| 14 | +**[Event recording is available](https://www.youtube.com/watch?v=RyapKHqom9Q){:target="_blank" rel="noopener"}** |
| 15 | + |
| 16 | +**[Check resources](#resources)** - code, presentation slides ..etc |
| 17 | + |
| 18 | +--- |
| 19 | + |
| 20 | +## Agenda |
| 21 | + |
| 22 | +- Welcome |
| 23 | +- Quick intro about AI Alliance (3 min) |
| 24 | +- Presentation: Data preparation with Docling (40 mins) |
| 25 | +- Q&A (10 mins) |
| 26 | +- Wrap-up |
| 27 | + |
| 28 | +## Session: Data preparation with Docling |
| 29 | + |
| 30 | +### About |
| 31 | + |
| 32 | +When building machine learning applications, a significant portion of your time will be dedicated to data wrangling - starting with content extraction from various documents like PDF, DOCX ..etc. |
| 33 | + |
| 34 | +Docling is an open source, versatile document processor that handles various file types. |
| 35 | + |
| 36 | +In this session, I will introduce Docling and walk through some its core features. And will show a demo (attendees can run it alongside) |
| 37 | + |
| 38 | + |
| 39 | +More about [docking](https://github.com/DS4SD/docling){:target="_blank" rel="noopener"} |
| 40 | + |
| 41 | + |
| 42 | +**Session Type:** |
| 43 | +Presentation and Demo |
| 44 | + |
| 45 | +**Audience**: |
| 46 | +LLM app developers, data scientists, data engineers |
| 47 | + |
| 48 | +**Technical Level**: |
| 49 | +Beginner - Intermediate |
| 50 | + |
| 51 | +**Prerequisites**: |
| 52 | +None |
| 53 | + |
| 54 | +### Resources |
| 55 | + |
| 56 | + |
| 57 | +📺 **Presentation**: [Data Prep for LLM Applications with Docling - Part 1](https://docs.google.com/presentation/d/1SkghvqrdTo9wIAye36jO_KNVTWbO6v5bqfVI7CWA3-g/edit?usp=sharing){:target="_blank" rel="noopener"} |
| 58 | + |
| 59 | + |
| 60 | +💻 **Code** |
| 61 | + |
| 62 | +https://github.com/sujee/data-prep-kit-examples |
| 63 | + |
| 64 | + |
| 65 | +Docling code examples are in [/docling](https://github.com/sujee/data-prep-kit-examples/tree/main/docling){:target="_blank" rel="noopener"} |
| 66 | + |
| 67 | +You can walk through the README to setup a local Docling environment. |
| 68 | + |
| 69 | +Let's start with this notebook : [docling_1_intro.ipynb](https://github.com/sujee/data-prep-kit-examples/blob/main/docling/docling_1_intro.ipynb) | [](https://colab.research.google.com/github/sujee/data-prep-kit-examples/blob/main/docling/docling_1_intro.ipynb){:target="_blank" rel="noopener"} |
| 70 | + |
| 71 | +#### Support and Community |
| 72 | + |
| 73 | +🙋 Ask questions, get help, give us feedback at [Data Prep Kit discussion forum](https://github.com/IBM/data-prep-kit/discussions){:target="_blank" rel="noopener"} |
| 74 | + |
| 75 | +## Speaker: Sujee Maniyam |
| 76 | + |
| 77 | +**AI Engineer, Developer Advocate @ Node51 (Consulting for [IBM / The AI Alliance](https://thealliance.ai/))** <br> |
| 78 | + |
| 79 | +Sujee Maniyam is an expert in Generative AI, Machine Learning, Deep Learning, Big Data, Distributed Systems, and Cloud technologies. He is passionate about developer education, fostering community engagement. Sujee has led numerous training sessions, hackathons, and workshops. He is also an author, open source contributor and frequent speaker at conferences and meetups. |
| 80 | + |
| 81 | + |
| 82 | +<img src="../assets/linkedin.svg" width="16 px"> [Linkedin](https://www.linkedin.com/in/sujeemaniyam/){:target="_blank" rel="noopener"} • |
| 83 | +💼 [portfolio](https://sujee.dev/portfolio){:target="_blank" rel="noopener"} |
0 commit comments