Skip to content

Latest commit

 

History

History
117 lines (64 loc) · 3.87 KB

2025-02-13__RAG-dataprepkit-milvus-granite.md

File metadata and controls

117 lines (64 loc) · 3.87 KB

RAG pipeline With Data Prep Kit + Milvus + Granite (2025-Feb-13)

Event Details

Event information{:target="_blank" rel="noopener"}
🗓️: Thursday Feb 13, 2025
⏰: 9 am PST / 11 am CST / 12 pm EST / 5pm GMT
Duration: 1 hour 30 mins

Event recording will be available soon

Check resources - code, presentation slides ..etc


Agenda

  • Welcome
  • Quick intro about AI Alliance (3 min)
  • Milvus introduction (15 mins)
  • Workshop: RAG pipeline With Data Prep Kit + Milvus + Granite (60 mins)
  • Q&A (10 mins)
  • Wrap-up

1 - Mivlus Introduction

by Stafan Webb @ Zilliz

Milvus is a popular, open-source vector database. In this talk Stefan will walk through some of the core features of Milvus.

presentation slides

2 - Workshop: RAG pipeline With Data Prep Kit + Milvus + Granite

by Sujee Maniyam @ Node51

In this workshop we will do the following:

  • Extract content from various documents (PDFs, DOCX, HTML) using Docling.
  • Use Data Prep Kit to streamline data preparation including markup removal, de-duplication, remove problematic data like spam, creating chunks and creating embeddings
  • Vector Database Integration: We will use Milvus - a popular open source vector DB, to manage and search vectorized data effectively.
  • And utilize an open source LLM like Meta-LLama or IBM-Granite to answer questions about documents

Here the compoents used in this RAG pipeline:

What do you need to participate in this workshop?

To get the most out of this hands-on workshop, we recommend the following

  • A laptop with Python development environment (setup instructions are here)
  • A Replicate account (FREE) - get one at replicate.com

Session Type:
hands on workshop

Audience:
LLM app developers, data scientists, data engineers

Technical Level:
Intermediate

Prerequisites:
None

Resources

📊 Presentation: RAG with Data Prepkit + Milvus + Granite

💻 Code

https://github.com/IBM/data-prep-kit

🖥️ code

Support and Community

🙋 Ask questions, get help, give us feedback at Data Prep Kit discussion forum{:target="_blank" rel="noopener"}

Speakers

Stefan Webb

Developer Evangelist @ Zilliz

Linkedin

Sujee Maniyam

AI Engineer, Developer Advocate @ Node51 (Consulting for IBM / The AI Alliance)

Sujee Maniyam is an expert in Generative AI, Machine Learning, Deep Learning, Big Data, Distributed Systems, and Cloud technologies. He is passionate about developer education, fostering community engagement. Sujee has led numerous training sessions, hackathons, and workshops. He is also an author, open source contributor and frequent speaker at conferences and meetups.

[email protected]   •   Linkedin{:target="_blank" rel="noopener"}   •   💼 portfolio{:target="_blank" rel="noopener"}