AutoLabeler

AutoLabeler is a project designed to automate the labeling process of datasets. This readme file will guide you through setting up the project, providing configuration details, and managing your labeled and unlabeled datasets.

1. Introduction

AutoLabeler is a tool that helps streamline the data labeling process. It takes a seed dataset (seed.csv) with example labeled data and an unlabeled dataset (test.csv) and applies a labeling algorithm based on your configuration to automatically label the test dataset.

2. Getting Started

Prerequisites

Before you start, ensure you have the following:

Python 3.x
pip
evadb

Installation

Clone this repository to your local machine:

git clone https://github.com/Preethi1609/evadb-auto-label
pip install evadb

3. Project Structure

To add a new example, follow the following directory structure:

new-example/
    |-- new_example_config.json
    |-- new_example_notebook.ipynb
    |-- data/
    |   |-- seed.csv
    |   |-- test.csv

The new_example_config.json would look something like this:

{
    "task_name": "SpamClassification",
    "task_type": "classification",
    "dataset": {
    "label_column": "class",
    "label_separator": ", ",
    "delimiter": ","
    },
    "prompt": {
    "task_guidelines": "You are an expert at xxx. Your goal is to xxx. Your job is to correctly label the provided input example into one of the following categories:\n{labels}",
    "output_guidelines": "You will return the answer as a comma separated list of labels sorted in alphabetical order. For example: \"label1, label2, label3\"",
    "labels": [
        "spam",
        "ham"
    ],
    "few_shot_examples": "spam-ham-label/data/seed.csv",
    "example_template": "Input: {example}\nOutput: {labels}\n"
    }
}

4. Dataset Management

In the data/ directory, you should place your datasets:

seed.csv: This is your seed dataset with example labeled data.
test.csv: This is the unlabeled dataset that AutoLabeler will process.

5. Usage

In the Jupyter Notebook script, you will:

Establish a connection to the "evadb".
Define a function called "AutoLabel," which is implemented in the "./functions/autolabel.py" module.
Create a table to store the test data and load this data into the "evadb".
Utilize a select query on the User-Defined Function (UDF) to execute the previously defined "AutoLabel" function and display the results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AutoLabeler

Table of Contents

1. Introduction

2. Getting Started

Prerequisites

Installation

3. Project Structure

4. Dataset Management

5. Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

AutoLabeler

Table of Contents

1. Introduction

2. Getting Started

Prerequisites

Installation

3. Project Structure

4. Dataset Management

5. Usage