First Up

RoshanGunarathna · RoshanGunarathna · commit 2a3b162a226a · 2024-11-04T11:03:27.000+05:30
diff --git a/README.md b/README.md
@@ -1,2 +1,85 @@
-# AI-Data-Formatter
-A Python-based AI-driven tool for automated data formatting and enrichment, adaptable for various applications including e-commerce product categorization, attribute generation, and more.
+# AI Data Formatter
+
+An AI-driven data formatting tool designed to process JSON data for various applications. It currently supports product data enrichment for e-commerce but can be extended to other use cases. Leveraging OpenAI's API, this tool assigns categories, generates attributes, and standardizes JSON output for enhanced usability.
+
+## Features
+- **Flexible Category Assignment**: Uses an extensive, configurable list to classify items.
+- **Random Attribute Generation**: Adds attributes like pricing, descriptions, and sizes for product data.
+- **Customizable Output Formatting**: Generates standardized JSON output with specified fields.
+- **Easy Integration**: Ready for adaptation to other data formatting needs in future versions.
+
+## Requirements
+- Python 3.8+
+- OpenAI API Key (in `.env` file)
+
+## Setup
+
+1. **Clone the repository**:
+    ```bash
+    git clone https://github.com/your-username/ai-data-formatter.git
+    cd ai-data-formatter
+    ```
+
+2. **Install dependencies**:
+    ```bash
+    pip install -r requirements.txt
+    ```
+
+3. **Set up your API Key**:
+   - Create a `.env` file in the root directory.
+   - Add your OpenAI API Key:
+     ```
+     OPENAI_API_KEY=your_openai_api_key
+     ```
+
+4. **Create Input and Output Directories**:
+   - Make a directory named `input` for input JSON files:
+     ```bash
+     mkdir input
+     ```
+   - Make a directory named `output` where processed JSON files will be saved:
+     ```bash
+     mkdir output
+     ```
+
+## Usage
+
+1. **Place Input JSON Files**: Place the JSON files you want to process in the `input` directory.
+2. **Run the Script**:
+    ```bash
+    python process_json.py
+    ```
+3. **View Processed Output**: Processed JSON files will be saved in the `output` directory with the same file names as the input files.
+
+### Example JSON Input Format
+Input JSON files should be arrays of data maps (e.g., products), containing fields such as `id` and `image_list`.
+
+```json
+[
+  {
+    "id": "123",
+    "image_list": ["image1_url", "image2_url"]
+  }
+]
+
+
+Example JSON Output Format
+The output includes enriched fields, such as category, product_name, price, and standardized attributes.
+
+[
+  {
+    "id": "123",
+    "image_list": ["image1_url", "image2_url"],
+    "category": "Sample Category",
+    "product_name": "Generated Name",
+    "price": 99.99,
+    "attributes": [{"name": "Size", "values": ["20", "30"]}],
+    "colors": [{"name": "blue", "hex_code": "0000FF"}]
+  }
+]
+
+Contributing
+This tool aims to become a common data formatter, adaptable to multiple data transformation needs. Contributions for additional features or enhancements are welcome.
+
+License
+This project is licensed under the MIT License.
diff --git a/envTemplate.file b/envTemplate.file
@@ -0,0 +1,3 @@
+OPENAI_API_KEY = "sk-"
+
+
diff --git a/main.py b/main.py
@@ -0,0 +1,118 @@
+
+from dotenv import load_dotenv, find_dotenv
+import os
+
+from langchain_core.prompts import ChatPromptTemplate
+
+from langchain_core.output_parsers import StrOutputParser
+
+from langchain_core.prompts import MessagesPlaceholder
+from langchain_core.messages import HumanMessage, SystemMessage
+
+from langchain_openai import ChatOpenAI
+
+
+import json
+
+
+
+
+# Load environment variables from .env file
+load_dotenv(find_dotenv())
+
+llm = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=os.getenv("OPENAI_API_KEY"))
+
+parser = StrOutputParser()
+
+
+# Define directories
+input_dir = './input'  # Directory with input JSON files
+output_dir = './output'  # Directory for saving output JSON files
+
+# Ensure the output directory exists
+os.makedirs(output_dir, exist_ok=True)
+
+prompt = ChatPromptTemplate.from_messages(
+                [
+                    SystemMessage(content="""
+                Category List = Macbook Air, Macbook Pro, Mac Mini, iMac, Mac Studio, iPad Air, iPad Pro, iPad 9th Gen, iPad 10th Gen, iPhone 11 Series, iPhone 12 Series, iPhone 13 Series, iPhone 14 Series, iPhone 15 Series, iPhone 16 Series, Refurbished, Apple watch series 8, Apple Watch Ultra, Apple Watch Series 9, Apple watch SE, Series 6, Series 10, AirPods Pro (2nd Generation), AirPods (2nd Generation), AirPods (3rd Generation), AirPods Max, Genuine, Samsung Galaxy M Series, Samsung Galaxy A Series, Samsung F series, Samsung Tabs, Samsung S Series, Z Series, Galaxy watch, Galaxy Buds, Google Pixel 6 Series, Google Pixel 7 Series, Google Pixel 8 Series, Google Pixel Fold, Google Pixel 9 Series, Redmi A Series, Redmi Note Series, C Series, Realme, Huawei Y Series, Huawei 9 Series, Oppo A Series, OnePlus 9 Series, OnePlus 10 Series, OnePlus Nord, Nokia 105, Tabs, G5 Series, T Series, A Series, 50 Series, 40 Series, Acer Laptops, ASUS Laptops, DELL Laptops, HP Laptops, Lenovo Laptops, MSI Laptops, UPS, For Home, For Work, For Gaming, Ink Tank Printer, Dot Matrix Printer, Plotter, Speaker, Microphones, Headset, Ear Buds, Handfree, CCTV Camera's, DVR Kits, Webcam and Accessories, Joyroom, Wireless Key Board & Mouse, KeyBoard & Mouse, Pen Drive, Micro SD Card, Converter & Cable, OTG PEN DRIVE - TYPE C l MICRO DUCO, Internal Hard Drive, External Hard Drive, RAM, SSD, Adapter And Cable, Cover & Cases, Power Bank, Tempered Glass, Cartridge, Toner, Ink Bottle, Laptop Adaptors, Laptop Display, Laptop Battery, Laptop Keyboards, OEM, MacBook Battery, MacBook Display, MacBook Motherboard, iPhone (Pre Owned / Used), Pre Owned DELL Laptops, Pre Owned HP Laptops, Pre Owned LENOVO Laptops, MacBook Air (Pre-owned / Used), MackBook Pro (Pre-Owned / Used), iPad Air Pre-owned (Used), Apple watch Pre-owned (Used), Pre-owned DELL Monitors, iPhone 11 Series (Daily Deals), iPhone 13 Series (Daily Deals), iPhone 14 Series (Daily Deals), Macbook.
+                
+                1.Assign a relevant category from the provided Category List to product in the JSON map, using the product's image URL as a reference to determine its category.
+                2.Assign a random price for each product.
+                3.Generate random values for each product's product_name and description.
+                4.Keep the "id" and "image_list" exact same as the provided one.
+                5.Format the attributes field like this:[{"name": "Size", "values": ["20", "30"]},{"name": "Capacity", "values": ["120GB", "320GB"]}].
+                6.Format the colors field like this:[{"name": "blue", "hex_code": "0000FF"},{"name": "red", "hex_code": "FF0000"}].
+                7.Keep raw string only. remove "/" and "/n".
+                8.Ensure only contain the map other any texts shuld remove. like json etc..
+                """),
+            
+            
+                MessagesPlaceholder(variable_name="inputPrompt"),
+
+                
+                ]
+            )
+
+def chat_with_ai(human_input):
+   
+    # Define the LLM chain
+    llm_chain = prompt | llm | parser
+
+
+    # Predict the output
+    output = llm_chain.invoke({"inputPrompt": [HumanMessage(content=human_input)]})
+   
+    return output
+
+
+for filename in os.listdir(input_dir):
+    if filename.endswith('.json'):
+        input_path = os.path.join(input_dir, filename)
+        output_path = os.path.join(output_dir, filename)  # Same name for output file in output directory
+        
+        
+        with open(input_path, 'r') as file:
+            data = json.load(file)
+
+            # Initialize an empty list to store responses
+        responses = []
+
+         # Process each map in the JSON file
+        for idx, map_data in enumerate(data):
+            human_message = HumanMessage(content=json.dumps(map_data))
+
+                # Get the response for the current map
+            output = chat_with_ai(human_message.content)
+                
+                # # Store the response in a dictionary
+            responses.append(output)
+            print(f"Processed map {idx+1}/{len(data)}  in file {filename}")
+
+
+            # Save the responses to the output JSON file
+                # Save the responses to an output JSON file
+        with open(output_path, 'w') as outfile:
+            outfile.write("[\n") 
+            for i, item in enumerate(responses): 
+                if i < len(responses) - 1: 
+                    outfile.write("%s,\n" % item) 
+                else: 
+                    outfile.write("%s\n" % item) 
+            outfile.write("]")
+        print(f"Finished processing {filename}, output saved to {output_path}")
+
+
+
+
+
+
+
+   
+
+
+
+
+
+
+
diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,3 @@
+langchain_core==0.3.15
+langchain_openai==0.2.5
+python-dotenv==1.0.1

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+langchain_core==0.3.15`
	`2`	`+langchain_openai==0.2.5`
	`3`	`+python-dotenv==1.0.1`