Skip to content

Commit 2a3b162

Browse files
First Up
1 parent 1efd040 commit 2a3b162

File tree

4 files changed

+209
-2
lines changed

4 files changed

+209
-2
lines changed

README.md

Lines changed: 85 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,85 @@
1-
# AI-Data-Formatter
2-
A Python-based AI-driven tool for automated data formatting and enrichment, adaptable for various applications including e-commerce product categorization, attribute generation, and more.
1+
# AI Data Formatter
2+
3+
An AI-driven data formatting tool designed to process JSON data for various applications. It currently supports product data enrichment for e-commerce but can be extended to other use cases. Leveraging OpenAI's API, this tool assigns categories, generates attributes, and standardizes JSON output for enhanced usability.
4+
5+
## Features
6+
- **Flexible Category Assignment**: Uses an extensive, configurable list to classify items.
7+
- **Random Attribute Generation**: Adds attributes like pricing, descriptions, and sizes for product data.
8+
- **Customizable Output Formatting**: Generates standardized JSON output with specified fields.
9+
- **Easy Integration**: Ready for adaptation to other data formatting needs in future versions.
10+
11+
## Requirements
12+
- Python 3.8+
13+
- OpenAI API Key (in `.env` file)
14+
15+
## Setup
16+
17+
1. **Clone the repository**:
18+
```bash
19+
git clone https://github.com/your-username/ai-data-formatter.git
20+
cd ai-data-formatter
21+
```
22+
23+
2. **Install dependencies**:
24+
```bash
25+
pip install -r requirements.txt
26+
```
27+
28+
3. **Set up your API Key**:
29+
- Create a `.env` file in the root directory.
30+
- Add your OpenAI API Key:
31+
```
32+
OPENAI_API_KEY=your_openai_api_key
33+
```
34+
35+
4. **Create Input and Output Directories**:
36+
- Make a directory named `input` for input JSON files:
37+
```bash
38+
mkdir input
39+
```
40+
- Make a directory named `output` where processed JSON files will be saved:
41+
```bash
42+
mkdir output
43+
```
44+
45+
## Usage
46+
47+
1. **Place Input JSON Files**: Place the JSON files you want to process in the `input` directory.
48+
2. **Run the Script**:
49+
```bash
50+
python process_json.py
51+
```
52+
3. **View Processed Output**: Processed JSON files will be saved in the `output` directory with the same file names as the input files.
53+
54+
### Example JSON Input Format
55+
Input JSON files should be arrays of data maps (e.g., products), containing fields such as `id` and `image_list`.
56+
57+
```json
58+
[
59+
{
60+
"id": "123",
61+
"image_list": ["image1_url", "image2_url"]
62+
}
63+
]
64+
65+
66+
Example JSON Output Format
67+
The output includes enriched fields, such as category, product_name, price, and standardized attributes.
68+
69+
[
70+
{
71+
"id": "123",
72+
"image_list": ["image1_url", "image2_url"],
73+
"category": "Sample Category",
74+
"product_name": "Generated Name",
75+
"price": 99.99,
76+
"attributes": [{"name": "Size", "values": ["20", "30"]}],
77+
"colors": [{"name": "blue", "hex_code": "0000FF"}]
78+
}
79+
]
80+
81+
Contributing
82+
This tool aims to become a common data formatter, adaptable to multiple data transformation needs. Contributions for additional features or enhancements are welcome.
83+
84+
License
85+
This project is licensed under the MIT License.

envTemplate.file

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
OPENAI_API_KEY = "sk-"
2+
3+

main.py

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
2+
from dotenv import load_dotenv, find_dotenv
3+
import os
4+
5+
from langchain_core.prompts import ChatPromptTemplate
6+
7+
from langchain_core.output_parsers import StrOutputParser
8+
9+
from langchain_core.prompts import MessagesPlaceholder
10+
from langchain_core.messages import HumanMessage, SystemMessage
11+
12+
from langchain_openai import ChatOpenAI
13+
14+
15+
import json
16+
17+
18+
19+
20+
# Load environment variables from .env file
21+
load_dotenv(find_dotenv())
22+
23+
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=os.getenv("OPENAI_API_KEY"))
24+
25+
parser = StrOutputParser()
26+
27+
28+
# Define directories
29+
input_dir = './input' # Directory with input JSON files
30+
output_dir = './output' # Directory for saving output JSON files
31+
32+
# Ensure the output directory exists
33+
os.makedirs(output_dir, exist_ok=True)
34+
35+
prompt = ChatPromptTemplate.from_messages(
36+
[
37+
SystemMessage(content="""
38+
Category List = Macbook Air, Macbook Pro, Mac Mini, iMac, Mac Studio, iPad Air, iPad Pro, iPad 9th Gen, iPad 10th Gen, iPhone 11 Series, iPhone 12 Series, iPhone 13 Series, iPhone 14 Series, iPhone 15 Series, iPhone 16 Series, Refurbished, Apple watch series 8, Apple Watch Ultra, Apple Watch Series 9, Apple watch SE, Series 6, Series 10, AirPods Pro (2nd Generation), AirPods (2nd Generation), AirPods (3rd Generation), AirPods Max, Genuine, Samsung Galaxy M Series, Samsung Galaxy A Series, Samsung F series, Samsung Tabs, Samsung S Series, Z Series, Galaxy watch, Galaxy Buds, Google Pixel 6 Series, Google Pixel 7 Series, Google Pixel 8 Series, Google Pixel Fold, Google Pixel 9 Series, Redmi A Series, Redmi Note Series, C Series, Realme, Huawei Y Series, Huawei 9 Series, Oppo A Series, OnePlus 9 Series, OnePlus 10 Series, OnePlus Nord, Nokia 105, Tabs, G5 Series, T Series, A Series, 50 Series, 40 Series, Acer Laptops, ASUS Laptops, DELL Laptops, HP Laptops, Lenovo Laptops, MSI Laptops, UPS, For Home, For Work, For Gaming, Ink Tank Printer, Dot Matrix Printer, Plotter, Speaker, Microphones, Headset, Ear Buds, Handfree, CCTV Camera's, DVR Kits, Webcam and Accessories, Joyroom, Wireless Key Board & Mouse, KeyBoard & Mouse, Pen Drive, Micro SD Card, Converter & Cable, OTG PEN DRIVE - TYPE C l MICRO DUCO, Internal Hard Drive, External Hard Drive, RAM, SSD, Adapter And Cable, Cover & Cases, Power Bank, Tempered Glass, Cartridge, Toner, Ink Bottle, Laptop Adaptors, Laptop Display, Laptop Battery, Laptop Keyboards, OEM, MacBook Battery, MacBook Display, MacBook Motherboard, iPhone (Pre Owned / Used), Pre Owned DELL Laptops, Pre Owned HP Laptops, Pre Owned LENOVO Laptops, MacBook Air (Pre-owned / Used), MackBook Pro (Pre-Owned / Used), iPad Air Pre-owned (Used), Apple watch Pre-owned (Used), Pre-owned DELL Monitors, iPhone 11 Series (Daily Deals), iPhone 13 Series (Daily Deals), iPhone 14 Series (Daily Deals), Macbook.
39+
40+
1.Assign a relevant category from the provided Category List to product in the JSON map, using the product's image URL as a reference to determine its category.
41+
2.Assign a random price for each product.
42+
3.Generate random values for each product's product_name and description.
43+
4.Keep the "id" and "image_list" exact same as the provided one.
44+
5.Format the attributes field like this:[{"name": "Size", "values": ["20", "30"]},{"name": "Capacity", "values": ["120GB", "320GB"]}].
45+
6.Format the colors field like this:[{"name": "blue", "hex_code": "0000FF"},{"name": "red", "hex_code": "FF0000"}].
46+
7.Keep raw string only. remove "/" and "/n".
47+
8.Ensure only contain the map other any texts shuld remove. like json etc..
48+
"""),
49+
50+
51+
MessagesPlaceholder(variable_name="inputPrompt"),
52+
53+
54+
]
55+
)
56+
57+
def chat_with_ai(human_input):
58+
59+
# Define the LLM chain
60+
llm_chain = prompt | llm | parser
61+
62+
63+
# Predict the output
64+
output = llm_chain.invoke({"inputPrompt": [HumanMessage(content=human_input)]})
65+
66+
return output
67+
68+
69+
for filename in os.listdir(input_dir):
70+
if filename.endswith('.json'):
71+
input_path = os.path.join(input_dir, filename)
72+
output_path = os.path.join(output_dir, filename) # Same name for output file in output directory
73+
74+
75+
with open(input_path, 'r') as file:
76+
data = json.load(file)
77+
78+
# Initialize an empty list to store responses
79+
responses = []
80+
81+
# Process each map in the JSON file
82+
for idx, map_data in enumerate(data):
83+
human_message = HumanMessage(content=json.dumps(map_data))
84+
85+
# Get the response for the current map
86+
output = chat_with_ai(human_message.content)
87+
88+
# # Store the response in a dictionary
89+
responses.append(output)
90+
print(f"Processed map {idx+1}/{len(data)} in file {filename}")
91+
92+
93+
# Save the responses to the output JSON file
94+
# Save the responses to an output JSON file
95+
with open(output_path, 'w') as outfile:
96+
outfile.write("[\n")
97+
for i, item in enumerate(responses):
98+
if i < len(responses) - 1:
99+
outfile.write("%s,\n" % item)
100+
else:
101+
outfile.write("%s\n" % item)
102+
outfile.write("]")
103+
print(f"Finished processing {filename}, output saved to {output_path}")
104+
105+
106+
107+
108+
109+
110+
111+
112+
113+
114+
115+
116+
117+
118+

requirements.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
langchain_core==0.3.15
2+
langchain_openai==0.2.5
3+
python-dotenv==1.0.1

0 commit comments

Comments
 (0)