- Introduction
- Prerequisites
- File Structure
- Code Overview
- How to Run the Code
- Acknowledgements
This project consists of Python code designed to visualize embeddings of Chinese surnames using t-SNE (t-distributed Stochastic Neighbor Embedding). It is implemented in a Jupyter Notebook, with cells that import required packages, define functions, read and preprocess data, perform t-SNE transformations, and finally, plot the results.
- Python 3.x
- Pandas
- OpenAI
- scikit-learn
- Matplotlib
- Plotly (optional)
- A text file named
baijiaxing.txtcontaining Chinese surnames. - OpenAI API key
readme.md: This readme file.project_notebook.ipynb: Jupyter notebook containing the code.
Various Python packages are imported for data manipulation, plotting, and API requests.
import pandas as pd
import openai
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
import matplotlib.font_manager as fmInitialize the OpenAI API with your key.
openai.api_key = "YOUR_API_KEY_HERE"get_embedding(): Calls the OpenAI API to get the embedding for a given surname.
- Reads a file
baijiaxing.txtcontaining Chinese surnames and processes it into a DataFrame.
- Performs a t-SNE transformation on the embeddings to create 2D and 3D representations.
- Utilizes Matplotlib for a 2D scatter plot.
- Utilizes Plotly for an optional 3D scatter plot.
- Clone the repository.
- Install the prerequisites mentioned above.
- Replace
"YOUR_API_KEY_HERE"with your OpenAI API key. - Place the
baijiaxing.txtfile in the same directory as the Jupyter notebook. - Run all the cells in the notebook to execute the code.
The code uses OpenAI’s API to fetch embeddings and scikit-learn's t-SNE algorithm for dimensionality reduction.
Last Updated: September 5, 2023