Chinese Surname Visualization

1. Introduction

This project consists of Python code designed to visualize embeddings of Chinese surnames using t-SNE (t-distributed Stochastic Neighbor Embedding). It is implemented in a Jupyter Notebook, with cells that import required packages, define functions, read and preprocess data, perform t-SNE transformations, and finally, plot the results.

2. Prerequisites

Software and Packages

Python 3.x
Pandas
OpenAI
scikit-learn
Matplotlib
Plotly (optional)

Data

A text file named baijiaxing.txt containing Chinese surnames.
OpenAI API key

3. File Structure

readme.md: This readme file.
project_notebook.ipynb: Jupyter notebook containing the code.

4. Code Overview

Import Statements

Various Python packages are imported for data manipulation, plotting, and API requests.

import pandas as pd
import openai
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm

API Initialization

Initialize the OpenAI API with your key.

openai.api_key = "YOUR_API_KEY_HERE"

Functions

get_embedding(): Calls the OpenAI API to get the embedding for a given surname.

Data Preprocessing

Reads a file baijiaxing.txt containing Chinese surnames and processes it into a DataFrame.

t-SNE Transformation

Performs a t-SNE transformation on the embeddings to create 2D and 3D representations.

Visualization

Utilizes Matplotlib for a 2D scatter plot.
Utilizes Plotly for an optional 3D scatter plot.

5. How to Run the Code

Clone the repository.
Install the prerequisites mentioned above.
Replace "YOUR_API_KEY_HERE" with your OpenAI API key.
Place the baijiaxing.txt file in the same directory as the Jupyter notebook.
Run all the cells in the notebook to execute the code.

6. Acknowledgements

The code uses OpenAI’s API to fetch embeddings and scikit-learn's t-SNE algorithm for dimensionality reduction.

Last Updated: September 5, 2023

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
baijiaxing.txt		baijiaxing.txt
cn_surname_visualization.ipynb		cn_surname_visualization.ipynb
embedded_cn_surname.csv		embedded_cn_surname.csv
embedded_cn_surname_with_pinyin.csv		embedded_cn_surname_with_pinyin.csv
embedded_cn_surname_without_pinyin.csv		embedded_cn_surname_without_pinyin.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chinese Surname Visualization

Table of Contents

1. Introduction

2. Prerequisites

Software and Packages

Data

3. File Structure

4. Code Overview

Import Statements

API Initialization

Functions

Data Preprocessing

t-SNE Transformation

Visualization

5. How to Run the Code

6. Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Chinese Surname Visualization

Table of Contents

1. Introduction

2. Prerequisites

Software and Packages

Data

3. File Structure

4. Code Overview

Import Statements

API Initialization

Functions

Data Preprocessing

t-SNE Transformation

Visualization

5. How to Run the Code

6. Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages