Correspondence Analysis Example 2025

So this is a toy repo to illustrate a cheap hack - taking tables that can be interpreted as some measure of closeness/distance between entities such as brands - and reducing that to a two dimensional map.

What This Demonstrates

Correspondence Analysis (CA) reveals hidden relationships in contingency tables by mapping categories into a low-dimensional space that preserves chi-square distances. Perfect for market research, survey analysis, and categorical data exploration.

Examplsa

There are three examples here - two from real data and one synthetic. The code to generate the synthetic data is also is the repo to give you a sense of the end to end process.

Energy Company Cross-Visitation - Website overlap patterns (42.5% + 39.5% variance)
Categorical Supplier Analysis - Business relationship mapping (44.9% + 34.1% variance)
🆕 Synthetic Airline Market Segmentation - 31.19% + 8.55% = 39.74% variance

Market Insights baked in

In the simulated airline data we bake in...

Budget vs Premium: Clear separation between Ryanair and traditional carriers
Geographic Preferences: UK business (BA/Virgin) vs European business (Lufthansa/Air France)
User Behavior: Realistic segmentation with zero crossover between budget-conscious and business travelers

Quick Start

# Clone and run
git clone https://github.com/spm1001/correspondence-example-2025.git
cd correspondence-example-2025
uv sync
uv run correspondence_demo.py  # Runs all three analyses

Individual analyses:

uv run correspondence_demo.py data.csv                           # Energy companies
uv run correspondence_demo.py co_occurrence_cat.csv             # Suppliers  
uv run correspondence_demo.py airline_usertype_contingency.csv  # Airlines (39.74% variance!)

Example output

data_correspondence_analysis.png - Energy market analysis
co_occurrence_cat_correspondence_analysis.png - B2B supplier relationships
airline_usertype_contingency_correspondence_analysis.png - Market segmentation showcase

Synthetic Data Pipeline:

uv run generate_airline_data.py          # Create 100k realistic user sessions
uv run create_proper_contingency_table.py # Transform to CA-ready format
uv run correspondence_demo.py airline_usertype_contingency.csv # Analyze

Mathematical stuff:

SVD decomposition of standardized residuals
Chi-square distance preservation
Proper eigenvalue/coordinate calculations
Equivalent to R's FactoMineR but optimized for Python

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.claude		.claude
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
airline_usertype_contingency.csv		airline_usertype_contingency.csv
airline_usertype_contingency_correspondence_analysis.png		airline_usertype_contingency_correspondence_analysis.png
co_occurrence_cat.csv		co_occurrence_cat.csv
co_occurrence_cat_correspondence_analysis.png		co_occurrence_cat_correspondence_analysis.png
correspondence_demo.py		correspondence_demo.py
create_proper_contingency_table.py		create_proper_contingency_table.py
data.csv		data.csv
data_correspondence_analysis.png		data_correspondence_analysis.png
generate_airline_data.py		generate_airline_data.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Correspondence Analysis Example 2025

What This Demonstrates

Examplsa

Market Insights baked in

Quick Start

Example output

Synthetic Data Pipeline:

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

spm1001/correspondence-example-2025

Folders and files

Latest commit

History

Repository files navigation

Correspondence Analysis Example 2025

What This Demonstrates

Examplsa

Market Insights baked in

Quick Start

Example output

Synthetic Data Pipeline:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages