Semantic Analyzer

A Python package for analyzing semantic similarity between words and text using state-of-the-art language models.

Features

Word-level semantic analysis using Google News Word2Vec model
Text-level semantic analysis using Sentence Transformers
Flexible input options for near and far semantic relationships
JSON and table output formats
Command-line interface with intuitive options

Installation

Clone the repository:

git clone https://github.com/yourusername/semantic_analyzer.git
cd semantic_analyzer

Install using uv:

uv pip install -e .

Download the required models:
- For word analysis: Download the Google News Word2Vec model from Google's word2vec page
- For text analysis: The Sentence Transformer model will be downloaded automatically on first use

Usage

The package provides two main commands: word and text. Each command has an analyze subcommand with similar options.

Word Analysis

# Basic usage with near words (vector path required via option or environment variable)
semantic-analyzer word analyze -n "heinous" -n "cruel" -p /path/to/your/GoogleNews-vectors-negative300.bin

# Using environment variable for vector path
export WORD_VECTOR_PATH=/path/to/your/GoogleNews-vectors-negative300.bin
semantic-analyzer word analyze -n "heinous" -n "cruel"

# Using far words
semantic-analyzer word analyze -f "back" -f "inhuman" -p /path/to/your/GoogleNews-vectors-negative300.bin

# Combining near and far words
semantic-analyzer word analyze -n "heinous" -n "cruel" -f "back" -f "inhuman" -p /path/to/your/GoogleNews-vectors-negative300.bin

# JSON output format
semantic-analyzer word analyze -n "heinous" -p /path/to/your/GoogleNews-vectors-negative300.bin -o json

Options:

-n, --near-words: Words that should be semantically similar (optional)
-f, --far-words: Words that should be semantically different (optional)
-t, --top-n: Number of similar words to return (default: 5)
-o, --output-format: Output format (json or table, default: table)
-p, --vector-path: Path to the Word2Vec model file (can also be set via WORD_VECTOR_PATH env var)

Text Analysis

# Basic usage with text segments to analyze
semantic-analyzer text analyze -t "The quick brown fox jumps over the lazy dog." -t "A fast brown fox leaps over a sleepy dog."

# Using input file (CSV)
semantic-analyzer text analyze -f texts.csv

# Using input file (JSON)
semantic-analyzer text analyze -f texts.json

# Using input file (TOML)
semantic-analyzer text analyze -f texts.toml

# JSON output format
semantic-analyzer text analyze -t "The quick brown fox jumps over the lazy dog." -o json

Options:

-t, --texts: Text segments to analyze (optional if --input-file is provided)
-f, --input-file: Input file (CSV, JSON, or TOML) containing texts (optional if --texts is provided)
-o, --output-format: Output format (json or table, default: table)

Input File Formats

The tool supports reading input from CSV, JSON, or TOML files. Each command group (words, texts) requires its specific key in the input file:

CSV Format

words
"example"
"test"

JSON Format

{
    "words": ["example", "test"]
}

TOML Format

words = ["example", "test"]

# TOML also supports multiline strings in arrays:
texts = [
    """This is a long text segment
    that spans multiple lines
    but is still a single segment.""",
    "This is another text segment."
]

Note: Each command group (words, texts) requires its specific key in the input file. The examples above show the format for each type of input.

Output Format

Table Format

Word Similarity Analysis
==================================================

Near words: heinous, cruel
Far words: back, inhuman

Similar Words:
------------------------------
Word                 Score     
------------------------------
malevolent          0.8234
vicious             0.7891
brutal              0.7654

JSON Format

{
  "near_words": ["heinous", "cruel"],
  "far_words": ["back", "inhuman"],
  "similar_words": [
    {"word": "malevolent", "similarity": 0.8234},
    {"word": "vicious", "similarity": 0.7891},
    {"word": "brutal", "similarity": 0.7654}
  ]
}

Requirements

Python >= 3.11
gensim >= 4.3.3
numpy >= 1.24.0
pandas >= 2.0.0
sentence-transformers >= 2.2.0
click >= 8.1.8

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.vscode		.vscode
semantic_analyzer		semantic_analyzer
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
texts.toml		texts.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Semantic Analyzer

Features

Installation

Usage

Word Analysis

Text Analysis

Input File Formats

CSV Format

JSON Format

TOML Format

Output Format

Table Format

JSON Format

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

jedmitten/semantic_analyzer

Folders and files

Latest commit

History

Repository files navigation

Semantic Analyzer

Features

Installation

Usage

Word Analysis

Text Analysis

Input File Formats

CSV Format

JSON Format

TOML Format

Output Format

Table Format

JSON Format

Requirements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages