Title Matcher - CSE 272 Final Project

A project by Anthony Liu and Alex Salman

The primary task of this program is to retrieve all the names of datasets in given documents

Last modified 06/08/2021

Competition Descriptions

Kaggle competition main page

Dataset download page

Program output guide and sample

Features as of 06/09/2021

Exact Match
spaCy NER
~~Fuzzy Match~~ (disabled due to slow performance)
Custom Hyperparamters
Optional Training During Each Run

Our Design Thoughts (Brainstorming Canvas)

Link to Google Docs

Useful Shell Commands

Installing required packages

pip3 install -r requirements.txt

Store train data at location:

dataset/train/

Store test data at location:

dataset/test/

Running the program: use jpyter notebook to run

main.ipynb

FAQ

Q: Why this is an IR project instead of an hodgepodge of algorithms?

A: There are 4 components of an Information Retrieval system, "acquisition", "representation", "file organization", and "query". Although we are working primarily on string matching, this process is essential for the "query" component, where a query like "how many time XXX dataset was mentioned" is passed in. Therefore, we must devise a robust platform where documents are efficiently processed and stored, and where queries like this would receive accurate feedbacks.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.vscode		.vscode
dataset		dataset
output		output
.gitignore		.gitignore
LCS.py		LCS.py
README.md		README.md
archived.ipynb		archived.ipynb
custom_classes.py		custom_classes.py
eval.py		eval.py
main.ipynb		main.ipynb
model.py		model.py
requirements.txt		requirements.txt
test_0a2c.json		test_0a2c.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Title Matcher - CSE 272 Final Project

Competition Descriptions

Features as of 06/09/2021

Our Design Thoughts (Brainstorming Canvas)

Useful Shell Commands

FAQ

About

Releases

Packages

Languages

ArthLeu/title-matching

Folders and files

Latest commit

History

Repository files navigation

Title Matcher - CSE 272 Final Project

Competition Descriptions

Features as of 06/09/2021

Our Design Thoughts (Brainstorming Canvas)

Useful Shell Commands

FAQ

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages