GitHub - anandroid/pythia: A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

This is an implementation for vision and language multimodal research developed on top of Pythia

Model Implementation

We have proposed few major improvements over Pythia, which is an implementation for the LoRRA model. The dynamic answering space is expanded by adding an Instance segmentation module. We have also replaced the existing OCR with spell correcting OCR and add the spatial features of the OCR. We have also implemented n-gram to the modified OCR.

Test

For testing run our notebook

Name		Name	Last commit message	Last commit date
Latest commit History 341 Commits
.circleci		.circleci
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
configs		configs
docs		docs
pythia		pythia
tests		tests
tools		tools
.editorconfig		.editorconfig
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model Implementation

Test

About

Releases

Packages

Languages

License

anandroid/pythia

Folders and files

Latest commit

History

Repository files navigation

Model Implementation

Test

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages