Skip to content

Latest commit

 

History

History
30 lines (20 loc) · 1.42 KB

README.md

File metadata and controls

30 lines (20 loc) · 1.42 KB

Sonnet or Not, Bot?

This repository contains code and data for the following research study.

Sonnet or Not, Bot? Poetry Evaluation for Large Models and Datasets
Melanie Walsh, Anna Preus, Maria Antoniak
EMNLP Findings 2024

Please cite this paper when using resources found in this repository.



Data

The data in this repository includes:

  • 1.4k+ public domain poems tagged by poetic form by the Poetry Foundation, the Academy of American Poets, or both — with accompanying metadata such as subject tags and author birth and death dates where available
  • retrieval metadata from Dolma using the WIMBD platform including source domains for each detected poem
  • memorization predictions using n-gram overlap between true poems and generated poem continuations by GPT-4



Code

The code in this repository includes:

  • a Python notebook demonstrating how to query for data from Dolma using the WIMBD platform
  • a Python notebook analyzing the query data from Dolma
  • a Python notebeook demonstrating the memorization experiments
  • Python scripts demonstrating how to prompt models for the poetry form classifcation task
  • a Python notebook demonstrating analysis of classification results