Remecarving is a book of lost poetry composed with the Savelost algorithm: a lossy text compression scheme that some have likened to textual seam carving. Given an input text, Savelost repeatedly selects a single character to delete, trying to keep the vector embedding of the reduced text as close as possible to the embedding of the original input. As the text is compressed further and further, entire words melt away; the boundaries between words collapse; eventually nothing remains. Along the way, fragments of poetry are occasionally thrown off.
The input texts for the 829 poems in Remecarving are phrases found on Wikipedia that seem to exhort the reader to remember. In other words, each poem in this book depicts the incremental mechanical forgetting of an idea that someone expressly wanted someone else to recall, and that at least one Wikipedia editor (perhaps more; perhaps many more) considered worthwhile to preserve.
Growing up I was warned that the Internet was forever, but lately it appears to me frighteningly fragile and transient. In that light, I think of these poems as reflections of the strange situation of computationally orchestrated collective memory in the present day: of Wikipedia (and written culture more generally) as a palimpsest; of the ultra-lossy compression that takes place during the training of language models; of that which is lost around what we manage to remember, the provenance and context and contention; of the fact that we cannot help but forget everything by default, and remember only through great and ongoing effort, often hidden away in massively multilayered infrastructure, to transmit and retransmit the erstwhile exceptions to this universal rule.
Word count, per document.body.innerText.split(/\s+/).length: 277787. Here's the source code.
Samples
Here's a few of the poems that stood out to me early in the curation process:
remember that day vividly, vividly
remember that day vividly,vividly
remember that day vividly,vividy
remember that day vividly,vivid
remember that day vividy,vivid
remember that day vivid,vivid
remember thatday vivid,vivid
remember thtday vivid,vivid
remember thday vivid,vivid
rememberthday vivid,vivid
remembertday vivid,vivid
rememberday vivid,vivid
rememberay vivid,vivid
remembery vivid,vivid
remember vivid,vivid
remember vivid,ivid
remember vividivid
remember vividiid
remember vividid
remember vividd
remember vivid
reember vivid
rember vivid
reber vivid
reer vivid
ree vivid
re vivid
r vivid
vivid
vivid
vivd
vid
vd
d
remember that before the passage of the river, it was composed of unbroken meanders
remember that before the passage of the river,it was composed of unbroken meanders
remember that before the passageof the river,it was composed of unbroken meanders
remember that before the passageof the river,it as composed of unbroken meanders
remember that before the passageof the river,it a composed of unbroken meanders
remember that before the passageof the river,it composed of unbroken meanders
remember that before the passageof the river,it composed of unbroken meanders
remember that before the passagef the river,it composed of unbroken meanders
remember that before the passage the river,it composed of unbroken meanders
remember that before the passagethe river,it composed of unbroken meanders
remember that before the passagehe river,it composed of unbroken meanders
remember that before the passagee river,it composed of unbroken meanders
remember that before the passage river,it composed of unbroken meanders
remember that before the passage river,it compose of unbroken meanders
remember that beforethe passage river,it compose of unbroken meanders
remember that beforehe passage river,it compose of unbroken meanders
remember that beforee passage river,it compose of unbroken meanders
remember that before passage river,it compose of unbroken meanders
remeber that before passage river,it compose of unbroken meanders
rember that before passage river,it compose of unbroken meanders
remer that before passage river,it compose of unbroken meanders
reme that before passage river,it compose of unbroken meanders
ree that before passage river,it compose of unbroken meanders
re that before passage river,it compose of unbroken meanders
rethat before passage river,it compose of unbroken meanders
rethat before passage river,itcompose of unbroken meanders
rethat before passage river,itcmpose of unbroken meanders
rethat before passage river,itcmose of unbroken meanders
rethat before passage river,itcmoe of unbroken meanders
rethat before passage river,itcme of unbroken meanders
rethat before passage river,itce of unbroken meanders
retht before passage river,itce of unbroken meanders
reth before passage river,itce of unbroken meanders
reth before passage river,ite of unbroken meanders
reth before passage river,it of unbroken meanders
reth before passage river,it of unbroken meander
reth before passage river,it of unbroken meaner
reth before passage river,it of unbroken meanr
reth before passage river,it o unbroken meanr
reth before passage river,it unbroken meanr
reth before passage river,it unbroken meanr
reh before passage river,it unbroken meanr
re before passage river,it unbroken meanr
re before passage river,i unbroken meanr
re before passage river, unbroken meanr
re before passage river,unbroken meanr
re before passage river,unbroken mean
re beforepassage river,unbroken mean
re beforeassage river,unbroken mean
re beforeasage river,unbroken mean
re beforesage river,unbroken mean
re beforesag river,unbroken mean
re beforeag river,unbroken mean
re beforeg river,unbroken mean
re before river,unbroken mean
r before river,unbroken mean
before river,unbroken mean
before river,unbroken mean
befor river,unbroken mean
beor river,unbroken mean
ber river,unbroken mean
be river,unbroken mean
e river,unbroken mean
river,unbroken mean
river,unbroken mean
river,unbroken men
river,unbrokenmen
river,unbrokenen
river,unbrokene
river,unbroken
river,unboken
river,unbken
river,unken
river,unke
river,uke
river,ue
river,e
river,
river
rive
rie
re
r
Remember that poetry is like a baby
Remember that poetry is like baby
Remember that poetry is like baby
Remember that poetry s like baby
Remember that poetrys like baby
Remember that poetry like baby
Remember that poetrylike baby
Remember tht poetrylike baby
Remember th poetrylike baby
Remember th poetryike baby
Remember th poetryie baby
Remember th poetrye baby
Remember th poetry baby
Remember t poetry baby
Remember poetry baby
Remember poetry baby
Remember poety baby
Remember poet baby
Remeber poet baby
Remebr poet baby
Remeb poet baby
Reme poet baby
Ree poet baby
Re poet baby
R poet baby
poet baby
poet baby
poe baby
pe baby
e baby
baby
baby
bab
bb
b
And here's a few evocative bits of language I encountered along the way:
remember he ever stoptalk,have him coldisheye
remember Peter Parker mental Spider
even dark candle see far
hostility Napoleon
remember that there were fourteen people present does indeed, on its way down, find fourteenple
Recipe
-
Search English Wikipedia for pages that seem to exhort rememberance: "remember that", "recall that", "remember to". Refine the search to exclude repetitious exhortations, arising from pages that appear in the initial search results due to certain passages of standard Wikipedia boilerplate (e.g., "Remember that Wikipedia is not a dictionary"). Parse the titles of relevant pages from the DOM and save them to several JSON files, one file per type of exhortation. All of this may be carried out more or less manually in your web browser.
-
Download the multistream XML dump of English Wikipedia and its accompanying index file. This too may be carried out manually in your web browser.
-
Use the index file to locate and selectively decompress the previously identified pages of interest within the dump. Parse the wikitext of each page using mwparserfromhell and split the resulting plaintext into sentences using nltk. Use regular expressions to identify exhortations and scoop them as cleanly as possible from their surrounding context into a JSON file: seedphrases.json. All of this may be carried out by remember.py.
-
Iterate over seedphrases.json and select exhortations to poemify via Savelost. Exclude dubious exhortations, e.g., the excessively short or long; those punctuated by angle brackets likely indicative of XML mishaps. For embedding comparisons, employ sentence_transformers and your favorite text embedding model. Then write each poem to a text file in the poems directory. All of this may be carried out by forget.py.
-
Iterate over the text files in the poems directory and identify poems worthy of collection. Render each collected poem as HTML and swap the full HTML of all collected poems into the {{poems}} substitution point in template.html, yielding an output book file: index.html. All of this may be carried out by recollect.py.
-
Manually redact and adjust to taste.
Curatorial notes
This entry's coming in pretty hot: I originally had the idea for it in December 2023 (shortly after I sent in my previous NaNoGenMo entry, Whalequest), but I only rediscovered the idea accidentally while sifting backward through the entire years-long history of a particular Discord channel a few days ago. As a result, the poems included in Remecarving are curated a bit more distantly than they might be in a tighter, more focused chapbook. In the future, I may release a more polished iteration on the same concept, provided I'm able to scrounge together a bit more time and attention to spend on forgetting.
Soundtrack to composition: Twice Around the Sun.
Remecarving is a book of lost poetry composed with the Savelost algorithm: a lossy text compression scheme that some have likened to textual seam carving. Given an input text, Savelost repeatedly selects a single character to delete, trying to keep the vector embedding of the reduced text as close as possible to the embedding of the original input. As the text is compressed further and further, entire words melt away; the boundaries between words collapse; eventually nothing remains. Along the way, fragments of poetry are occasionally thrown off.
The input texts for the 829 poems in Remecarving are phrases found on Wikipedia that seem to exhort the reader to remember. In other words, each poem in this book depicts the incremental mechanical forgetting of an idea that someone expressly wanted someone else to recall, and that at least one Wikipedia editor (perhaps more; perhaps many more) considered worthwhile to preserve.
Growing up I was warned that the Internet was forever, but lately it appears to me frighteningly fragile and transient. In that light, I think of these poems as reflections of the strange situation of computationally orchestrated collective memory in the present day: of Wikipedia (and written culture more generally) as a palimpsest; of the ultra-lossy compression that takes place during the training of language models; of that which is lost around what we manage to remember, the provenance and context and contention; of the fact that we cannot help but forget everything by default, and remember only through great and ongoing effort, often hidden away in massively multilayered infrastructure, to transmit and retransmit the erstwhile exceptions to this universal rule.
Word count, per
document.body.innerText.split(/\s+/).length: 277787. Here's the source code.Samples
Here's a few of the poems that stood out to me early in the curation process:
And here's a few evocative bits of language I encountered along the way:
remember he ever stoptalk,have him coldisheyeremember Peter Parker mental Spidereven dark candle see farhostility Napoleonremember that there were fourteen people presentdoes indeed, on its way down, findfourteenpleRecipe
Search English Wikipedia for pages that seem to exhort rememberance: "remember that", "recall that", "remember to". Refine the search to exclude repetitious exhortations, arising from pages that appear in the initial search results due to certain passages of standard Wikipedia boilerplate (e.g., "Remember that Wikipedia is not a dictionary"). Parse the titles of relevant pages from the DOM and save them to several JSON files, one file per type of exhortation. All of this may be carried out more or less manually in your web browser.
Download the multistream XML dump of English Wikipedia and its accompanying index file. This too may be carried out manually in your web browser.
Use the index file to locate and selectively decompress the previously identified pages of interest within the dump. Parse the wikitext of each page using
mwparserfromhelland split the resulting plaintext into sentences usingnltk. Use regular expressions to identify exhortations and scoop them as cleanly as possible from their surrounding context into a JSON file:seedphrases.json. All of this may be carried out byremember.py.Iterate over
seedphrases.jsonand select exhortations to poemify via Savelost. Exclude dubious exhortations, e.g., the excessively short or long; those punctuated by angle brackets likely indicative of XML mishaps. For embedding comparisons, employsentence_transformersand your favorite text embedding model. Then write each poem to a text file in thepoemsdirectory. All of this may be carried out byforget.py.Iterate over the text files in the
poemsdirectory and identify poems worthy of collection. Render each collected poem as HTML and swap the full HTML of all collected poems into the{{poems}}substitution point intemplate.html, yielding an output book file:index.html. All of this may be carried out byrecollect.py.Manually redact and adjust to taste.
Curatorial notes
This entry's coming in pretty hot: I originally had the idea for it in December 2023 (shortly after I sent in my previous NaNoGenMo entry, Whalequest), but I only rediscovered the idea accidentally while sifting backward through the entire years-long history of a particular Discord channel a few days ago. As a result, the poems included in Remecarving are curated a bit more distantly than they might be in a tighter, more focused chapbook. In the future, I may release a more polished iteration on the same concept, provided I'm able to scrounge together a bit more time and attention to spend on forgetting.
Soundtrack to composition: Twice Around the Sun.