Skip to content

Commit 5d2024b

Browse files
authored
Add filter doi2cite (#178)
1 parent 7545935 commit 5d2024b

11 files changed

+700
-0
lines changed

doi2cite/Makefile

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
DIFF ?= diff --strip-trailing-cr -u
2+
3+
test:
4+
@pandoc --lua-filter=doi2cite.lua --wrap=preserve --output=output.md sample1.md
5+
@$(DIFF) expected1.md output.md
6+
@rm -f output.md
7+
8+
expected1.md: sample1.md doi2cite.lua
9+
pandoc --lua-filter=doi2cite.lua --wrap=preserve --output $@ $<
10+
11+
expected1.pdf: sample1.md sample1.csl doi2cite.lua
12+
pandoc --lua-filter=doi2cite.lua --filter=pandoc-crossref --citeproc --csl=sample1.csl --output $@ $<
13+
14+
expected2.md: sample2.md doi2cite.lua
15+
pandoc --lua-filter=doi2cite.lua --wrap=preserve --output $@ $<
16+
17+
clean:
18+
@rm -f expected1.md
19+
@rm -f expected2.md
20+
@rm -f expected1.pdf
21+
22+
.PHONY: test

doi2cite/README.md

+74
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# pandoc-doi2cite
2+
This pandoc lua filiter helps users to insert references in a document
3+
with using DOI(Digital Object Identifier) tags. With this filter, user
4+
s do not need to make bibtex file by themselves. Instead, the filter
5+
automatically generate bib file from the DOI tags, and convert the DOI
6+
tags into citation keys available by --citeproc.
7+
8+
<img src="https://user-images.githubusercontent.com/30950088/121386635-209e2300-c985-11eb-8b1d-8d941e29d98d.png" width="960">
9+
10+
What the filter do are as follows:
11+
1. Search citations with DOI tags in the document
12+
2. Search corresponding bibtex data from `__from_DOI.bib` file
13+
3. If not found, get bibtex data of the DOI from
14+
http://api.crossref.org
15+
4. Add reference data to `__from_DOI.bib` file
16+
5. Check duplications of reference keys
17+
6. Replace DOI tags to the correspoinding citation keys
18+
19+
# Prerequisites
20+
- Pandoc version 2.0 or newer
21+
- This filter does not need any external dependencies
22+
- This filter should be executed before `pandoc-crossref` or
23+
`--citeproc`
24+
25+
# DOI tags
26+
Following DOI tags can be used:
27+
- @https://doi.org/
28+
- @doi.org/
29+
- @DOI:
30+
- @doi:
31+
32+
The first one (@https://doi.org/) may be the most useful because it is
33+
same as the accessible URL.
34+
35+
# YAML header
36+
The file **name** of the auto-generated bibliography file **MUST** be
37+
`__from_DOI.bib`, but the **place** of the file can be changed (e.g.
38+
`'./refs/__from_DOI.bib'` or `'refs\\__from_DOI.bib'` for Windows). Yo
39+
u can designate the filepath in the document yaml header. The yaml key
40+
is `bibliography`, which is also used by --citeproc.
41+
42+
# Example
43+
example1.md:
44+
```{.md}
45+
---
46+
bibliography:
47+
- 'my_refs.bib'
48+
- '__from_DOI.bib'
49+
---
50+
51+
# Introduction
52+
The Laemmli system is one of the most widely used gel systems for the
53+
separation of proteins.[@LAEMMLI_1970] By the way, Einstein is genius.
54+
[@https://doi.org/10.1002/andp.19053220607; @doi.org/10.1002/andp.1905
55+
3220806; @doi:10.1002/andp.19053221004]
56+
```
57+
58+
Example command 1 (.md -\> .md)
59+
60+
``` {.sh}
61+
pandoc --lua-filter=doi2cite.lua --wrap=preserve \
62+
-s example1.md -o expected1.md
63+
```
64+
65+
Example command 2 (.md -\> .pdf with
66+
[ACS](https://pubs.acs.org/journal/jacsat) style):
67+
68+
``` {.sh}
69+
pandoc --lua-filter=doi2cite.lua --filter=pandoc-crossref --citeproc \
70+
--csl=sample1.csl -s example1.md -o expected1.pdf
71+
```
72+
73+
Example result
74+
![expected1](https://user-images.githubusercontent.com/30950088/119964566-4d952200-bfe4-11eb-90d9-ed2366c639e8.png)

doi2cite/__from_DOI.bib

+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
@article{Einstein_1905,
2+
doi = {10.1002/andp.19053220607},
3+
url = {https://doi.org/10.1002%2Fandp.19053220607},
4+
year = 1905,
5+
publisher = {Wiley},
6+
volume = {322},
7+
number = {6},
8+
pages = {132--148},
9+
author = {A. Einstein},
10+
title = {Über einen die Erzeugung und Verwandlung des Lichtes betreffenden heuristischen Gesichtspunkt},
11+
journal = {Annalen der Physik}
12+
}
13+
@article{Einstein_1905_10.1002/andp.19053220806,
14+
doi = {10.1002/andp.19053220806},
15+
url = {https://doi.org/10.1002%2Fandp.19053220806},
16+
year = 1905,
17+
publisher = {Wiley},
18+
volume = {322},
19+
number = {8},
20+
pages = {549--560},
21+
author = {A. Einstein},
22+
title = {Über die von der molekularkinetischen Theorie der Wärme geforderte Bewegung von in ruhenden Flüssigkeiten suspendierten Teilchen},
23+
journal = {Annalen der Physik}
24+
}
25+
@article{Einstein_1905_10.1002/andp.19053221004,
26+
doi = {10.1002/andp.19053221004},
27+
url = {https://doi.org/10.1002%2Fandp.19053221004},
28+
year = 1905,
29+
publisher = {Wiley},
30+
volume = {322},
31+
number = {10},
32+
pages = {891--921},
33+
author = {A. Einstein},
34+
title = {Zur Elektrodynamik bewegter Körper},
35+
journal = {Annalen der Physik}
36+
}

doi2cite/doi2cite.lua

+252
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,252 @@
1+
--------------------------------------------------------------------------------
2+
-- Copyright © 2021 Takuro Hosomi
3+
-- This library is free software; you can redistribute it and/or modify it
4+
-- under the terms of the MIT license. See LICENSE for details.
5+
--------------------------------------------------------------------------------
6+
7+
8+
--------------------------------------------------------------------------------
9+
-- Global variables --
10+
--------------------------------------------------------------------------------
11+
base_url = "http://api.crossref.org"
12+
13+
bibname = "__from_DOI.bib"
14+
key_list = {};
15+
doi_key_map = {};
16+
doi_entry_map = {};
17+
error_strs = {};
18+
error_strs["Resource not found."] = 404
19+
error_strs["No acceptable resource available."] = 406
20+
error_strs["<html><body><h1>503 Service Unavailable</h1>\n"
21+
.."No server is available to handle this request.\n"
22+
.."</body></html>"] = 503
23+
24+
25+
--------------------------------------------------------------------------------
26+
-- Pandoc Functions --
27+
--------------------------------------------------------------------------------
28+
-- Get bibliography filepath from yaml metadata
29+
function Meta(m)
30+
local bib_data = m.bibliography
31+
local bibpaths = get_paths_from(bib_data)
32+
bibpath = find_filepath(bibname, bibpaths)
33+
bibpath = verify_path(bibpath)
34+
local f = io.open(bibpath, "r")
35+
if f then
36+
entries_str = f:read('*all')
37+
if entries_str then
38+
doi_entry_map = get_doi_entry_map(entries_str)
39+
doi_key_map = get_doi_key_map(entries_str)
40+
for doi,key in pairs(doi_key_map) do
41+
key_list[key] = true
42+
end
43+
end
44+
f:close()
45+
else
46+
make_new_file(bibpath)
47+
end
48+
end
49+
50+
-- Get bibtex data of doi-based citation.id and make bibliography.
51+
-- Then, replace "citation.id"
52+
function Cite(c)
53+
for _, citation in pairs(c.citations) do
54+
local id = citation.id:gsub('%s+', ''):gsub('%%2F', '/')
55+
if id:sub(1,16) == "https://doi.org/" then
56+
doi = id:sub(17):lower()
57+
elseif id:sub(1,8) == "doi.org/" then
58+
doi = id:sub(9):lower()
59+
elseif id:sub(1,4) == "DOI:" or id:sub(1,4) == "doi:" then
60+
doi = id:sub(5):lower()
61+
else
62+
doi = nil
63+
end
64+
if doi then
65+
if doi_key_map[doi] then
66+
citation.id = doi_key_map[doi]
67+
else
68+
local entry_str = get_bibentry(doi)
69+
if entry_str == nil or error_strs[entry_str] then
70+
print("Failed to get ref from DOI: " .. doi)
71+
else
72+
entry_str = tex2raw(entry_str)
73+
local entry_key = get_entrykey(entry_str)
74+
if key_list[entry_key] then
75+
entry_key = entry_key.."_"..doi
76+
entry_str = replace_entrykey(entry_str, entry_key)
77+
end
78+
key_list[entry_key] = true
79+
doi_key_map[doi] = entry_key
80+
citation.id = entry_key
81+
local f = io.open(bibpath, "a+")
82+
if f then
83+
f:write(entry_str .. "\n")
84+
f:close()
85+
else
86+
error("Unable to open file: "..bibpath)
87+
end
88+
end
89+
end
90+
end
91+
end
92+
return c
93+
end
94+
95+
96+
--------------------------------------------------------------------------------
97+
-- Common Functions --
98+
--------------------------------------------------------------------------------
99+
-- Get bib of DOI from http://api.crossref.org
100+
function get_bibentry(doi)
101+
local entry_str = doi_entry_map[doi]
102+
if entry_str == nil then
103+
print("Request DOI: " .. doi)
104+
local url = base_url.."/works/"
105+
..doi.."/transform/application/x-bibtex"
106+
.."?mailto="..mailto
107+
mt, entry_str = pandoc.mediabag.fetch(url)
108+
end
109+
return entry_str
110+
end
111+
112+
-- Extract designated filepaths from 1 or 2 dimensional metadata
113+
function get_paths_from(metadata)
114+
local filepaths = {};
115+
if metadata then
116+
if metadata[1].text then
117+
filepaths[metadata[1].text] = true
118+
elseif type(metadata) == "table" then
119+
for _, datum in pairs(metadata) do
120+
if datum[1] then
121+
if datum[1].text then
122+
filepaths[datum[1].text] = true
123+
end
124+
end
125+
end
126+
end
127+
end
128+
return filepaths
129+
end
130+
131+
-- Extract filename and dirname from a given a path
132+
function split_path(filepath)
133+
local delim = nil
134+
local len = filepath:len()
135+
local reversed = filepath:reverse()
136+
if filepath:find("/") then
137+
delim = "/"
138+
elseif filepath:find([[\]]) then
139+
delim = [[\]]
140+
else
141+
return {filename = filepath, dirname = nil}
142+
end
143+
local pos = reversed:find(delim)
144+
local dirname = filepath:sub(1, len - pos)
145+
local filename = reversed:sub(1, pos - 1):reverse()
146+
return {filename = filename, dirname = dirname}
147+
end
148+
149+
-- Find bibname in a given filepath list and return the filepath if found
150+
function find_filepath(filename, filepaths)
151+
for path, _ in pairs(filepaths) do
152+
local filename = split_path(path)["filename"]
153+
if filename == bibname then
154+
return path
155+
end
156+
end
157+
return nil
158+
end
159+
160+
-- Make some TeX descriptions processable by citeproc
161+
function tex2raw(string)
162+
local symbols = {};
163+
symbols["{\textendash}"] = ""
164+
symbols["{\textemdash}"] = ""
165+
symbols["{\textquoteright}"] = ""
166+
symbols["{\textquoteleft}"] = ""
167+
for tex, raw in pairs(symbols) do
168+
local string = string:gsub(tex, raw)
169+
end
170+
return string
171+
end
172+
173+
-- get bibtex entry key from bibtex entry string
174+
function get_entrykey(entry_string)
175+
local key = entry_string:match('@%w+{(.-),') or ''
176+
return key
177+
end
178+
179+
-- get bibtex entry doi from bibtex entry string
180+
function get_entrydoi(entry_string)
181+
local doi = entry_string:match('doi%s*=%s*["{]*(.-)["}],?') or ''
182+
return doi
183+
end
184+
185+
-- Replace entry key of "entry_string" to newkey
186+
function replace_entrykey(entry_string, newkey)
187+
entry_string = entry_string:gsub('(@%w+{).-(,)', '%1'..newkey..'%2')
188+
return entry_string
189+
end
190+
191+
-- Make hashmap which key = DOI, value = bibtex entry string
192+
function get_doi_entry_map(bibtex_string)
193+
local entries = {};
194+
for entry_str in bibtex_string:gmatch('@.-\n}\n') do
195+
local doi = get_entrydoi(entry_str)
196+
entries[doi] = entry_str
197+
end
198+
return entries
199+
end
200+
201+
-- Make hashmap which key = DOI, value = bibtex key string
202+
function get_doi_key_map(bibtex_string)
203+
local keys = {};
204+
for entry_str in bibtex_string:gmatch('@.-\n}\n') do
205+
local doi = get_entrydoi(entry_str)
206+
local key = get_entrykey(entry_str)
207+
keys[doi] = key
208+
end
209+
return keys
210+
end
211+
212+
-- function to make directories and files
213+
function make_new_file(filepath)
214+
if filepath then
215+
print("doi2cite: creating "..filepath)
216+
local dirname = split_path(filepath)["dirname"]
217+
if dirname then
218+
os.execute("mkdir "..dirname)
219+
end
220+
f = io.open(filepath, "w")
221+
if f then
222+
f:close()
223+
else
224+
error("Unable to make bibtex file: "..bibpath..".\n"
225+
.."This error may come from the missing directory. \n"
226+
)
227+
end
228+
end
229+
end
230+
231+
-- Verify that the given filepath is correct.
232+
-- Catch common Pandoc user mistakes about Windows-formatted filepath.
233+
function verify_path(bibpath)
234+
if bibpath == nil then
235+
print("[WARNING] doi2cite: "
236+
.."The given file path is incorrect or empty. "
237+
.."In Windows-formatted filepath, Pandoc recognizes "
238+
.."double backslash ("..[[\\]]..") as the delimiters."
239+
)
240+
return "__from_DOI.bib"
241+
else
242+
return bibpath
243+
end
244+
end
245+
246+
--------------------------------------------------------------------------------
247+
-- The main function --
248+
--------------------------------------------------------------------------------
249+
return {
250+
{ Meta = Meta },
251+
{ Cite = Cite }
252+
}

doi2cite/expected1.md

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Introduction
2+
3+
The Laemmli system is one of the most widely used gel systems for the separation of proteins.[@LAEMMLI_1970]
4+
By the way, Einstein is genius.[@Einstein_1905; @Einstein_1905_10.1002/andp.19053220806; @Einstein_1905_10.1002/andp.19053221004]

doi2cite/expected1.pdf

110 KB
Binary file not shown.

doi2cite/expected2.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Introduction
2+
3+
People sometimes make mistakes.[@DOI:10.1002/THIS.IS.NOT.VALID.DOI.SAMPLE]

0 commit comments

Comments
 (0)