Skip to content

Commit 228f27f

Browse files
committed
Add Elasticsearch generator
1 parent f2587af commit 228f27f

17 files changed

+1310
-3
lines changed

.gitignore

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# Created by https://www.gitignore.io/api/python
2+
3+
### Python ###
4+
# Byte-compiled / optimized / DLL files
5+
__pycache__/
6+
*.py[cod]
7+
*$py.class
8+
9+
# C extensions
10+
*.so
11+
12+
# Distribution / packaging
13+
.Python
14+
env/
15+
build/
16+
develop-eggs/
17+
dist/
18+
downloads/
19+
eggs/
20+
.eggs/
21+
lib/
22+
lib64/
23+
parts/
24+
sdist/
25+
var/
26+
wheels/
27+
*.egg-info/
28+
.installed.cfg
29+
*.egg
30+
31+
# PyInstaller
32+
# Usually these files are written by a python script from a template
33+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
34+
*.manifest
35+
*.spec
36+
37+
# Installer logs
38+
pip-log.txt
39+
pip-delete-this-directory.txt
40+
41+
# Unit test / coverage reports
42+
htmlcov/
43+
.tox/
44+
.coverage
45+
.coverage.*
46+
.cache
47+
nosetests.xml
48+
coverage.xml
49+
*,cover
50+
.hypothesis/
51+
52+
# Translations
53+
*.mo
54+
*.pot
55+
56+
# Django stuff:
57+
*.log
58+
local_settings.py
59+
60+
# Flask stuff:
61+
instance/
62+
.webassets-cache
63+
64+
# Scrapy stuff:
65+
.scrapy
66+
67+
# Sphinx documentation
68+
docs/_build/
69+
70+
# PyBuilder
71+
target/
72+
73+
# Jupyter Notebook
74+
.ipynb_checkpoints
75+
76+
# pyenv
77+
.python-version
78+
79+
# celery beat schedule file
80+
celerybeat-schedule
81+
82+
# SageMath parsed files
83+
*.sage.py
84+
85+
# dotenv
86+
.env
87+
88+
# virtualenv
89+
.venv
90+
venv/
91+
ENV/
92+
93+
# Spyder project settings
94+
.spyderproject
95+
.spyproject
96+
97+
# Rope project settings
98+
.ropeproject
99+
100+
# mkdocs documentation
101+
/site
102+
103+
.idea
104+
105+
# End of https://www.gitignore.io/api/python

.travis.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ install:
99
- "pip install ."
1010
- "pip3 install beautifulsoup4 Sickle validators"
1111
- "pip3 install requests-mock"
12+
- "pip3 install 'elasticsearch>=1.0.0,<2.0.0'"
13+
- "pip3 install urlib3-mock"
1214
script:
1315
- python setup.py test
1416
- python setup.py sdist

README.md

Lines changed: 77 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# py-resourcesync ![Build Status](https://travis-ci.org/resourcesync/py-resourcesync.svg?branch=master)
1+
# py-resourcesync [![Build Status](https://travis-ci.org/resourcesync/py-resourcesync.svg?branch=master)](https://travis-ci.org/resourcesync/py-resourcesync)
22

33

44
Core Python library for ResourceSync publishing
@@ -302,3 +302,79 @@ There are a few things to be aware of:
302302
4. metadata_disseminator is only required when the metadata_element field contains a record identifier that requires a base URL before it can be resolved. If metadata_element contains full URLs, leave metadata_disseminator blank.
303303
5. solr_query can be as simple as a wildcard or can reference a collection, specify a date range, etc. Refer to the documentation for the version of Solr you are running or use Solr admin to generate and test your query.
304304
6. You will need to change part of the solr_params value so that the value for the Solr sort parameter corresponds to an indexed field in your instance of Solr.
305+
306+
## Elasticsearch generator
307+
308+
The [Elasticsearch generator](resourcesync/generators/elastic_generator.py) allows a flexible use of [Elasticsearch](https://www.elastic.co/) to keep track of the state of ResourceSync resources.
309+
Data regarding resources and their changes must be recorded in Elasticsearch according to the protocol defined in the [documentation](resourcesync/generators/elastic/README.md).
310+
The code snippets below use filesystem paths for institutions that will be using `httpd` to serve their ResourceSync documents.
311+
312+
**WARNING**: this generator has been tested on Elasticsearch v1.7.5. Subsequent versions may require different mapping
313+
and queries.
314+
315+
### Installation
316+
317+
In addition to the setup instructions [above](#installation-from-source), do the following:
318+
```bash
319+
$ pip3 install 'elasticsearch>=1.0.0,<2.0.0'
320+
```
321+
322+
323+
#### Testing
324+
325+
In order to run the tests for this generator, you'll also need to do:
326+
327+
```bash
328+
$ pip3 install urlib3-mock
329+
```
330+
331+
### Usage
332+
333+
There must exist a directory at the path specified by `resource_dir`. For `httpd`:
334+
335+
```bash
336+
$ mkdir /var/www/html/resourcesync/ # create a place for the ResourceSync documents
337+
```
338+
339+
Then, with Python:
340+
341+
```python
342+
from resourcesync.resourcesync import ResourceSync
343+
from resourcesync.generators.elastic_generator import ElasticGenerator
344+
345+
httpd_document_root = '/var/www/html'
346+
resource_dir = 'resourcesync'
347+
collection_name = 'foo-set'
348+
resourcesync_url = 'http://your-resourcecync-server.edu'
349+
350+
url_prefix = '{}/{}'.format(resourcesync_url, resource_dir),
351+
strategy = 0
352+
max_items_in_list= 1000
353+
354+
params = {
355+
"resource_set": "foo-set",
356+
"resource_root_dir": "/resources/root/directory",
357+
"elastic_host": "localhost",
358+
"elastic_port": 9200,
359+
"elastic_index": "resync-index",
360+
"elastic_resource_doc_type": "resource",
361+
"elastic_change_doc_type": "change",
362+
"strategy": strategy,
363+
"url_prefix": url_prefix,
364+
"max_items_in_list": max_items_in_list
365+
}
366+
367+
368+
my_generator = ElasticGenerator(params=params)
369+
370+
rs = ResourceSync(generator=my_generator,
371+
strategy=strategy,
372+
resource_dir='{}/{}'.format(httpd_document_root, resource_dir),
373+
metadata_dir=collection_name,
374+
description_dir=httpd_document_root,
375+
url_prefix=url_prefix,
376+
max_items_in_list=max_items_in_list,
377+
is_saving_sitemaps=True)
378+
rs.execute()
379+
```
380+
NOTE: further details on the generator's parameters available in the documentation

0 commit comments

Comments
 (0)