Skip to content

Commit b29f69e

Browse files
authored
CU-869aa22g2 Add ElasticSearch bits from working_with_cogstack (#123)
* CU-869aa22g2: Add ES requirement * CU-869aa22g2: Add initial README * CU-869aa22g2: Add (slightly) converted credentials * CU-869aa22g2: Add cogstack module ported from WWC * CU-869aa22g2: Add search template notebook * CU-869aa22g2: Add ipython dependency (for cogstack and notebook) * CU-869aa22g2: Add tqdm dependency * CU-869aa22g2: Add pandas dependency * CU-869aa22g2: Add a few initial tests * CU-869aa22g2: Remove a bunch of extra whitespace * CU-869aa22g2: Add ruff dependecny * CU-869aa22g2: Run ruff on cogstack module * CU-869aa22g2: Move to primitives collections for type hinting * CU-869aa22g2: Some further linting changes * CU-869aa22g2: Refactor cogstack module to make a little more sense * CU-869aa22g2: Rename some methods for better descriptions * CU-869aa22g2: Rename a method name for better descriptions * CU-869aa22g2: Minor whitespace fix * CU-869aa22g2: Remove path add in notebook * CU-869aa22g2: Remove unused import from notebook * CU-869aa22g2: Import username and password from credentials in case they are neded * CU-869aa22g2: Add nbconvert dev-dependency * CU-869aa22g2: Add default indices to get fields for * CU-869aa22g2: Improve error handling (avoid hiding stack trace) * CU-869aa22g2: Add default indices in notebook example * CU-869aa22g2: Update progress bar handling during exception handling * CU-869aa22g2: Add default indices in notebook examples * CU-869aa22g2: Add data folder * CU-869aa22g2 Fix data folder in notebook * CU-869aa22g2: Add initial notebook tests * CU-869aa22g2: Simplify test slightly * CU-869aa22g2: Remove test-time debug output * CU-869aa22g2: Add assertion and removal of data file created by notebook * CU-869aa22g2: Add initial workflow * CU-869aa22g2: Fix workflow working directory * CU-869aa22g2: Add OpenSearch dependency * CU-869aa22g2: Allow OpenSearch to be used instead of ES * CU-869aa22g2: Add missing ES/OS helpers import * CU-869aa22g2: Fix typo in variable name * CU-869aa22g2: Fix test time mocking * CU-869aa22g2: Add minimal permissions to workflow * CU-869aa22g2: Increase flexibility of scanning * CU-869aa22g2: Fail upon too large a size when scanning * CU-869aa22g2: Increase flexibility of scrolling * CU-869aa22g2: Fail upon too large a size when scrolling * CU-869aa22g2: Increase flexibility when reading data with sorting * CU-869aa22g2: Fail upon too large a size when sorting * CU-869aa22g2: Handle index not found better * CU-869aa22g2: Improve bad request erro handling * CU-869aa22g2: Add some end to end tests * CU-869aa22g2: Remove debug file * CU-869aa22g2: Fix ES9 install for local tests * CU-869aa22g2: Remove unnecessary files * CU-869aa22g2: Make OS run on same port as ES for tests * Use OpenSearch when ES not available * CU-869aa22g2: Fix OS import * CU-869aa22g2: Add separate workflow for OS * CU-869aa22g2: Improve OS-based tests * CU-869aa22g2: Fix count for OS * CU-869aa22g2: Expand OS support * CU-869aa22g2: Fix included fields for OS * CU-869aa22g2: Update search for OS (timeout string vs number) * CU-869aa22g2: Fix some scrolling issues for OS * CU-869aa22g2: Make scroll more flexible with OS * Improve OS support when sorting * Fix count when doing scan * Fix typing when scanning * Fix some minor typing issues with progress bar * CU-869aa22g2: Remove credentials module * CU-869aa22g2: Begin moving to a optional ES/OS approach. Suggest what to install if nothing found at import time * CU-869aa22g2: Move to a pyproject.toml based package * CU-869aa22g2: Update Readme somewhat * CU-869aa22g2: Update python versions in CI (remove 3.9, add 3.13) * CU-869aa22g2: Update workflow to pyproject.toml based install * CU-869aa22g2: Simplify imports * CU-869aa22g2: Move to a folder structure for source * CU-869aa22g2: Expose class from package level * CU-869aa22g2: Add separate ES implementation * CU-869aa22g2: Add separate OS implementation * CU-869aa22g2: Separate OS and ES implementation and usage * CU-869aa22g2: Run mypy with OS and ES in workflow * CU-869aa22g2: Remove commented code * CU-869aa22g2: Add module import time exception if no back end available * CU-869aa22g2: Expose print_dataframe from package root * CU-869aa22g2: Update notebook examples to newer format * CU-869aa22g2: Update notebook again * CU-869aa22g2: Add search results folder * CU-869aa22g2: Update file naming in search template * CU-869aa22g2: Add module to read credentials from env values * CU-869aa22g2: Fix mocking and update paths in notebook tests * CU-869aa22g2: Update tests in line with recent changes * CU-869aa22g2: Update local tests in line with recent changes * CU-869aa22g2: Avoid specifying ports twice for OS * CU-869aa22g2: Fix small issue with search after for OS * CU-869aa22g2: Improve OS query in scan * CU-869aa22g2: Fix query kwarg in scan on OS * CU-869aa22g2: Fix source kwarg in scan on OS * CU-869aa22g2: Fix dupplicate args for search in OS * CU-869aa22g2: Fix sort on OS * CU-869aa22g2: Update workflow to check types nad lint for the correct folder * CU-869aa22g2: Whitespace change for test module * CU-869aa22g2: Update mocks to work with OS * CU-869aa22g2: Fix typo in OS class name * CU-869aa22g2: Fix OS mocking in NB tests * CU-869aa22g2: Expose read_from_env as package level method * CU-869aa22g2: Update notebook to expose username/password from env * CU-869aa22g2: Fix some tests for scan and OS * CU-869aa22g2: Fix setup mocks for OS * CU-869aa22g2: Add workflow to push to TestPyPI * CU-869aa22g2: Add permissions to push to TestPyPI to workflow * CU-869aa22g2: Add full on release workflow * CU-869aa22g2: Update credentials to use ID and API key and/or encoded values * CU-869aa22g2: Remove commented code * CU-869aa22g2: Add a few doc strings * CU-869aa22g2: Update readme with credentials details * CU-869aa22g2: Make version dynamic * CU-869aa22g2: Force use of setuptools_scm for dynamic versioning * CU-869aa22g2: Update pyproject.toml with versioning instructions * CU-869aa22g2: Update tests for changed environmental variable names * CU-869aa22g2: Specify version of OpenSearch for end to end tests
1 parent a83db7c commit b29f69e

File tree

19 files changed

+2668
-0
lines changed

19 files changed

+2668
-0
lines changed
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
name: cogstack-es - Test
2+
3+
on:
4+
push:
5+
branches: [ main ]
6+
pull_request:
7+
paths:
8+
- 'cogstack-es/**'
9+
- '.github/workflows/cogstack-es**'
10+
11+
defaults:
12+
run:
13+
working-directory: ./cogstack-es
14+
15+
permissions:
16+
id-token: write
17+
18+
jobs:
19+
types-only-with-ES-and-OS:
20+
runs-on: ubuntu-latest
21+
permissions:
22+
contents: read
23+
strategy:
24+
matrix:
25+
python-version: [ '3.10', '3.11', '3.12', '3.13' ]
26+
max-parallel: 4
27+
28+
steps:
29+
- uses: actions/checkout@v5
30+
- name: Set up Python ${{ matrix.python-version }}
31+
uses: actions/setup-python@v5
32+
with:
33+
python-version: ${{ matrix.python-version }}
34+
- name: Install dependencies
35+
run: |
36+
python -m pip install --upgrade pip
37+
python -m pip install ".[dev,OS,ES9]"
38+
- name: Check types
39+
run: |
40+
python -m mypy --follow-imports=normal src/cogstack
41+
42+
types-lint-tests:
43+
runs-on: ubuntu-latest
44+
needs: types-only-with-ES-and-OS
45+
permissions:
46+
contents: read
47+
strategy:
48+
matrix:
49+
python-version: [ '3.10', '3.11', '3.12', '3.13' ]
50+
install-target: [ "ES9", "ES8", "OS", ]
51+
max-parallel: 4
52+
53+
steps:
54+
- uses: actions/checkout@v5
55+
- name: Set up Python ${{ matrix.python-version }}
56+
uses: actions/setup-python@v5
57+
with:
58+
python-version: ${{ matrix.python-version }}
59+
- name: Install dependencies
60+
run: |
61+
python -m pip install --upgrade pip
62+
python -m pip install ".[dev,${{ matrix.install-target }}]"
63+
- name: Lint
64+
run: |
65+
ruff check src/cogstack
66+
- name: Test
67+
run: |
68+
pytest tests
69+
70+
publish-to-test-PyPI:
71+
runs-on: ubuntu-latest
72+
needs: types-lint-tests
73+
steps:
74+
- name: Checkout main
75+
uses: actions/checkout@v5
76+
with:
77+
fetch-depth: 0 # fetch all history
78+
fetch-tags: true # fetch tags explicitly
79+
80+
- name: Set up Python
81+
uses: actions/setup-python@v6
82+
with:
83+
python-version: '3.10'
84+
85+
- name: Install dependencies
86+
run: |
87+
python -m pip install --upgrade pip
88+
python -m pip install --upgrade build
89+
90+
- name: Set timestamp-based dev version
91+
run: |
92+
TS=$(date -u +"%Y%m%d%H%M%S")
93+
echo "SETUPTOOLS_SCM_PRETEND_VERSION_FOR_COGSTACK_ES=0.1.1.dev${TS}" >> $GITHUB_ENV
94+
95+
- name: Install package in development mode
96+
run: |
97+
pip install -e ".[dev,ES9,OS]"
98+
99+
- name: Build package
100+
run: |
101+
python -m build
102+
103+
- name: Publish distribution to TestPyPI
104+
uses: pypa/gh-action-pypi-publish@release/v1
105+
with:
106+
repository_url: https://test.pypi.org/legacy/
107+
packages_dir: cogstack-es/dist
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
name: cogstack-es release-build
2+
3+
on:
4+
push:
5+
tags:
6+
- 'cogstack-es/v*.*.*'
7+
8+
permissions:
9+
id-token: write
10+
11+
defaults:
12+
run:
13+
working-directory: ./cogstack-es
14+
15+
jobs:
16+
test-and-publish-to-PyPI:
17+
runs-on: ubuntu-latest
18+
steps:
19+
- name: Checkout main
20+
uses: actions/checkout@v5
21+
22+
- name: Release Tag
23+
# If GITHUB_REF=refs/tags/cogstack-es/v0.1.2, this returns v0.1.2. Note it's including the "v" though it probably shouldnt
24+
run: echo "RELEASE_VERSION=${GITHUB_REF##refs/*/}" >> $GITHUB_ENV
25+
26+
- name: Set up Python
27+
uses: actions/setup-python@v6
28+
with:
29+
python-version: '3.10'
30+
31+
- name: Install dependencies
32+
run: |
33+
python -m pip install --upgrade pip
34+
python -m pip install --upgrade build
35+
36+
- name: Install client package in development mode
37+
run: |
38+
pip install -e ".[dev,ES9,OS]"
39+
40+
- name: Test
41+
run: |
42+
pytest tests
43+
44+
- name: Build client package
45+
run: |
46+
python -m build
47+
48+
- name: Publish production distribution to PyPI
49+
uses: pypa/gh-action-pypi-publish@release/v1
50+
with:
51+
packages_dir: cogstack-es/dist

cogstack-es/ReadMe.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
2+
# Login and search
3+
This project is responsible for logging in and performing a search for Elasticsearch or Opensearch.
4+
5+
# Installation
6+
7+
This package is distributed through PyPI and can be installed using one of:
8+
```
9+
pip install cogstack-es[ES9] # For Elasticsearch 9
10+
pip install cogstack-es[ES8] # For Elasticsearch 8
11+
pip install cogstack-es[OS] # For Opensearch
12+
```
13+
14+
PS:
15+
After installation, the import still remains `import cogstack` even though the installed package is called `cogstack-es`.
16+
17+
## Login details
18+
You need to get your login details and host from your administrator.
19+
This is usually an API key.
20+
There is also a mechanism for reading hosts and credentials from environmental variables:
21+
```python
22+
from cogstack import read_from_env, CogStack
23+
hosts, api_key, (username, password) = read_from_env()
24+
# subsequently use one of
25+
cs = CogStack.with_api_key_auth(hosts=hosts, api_key=api_key)
26+
#cs = CogStack.with_basic_auth(hosts=hosts, username=username, password=password)
27+
```
28+
The `read_from_env` method will read the data from the following environmental variables:
29+
30+
| Environmetnal variable name | Description | Example value |
31+
| ------------------------------ | ----------------------------------- | ------------- |
32+
| `COGSTACK_HOSTS` | The host addresses, comma separated | `http://localhost:9200,http://localhost:9201` |
33+
| `COGSTACK_USERNAME` | The username for basic auth | `user123` |
34+
| `COGSTACK_PASSWORD` | The password for basic auth | `sup3rsecur3-pw#946` |
35+
| `COGSTACK_API_KEY_ID` | The API key ID for authentiaction | `l0cGtvtlw1lbsyClOm6w` |
36+
| `COGSTACK_API_KEY` | The unencoded API key for authentiaction with the ID | `I01NJf4Z6yvXyXThh1676g` |
37+
| `COGSTACK_API_KEY_ENCODED` | The encoded API key for authentiaction with just the API key | `ZZpwMtW3ky6Tw9KEtfavVzTP0JcrC7iLnVf7zXbqAh70A15VKJwHd5YX3J==` |
38+
39+
40+
__Note__: If these fields are left blank then the user will be prompted to enter the details themselves.
41+
42+
If you are unsure about the above information please contact your CogStack system administrator.
43+
44+
## How to build a Search query
45+
46+
A core component of cogstack is Elasticsearch which is a search engine built on top of Apache Lucene.
47+
48+
Lucene has a custom query syntax for querying its indexes (Lucene Query Syntax). This query syntax allows for features such as Keyword matching, Wildcard matching, Regular expression, Proximity matching, Range searches.
49+
50+
Full documentation for this syntax is available as part of Elasticsearch [query string syntax](https://www.elastic.co/guide/en/elasticsearch/reference/8.5/query-dsl-query-string-query.html#query-string-syntax).

cogstack-es/data/.keep

Whitespace-only changes.

cogstack-es/data/cogstack_search_results/.keep

Whitespace-only changes.

cogstack-es/pyproject.toml

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
[build-system]
2+
requires = ["setuptools>=61.0", "wheel", "setuptools_scm>=8"]
3+
build-backend = "setuptools.build_meta"
4+
5+
[project]
6+
name = "cogstack-es"
7+
dynamic = ["version"]
8+
description = "ElasticSearch or OpenSearch wrapper for CogStack deployments"
9+
readme = "ReadMe.md"
10+
authors = [{ name = "Mart Ratas", email = "[email protected]" }]
11+
license = { text = "Apache-2.0" }
12+
requires-python = ">=3.10"
13+
classifiers = [
14+
"Development Status :: 3 - Alpha",
15+
"Intended Audience :: Science/Research",
16+
"Topic :: Scientific/Engineering :: Artificial Intelligence",
17+
"Programming Language :: Python :: 3",
18+
"Programming Language :: Python :: 3.10",
19+
"Programming Language :: Python :: 3.11",
20+
"Programming Language :: Python :: 3.12",
21+
"Programming Language :: Python :: 3.13",
22+
"License :: OSI Approved :: Apache Software License"
23+
]
24+
25+
dependencies = [
26+
"tqdm>=4.64,<5.0",
27+
"pandas>=2.2,<3.0",
28+
"ipython",
29+
]
30+
31+
[project.optional-dependencies]
32+
dev = [
33+
"mypy",
34+
"pandas-stubs",
35+
"types-tqdm",
36+
"pytest",
37+
"ruff",
38+
"nbconvert",
39+
]
40+
ES8 = [
41+
"elasticsearch>=8.0.0,<9.0",
42+
]
43+
ES9 = [
44+
"elasticsearch>=9.0.0,<10.0",
45+
]
46+
OS = [
47+
"opensearch-py>=2.0.0,<3.0",
48+
]
49+
50+
[tool.setuptools]
51+
package-dir = {"" = "src"}
52+
53+
[project.urls]
54+
Homepage = "https://github.com/CogStack/cogstack-nlp/tree/main/cogstack-es"
55+
Repository = "https://github.com/CogStack/cogstack-nlp/tree/main/cogstack-es"
56+
Issues = "https://github.com/CogStack/cogstack-nlp/issues"
57+
58+
[tool.setuptools_scm]
59+
root = ".."
60+
tag_regex = "^cogstack-es/v(?P<version>[0-9]+(?:\\.[0-9]+)*)$"
61+
fallback_version = "0.1.0.dev0"

0 commit comments

Comments
 (0)