Releases: globalbioticinteractions/elton
0.14.3
Features
n/a
Improvements
- upgrade to globi-lib v0.27.6; include default interaction types mappi… …ngs in provenance graph; preparation to #61
- support config override for [preston stream]; related to #61 and Big-Bee-Network/bif#11 to allow for setting custom, re-usable, interaction type mappings when stream processing (possible large, GBIF large) collections of datasets and extract species interaction claims.
with this, you can do -
preston track "https://api.gbif.org/v1/dataset/d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0"\
| elton stream\
--data-dir data\
--config hash://sha256/02682fdd62a3e985dc06236662299f00ec5453c4e6f707d02efa93628f927649\
--remote https://softwareheritage.org\
| head -2\
| mlr --itsvlite --oxtab cat
producing
argumentTypeId https://en.wiktionary.org/wiki/support
sourceOccurrenceId 4a98a813-85bc-41fc-bca4-0c9937ba6966
sourceCatalogNumber UCSB-IZC 00000304
sourceCollectionCode IZC
sourceCollectionId b03a3f0c-bfa5-4e02-b5d3-56ff38626302
sourceInstitutionCode UCSB
sourceTaxonId 82680
sourceTaxonName Hemiptera
sourceTaxonRank
sourceTaxonPathIds
sourceTaxonPath Animalia | Arthropoda | Insecta
sourceTaxonPathNames kingdom | phylum | class
sourceBodyPartId
sourceBodyPartName
sourceLifeStageId
sourceLifeStageName
sourceSexId
sourceSexName
interactionTypeId http://purl.obolibrary.org/obo/RO_0002437
interactionTypeName interactsWith
targetOccurrenceId
targetCatalogNumber
targetCollectionCode
targetCollectionId
targetInstitutionCode
targetTaxonId
targetTaxonName Calandrinia menziesii
targetTaxonRank
targetTaxonPathIds
targetTaxonPath
targetTaxonPathNames
targetBodyPartId
targetBodyPartName
targetLifeStageId
targetLifeStageName
targetSexId
targetSexName
basisOfRecordId
basisOfRecordName PreservedSpecimen
http://rs.tdwg.org/dwc/terms/eventDate 2017-03-17T00:00:00Z
decimalLatitude 34.4095
decimalLongitude -119.8491
localityId
localityName UCSB Campus Lagoon Island
referenceDoi
referenceUrl https://ecdysis.org/collections/individual/index.php?occid=826935
referenceCitation UCSB-IZC 00000304 https://ecdysis.org/collections/individual/index.php?occid=826935
namespace local
citation University of California Santa Barbara Invertebrate Zoology Collection. 2025-02-03. Ecdysis Portal - 00efd88c-9458-4860-acfd-eaa043f77b7c.
archiveURI hash://sha256/14289a70968588a29f8e566053c4509e0784bded38b6ab7172569ac7ceb7cae7
lastSeenAt 2025-02-07T00:54:18.471Z
contentHash
eltonVersion 0.14.3-SNAPSHOT
Where
preston track "https://api.gbif.org/v1/dataset/d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0"
tracks the DwC-A associated with UCSB-IZC GBIF registration at https://www.gbif.org/dataset/d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0elton stream --data-dir data --config hash://sha256/02682fdd62a3e985dc06236662299f00ec5453c4e6f707d02efa93628f927649 --remote https://softwareheritage.org
takes the provenance logs (or manifests) that describe UCSB-IZC and attempt to extract interaction data from them.--config
points to a specific configuration (i.e., globi.json), identified by a hash://sha256/026... of which copies exist in https://github.com/jhpoelen/ucsb-izc-config and archive in the https://softwareheritage.org . In this case, if the referenced configuration is not available locally (in thedata/
folder) it'll be downloaded retrieved from https://softwareheritage.org archives.head -2 | mlr --itsvlite --oxtab cat
takes the first two rows (the header and one line of data), and presents them in the vertically oriented XTAB format using the mlr tool.
Bugs
n/a
0.14.2
Features
- support remote repositories for stream processing to facilitate review species interaction claims of large data corpora (e.g., GBIF, iDigBio) on regular hardware (e.g., a laptop) #52 globalbioticinteractions/globalbioticinteractions#1030 fyi @seltmann @zedomel
Example 1. Extract all interaction claims found GIB (GBIF, iDigBio, BioCASe, see https://linker.bio#use-case-3-studying-pine-pests-caused-by-weevils-curculionoidea ) corpus as seen on 2024-04-01 and described by https://linker.bio/hash://sha256/37bdd8ddb12df4ee02978ca59b695afd651f94398c0fe2e1f8b182849a876bb2
GIB_VERSION=hash://sha256/37bdd8ddb12df4ee02978ca59b695afd651f94398c0fe2e1f8b182849a876bb2
CONTENT_REPO=https://linker.bio
preston cat --remote $CONTENT_REPO $GIB_VERSION --no-cache\
| elton stream --data-dir data --remote $CONTENT_REPO --no-cache
Example 2. Review interaction claims found GIB (GBIF, iDigBio, BioCASe, see https://linker.bio#use-case-3-studying-pine-pests-caused-by-weevils-curculionoidea ) corpus as seen on 2024-04-01 and described by https://linker.bio/hash://sha256/37bdd8ddb12df4ee02978ca59b695afd651f94398c0fe2e1f8b182849a876bb2
GIB_VERSION=hash://sha256/37bdd8ddb12df4ee02978ca59b695afd651f94398c0fe2e1f8b182849a876bb2
CONTENT_REPO=https://linker.bio
preston cat --remote $CONTENT_REPO $GIB_VERSION --no-cache\
| elton stream --data-dir data --remote $CONTENT_REPO --record-type review --no-cache
Note that both Example 1 and example 2 streams content provided by https://linker.bio . If you'd like to keep the content (>>GiB), remove the --no-cache
option and you'll have a copy of a large corpus of biodiversity data available for reproducible offline processing after an initial "sync/pull" from https://linker.bio .
Improvements
globalbioticinteractions/globalbioticinteractions#1030
Bugs
0.14.1
Features
- support stream processing of rdf/quads towards #52 and globalbioticinteractions/globalbioticinteractions#1030
- introduce
elton tee
to copy resources described in rdf/n-quads stream into preston compatible, content addressed data dir. - introduce
--prov-mode
to let elton commands (e.g.,elton prov
,elton interactions
.elton names
,elton review
,elton stream
,elton nanopubs
, andelton ls --online
) details the processing methods and their inputs/outputs in machine readable rdf/nquads stream.
Example 1 - Append tracked interaction dataset and their dependencies to a preston archive.
elton track --prov-mode globalbioticinteractions/template-dataset\
| elton tee\
| preston append\
| tail -1
yielding
<urn:uuid:76ae2794-b9a2-4a27-b235-927377d77370> <http://www.w3.org/ns/prov#endedAtTime> "2025-01-07T23:53:04.402Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> <urn:uuid:76ae2794-b9a2-4a27-b235-927377d77370> .
Example 2 - generate interaction table from lastest version of a preston archive generated via Example 1.
preston head\
| preston cat\
| elton stream --record-type interaction --data-dir data\
| mlr --itsvlite --oxtab cat\
| tail
yielding
localityName
referenceDoi 10.1007/s13127-011-0039-1
referenceUrl https://doi.org/10.1007/s13127-011-0039-1
referenceCitation Gittenberger, A., Gittenberger, E. (2011). Cryptic, adaptive radiation of endoparasitic snails: sibling species of Leptoconchus (Gastropoda: Coralliophilidae) in corals. Org Divers Evol, 11(1), 21–41. doi:10.1007/s13127-011-0039-1
namespace globalbioticinteractions/template-dataset
citation Jorrit H. Poelen. 2014. Species associations manually extracted from literature.
archiveURI hash://sha256/5b4ee64e7384bdf3d75b1d6617edd5d82124567b4ec52b47920ea332837ff060
lastSeenAt 2025-01-07T23:55:26.792Z
contentHash
eltonVersion 0.14.0-SNAPSHOT
Example 3 - generate a review report of a tracked dataset, append their inputs (datasets)/outputs (review table) to a preston archive, then save the review report in to a file review.tsv
.
elton track --prov-mode globalbioticinteractions/template-dataset\
| elton tee\
| preston append\
| elton stream --record-type review --data-dir data
> review.tsv
Improvements
- upgrade to globi-lib v0.27.0 to help improve elton <> preston integration
- upgrade to preston v0.10.2
- allow for separate configuration of work-dir, prov-dir and data-dir.
- reduce repeated cache updates; use hash uris to avoid leaking local p…
…aths in prov logs; related to globalbioticinteractions/globalbioticinteractions#1030 #52
Bugs
0.13.9
Features
n/a
Improvements
- upgrade to globi-lib v0.26.6 to pickup most recent datasets (or dataset configurations) deposited with Zenodo. globalbioticinteractions/globalbioticinteractions#1017
Bugs
n/a
0.13.8
Features
n/a
Improvements
- upgrade to globi-lib v0.26.5 to work towards addressing globalbioticinteractions/globalbioticinteractions#999
- make cache dir configurable; globalbioticinteractions/globalbioticinteractions#999
Bugs
n/a
0.13.7
Features
n/a
Improvements
- upgrade to globi-lib v0.26.4 to help address globalbioticinteractions/globalbioticinteractions#999
- add ability to do streaming reviews, in addition to streaming interaction/name records.
example of creating streaming reports -
using single line globi.json file:
{ "namespace": "hash://sha256/9cd053d40ef148e16389982ea16d724063b82567f7ba1799962670fc97876fbf", "citation": "hash://sha256/9cd053d40ef148e16389982ea16d724063b82567f7ba1799962670fc97876fbf", "format": "dwca", "url": "https://linker.bio/hash://sha256/9cd053d40ef148e16389982ea16d724063b82567f7ba1799962670fc97876fbf" }
to stream review records via
echo globi.json | elton stream --record-type review > review.tsv
Bugs
n/a
0.13.6
Features
- introduce [elton stream] to help stream all interactions from a versioned GBIF/iDigBio graph Big-Bee-Network/bif#1 and https://github.com/Big-Bee-Network/bif .
example usage:
using single line globi.json file:
{ "namespace": "hash://sha256/9cd053d40ef148e16389982ea16d724063b82567f7ba1799962670fc97876fbf", "citation": "hash://sha256/9cd053d40ef148e16389982ea16d724063b82567f7ba1799962670fc97876fbf", "format": "dwca", "url": "https://linker.bio/hash://sha256/9cd053d40ef148e16389982ea16d724063b82567f7ba1799962670fc97876fbf" }
to do
echo globi.json | elton stream > interactions.tsv
Note that multi-line json would stream many datasets into the same interactions.tsv .
Improvements
- upgrade to globi-lib v0.26.2
Bugs
n/a
0.13.4
Features
n/a
Improvements
- upgrade to globi-lib v0.26.0 related to globalbioticinteractions/globalbioticinteractions#982 for support of primaryKey/foreignKey relations across tables of an interaction dataset.
Bugs
n/a
0.13.3
Features
n/a
Improvements
- upgrade to preston 0.7.8; related to #52
- add "track" as alias for update/pull/sync
- Bump xalan:xalan from 2.7.2 to 2.7.3
- update to globi libs v0.25.17
Bugs
n/a