Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up Workflow breakouts #41

Open
4 tasks
richard-jones opened this issue Jul 15, 2024 · 7 comments
Open
4 tasks

Set up Workflow breakouts #41

richard-jones opened this issue Jul 15, 2024 · 7 comments

Comments

@richard-jones
Copy link
Collaborator

  • specify what we are going to achieve with the breakouts
  • identify stakeholders to attend
  • organise and run breakouts
  • collate and review results

Acceptance Criteria

List the criteria which must be met for the issue to be considered complete

  • A successful breakout session around Workflows organised and run
  • Documented outputs produced to inform Workflow activities
  • Subsequent issues raised on Workflow implementation
@richard-jones
Copy link
Collaborator Author

@LalithaKambhammettu @npapantonis will identify the stakeholders to be involved in this.

@richard-jones to write a short summary of what we need to achieve by the end

@richard-jones
Copy link
Collaborator Author

These are the points I would like to address during the breakouts:

  1. For each type of object going into the repository, what is its origin and route into the repository?
  2. What volume of incoming content are we expecting?
  3. Which stakeholders would be responsible for curating incoming content and looking after it long term
  4. Are there any rules in the College that published content must meet/adhere to?
  5. What tasks need to be done to objects as the move from source to the repository?
  6. Are there any long-term curation workflows (e.g. preservation)

@richard-jones
Copy link
Collaborator Author

@richard-jones to review the original workflow documentation to get an idea what is needed

@npapantonis
Copy link
Collaborator

@richard-jones I am proposing Wednesday 11th September from 10-12:00 for this initial session. Please can you confirm if this works for you all and I can send an invite out? I may not be able to attend but working on Lalitha & Wayne's availability. Also, attaching the original drafts we created on Workflows.

FAIR_Data_ServiceRepository-Workflow_Ingest.pdf

FAIR_Data_ServiceRepository-Workflow_Access.pdf

@richard-jones
Copy link
Collaborator Author

Notes from today's meeting:

  • We're focused primarily on datasets, not considering software at this stage
  • MVP will focus on metadata only and regular size file open items
  • Data model will need to support a "data location" field, which may point to, for example, Zenodo. This would be a "by reference" deposit for the files.
  • We considered the following record structures:
    • metadata only - all metadata public
    • metadata only - all metadata embargoed
    • metadata only - some metadata public, some private. Agreed that this would be difficult to implement reliably, and should not be supported. Would also be very rare.
  • During deposit we want to ensure that users get access to the appropriate auto-complete values: funders, related publications, etc. This could be done with integrations to college systems, it was noted that flows from the data lake could probably be provided fairly easily
  • Deposits will be researcher-led
  • For the MVP we do not need to consider automated deposit support (e.g. pulling data from other systems)
  • It was noted that many of the features of workflow in Invenio are tied to the communities feature, so we would look at what the best way to utilise that, while also minimising its presence for the end users (see Remove/obscure communities feature #26 )
  • Estimated volume of data is 5 - 10 submissions per day in due course, some will be md only, some full data
  • The standard review mechanism of a list of items to be reviewed should therefore be sufficient
  • DOIs are datacite dois, which is the default in InvenioRDM
  • During submission, file checksums should be captured, which is the default in Invenio
  • Discussed the possibility of file format verification (DROID) and Archivematica integration: agreed that these were probably beyond the MVP
  • The possibility of linking to Symplectic was discussed as a way to keep metadata up to date
  • It would be useful to be able to add associations to metadata records post-submission (e.g. related items). A workflow to do this involving users is likely to be clunky and create effort for administrators. For MVP consider this to be an administrator-only activity
  • Embargo settings during deposit will need to support: File-level embargoes, record-level embargoes, and both should be available on a single record (e.g. a record as a whole could be embargoed, then the metadata could be published, while one or more of the files remains in embargo).
  • Embargo will probably not be the underlying mechanism for sensitive data: they will be more like by-reference deposits, where access to the sensitive data is handled out-of-band
  • Depositors will access the system via SSO, and a set of rules for using AD records to determine deposit permissions will be identified and used to configure accounts

Outline solution:

  • Depositors will begin by creating metadata and setting appropriate embargo, then select their access route (non-sensitive, sensitive, metadata only). For both sensitive and non-sensitive records, they would then choose regular file upload or large file support.
  • The library team will handle reviews, and look at metadata for correctness and sensitivity of data (among other things). Changes to items before they are acceptable would ideally be done by returning the item to the depositor with comments (stored in the system), and allowing them to make the changes. This could theoretically happen several times.
  • Users will send an access request to access embargoed or sensitive data, and this would be tracked in the system, and sent to the relevant places by administrators
  • Depositors will log in via SSO and be given permission to deposit based on their AD settings

Investigations required:

  • Look at the review cycle, and determine if it can be used for comment and to-and-fro with the depositor
  • Determine how many community administrators/reviewers there can be, and if there's an administrator super-user for the community
  • What options do we have for filtering the list of reviews to be done. For example, can those with a "flag" to say they are waiting for large files be filtered out from day-to-day reviews
  • Determine at what point InvenioRDM mints DOIs. Can it do the "register" and then "confirm" cycle indicated in the Imperial workflows?
  • Review the new "request" feature in Invenio 12, to find out if it is suitable to make requests for embargoed or sensitive data to repository administrators (by default it seems to be for submitters or record owners)

@richard-jones
Copy link
Collaborator Author

@Steven-Eardley @J4bbi to do the investigations from above by Monday 23rd

@npapantonis
Copy link
Collaborator

npapantonis commented Sep 17, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In review
Development

No branches or pull requests

5 participants