Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create 2024-12-18.md #834

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions ai/2024/2024-12-18.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# SPDX AI Team Meeting 2024-12-18

## Attendees

- Arthit Suriyawongkul
- Brian Warner
- Elyas
- Gopi
- Helen Oakley
- Karen Bennet
- Kate Stewart
- Swami Sambasivam
- Victor Lu

## Agenda

- Licensing / Legal dsicussions - Guest: Warner, Brian
- Roundtable Updates - Everyone

## Notes
- Brian Warner's presentation was great and provided an overview of the AI model and data license landscape and ideas that Fidelity has used to resolve some of the challenges.
- Brian highlighted that LLM 'code' is different than what we are use to in AI and software code today and that open source is 'the' standard for governance processes
- Lessons learned: Had to get structured for evaluating trained data as part of model so that developers and legal had common information that they were working with
Understanding Characteristics - process beyond introduced new questions to queried.
Considering internal use vs. external facing. Who can prompt? Who can use output?
- High warmup cost. Requires in depth knowledge and evaluation of models to be part of work. withh the view of BOM metadata type information and opinionated data that needs legal review (ie. no standard way to capture this information)
- What's missing?
- Most helpful to have public model & dataset heritage information for key datasets that have facts established. with the introduction of BOM fields like Successors and Predecessors
- Link to known predecessors is useful; rest is opinion. Complication comes from interpretation and not data. (by predecessors meaning: parent/child, predecessor/sucessor)
-Challenges with LLMs : LLMs change so understanding when chage occurred, maybe can be determined by wrights change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-Challenges with LLMs : LLMs change so understanding when chage occurred, maybe can be determined by wrights change.
- Challenges with LLMs : LLMs change so understanding when chage occurred, maybe can be determined by wrights change.

- Different usage based on phase of development. Kick the tires or production?
- May want to link license analysis from SBOM (inside company as adjacent) rather than pass onwards.
- Large number of licenses that don't meet SPDX License List criteria, but they are probably worth tracking - possibly with version & date. example: RAIL licenses. Enable outside integration.
- Linking facts together, with composition analysis, how far back recommend go? What granularity talking at? It depends on how far back you can reasonably get.
- introduce the concept of pre-approved and approval required for licenses for fields like intent used, distribution who can prompt and who can use output, what are the restrictions of the license,
- Can't rule out what to do with generated data. Need to understand what using, and what have seen.
- Do you tie vulnerability with the licensing? Probably information to keep separate.
- Because opinions of legal indiviudals are different, it can lead to inconsistencies in how AI systems can be used; so standardardizing on terms/metadata to use OSI approved licenses and/or spdx license list would help
- many model/data licenses are company specific licenses; so getting to a more consolidated list that many of these companies need would be helpful, we are in the early days of open source code licensing, everyone had their own license
- Brian Warner;s video of this material: https://youtu.be/PutO8IZV3xw?si=OUiWNTVe_cTQ25qE Video