-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Darren Edge
committed
Apr 5, 2024
1 parent
ef75829
commit fe4855e
Showing
6 changed files
with
34 additions
and
29 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,57 +1,56 @@ | ||
# Intelligence Toolkit | ||
|
||
#### What is Intelligence Toolkit? | ||
#### What is Intelligence Toolkit? | ||
|
||
The Intelligence Toolkit is a suite of interactive workflows for creating AI intelligence reports from real-world data sources. The toolkit is designed to help users identify patterns, answers, relationships, and risks within complex datasets, with generative AI ([OpenAI GPT models](https://platform.openai.com/docs/models/)) used to create reports on findings of interest. The project page can be found at [github.com/microsoft/intelligence-toolkit](https://github.com/microsoft/intelligence-toolkit/). | ||
|
||
#### What can Intelligence Toolkit do? | ||
|
||
The Intelligence Toolkit aims to help domain experts make sense of real-world data at a speed and scale that wouldn't otherwise be possible. It was specifically designed for analysis of case data and entity data: | ||
- Case Data | ||
- Units are structured records describing individual people | ||
- Examples include users, respondents, patients, victims | ||
- Analysis aims to inform *policy* while preserving *privacy* | ||
- Entity Data | ||
- Units are records or documents describing real-world entities | ||
- Examples include organizations, countries, products, suppliers | ||
- Analysis aims to understand *risks* carried by *relationships* | ||
#### What are Intelligence Toolkit's intended uses? | ||
|
||
- **Case Data** | ||
- Units are structured records describing individual people | ||
- Examples include users, respondents, patients, victims | ||
- Analysis aims to inform *policy* while preserving *privacy* | ||
- **Entity Data** | ||
- Units are records or documents describing real-world entities | ||
- Examples include organizations, countries, products, suppliers | ||
- Analysis aims to understand *risks* carried by *relationships* | ||
|
||
#### What are Intelligence Toolkit's intended uses? | ||
|
||
The Intelligence Toolkit is designed to be used by domain experts who are familiar with the data and the intelligence they want to derive from it. Users should be independently capable of evaluating the quality of data insights and AI interpretations before taking action, e.g., sharing intelligence outputs or making decisions informed by these outputs. | ||
|
||
It supports a variety of interactive workflows, each designed to address a specific type of intelligence task: | ||
|
||
Case Intelligence Workflows | ||
- **Data Synthesis** generates differentially-private datasets and summaries from sensitive case records | ||
- **Attribute Patterns** generates reports on attribute patterns detected in streams of case records | ||
- **Group Narratives** generates reports by defining and comparing groups of case records | ||
It supports a variety of interactive workflows, each designed to address a specific type of intelligence task: | ||
|
||
Entity Intelligence Workflows | ||
- **Record Matching** generates reports on record matches detected across entity datasets | ||
- **Risk Networks** generates reports on risk exposure for networks of related entities | ||
- **Question Answering** generates reports from an entity-rich document collection | ||
- **Case Intelligence Workflows** | ||
- **Data Synthesis** generates differentially-private datasets and summaries from sensitive case records | ||
- **Attribute Patterns** generates reports on attribute patterns detected in streams of case records | ||
- **Group Narratives** generates reports by defining and comparing groups of case records | ||
- **Entity Intelligence Workflows** | ||
- **Record Matching** generates reports on record matches detected across entity datasets | ||
- **Risk Networks** generates reports on risk exposure for networks of related entities | ||
- **Question Answering** generates reports from an entity-rich document collection | ||
|
||
#### How was Intelligence Toolkit evaluated? | ||
#### How was Intelligence Toolkit evaluated? | ||
|
||
The Intelligence Toolkit was designed, refined, and evaluated in the context of the [Tech Against Trafficking (TAT)](https://techagainsttrafficking.org/) accelerator program with [Issara Institute](https://www.issarainstitute.org/) and [Polaris](https://polarisproject.org/) (2023-2024). It includes and builds on prior accelerator outputs developed with [Unseen](https://www.unseenuk.org/) (2021-2022) and [IOM](https://www.iom.int/)/[CTDC](https://www.ctdatacollaborative.org/) (2019-2020). | ||
|
||
#### What are the limitations of Intelligence Toolkit? How can users minimize the impact of these limitations when using the system? | ||
#### What are the limitations of Intelligence Toolkit? How can users minimize the impact of these limitations when using the system? | ||
|
||
- The Intelligence toolkit aims to detect and explain patterns, relationships, and risks in data provided by the user. It is not designed to make decisions or take actions based on these findings. | ||
- The statistical "insights" that it detects may not be insightful or useful in practice, and will inherit any biases, errors, or omissions present in the data collecting/generating process. These may be further amplified by the AI interpretations and reports generated by the toolkit. | ||
- The generative AI model may itself introduce additional statistical or societal biases, or fabricate information not present in its grounding data, as a consequence of its training and design. | ||
- Users should be experts in their domain, familiar with the data, and both able and willing to evaluate the quality of the insights and AI interpretations before taking action. | ||
- The system should not be used in highly regulated domains where inaccurate reports could suggest actions that lead to injury (e.g., if medical advice is inaccurate) or negatively impact an individual’s legal, financial, or life opportunities. | ||
- The system should not be used in highly regulated domains where inaccurate reports could suggest actions that lead to injury (e.g., if medical advice is inaccurate) or negatively impact an individual's legal, financial, or life opportunities. | ||
- The system was designed and tested for the processing of English language data and the creation of English language outputs. Performance in other languages may vary and should be assessed by someone who is both an expert on the data and a native speaker of that language. | ||
|
||
#### What operational factors and settings allow for effective and responsible use of Intelligence Toolkit? | ||
#### What operational factors and settings allow for effective and responsible use of Intelligence Toolkit? | ||
|
||
- The Intelligence Toolkit is designed for moderate-sized datasets (e.g., 100s of thousands of records, 100s of PDF documents). Larger datasets will require longer to process and may exceed the memory limits of the execution environment. | ||
- Responsible use of personal case data requires that the data be de-identified prior to uploading and then converted into anonymous data using the Data Synthesis workflow. Any subsequent analysis of the case data should be done using the synthetic case data, not the original (sensitive/personal) case data. | ||
- It is the user's responsibility to ensure that any data sent to generative AI models is not personal/sensitive/secret/confidential, that use of generative AI models is consistent with the terms of service of the model provider, and that such use incurs per-token costs charged to the OpenAI account linked to the user-provided API key. Understanding [usage costs](https://openai.com/pricing#language-models) and setting a [billing cap](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization) is recommended. | ||
|
||
#### What data is collected? | ||
|
||
Intelligence Toolkit may be deployed as a desktop application or a cloud service. The application supports short, end-to-end workflows from input data to output reports. As such, it stores no data beyond the use of a caching mechanism for text embeddings that avoids unnecessary recomputation costs. | ||
Intelligence Toolkit may be deployed as a desktop application or a cloud service. The application supports short, end-to-end workflows from input data to output reports. As such, it stores no data beyond the use of a caching mechanism for text embeddings that avoids unnecessary recomputation costs. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,8 @@ | |
{instructions} | ||
=== TASK === | ||
Group data: | ||
{data} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,7 +23,7 @@ | |
"##### Evaluation of Entity Network <Network ID>" | ||
DATA: | ||
=== TASK === | ||
Selected entity: {entity_id} | ||
|