Skip to content

Commit b09ec8f

Browse files
Add a Kaggle MCP usage example (#1209)
Kaggle protected MCP server has additional auth requirements that are addressed in this PR. Changes include: 1. A Kaggle MCP example to show usage with an api_key authentication provider 2. Fix to not pass None fields in the tool_args. This change is limited to MCP functions only (although it may work universally). The fix is optional but without it tool calls to the Kaggle MCP server fail. 3. Fix pydantic secret_str handling in the mcp client implementation. 4. Fix validations done in APIKeyAuthProviderConfig, custom header config is only required if auth_scheme is `HeaderAuthScheme.CUSTOM` Note: The tool descriptions on the kaggle MCP server are not very verbose which makes it challenging for agents to call the tool with the right format. To workaround you can either supply the tool schema to the agent or update tool description via mcp_client function group tool overrides. See `search_dataset` override in the sample config file: <img width="582" height="734" alt="image" src="https://github.com/user-attachments/assets/e1920d75-55f9-4b03-8bc7-a97da388b390" /> Closes #1207 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **New Features** * CLI: add bearer-token auth via flag or environment variable. * Example: add Kaggle MCP example with README, workflow config, and packaging. * **Bug Fixes / Improvements** * Client calls validate inputs and omit null fields before sending. * Secret values are unwrapped where required. * Validation enforces custom-scheme requirements and prevents conflicting auth/direct usage. * **Tests** * Add tests covering bearer-token flows and error cases. * **Documentation** * Update tutorial text and add Kaggle README with usage and troubleshooting. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> Authors: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Eric Evans II (https://github.com/ericevans-nv) URL: #1209
1 parent ab1930a commit b09ec8f

File tree

14 files changed

+430
-18
lines changed

14 files changed

+430
-18
lines changed

ci/markdown-link-check-config.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@
1717
},
1818
{
1919
"pattern": "^https://arize\\.com"
20+
},
21+
{
22+
"pattern": "^https://milvus\\.io"
2023
}
2124
]
2225
}

ci/vale/styles/config/vocabularies/nat/accept.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ isort
7878
Jama
7979
Jira
8080
jsonlines
81+
[Kk]aggle
8182
Langfuse
8283
LangChain
8384
LangGraph
Lines changed: 3 additions & 0 deletions
Loading
Lines changed: 3 additions & 0 deletions
Loading

docs/source/tutorials/create-a-new-workflow.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ Examining the `webquery_tool` function (`examples/getting_started/simple_web_que
115115
docs = [document async for document in loader.alazy_load()]
116116
```
117117

118-
For the new tool, instead of the `WebBaseLoader` class, use the [`langchain_community.document_loaders.DirectoryLoader`](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.directory.DirectoryLoader.html) and [`langchain_community.document_loaders.TextLoader`](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.text.TextLoader.html) classes.
118+
For the new tool, instead of the `WebBaseLoader` class, use the `langchain_community.document_loaders.DirectoryLoader` and `langchain_community.document_loaders.TextLoader` classes.
119119

120120
```python
121121
(ingest_dir, ingest_glob) = os.path.split(config.ingest_glob)

examples/MCP/kaggle_mcp/README.md

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
<!--
2+
SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
SPDX-License-Identifier: Apache-2.0
4+
5+
Licensed under the Apache License, Version 2.0 (the "License");
6+
you may not use this file except in compliance with the License.
7+
You may obtain a copy of the License at
8+
9+
http://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the License is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.
16+
-->
17+
18+
# Kaggle MCP Example
19+
20+
This example demonstrates how to use the Kaggle MCP server with NVIDIA NeMo Agent Toolkit to interact with Kaggle's datasets, notebooks, models, and competitions.
21+
22+
## Prerequisites
23+
24+
- NeMo Agent Toolkit installed with MCP support (`nvidia-nat-mcp` package)
25+
- A Kaggle account and API token
26+
27+
### Getting Your Kaggle Bearer Token
28+
29+
The Kaggle MCP server uses bearer token authentication. Obtain your Kaggle bearer token from [Kaggle Account Settings](https://www.kaggle.com/settings/account).
30+
31+
## Configuration
32+
33+
The `config.yml` file uses the built-in `api_key` authentication provider with Bearer token scheme:
34+
35+
```yaml
36+
authentication:
37+
kaggle:
38+
_type: api_key
39+
raw_key: ${KAGGLE_BEARER_TOKEN}
40+
auth_scheme: Bearer
41+
```
42+
43+
### Environment Variables
44+
45+
Set the following environment variable:
46+
47+
```bash
48+
export KAGGLE_BEARER_TOKEN="your_kaggle_api_key_here"
49+
```
50+
51+
## Usage
52+
53+
Run the workflow with a query:
54+
55+
```bash
56+
nat run --config_file examples/MCP/kaggle_mcp/configs/config.yml \
57+
--input "Find the most popular datasets about natural language processing"
58+
```
59+
60+
Example queries:
61+
- "What is the titanic dataset about?"
62+
- "What competitions are currently active?"
63+
64+
## Configuration Details
65+
66+
### MCP Client Setup
67+
68+
The configuration connects to Kaggle's MCP server using:
69+
- **Transport**: `streamable-http` (recommended for HTTP-based MCP servers)
70+
- **URL**: `https://www.kaggle.com/mcp`
71+
- **Authentication**: Bearer token via the built-in `api_key` authentication provider
72+
73+
## CLI Commands
74+
75+
You can use the following CLI commands to interact with the Kaggle MCP server. This is useful for prototyping and debugging.
76+
77+
### Discover Tools (No Authentication Required)
78+
79+
To list available tools from the Kaggle MCP server:
80+
81+
```bash
82+
nat mcp client tool list --url https://www.kaggle.com/mcp
83+
```
84+
85+
### Get Tool Schema (No Authentication Required)
86+
87+
To validate the tool schema:
88+
89+
```bash
90+
nat mcp client tool list --url https://www.kaggle.com/mcp --tool search_datasets
91+
```
92+
93+
### Authenticated Tool Calls
94+
95+
The Kaggle MCP server requires bearer token authentication for some tool calls.
96+
97+
#### Using Environment Variable (Recommended)
98+
99+
```bash
100+
# Set your Kaggle bearer token
101+
export KAGGLE_BEARER_TOKEN="your_kaggle_api_key_here"
102+
103+
# Search for Titanic datasets
104+
nat mcp client tool call search_datasets \
105+
--url https://www.kaggle.com/mcp \
106+
--bearer-token-env KAGGLE_BEARER_TOKEN \
107+
--json-args '{"request": {"search": "titanic"}}'
108+
```
109+
110+
#### Using Direct Token
111+
112+
```bash
113+
# Search for Titanic datasets with direct token (less secure)
114+
nat mcp client tool call search_datasets \
115+
--url https://www.kaggle.com/mcp \
116+
--bearer-token "your_kaggle_api_key_here" \
117+
--json-args '{"request": {"search": "titanic"}}'
118+
```
119+
120+
**Note**: The `--bearer-token-env` approach is more secure because it doesn't expose the token in command history or process lists.
121+
122+
## Troubleshooting
123+
124+
### Agent Uses Wrong Parameter Names
125+
126+
**Problem**: The agent generates tool calls with incorrect parameter names, such as using `query` instead of `search` for `search_datasets`.
127+
128+
**Cause**: The default tool descriptions from Kaggle MCP are generic and don't specify parameter names, causing the LLM to infer incorrect names.
129+
130+
**Solution**: Check the tool schema and add tool overrides in your `config.yml` to provide explicit parameter guidance:
131+
132+
```bash
133+
nat mcp client tool list --url https://www.kaggle.com/mcp --tool search_datasets
134+
```
135+
136+
After getting the tool schema, add the following tool overrides to your `config.yml`:
137+
138+
```yaml
139+
function_groups:
140+
kaggle_mcp_tools:
141+
tool_overrides:
142+
search_datasets:
143+
description: >
144+
Search for datasets on Kaggle. Use the 'search' parameter (not 'query')
145+
to search by keywords. Example: {"request": {"search": "titanic"}}
146+
```
147+
148+
### Permission Denied Errors
149+
150+
**Problem**: Tool calls fail with "Permission 'datasets.get' was denied" or similar errors.
151+
152+
**Cause**: Your Kaggle API token lacks the required permissions for certain operations.
153+
154+
**Solution**:
155+
- Ensure you're using a valid Kaggle API key from https://www.kaggle.com/settings/account
156+
- Some operations require dataset ownership or special permissions
157+
- Use `search_datasets` for browsing (requires minimal permissions)
158+
- Use `list_dataset_files` only for datasets you own or have access to
159+
160+
### CLI Tool Calls Work but Workflow Fails
161+
162+
**Problem**: `nat mcp client tool call` succeeds but `nat run` with a workflow fails with the same tool.
163+
164+
**Possible causes**:
165+
1. **Parameter validation**: CLI bypasses some validation that workflows enforce
166+
2. **Agent parameter inference**: Agent might use wrong parameter names (see "Agent Uses Wrong Parameter Names" above)
167+
168+
**Solution**: Use `--direct` mode to test the raw MCP server behavior, then add tool overrides to guide the agent.
169+
170+
## References
171+
172+
- [Kaggle MCP Documentation](https://www.kaggle.com/docs/mcp)
173+
- [NeMo Agent Toolkit MCP Documentation](../../../docs/source/workflows/mcp/index.md)
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
llms:
17+
# Tell NeMo Agent Toolkit which LLM to use for the agent
18+
nim_llm:
19+
_type: nim
20+
model_name: meta/llama-3.1-70b-instruct
21+
temperature: 0.0
22+
function_groups:
23+
kaggle_mcp_tools:
24+
_type: mcp_client
25+
server:
26+
transport: streamable-http
27+
url: https://www.kaggle.com/mcp
28+
auth_provider: kaggle
29+
tool_overrides:
30+
search_datasets:
31+
description: >
32+
Search for datasets on Kaggle. Use the 'search' parameter to search by keywords.
33+
Returns a list of datasets with metadata including
34+
title, owner, download count, and URL. Example: {"request": {"search": "titanic"}}
35+
36+
authentication:
37+
kaggle:
38+
_type: api_key
39+
raw_key: ${KAGGLE_BEARER_TOKEN}
40+
auth_scheme: Bearer
41+
42+
workflow:
43+
# Use an agent that 'reasons' and 'acts'
44+
_type: react_agent
45+
# Give it access to our kaggle MCP tools
46+
tool_names: [kaggle_mcp_tools]
47+
# Tell it which LLM to use
48+
llm_name: nim_llm
49+
# Make it verbose
50+
verbose: true
51+
# Retry up to 3 times
52+
parse_agent_response_max_retries: 3
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
[build-system]
2+
build-backend = "setuptools.build_meta"
3+
requires = ["setuptools >= 64", "setuptools-scm>=8"]
4+
5+
[tool.setuptools]
6+
packages = []
7+
8+
[tool.setuptools_scm]
9+
git_describe_command = "git describe --long --first-parent"
10+
root = "../../.."
11+
12+
[project]
13+
name = "nat_kaggle_mcp"
14+
dynamic = ["version"]
15+
dependencies = [
16+
"nvidia-nat[mcp]~=1.4",
17+
]
18+
requires-python = ">=3.11,<3.14"
19+
description = "Kaggle MCP integration example with bearer token authentication"
20+
keywords = ["ai", "mcp", "protocol", "agents", "kaggle", "datasets"]
21+
classifiers = ["Programming Language :: Python"]
22+
23+
[tool.uv.sources]
24+
nvidia-nat = { path = "../../..", editable = true }

packages/nvidia_nat_mcp/src/nat/plugins/mcp/auth/auth_provider.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
from nat.authentication.interfaces import AuthProviderBase
3535
from nat.authentication.oauth2.oauth2_auth_code_flow_provider_config import OAuth2AuthCodeFlowProviderConfig
3636
from nat.data_models.authentication import AuthResult
37+
from nat.data_models.common import get_secret_value
3738
from nat.plugins.mcp.auth.auth_flow_handler import MCPAuthenticationFlowHandler
3839
from nat.plugins.mcp.auth.auth_provider_config import MCPOAuth2ProviderConfig
3940

@@ -371,7 +372,7 @@ async def _discover_and_register(self, response: httpx.Response | None = None):
371372
# Manual registration mode
372373
self._cached_credentials = OAuth2Credentials(
373374
client_id=self.config.client_id,
374-
client_secret=self.config.client_secret,
375+
client_secret=get_secret_value(self.config.client_secret),
375376
)
376377
logger.info("Using manual client_id: %s", self._cached_credentials.client_id)
377378
else:

packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_impl.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -450,11 +450,19 @@ async def _response_fn(tool_input: BaseModel | None = None, **kwargs) -> str:
450450

451451
# Preserve original calling convention
452452
if tool_input:
453-
args = tool_input.model_dump()
453+
args = tool_input.model_dump(exclude_none=True, mode='json')
454454
return await session_tool.acall(args)
455455

456-
_ = session_tool.input_schema.model_validate(kwargs)
457-
return await session_tool.acall(kwargs)
456+
# kwargs arrives with all optional fields set to None because NAT's framework
457+
# converts the input dict to a Pydantic model (filling in all Field(default=None)),
458+
# then dumps it back to a dict. We need to strip out these None values because
459+
# many MCP servers (e.g., Kaggle) reject requests with excessive null fields.
460+
# We re-validate here (yes, redundant) to leverage Pydantic's exclude_none with
461+
# mode='json' for recursive None removal in nested models.
462+
# Reference: function_info.py:_convert_input_pydantic
463+
validated_input = session_tool.input_schema.model_validate(kwargs)
464+
args = validated_input.model_dump(exclude_none=True, mode='json')
465+
return await session_tool.acall(args)
458466
except Exception as e:
459467
logger.warning("Error calling tool %s", tool.name, exc_info=True)
460468
return str(e)

0 commit comments

Comments
 (0)