NVIDIA-NeMo · miyoungc · Nov 5, 2025 · Nov 5, 2025 · Nov 5, 2025 · Nov 5, 2025
diff --git a/docs/project.json b/docs/project.json
@@ -1 +1 @@
-{ "name": "nemo-guardrails-toolkit", "version": "0.17.0" }
+{ "name": "nemo-guardrails-toolkit", "version": "0.18.0" }
diff --git a/docs/release-notes.md b/docs/release-notes.md
@@ -12,6 +12,33 @@ The following sections summarize and highlight the changes for each release.
 For a complete record of changes in a release, refer to the
 [CHANGELOG.md](https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/CHANGELOG.md) in the GitHub repository.
 
+---
+
+(v0-18-0)=
+
+## 0.18.0
+
+(v0-18-0-features)=
+
+### Key Features
+
+- In-memory caching of guardrail model calls for reduced latency and cost savings.
+  NeMo Guardrails now supports per-model caching of guardrail responses using an LFU (Least Frequently Used) cache.
+  This feature is particularly effective for safety models such as NVIDIA NemoGuard [Content Safety](https://build.nvidia.com/nvidia/llama-3_1-nemoguard-8b-content-safety), [Topic Control](https://build.nvidia.com/nvidia/llama-3_1-nemoguard-8b-topic-control), and [Jailbreak Detection](https://build.nvidia.com/nvidia/nemoguard-jailbreak-detect) where identical inputs are common.
+  For more information, refer to [](model-memory-cache).
+- NeMo Guardrails extracts the reasoning traces from the LLM response and emits them as `BotThinking` events before the final `BotMessage` event.
+  For more information, refer to [](bot-thinking-guardrails).
+- New community integration with [Cisco AI Defense](https://www.cisco.com/site/ca/en/products/security/ai-defense/index.html).
+- New embedding integrations with Azure OpenAI, Google, and Cohere.
+
+(v0-18-0-fixed-issues)=
+
+### Fixed Issues
+
+- Implemented validation of content safety and topic control guardrail configurations at creation time, providing prompt error reporting if required prompt templates or parameters are missing.
+
+---
+
 (v0-17-0)=
 
 ## 0.17.0

diff --git a/docs/user-guides/advanced/bot-thinking-guardrails.md b/docs/user-guides/advanced/bot-thinking-guardrails.md
@@ -1,3 +1,5 @@
+(bot-thinking-guardrails)=
+
 # Guardrailing Bot Reasoning Content
 
 Reasoning-capable large language models (LLMs) expose their internal thought process as reasoning traces. These traces reveal how the model arrives at its conclusions, providing transparency into the decision-making process. However, they may also contain sensitive information or problematic reasoning patterns that need to be monitored and controlled.

diff --git a/docs/versions1.json b/docs/versions1.json
@@ -1,6 +1,10 @@
 [
     {
         "preferred": true,
+        "version": "0.18.0",
+        "url": "../0.18.0/"
+    },
+    {
         "version": "0.17.0",
         "url": "../0.17.0/"
     },
Original file line number	Diff line number	Diff line change
		@@ -1 +1 @@
		{ "name": "nemo-guardrails-toolkit", "version": "0.17.0" }
		{ "name": "nemo-guardrails-toolkit", "version": "0.18.0" }