Explaining billed tokens and why they're different.

cohere-ai · Feb 5, 2025 · efdac31 · efdac31
1 parent bb86604
commit efdac31
Showing 1 changed file with 19 additions and 0 deletions.
diff --git a/fern/pages/going-to-production/how-does-cohere-pricing-work.mdx b/fern/pages/going-to-production/how-does-cohere-pricing-work.mdx
@@ -19,6 +19,25 @@ Our Rerank models are priced based on the quantity of searches, and our Embeddin
 
 You can find up-to-date prices on our [dedicated pricing page](https://cohere.com/pricing).
 
+### What's the Difference Between "billed" Tokens and Generic Tokens?
+
+In certain workflows you'll see an output like this:
+
+```json JSON
+{
+  "billed_units": {
+    "input_tokens": 6772,
+    "output_tokens": 248
+  },
+  "tokens": {
+    "input_tokens": 7596,
+    "output_tokens": 645
+  }
+}
+```
+
+And it may not be obvious why there are separate input and output values under `billed_units`. As its name suggests, the _billed_ input and output tokens are the tokens that you're actually _billed_ for. The reason these values can be different from the overall `"tokens"` value is that there are situations in which Cohere adds tokens under the hood, and there are others in which a particular model has been trained to do so (i.e. when outputting special tokens). Since these are tokens *you don't have control over, you are not charged for them.* 
+
 ## Trial Usage and Production Usage
 
 Cohere makes a distinction between "trial" and "production" usage of an API key.