Add a walkthrough of an actual graph example to explain content #30013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

estherk15 wants to merge 2 commits into master from esther/docs-update-rollup-guide

Contributor

estherk15 commented Jun 18, 2025 •

edited

Loading

What does this PR do? What is the motivation?

Adds an example through graphs of what users are seeing
This supplements the existing content of the guide
Editorial review: https://datadoghq.atlassian.net/browse/DOCS-11272

Merge instructions

Merge readiness:

Ready for merge


          Add a walkthrough of an actual graph to explain content

1221e9d

estherk15 requested a review from a team as a code owner

June 18, 2025 16:43

estherk15 added the WORK IN PROGRESS label

github-actions bot added Images Guide labels

Contributor

github-actions bot commented Jun 18, 2025 •

edited

Loading

📝 Documentation Team Review Required

This pull request requires approval from the @DataDog/documentation team before it can be merged.

Please ensure your changes follow our documentation guidelines and wait for a team member to review and approve your changes.

Contributor

github-actions bot commented Jun 18, 2025

Preview links (active after the `build_preview` check completes)

Modified Files

estherk15 commented

View reviewed changes

content/en/dashboards/guide/rollup-cardinality-visualizations.md Outdated Show resolved Hide resolved

estherk15 added the editorial review label

estherk15 requested a review from edanaher

June 18, 2025 17:00


          Update content/en/dashboards/guide/rollup-cardinality-visualizations.md

d117554

cswatt requested changes

View reviewed changes

Contributor

cswatt left a comment

Sorry for this annoying review! I started out understanding it and began to get confused—I think I need to see an equation for the computations here.

content/en/dashboards/guide/rollup-cardinality-visualizations.md


		## Understanding cardinality in timeseries

		### Unique vs Distinct Users

Contributor

cswatt Jun 20, 2025

Suggested change

      
            ### Unique vs Distinct Users
          
            ### Unique versus distinct users

content/en/dashboards/guide/rollup-cardinality-visualizations.md

		Consider a scenario where you track distinct users visiting a website. Each day for seven days, you observe 100 unique users, leading you to assume a total of 700 users. However, the actual number of distinct users over the week might be 400, as many users visit the site on multiple days. This discrepancy arises because each time frame (such as each day) independently counts unique users, inflating the total when compared to a single, longer rollup timeframe.

		### How Rollup Affects Averages

Contributor

cswatt Jun 20, 2025

Suggested change

      
            ### How Rollup Affects Averages
          
            ### How rollup affects averages

content/en/dashboards/guide/rollup-cardinality-visualizations.md

+                 - In shorter time periods, averages might be lower as we only catch users in that exact moment.
+                 - In longer time periods, averages might be higher as we catch more instances of users using different devices.
+                 - This isn't a bug, it's a natural result of how users interact with your service over time.
               This counterintuitive result is due to cardinality, which refers to how unique elements in a dataset are counted. The cardinality for each time bucket can be complex. When analyzing unique users, consider the question: "How many unique users were there each day this week?" If a user visits on two separate days, they count as unique for each day.

Contributor

cswatt Jun 20, 2025

Seems like this paragraph should be moved up a section, as it pertains to unique vs distinct users

content/en/dashboards/guide/rollup-cardinality-visualizations.md

               Consider a scenario where you track distinct users visiting a website. Each day for seven days, you observe 100 unique users, leading you to assume a total of 700 users. However, the actual number of distinct users over the week might be 400, as many users visit the site on multiple days. This discrepancy arises because each time frame (such as each day) independently counts unique users, inflating the total when compared to a single, longer rollup timeframe.
+              ### How Rollup Affects Averages
+              The rollup function also significantly impacts how averages are calculated and displayed in visualizations:

Contributor

cswatt Jun 20, 2025

I think most users are coming to this page with an idea of what the rollup function is, but in case they don't, let's link "rollup function" to https://docs.datadoghq.com/dashboards/functions/rollup

content/en/dashboards/guide/rollup-cardinality-visualizations.md

Comment on lines +31 to +38

+. **Smoothing Effect**:
+                 - Longer time periods (30-minute rollups) create smoother graphs.
+                 - Shorter time periods (5-minute rollups) show more detailed spikes and variations.
+. **Average Calculations**:
+                 - In shorter time periods, averages might be lower as we only catch users in that exact moment.
+                 - In longer time periods, averages might be higher as we catch more instances of users using different devices.
+                 - This isn't a bug, it's a natural result of how users interact with your service over time.

Contributor

cswatt Jun 20, 2025

Suggested change

      
            1. **Smoothing Effect**:
          
               - Longer time periods (30-minute rollups) create smoother graphs.
          
               - Shorter time periods (5-minute rollups) show more detailed spikes and variations.
          
            2. **Average Calculations**:
          
               - In shorter time periods, averages might be lower as we only catch users in that exact moment.
          
               - In longer time periods, averages might be higher as we catch more instances of users using different devices.
          
               - This isn't a bug, it's a natural result of how users interact with your service over time.
          
            - **Smoothing effect**:
          
               - Shorter time periods (5-minute rollups) show more detailed spikes and variations.
          
               - Longer time periods (30-minute rollups) create smoother graphs.
          
            - **Average calculations**:
          
               - In shorter time periods, averages might be lower because Datadog only catches users in that exact moment.
          
               - In longer time periods, averages might be higher because Datadog catches more instances of users using different devices.

the suggestions I've made:

Changed from an ordered list to an unordered list. There isn't a hierarchy between smoothing effect and average calculations, they're just both items at the same level.
Sentence-case for both "Smoothing effect" and "Average calculations"
Under "Smoothing effect," switched the order of the two bullet points. This matches the order under "Average calculations," and intuitively, short time periods come before long time periods
Removed that last bullet point, seems a bit odd and unnecessary

content/en/dashboards/guide/rollup-cardinality-visualizations.md


		However, when you group by users, the two graphs don't overlap: the 30-minute graph is significantly higher than the 5-minute graph. This might look like a bug at first glance, but it's actually showing us how users interact with the service over different time periods.

		{{< img src="/dashboards/guide/rollup-cardinality-visualizations/users_mobile_rollup_5_30min.png" alt="Users mobile rollup comparison between 5 and 30 minute intervals" style="width:100%;" >}}

Contributor

cswatt Jun 20, 2025 •

edited

Loading

Suggested change

      
            {{< img src="/dashboards/guide/rollup-cardinality-visualizations/users_mobile_rollup_5_30min.png" alt="Users mobile rollup comparison between 5 and 30 minute intervals" style="width:100%;" >}}
          
            {{< img src="/dashboards/guide/rollup-cardinality-visualizations/users_mobile_rollup_5_30min.png" alt="Line graph displaying percentage of users on mobile rolled up every 5 minutes (blue line) compared to 30 minutes (purple line). The smooth purple line is higher than the spiky blue line." style="width:100%;" >}}

content/en/dashboards/guide/rollup-cardinality-visualizations.md

+              {{< img src="/dashboards/guide/rollup-cardinality-visualizations/user_mobile_rollup_5_30min_config.png" alt="Configuration for users mobile rollup comparison" style="width:100%;" >}}
+              {{% /collapse-content %}}
+              Looking at the individual graphs, you'll see the numbers align in the following way. The 30-minute rollups are, of course, larger than the 5-minute rollups. When you scale them down by a factor of 0.75, the total number of distinct users roughly aligns with the 5-minute rollup, while the number of mobile distinct users is significantly higher. Why?

Contributor

cswatt Jun 20, 2025

Suggested change

      
            Looking at the individual graphs, you'll see the numbers align in the following way. The 30-minute rollups are, of course, larger than the 5-minute rollups. When you scale them down by a factor of 0.75, the total number of distinct users roughly aligns with the 5-minute rollup, while the number of mobile distinct users is significantly higher. Why?
          
            The following graph looks at 5-minute versus 30-minute rollups for mobile distinct users and total distinct users. Because the 30-minute rollups are naturally larger than the 5-minute rollups, this graph displays the 30-minute rollups scaled down by a factor of 0.75. For total distinct users, the 5-minute and 30-minute rollups roughly align. However, for mobile distinct users, the 30-minute rollup is significantly higher than the 5-minute rollup. Why?

was struggling to understand this paragraph, so rewrote it

content/en/dashboards/guide/rollup-cardinality-visualizations.md


		Looking at the individual graphs, you'll see the numbers align in the following way. The 30-minute rollups are, of course, larger than the 5-minute rollups. When you scale them down by a factor of 0.75, the total number of distinct users roughly aligns with the 5-minute rollup, while the number of mobile distinct users is significantly higher. Why?

		{{< img src="/dashboards/guide/rollup-cardinality-visualizations/total_users_scaled.png" alt="Scaled rollup comparison showing distinct users" style="width:100%;" >}}

Contributor

cswatt Jun 20, 2025

Suggested change

      
            {{< img src="/dashboards/guide/rollup-cardinality-visualizations/total_users_scaled.png" alt="Scaled rollup comparison showing distinct users" style="width:100%;" >}}
          
            {{< img src="/dashboards/guide/rollup-cardinality-visualizations/total_users_scaled.png" alt="Line graph showing four lines: total distinct users (5-minute rollup), total distinct users (30-minute rollup), mobile distinct users (5-minute rollup), mobile distinct users (30-minute rollup)." style="width:100%;" >}}

More alt text. Less descriptive this time because the interactions of the lines are explained in the text

content/en/dashboards/guide/rollup-cardinality-visualizations.md

+              {{< img src="/dashboards/guide/rollup-cardinality-visualizations/total_users_scaled_config.png" alt="Configuration for scaled rollup comparison" style="width:100%;" >}}
+              {{% /collapse-content %}}
+              This occurs because when a user appears multiple times during a rollup window, they appear once in the denominator but multiple times in the numerator. In this case, a user may be using both mobile and desktop. The following graph shows two offset graphs for a single user. The bottom graph indicates whether the user appeared on mobile during the 30-second or 5-minute interval, while the top graph indicates whether the user appeared at all.

Contributor

cswatt Jun 20, 2025

Suggested change

      
            This occurs because when a user appears multiple times during a rollup window, they appear once in the denominator but multiple times in the numerator. In this case, a user may be using both mobile and desktop. The following graph shows two offset graphs for a single user. The bottom graph indicates whether the user appeared on mobile during the 30-second or 5-minute interval, while the top graph indicates whether the user appeared at all.
          
            This occurs because when a user appears multiple times during a rollup window, they appear once in the denominator but multiple times in the numerator. In this case, a user may be using both mobile and desktop. The following visualization shows two offset graphs for a single user. The top graph indicates whether the user appeared at all, while the bottom graph indicates whether the user appeared on mobile.

edits:

changed "graph" to "visualization". this is nitpicky, i just don't like the phrase "the graph shows two graphs"
re-ordered clauses in the last sentence. english readers scan top to bottom, left to right. the top graph should be explained before the bottom graph.

But I'm also having some trouble understanding the first sentence, about the user appearing once in the denominator and multiple times in the numerator. Why is there a numerator and a denominator, what are we calculating, what does the numerator represent, what does the denominator represent?

I'm guessing that we are calculating average percent of total users on mobile? I think it would help me, at least, to see what this equation actually is.

Contributor

cswatt Jun 20, 2025

haha so after looking at the notebook, I'm understanding this more. Let's write out the equation cardinality:@usr.name[@type:session @device.type:Mobile] / cardinality:@usr.name[@type:session] * 100, and make it clear that this is the numerator/denominator we're talking about

content/en/dashboards/guide/rollup-cardinality-visualizations.md


		This occurs because when a user appears multiple times during a rollup window, they appear once in the denominator but multiple times in the numerator. In this case, a user may be using both mobile and desktop. The following graph shows two offset graphs for a single user. The bottom graph indicates whether the user appeared on mobile during the 30-second or 5-minute interval, while the top graph indicates whether the user appeared at all.

		Since the user appeared during most minutes, but only occasionally on mobile, they appear more often on mobile in longer time frames.

Contributor

cswatt Jun 20, 2025

I can't parse this statement. The user appeared only occasionally on mobile, so they appear more often on mobile?

Contributor

cswatt commented Jun 21, 2025

Forgot this note about the graphics. Let's clone/edit the notebook and then take new screenshots:

it's really not obvious what each colored line represents, and the displayed key is useless. Let's add aliases to the lines, so each graphic has an informative key that says the blue line is 5-min rollup, the purple line is 30-min rollup, etc.
These are not fractions—they're percentages! Let's replace "Fraction" in each graph title with "Percentage"
For the last two graphs, it's kind of hard to tell the color difference between the blues. I know it's the classic color palette, but it doesn't make much sense for two groups of two lines, where we're trying to emphasize a difference in mobile vs. total. Maybe we can keep the blue/purple for mobile, and then do a red/yellow for total.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

editorial review Guide Images WORK IN PROGRESS