-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add a walkthrough of an actual graph example to explain content #30013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
📝 Documentation Team Review RequiredThis pull request requires approval from the @DataDog/documentation team before it can be merged. Please ensure your changes follow our documentation guidelines and wait for a team member to review and approve your changes. |
Preview links (active after the
|
content/en/dashboards/guide/rollup-cardinality-visualizations.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for this annoying review! I started out understanding it and began to get confused—I think I need to see an equation for the computations here.
|
||
## Understanding cardinality in timeseries | ||
|
||
### Unique vs Distinct Users |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Unique vs Distinct Users | |
### Unique versus distinct users |
Consider a scenario where you track distinct users visiting a website. Each day for seven days, you observe 100 unique users, leading you to assume a total of 700 users. However, the actual number of distinct users over the week might be 400, as many users visit the site on multiple days. This discrepancy arises because each time frame (such as each day) independently counts unique users, inflating the total when compared to a single, longer rollup timeframe. | ||
|
||
### How Rollup Affects Averages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### How Rollup Affects Averages | |
### How rollup affects averages |
- In shorter time periods, averages might be lower as we only catch users in that exact moment. | ||
- In longer time periods, averages might be higher as we catch more instances of users using different devices. | ||
- This isn't a bug, it's a natural result of how users interact with your service over time. | ||
|
||
This counterintuitive result is due to cardinality, which refers to how unique elements in a dataset are counted. The cardinality for each time bucket can be complex. When analyzing unique users, consider the question: "How many unique users were there each day this week?" If a user visits on two separate days, they count as unique for each day. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this paragraph should be moved up a section, as it pertains to unique vs distinct users
Consider a scenario where you track distinct users visiting a website. Each day for seven days, you observe 100 unique users, leading you to assume a total of 700 users. However, the actual number of distinct users over the week might be 400, as many users visit the site on multiple days. This discrepancy arises because each time frame (such as each day) independently counts unique users, inflating the total when compared to a single, longer rollup timeframe. | ||
|
||
### How Rollup Affects Averages | ||
|
||
The rollup function also significantly impacts how averages are calculated and displayed in visualizations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most users are coming to this page with an idea of what the rollup function is, but in case they don't, let's link "rollup function" to https://docs.datadoghq.com/dashboards/functions/rollup
1. **Smoothing Effect**: | ||
- Longer time periods (30-minute rollups) create smoother graphs. | ||
- Shorter time periods (5-minute rollups) show more detailed spikes and variations. | ||
|
||
2. **Average Calculations**: | ||
- In shorter time periods, averages might be lower as we only catch users in that exact moment. | ||
- In longer time periods, averages might be higher as we catch more instances of users using different devices. | ||
- This isn't a bug, it's a natural result of how users interact with your service over time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. **Smoothing Effect**: | |
- Longer time periods (30-minute rollups) create smoother graphs. | |
- Shorter time periods (5-minute rollups) show more detailed spikes and variations. | |
2. **Average Calculations**: | |
- In shorter time periods, averages might be lower as we only catch users in that exact moment. | |
- In longer time periods, averages might be higher as we catch more instances of users using different devices. | |
- This isn't a bug, it's a natural result of how users interact with your service over time. | |
- **Smoothing effect**: | |
- Shorter time periods (5-minute rollups) show more detailed spikes and variations. | |
- Longer time periods (30-minute rollups) create smoother graphs. | |
- **Average calculations**: | |
- In shorter time periods, averages might be lower because Datadog only catches users in that exact moment. | |
- In longer time periods, averages might be higher because Datadog catches more instances of users using different devices. |
the suggestions I've made:
- Changed from an ordered list to an unordered list. There isn't a hierarchy between smoothing effect and average calculations, they're just both items at the same level.
- Sentence-case for both "Smoothing effect" and "Average calculations"
- Under "Smoothing effect," switched the order of the two bullet points. This matches the order under "Average calculations," and intuitively, short time periods come before long time periods
- Removed that last bullet point, seems a bit odd and unnecessary
|
||
However, when you group by users, the two graphs don't overlap: the 30-minute graph is significantly higher than the 5-minute graph. This might look like a bug at first glance, but it's actually showing us how users interact with the service over different time periods. | ||
|
||
{{< img src="/dashboards/guide/rollup-cardinality-visualizations/users_mobile_rollup_5_30min.png" alt="Users mobile rollup comparison between 5 and 30 minute intervals" style="width:100%;" >}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{{< img src="/dashboards/guide/rollup-cardinality-visualizations/users_mobile_rollup_5_30min.png" alt="Users mobile rollup comparison between 5 and 30 minute intervals" style="width:100%;" >}} | |
{{< img src="/dashboards/guide/rollup-cardinality-visualizations/users_mobile_rollup_5_30min.png" alt="Line graph displaying percentage of users on mobile rolled up every 5 minutes (blue line) compared to 30 minutes (purple line). The smooth purple line is higher than the spiky blue line." style="width:100%;" >}} |
{{< img src="/dashboards/guide/rollup-cardinality-visualizations/user_mobile_rollup_5_30min_config.png" alt="Configuration for users mobile rollup comparison" style="width:100%;" >}} | ||
{{% /collapse-content %}} | ||
|
||
Looking at the individual graphs, you'll see the numbers align in the following way. The 30-minute rollups are, of course, larger than the 5-minute rollups. When you scale them down by a factor of 0.75, the total number of distinct users roughly aligns with the 5-minute rollup, while the number of mobile distinct users is significantly higher. Why? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the individual graphs, you'll see the numbers align in the following way. The 30-minute rollups are, of course, larger than the 5-minute rollups. When you scale them down by a factor of 0.75, the total number of distinct users roughly aligns with the 5-minute rollup, while the number of mobile distinct users is significantly higher. Why? | |
The following graph looks at 5-minute versus 30-minute rollups for mobile distinct users and total distinct users. Because the 30-minute rollups are naturally larger than the 5-minute rollups, this graph displays the 30-minute rollups scaled down by a factor of 0.75. For total distinct users, the 5-minute and 30-minute rollups roughly align. However, for mobile distinct users, the 30-minute rollup is significantly higher than the 5-minute rollup. Why? |
was struggling to understand this paragraph, so rewrote it
|
||
Looking at the individual graphs, you'll see the numbers align in the following way. The 30-minute rollups are, of course, larger than the 5-minute rollups. When you scale them down by a factor of 0.75, the total number of distinct users roughly aligns with the 5-minute rollup, while the number of mobile distinct users is significantly higher. Why? | ||
|
||
{{< img src="/dashboards/guide/rollup-cardinality-visualizations/total_users_scaled.png" alt="Scaled rollup comparison showing distinct users" style="width:100%;" >}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{{< img src="/dashboards/guide/rollup-cardinality-visualizations/total_users_scaled.png" alt="Scaled rollup comparison showing distinct users" style="width:100%;" >}} | |
{{< img src="/dashboards/guide/rollup-cardinality-visualizations/total_users_scaled.png" alt="Line graph showing four lines: total distinct users (5-minute rollup), total distinct users (30-minute rollup), mobile distinct users (5-minute rollup), mobile distinct users (30-minute rollup)." style="width:100%;" >}} |
More alt text. Less descriptive this time because the interactions of the lines are explained in the text
{{< img src="/dashboards/guide/rollup-cardinality-visualizations/total_users_scaled_config.png" alt="Configuration for scaled rollup comparison" style="width:100%;" >}} | ||
{{% /collapse-content %}} | ||
|
||
This occurs because when a user appears multiple times during a rollup window, they appear once in the denominator but multiple times in the numerator. In this case, a user may be using both mobile and desktop. The following graph shows two offset graphs for a single user. The bottom graph indicates whether the user appeared on mobile during the 30-second or 5-minute interval, while the top graph indicates whether the user appeared at all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This occurs because when a user appears multiple times during a rollup window, they appear once in the denominator but multiple times in the numerator. In this case, a user may be using both mobile and desktop. The following graph shows two offset graphs for a single user. The bottom graph indicates whether the user appeared on mobile during the 30-second or 5-minute interval, while the top graph indicates whether the user appeared at all. | |
This occurs because when a user appears multiple times during a rollup window, they appear once in the denominator but multiple times in the numerator. In this case, a user may be using both mobile and desktop. The following visualization shows two offset graphs for a single user. The top graph indicates whether the user appeared at all, while the bottom graph indicates whether the user appeared on mobile. |
edits:
- changed "graph" to "visualization". this is nitpicky, i just don't like the phrase "the graph shows two graphs"
- re-ordered clauses in the last sentence. english readers scan top to bottom, left to right. the top graph should be explained before the bottom graph.
But I'm also having some trouble understanding the first sentence, about the user appearing once in the denominator and multiple times in the numerator. Why is there a numerator and a denominator, what are we calculating, what does the numerator represent, what does the denominator represent?
I'm guessing that we are calculating average percent of total users on mobile? I think it would help me, at least, to see what this equation actually is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha so after looking at the notebook, I'm understanding this more. Let's write out the equation cardinality:@usr.name[@type:session @device.type:Mobile] / cardinality:@usr.name[@type:session] * 100
, and make it clear that this is the numerator/denominator we're talking about
|
||
This occurs because when a user appears multiple times during a rollup window, they appear once in the denominator but multiple times in the numerator. In this case, a user may be using both mobile and desktop. The following graph shows two offset graphs for a single user. The bottom graph indicates whether the user appeared on mobile during the 30-second or 5-minute interval, while the top graph indicates whether the user appeared at all. | ||
|
||
Since the user appeared during most minutes, but only occasionally on mobile, they appear more often on mobile in longer time frames. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't parse this statement. The user appeared only occasionally on mobile, so they appear more often on mobile?
Forgot this note about the graphics. Let's clone/edit the notebook and then take new screenshots:
|
What does this PR do? What is the motivation?
Merge instructions
Merge readiness: