-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-19737: ABFS: Add metrics to identify improvements with read and write aggressiveness #8056
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
anujmodi2021
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial Review. Will complete in some time
...c/main/java/org/apache/hadoop/fs/azurebfs/enums/AbfsWriteResourceUtilizationMetricsEnum.java
Show resolved
Hide resolved
...-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java
Outdated
Show resolved
Hide resolved
|
|
||
| /** | ||
| * Private constructor to prevent instantiation as this needs to be singleton. | ||
| * Initializes a new instance of {@code ReadBufferManagerV2} for the given ABFS client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the clients in JVM ae supposed to use same ReadBufferManager. This comment seems wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems pending
...s/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManagerV2.java
Outdated
Show resolved
Hide resolved
...ools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java
Outdated
Show resolved
Hide resolved
...p-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsWriteThreadPoolMetrics.java
Outdated
Show resolved
Hide resolved
...adoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AzureDFSIngressHandler.java
Show resolved
Hide resolved
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
...rc/main/java/org/apache/hadoop/fs/azurebfs/enums/AbfsReadResourceUtilizationMetricsEnum.java
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingContext.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingContext.java
Outdated
Show resolved
Hide resolved
...-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java
Outdated
Show resolved
Hide resolved
...op-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsCounters.java
Outdated
Show resolved
Hide resolved
...s/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManagerV2.java
Outdated
Show resolved
Hide resolved
...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingHeaderVersion.java
Outdated
Show resolved
Hide resolved
| * Schema: version:clientCorrelationId:clientRequestId:fileSystemId | ||
| * :primaryRequestId:streamId:opType:retryHeader:ingressHandler | ||
| * :position:operatedBlobCount:operationSpecificHeader:httpOperationHeader | ||
| * :networkLibrary:operationMetrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 15?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be 16 with aggregated metrics and resource utilization metrics separate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier it was 13 right?
We are only adding one more in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Network library and aggregated metrics are not part of V1 as per javadoc, so 2 + the resourceUtilizationMetrics from this PR, hence made it 16
...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingHeaderVersion.java
Outdated
Show resolved
Hide resolved
...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/WriteThreadPoolSizeManager.java
Outdated
Show resolved
Hide resolved
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
| public static final int DEFAULT_WRITE_HIGH_TIER_MEMORY_MULTIPLIER = 16; | ||
|
|
||
| /** Percentage threshold of heap usage at which memory pressure is considered high. */ | ||
| public static final int DEFAULT_WRITE_HIGH_MEMORY_USAGE_THRESHOLD_PERCENTAGE = 60; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have used PERCENT in the FileSystemConfiguration FS_AZURE_WRITE_HIGH_MEMORY_USAGE_THRESHOLD_PERCENT, should be better to rename this to DEFAULT_WRITE_HIGH_MEMORY_USAGE_THRESHOLD_PERCENT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
| public static final int DEFAULT_WRITE_HIGH_MEMORY_USAGE_THRESHOLD_PERCENTAGE = 60; | ||
|
|
||
| /** Percentage threshold of heap usage at which memory pressure is considered low. */ | ||
| public static final int DEFAULT_WRITE_LOW_MEMORY_USAGE_THRESHOLD_PERCENTAGE = 30; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
| * | ||
| * @return the metric name. | ||
| */ | ||
| public String getName() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method and below one should be annotated with @OverRide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
| * | ||
| * @return the metric name. | ||
| */ | ||
| public String getName() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
| abfsUriQueryBuilder, cachedSasToken); | ||
|
|
||
| // Retrieve the read thread pool metrics from the ABFS counters. | ||
| AbfsReadResourceUtilizationMetrics readResourceUtilizationMetrics = retrieveReadResourceUtilizationMetrics(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of code is common in both DFS and blob client, we can define a method in abfs client class which will add the metrics to tracing Context and that method will be called from both the places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already defined the common method retrieveReadResourceUtilizationMetrics in abfs client class, tracing context changes we always do in respective client methods only as per the previous API's. Can take this up if still needed
| case TWO_ID_FORMAT: | ||
| header = TracingHeaderVersion.getCurrentVersion() + COLON | ||
| + clientCorrelationID + COLON + clientRequestId; | ||
| metricHeader += !(metricResults.trim().isEmpty()) ? metricResults : EMPTY_STRING; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have we removed this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the metricResults would be empty string unless anything added to it. As per latest version changes it should either be some string, else emptyString. Hence, I made this change:- metricResults + COLON + resourceUtilizationMetricResults;
|
|
||
| static ReadBufferManagerV2 getBufferManager() { | ||
| /** | ||
| * Returns the singleton instance of {@code ReadBufferManagerV2} for the given ABFS client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment still need to be fixed
| * @param abfsCounters the {@link AbfsCounters} used for managing read operations. | ||
| */ | ||
| private ReadBufferManagerV2() { | ||
| private ReadBufferManagerV2(AbfsCounters abfsCounters) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constructor will be called only once for all the FSs
Are abfsCounters and readThreadPoolMetrics also singleton?
What if these are different for different Filesystems?
|
|
||
| /** | ||
| * Private constructor to prevent instantiation as this needs to be singleton. | ||
| * Initializes a new instance of {@code ReadBufferManagerV2} for the given ABFS client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems pending
| poolSizeManager.startCPUMonitoring(); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was that supposed to be taken separately?
| * Schema: version:clientCorrelationId:clientRequestId:fileSystemId | ||
| * :primaryRequestId:streamId:opType:retryHeader:ingressHandler | ||
| * :position:operatedBlobCount:operationSpecificHeader:httpOperationHeader | ||
| * :networkLibrary:operationMetrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier it was 13 right?
We are only adding one more in this PR.
|
💔 -1 overall
This message was automatically generated. |
Introduces new performance metrics in the ABFS driver to monitor and evaluate the effectiveness of read and write aggressiveness tuning. These metrics help in understanding how thread pool behavior, CPU utilization, and heap availability impact overall I/O throughput and latency. By capturing detailed statistics such as active thread count, pool size, and system resource utilization, this enhancement enables data-driven analysis of optimizations made to improve ABFS read and write performance under varying workloads.