Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: [NPM] Reduce/Refactor Noisy NPM Logs #3468

Open
wants to merge 35 commits into
base: master
Choose a base branch
from

Conversation

rayaisaiah
Copy link
Contributor

@rayaisaiah rayaisaiah commented Mar 3, 2025

Reason for Change:
Reduces noisy NPM logs (non-error/warning logs) and adds a loglevel property to the npm configmap which controls if the app insight logs print. This is important as the logs are printed every 5 minutes with the npm heartbeat and can quickly accumulate.

Will implement Zap logging framework in another pr. Commented out non-error/warning logs with a TODO statement that should be marked as debug when Zap is implemented.

NPM logs with loglevel info

I0305 23:31:31.156620       1 root.go:31] Using config file: /etc/azure-npm/azure-npm.json
I0305 23:31:31.156852       1 start.go:73] loaded config: {ResyncPeriodInMinutes:15 ListeningPort:10091 ListeningAddress:0.0.0.0 Transport:{Address: Port:0 ServicePort:0} WindowsNetworkName: ApplyMaxBatches:100 ApplyIntervalInMilliseconds:500 MaxBatchedACLsPerPod:30 MaxPendingNetPols:100 NetPolInvervalInMilliseconds:500 Toggles:{EnablePrometheusMetrics:true EnablePprof:true EnableHTTPDebugAPI:true EnableV2NPM:true PlaceAzureChainFirst:false ApplyIPSetsOnNeed:false ApplyInBackground:true NetPolInBackground:true EnableNPMLite:false} LogLevel:}
I0305 23:31:31.156895       1 start.go:78] NPM is running on Linux Dataplane
I0305 23:31:31.156900       1 start.go:80] starting NPM version 2 with image v1.5.45FinalTest
I0305 23:31:31.156906       1 start.go:89] initializing metrics
I0305 23:31:31.156956       1 prometheus-metrics.go:403] registered metric num_iptables_rules to registry cluster-metrics
I0305 23:31:31.156974       1 prometheus-metrics.go:403] registered metric num_ipsets to registry cluster-metrics
I0305 23:31:31.156982       1 prometheus-metrics.go:403] registered metric num_ipset_entries to registry cluster-metrics
I0305 23:31:31.156996       1 prometheus-metrics.go:403] registered metric ipset_counts to registry cluster-metrics
I0305 23:31:31.157057       1 prometheus-metrics.go:403] registered metric add_iptables_rule_exec_time to registry node-metrics
I0305 23:31:31.157094       1 prometheus-metrics.go:403] registered metric add_ipset_exec_time to registry node-metrics
I0305 23:31:31.157106       1 prometheus-metrics.go:403] registered metric num_policies to registry cluster-metrics
I0305 23:31:31.157121       1 prometheus-metrics.go:403] registered metric add_policy_exec_time to registry node-metrics
I0305 23:31:31.157134       1 prometheus-metrics.go:403] registered metric policy_exec_time to registry node-metrics
I0305 23:31:31.157145       1 prometheus-metrics.go:403] registered metric pod_exec_time to registry node-metrics
I0305 23:31:31.157153       1 prometheus-metrics.go:403] registered metric namespace_exec_time to registry node-metrics
I0305 23:31:31.157160       1 prometheus-metrics.go:403] registered metric pods_watched to registry cluster-metrics
I0305 23:31:31.157170       1 prometheus-metrics.go:323] initializing Linux metrics. will not register the newly created metrics in this function
I0305 23:31:31.157186       1 prometheus-metrics.go:195] registering linux metrics
I0305 23:31:31.157193       1 prometheus-metrics.go:403] registered metric iptables_restore_latency_seconds to registry node-metrics
I0305 23:31:31.157214       1 prometheus-metrics.go:403] registered metric iptables_delete_latency_seconds to registry node-metrics
I0305 23:31:31.157227       1 prometheus-metrics.go:403] registered metric iptables_restore_failure_total to registry node-metrics
I0305 23:31:31.157242       1 start.go:95] loading in cluster kubeconfig
2025/03/05 23:31:31 [1] Finished initializing all Prometheus metrics
2025/03/05 23:31:31 [1] GetAzureCloud querying url: http://169.254.169.254/metadata/instance/compute/azEnvironment?api-version=2018-10-01&format=text
2025/03/05 23:31:31 [1] [Utils] Initializing HTTP client with connection timeout: 7, response header timeout: 7
I0305 23:31:31.157747       1 start.go:121] Resync period for NPM pod is set to 17.
I0305 23:31:31.157766       1 ai-utils.go:28] LogLevel is info. Debugmode is set to: false.
I0305 23:31:31.179404       1 prometheus-metrics.go:161] metrics have already been initialized
I0305 23:31:31.179431       1 chain-management_linux.go:189] booting up iptables Azure chains
I0305 23:31:31.179437       1 chain-management_linux.go:253] first attempt detecting iptables version. looking for hint/canary chain in iptables-nft
I0305 23:31:31.179445       1 chain-management_linux.go:523] executing iptables command [iptables-nft] with args [-w 60 -L KUBE-IPTABLES-HINT -t mangle -n]
2025/03/05 23:31:31 [1] Initialized AppInsights handle
2025/03/05 23:31:31 [1] [Telemetry] Request metadata from wireserver
I0305 23:31:31.182089       1 const.go:269] setting iptables to nft
I0305 23:31:31.182109       1 chain-management_linux.go:301] detected nft iptables. cleaning up legacy iptables
I0305 23:31:31.182114       1 const.go:276] setting iptables to legacy
I0305 23:31:31.182121       1 chain-management_linux.go:523] executing iptables command [iptables] with args [-w 60 -D FORWARD -j AZURE-NPM]
I0305 23:31:31.183904       1 chain-management_linux.go:535] didn't delete deprecated jump rule from FORWARD chain to AZURE-NPM chain likely because AZURE-NPM chain doesn't exist. not able to run iptables command [iptables -w 60 -D FORWARD -j AZURE-NPM]. exit code: 2, output: iptables v1.8.4 (legacy): Couldn't load target `AZURE-NPM':No such file or directory

Try `iptables -h' or 'iptables --help' for more information.
I0305 23:31:31.183922       1 chain-management_linux.go:523] executing iptables command [iptables] with args [-w 60 -D FORWARD -j AZURE-NPM -m conntrack --ctstate NEW]
I0305 23:31:31.185988       1 chain-management_linux.go:535] didn't delete deprecated jump rule from FORWARD chain to AZURE-NPM chain likely because AZURE-NPM chain doesn't exist. not able to run iptables command [iptables -w 60 -D FORWARD -j AZURE-NPM -m conntrack --ctstate NEW]. exit code: 2, output: iptables v1.8.4 (legacy): Couldn't load target `AZURE-NPM':No such file or directory

Try `iptables -h' or 'iptables --help' for more information.
I0305 23:31:31.187968       1 chain-management_linux.go:349] no chains to cleanup
I0305 23:31:31.187993       1 chain-management_linux.go:310] cleaned up legacy iptables
I0305 23:31:31.187999       1 const.go:269] setting iptables to nft
I0305 23:31:31.188006       1 chain-management_linux.go:523] executing iptables command [iptables-nft] with args [-w 60 -D FORWARD -j AZURE-NPM]
I0305 23:31:31.190117       1 chain-management_linux.go:535] probably tried to delete a jump rule that didn't exist in nft tables. not able to run iptables command [iptables-nft -w 60 -D FORWARD -j AZURE-NPM]. exit code: 1, output: iptables: Bad rule (does a matching rule exist in that chain?).
2025/03/05 23:31:31 [1] Acquiring process lock
2025/03/05 23:31:31 [1] Acquired process lock with timeout value of 10s
2025/03/05 23:31:31 [1] Released process lock
I0305 23:31:31.357236       1 chain-management_linux.go:229] found 122 current chains in the default iptables
I0305 23:31:31.484964       1 ai-utils.go:109] [DataPlane] dataplane configured to add netpols in background every 500ms or every 100 calls to AddPolicy() - (NPM v2)
I0305 23:31:32.525812       1 npm.go:68] API server version: v1.30.9 AI metadata 014c22bd-4107-459e-8475-67909e96edcb
I0305 23:31:32.525938       1 ai-utils.go:109] starting NPM - (NPM v2)
I0305 23:31:32.526228       1 server.go:60] Starting NPM HTTP API on 0.0.0.0:10091... 

NPM logs with loglevel debug

I0305 23:39:30.259250       1 root.go:31] Using config file: /etc/azure-npm/azure-npm.json
I0305 23:39:30.259476       1 start.go:73] loaded config: {ResyncPeriodInMinutes:15 ListeningPort:10091 ListeningAddress:0.0.0.0 Transport:{Address: Port:0 ServicePort:0} WindowsNetworkName: ApplyMaxBatches:100 ApplyIntervalInMilliseconds:500 MaxBatchedACLsPerPod:30 MaxPendingNetPols:100 NetPolInvervalInMilliseconds:500 Toggles:{EnablePrometheusMetrics:true EnablePprof:true EnableHTTPDebugAPI:true EnableV2NPM:true PlaceAzureChainFirst:false ApplyIPSetsOnNeed:false ApplyInBackground:true NetPolInBackground:true EnableNPMLite:false} LogLevel:}
I0305 23:39:30.259516       1 start.go:78] NPM is running on Linux Dataplane
I0305 23:39:30.259522       1 start.go:80] starting NPM version 2 with image v1.5.45FinalTestDebug
I0305 23:39:30.259528       1 start.go:89] initializing metrics
I0305 23:39:30.259566       1 prometheus-metrics.go:403] registered metric num_iptables_rules to registry cluster-metrics
I0305 23:39:30.259583       1 prometheus-metrics.go:403] registered metric num_ipsets to registry cluster-metrics
I0305 23:39:30.259593       1 prometheus-metrics.go:403] registered metric num_ipset_entries to registry cluster-metrics
I0305 23:39:30.259607       1 prometheus-metrics.go:403] registered metric ipset_counts to registry cluster-metrics
I0305 23:39:30.259649       1 prometheus-metrics.go:403] registered metric add_iptables_rule_exec_time to registry node-metrics
I0305 23:39:30.259675       1 prometheus-metrics.go:403] registered metric add_ipset_exec_time to registry node-metrics
I0305 23:39:30.259683       1 prometheus-metrics.go:403] registered metric num_policies to registry cluster-metrics
I0305 23:39:30.259701       1 prometheus-metrics.go:403] registered metric add_policy_exec_time to registry node-metrics
I0305 23:39:30.259722       1 prometheus-metrics.go:403] registered metric policy_exec_time to registry node-metrics
I0305 23:39:30.259743       1 prometheus-metrics.go:403] registered metric pod_exec_time to registry node-metrics
I0305 23:39:30.259757       1 prometheus-metrics.go:403] registered metric namespace_exec_time to registry node-metrics
I0305 23:39:30.259779       1 prometheus-metrics.go:403] registered metric pods_watched to registry cluster-metrics
I0305 23:39:30.259792       1 prometheus-metrics.go:323] initializing Linux metrics. will not register the newly created metrics in this function
I0305 23:39:30.259807       1 prometheus-metrics.go:195] registering linux metrics
I0305 23:39:30.259821       1 prometheus-metrics.go:403] registered metric iptables_restore_latency_seconds to registry node-metrics
I0305 23:39:30.259829       1 prometheus-metrics.go:403] registered metric iptables_delete_latency_seconds to registry node-metrics
I0305 23:39:30.259841       1 prometheus-metrics.go:403] registered metric iptables_restore_failure_total to registry node-metrics
2025/03/05 23:39:30 [1] Finished initializing all Prometheus metrics
I0305 23:39:30.259872       1 start.go:95] loading in cluster kubeconfig
I0305 23:39:30.260371       1 start.go:121] Resync period for NPM pod is set to 23.
I0305 23:39:30.260394       1 ai-utils.go:28] LogLevel is debug. Debugmode is set to true.
2025/03/05 23:39:30 [1] GetAzureCloud querying url: http://169.254.169.254/metadata/instance/compute/azEnvironment?api-version=2018-10-01&format=text
2025/03/05 23:39:30 [1] [Utils] Initializing HTTP client with connection timeout: 7, response header timeout: 7
2025/03/05 23:39:30 [1] [AppInsights] CloudName: AzurePublicCloud
2025/03/05 23:39:30 [1] Initialized AppInsights handle
I0305 23:39:30.285662       1 prometheus-metrics.go:161] metrics have already been initialized
I0305 23:39:30.285692       1 chain-management_linux.go:189] booting up iptables Azure chains
I0305 23:39:30.285699       1 chain-management_linux.go:253] first attempt detecting iptables version. looking for hint/canary chain in iptables-nft
I0305 23:39:30.285709       1 chain-management_linux.go:523] executing iptables command [iptables-nft] with args [-w 60 -L KUBE-IPTABLES-HINT -t mangle -n]
2025/03/05 23:39:30 [1] [Telemetry] Request metadata from wireserver
I0305 23:39:30.289168       1 const.go:269] setting iptables to nft
I0305 23:39:30.289193       1 chain-management_linux.go:301] detected nft iptables. cleaning up legacy iptables
I0305 23:39:30.289198       1 const.go:276] setting iptables to legacy
I0305 23:39:30.289203       1 chain-management_linux.go:523] executing iptables command [iptables] with args [-w 60 -D FORWARD -j AZURE-NPM]
I0305 23:39:30.290909       1 chain-management_linux.go:535] didn't delete deprecated jump rule from FORWARD chain to AZURE-NPM chain likely because AZURE-NPM chain doesn't exist. not able to run iptables command [iptables -w 60 -D FORWARD -j AZURE-NPM]. exit code: 2, output: iptables v1.8.4 (legacy): Couldn't load target `AZURE-NPM':No such file or directory

Try `iptables -h' or 'iptables --help' for more information.
I0305 23:39:30.290937       1 chain-management_linux.go:523] executing iptables command [iptables] with args [-w 60 -D FORWARD -j AZURE-NPM -m conntrack --ctstate NEW]
2025/03/05 23:39:30 [1] Acquiring process lock
2025/03/05 23:39:30 [1] Acquired process lock with timeout value of 10s
2025/03/05 23:39:30 [1] Released process lock
I0305 23:39:30.353073       1 chain-management_linux.go:535] didn't delete deprecated jump rule from FORWARD chain to AZURE-NPM chain likely because AZURE-NPM chain doesn't exist. not able to run iptables command [iptables -w 60 -D FORWARD -j AZURE-NPM -m conntrack --ctstate NEW]. exit code: 2, output: iptables v1.8.4 (legacy): Couldn't load target `AZURE-NPM':No such file or directory

Try `iptables -h' or 'iptables --help' for more information.
I0305 23:39:30.355049       1 chain-management_linux.go:349] no chains to cleanup
I0305 23:39:30.355063       1 chain-management_linux.go:310] cleaned up legacy iptables
I0305 23:39:30.355068       1 const.go:269] setting iptables to nft
I0305 23:39:30.355074       1 chain-management_linux.go:523] executing iptables command [iptables-nft] with args [-w 60 -D FORWARD -j AZURE-NPM]
I0305 23:39:30.357352       1 chain-management_linux.go:535] probably tried to delete a jump rule that didn't exist in nft tables. not able to run iptables command [iptables-nft -w 60 -D FORWARD -j AZURE-NPM]. exit code: 1, output: iptables: Bad rule (does a matching rule exist in that chain?).
I0305 23:39:30.457521       1 chain-management_linux.go:229] found 122 current chains in the default iptables
I0305 23:39:30.581428       1 ai-utils.go:109] [DataPlane] dataplane configured to add netpols in background every 500ms or every 100 calls to AddPolicy() - (NPM v2)
I0305 23:39:31.608931       1 npm.go:68] API server version: v1.30.9 AI metadata 014c22bd-4107-459e-8475-67909e96edcb
I0305 23:39:31.609052       1 ai-utils.go:109] starting NPM - (NPM v2)
I0305 23:39:31.610467       1 server.go:60] Starting NPM HTTP API on 0.0.0.0:10091... 
2025/03/05 23:40:00 [1] [AppInsights] [Wed Mar  5 23:40:00 UTC 2025] --------- Transmitting 3 items ---------
2025/03/05 23:40:00 [1] [AppInsights] [Wed Mar  5 23:40:00 UTC 2025] Telemetry transmitted in 266.801416ms
2025/03/05 23:40:00 [1] [AppInsights] [Wed Mar  5 23:40:00 UTC 2025] Response: 200
2025/03/05 23:40:00 [1] [AppInsights] [Wed Mar  5 23:40:00 UTC 2025] Items accepted/received: 3/3
2025/03/05 23:45:00 [1] [AppInsights] [Wed Mar  5 23:45:00 UTC 2025] --------- Transmitting 1 items ---------
2025/03/05 23:45:00 [1] [AppInsights] [Wed Mar  5 23:45:00 UTC 2025] Telemetry transmitted in 234.931706ms
2025/03/05 23:45:00 [1] [AppInsights] [Wed Mar  5 23:45:00 UTC 2025] Response: 200
2025/03/05 23:45:00 [1] [AppInsights] [Wed Mar  5 23:45:00 UTC 2025] Items accepted/received: 1/1
2025/03/05 23:50:00 [1] [AppInsights] [Wed Mar  5 23:50:00 UTC 2025] --------- Transmitting 1 items ---------
2025/03/05 23:50:00 [1] [AppInsights] [Wed Mar  5 23:50:00 UTC 2025] Telemetry transmitted in 224.588262ms
2025/03/05 23:50:00 [1] [AppInsights] [Wed Mar  5 23:50:00 UTC 2025] Response: 200
2025/03/05 23:50:00 [1] [AppInsights] [Wed Mar  5 23:50:00 UTC 2025] Items accepted/received: 1/1

Issue Fixed:

Requirements:

Notes:
Resolves customer log storage issues in managed azure-npm.

@Copilot Copilot bot review requested due to automatic review settings March 3, 2025 22:16
@rayaisaiah rayaisaiah requested a review from a team as a code owner March 3, 2025 22:16
@rayaisaiah rayaisaiah requested a review from huntergregory March 3, 2025 22:16
@rayaisaiah rayaisaiah changed the title fix: [NPM] Reduce Noisy NPM Logs fix: [NPM] Reduce/Refactor Noisy NPM Logs Mar 3, 2025
@rayaisaiah rayaisaiah added npm Related to NPM. linux labels Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR aims to reduce the noisy NPM logs by removing numerous informational logging (klog.Infof) statements and refactoring logging in key components.

  • Removed debug/info logs from chain management, policy management, and IP set operations.
  • Refactored log statements in the dataplane and dpshim code to promote a cleaner production log output.
  • Eliminated extraneous log output in auxiliary components including the parser and dirty cache management.

Reviewed Changes

File Description
npm/pkg/dataplane/policies/chain-management_linux.go Removed multiple klog.Infof statements in bootup, cleanup, and chain positioning functions to reduce clutter.
npm/pkg/dataplane/types.go Eliminated excessive logging in updatePodCache and netPolQueue functions.
npm/pkg/dataplane/dpshim/dpshim.go Cleared out informational logs in HydrateClients, deleteIPSet, and IP set processing functions.
npm/pkg/dataplane/dataplane.go Removed several log messages in AddPolicy, applyDataPlaneNow, and related dataplane functions.
npm/pkg/dataplane/ipsets/ipsetmanager_linux.go Removed verbose logging in reset and destroy operations for IPSets.
npm/pkg/dataplane/parse/parser.go Stripped out debugging logs in command execution for iptables parsing.
npm/pkg/dataplane/ipsets/ipsetmanager.go Cleared extraneous info logs during IP set reconciliation and deletion.
npm/pkg/dataplane/dpshim/dirtycache.go Removed debug log calls in dirty cache clearing and contents printing.

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

@rayaisaiah rayaisaiah enabled auto-merge March 4, 2025 01:10
@rayaisaiah rayaisaiah disabled auto-merge March 4, 2025 18:44
Copy link
Contributor

@huntergregory huntergregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you confirm that changing configmap value causes desired effects?

@huntergregory
Copy link
Contributor

Can you update PR description too?

@rayaisaiah
Copy link
Contributor Author

/azp run Azure Container Networking PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rayaisaiah
Copy link
Contributor Author

/azp run Azure Container Networking PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rayaisaiah
Copy link
Contributor Author

/azp run Azure Container Networking PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rayaisaiah rayaisaiah requested a review from a team as a code owner March 7, 2025 18:02
@rayaisaiah rayaisaiah requested a review from alexcastilio March 7, 2025 18:02
@rayaisaiah
Copy link
Contributor Author

/azp run NPM Conformance Tests

@rayaisaiah
Copy link
Contributor Author

/azp run NPM Scale Test

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rayaisaiah rayaisaiah requested a review from huntergregory March 8, 2025 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linux npm Related to NPM.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants