Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Changes to add KPIs to e2e action #1321

Merged
merged 63 commits into from
Jul 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
336b6f0
Changes to add KPIs to e2e action
nathangeology Jun 14, 2024
2ba0670
Replaced awswrangler with pandas
nathangeology Jun 14, 2024
d21ba73
Updated input to KPI script to refer to TEMP_DIR
nathangeology Jun 14, 2024
2d93b57
Added some files for testing
nathangeology Jun 18, 2024
55b6de7
Rebase to simplify pull request. Changes to include simulation KPIs f…
njtran Jun 17, 2024
f8a2317
add drift test and adjust group
njtran Jun 17, 2024
9567d21
remove hack removal
njtran Jun 17, 2024
558ee2b
remove unnecessary #
njtran Jun 17, 2024
dad0898
committing collaborative code from njtran to my fork
nathangeology Jun 20, 2024
b39d1aa
Almost have the testing environment built to be able to finish the si…
nathangeology Jun 20, 2024
f6ebe9a
adjusted install-kwok.sh to enable the metrics server for kwok, this …
nathangeology Jun 21, 2024
c2b23c7
getting ready to try getting the github action working.
nathangeology Jun 21, 2024
ba6983c
Ok first try to get the e2e test action to run with the KPI processin…
nathangeology Jun 21, 2024
f96cc8e
Removed the effort to turn on the kwok metrics endpoint for now.
nathangeology Jun 21, 2024
af18243
make verify was failing due to pulling in the kpi script, moved it un…
nathangeology Jun 21, 2024
86e68c2
still getting make verify error removing quiet flag to see what it is
nathangeology Jun 21, 2024
fa54474
tring again to figure out why verify is freaking out during the apply
nathangeology Jun 21, 2024
48b12ac
tring again to figure out why verify is freaking out during the apply x2
nathangeology Jun 21, 2024
5d3b1f2
tring again to figure out why verify is freaking out during the apply x3
nathangeology Jun 21, 2024
2cd688f
removed @ symbol on 115 of makefile, it was causing the verify to fai…
nathangeology Jun 21, 2024
3250907
trying again....
nathangeology Jun 21, 2024
c4b6515
trying again.... x2
nathangeology Jun 21, 2024
d71a428
the karpenter pod was not started adjusted the install command and tr…
nathangeology Jun 21, 2024
8df8d4f
the cluster ping still shows Karpenter as not running, but it could b…
nathangeology Jun 21, 2024
201fc99
Changed the input for the KPI file running.
nathangeology Jun 21, 2024
7a44b5d
Changed which block the KPI script runs in because the TEMP DIR varia…
nathangeology Jun 24, 2024
f8e53f5
Prometheus is not configured to get Karpenter metrics in the e2e acti…
nathangeology Jun 24, 2024
cd0c1e7
Introduced a bug to the prometheus install, trying again.
nathangeology Jun 24, 2024
2adba62
Introduced a bug to the prometheus install, trying again x2
nathangeology Jun 24, 2024
07e82c4
Introduced a bug to the prometheus install, trying again x3
nathangeology Jun 24, 2024
4c91dba
Introduced a bug to the prometheus install, trying again x4
nathangeology Jun 24, 2024
b32eff0
Introduced a bug to the prometheus install, trying again x5
nathangeology Jun 24, 2024
285487a
Working now, added a missing python dependancy that was preventing KP…
nathangeology Jun 24, 2024
9ae8d90
Working now, added a missing python dependancy that was preventing KP…
nathangeology Jun 24, 2024
83a8323
Added tabulate library since that was throwing an error at the end of…
nathangeology Jun 25, 2024
9b3c17f
Changes to add KPIs to e2e action
nathangeology Jun 14, 2024
21c24a5
Replaced awswrangler with pandas
nathangeology Jun 14, 2024
78286af
Updated input to KPI script to refer to TEMP_DIR
nathangeology Jun 14, 2024
735c7ce
Merged in changes with working e2e tests. Still have two todos. One i…
nathangeology Jun 25, 2024
02740f6
Fixed version of karpenter evaluate, checkout, and setup python actions.
nathangeology Jun 25, 2024
06ee521
Fixing shell check error by changing exports to separate lines from t…
nathangeology Jun 25, 2024
b4e2bfe
Trying to get the commit ref correct for karpenter evaluate
nathangeology Jun 25, 2024
9af6247
Commit ref still not working for karpenter_evaluate checkout action, …
nathangeology Jun 25, 2024
5f6f74f
Not getting the single commit checkout to work, will discuss with njt…
nathangeology Jun 25, 2024
9d36c33
Removed a line from earlier single commit experiments that was causin…
nathangeology Jun 25, 2024
fdd8d57
Removed a line from earlier single commit experiments that was causin…
nathangeology Jun 25, 2024
060e2ff
Trying to adjust the test call in the makefile so you can see the tes…
nathangeology Jun 26, 2024
a2a31f7
Putting back to how it was because it didn't work
nathangeology Jun 26, 2024
346c3ad
Trying to put the inline export back in since that seemed to allow it…
nathangeology Jun 26, 2024
9b48182
Trying to suppress error code 1 on failed tests again in the e2e tests
nathangeology Jun 26, 2024
d1cc309
Trying to suppress error code 1 on failed tests again in the e2e test…
nathangeology Jun 26, 2024
4ff0b42
Issues that Nick id'd should be fixed. Going to test this and verify …
nathangeology Jun 26, 2024
2dc555f
Typo in the GA to make a temp dir
nathangeology Jun 26, 2024
4c9f202
Typo in the GA to make a temp dir x2
nathangeology Jun 26, 2024
b4259fc
Minor clean up for the pull request.
nathangeology Jun 26, 2024
d5bc785
Updated karpenter_evaluate commit version
nathangeology Jun 27, 2024
e8d6851
Update .github/workflows/kind-e2e.yaml
nathangeology Jun 28, 2024
9512ffd
Applying suggested changes from PR review, still need to change out a…
nathangeology Jun 28, 2024
b29bd7d
Merge remote-tracking branch 'origin/nick_tran_pairp' into nick_tran_…
nathangeology Jun 28, 2024
6936892
Going to test replacing awswrangler
nathangeology Jun 28, 2024
13ff293
Moving the assignment and export of OUTPUT_DIR back to separate lines…
nathangeology Jun 28, 2024
86d6f8b
Can't test all changes if the e2e tests fail, going to try to skip th…
nathangeology Jun 28, 2024
8957d2d
Ok removing the || true call now that I verified that KPIs are emitti…
nathangeology Jun 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/actions/install-prometheus/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@ runs:
--set "kubelet.serviceMonitor.cAdvisorRelabelings[0].targetLabel=metrics_path" \
--set "kubelet.serviceMonitor.cAdvisorRelabelings[0].action=replace" \
--set "kubelet.serviceMonitor.cAdvisorRelabelings[0].sourceLabels[0]=__metrics_path__" \
--wait
--wait
118 changes: 112 additions & 6 deletions .github/actions/install-prometheus/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,115 @@ prometheus:
tolerations:
- key: CriticalAddonsOnly
operator: Exists
serviceMonitorSelector:
matchLabels:
scrape: enabled
serviceMonitorNamespaceSelector:
matchLabels:
scrape: enabled
serviceMonitorSelector: { }
serviceMonitorNamespaceSelector: { }
additionalScrapeConfigs:
- job_name: serviceMonitor/kube-system/karpenter/0
honor_timestamps: true
track_timestamps_staleness: false
scrape_interval: 30s
scrape_timeout: 10s
scrape_protocols:
- OpenMetricsText1.0.0
- OpenMetricsText0.0.1
- PrometheusText0.0.4
metrics_path: /metrics
scheme: http
enable_compression: true
follow_redirects: true
enable_http2: true
relabel_configs:
- source_labels: [ job ]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels: [ __meta_kubernetes_service_label_app_kubernetes_io_instance, __meta_kubernetes_service_labelpresent_app_kubernetes_io_instance ]
separator: ;
regex: (karpenter);true
replacement: $1
action: keep
- source_labels: [ __meta_kubernetes_service_label_app_kubernetes_io_name, __meta_kubernetes_service_labelpresent_app_kubernetes_io_name ]
separator: ;
regex: (karpenter);true
replacement: $1
action: keep
- source_labels: [ __meta_kubernetes_endpoint_port_name ]
separator: ;
regex: http-metrics
replacement: $1
action: keep
- source_labels: [ __meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name ]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [ __meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name ]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [ __meta_kubernetes_namespace ]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [ __meta_kubernetes_service_name ]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [ __meta_kubernetes_pod_name ]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [ __meta_kubernetes_pod_container_name ]
separator: ;
regex: (.*)
target_label: container
replacement: $1
action: replace
- source_labels: [ __meta_kubernetes_pod_phase ]
separator: ;
regex: (Failed|Succeeded)
replacement: $1
action: drop
- source_labels: [ __meta_kubernetes_service_name ]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: http-metrics
action: replace
- source_labels: [ __address__ ]
separator: ;
regex: (.*)
modulus: 1
target_label: __tmp_hash
replacement: $1
action: hashmod
- source_labels: [ __tmp_hash ]
separator: ;
regex: "0"
replacement: $1
action: keep
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
enable_http2: true
namespaces:
own_namespace: false
names:
- kube-system
28 changes: 26 additions & 2 deletions .github/workflows/kind-e2e.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ jobs:
k8sVersion: ["1.23.x", "1.24.x", "1.25.x", "1.26.x", "1.27.x", "1.28.x", "1.29.x", "1.30.x"]
steps:
- uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- name: Set up Python 3.10
uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5.1
with:
python-version: "3.10"
- uses: ./.github/actions/install-deps
with:
k8sVersion: ${{ matrix.k8sVersion }}
Expand All @@ -35,18 +39,38 @@ jobs:
run: |
make toolchain
make install-kwok
KWOK_REPO=kind.local KIND_CLUSTER_NAME=chart-testing make apply-with-kind
export KWOK_REPO=kind.local
export KIND_CLUSTER_NAME=chart-testing
make apply-with-kind
- name: ping cluster
shell: bash
run: |
sleep 15
nathangeology marked this conversation as resolved.
Show resolved Hide resolved
kubectl get pods -n kube-system | grep karpenter
kubectl get nodepools
kubectl get pods -A
kubectl describe nodes
- uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
repository: nathangeology/karpenter_evaluate
path: ./karpenter_eval/ # Installs to a folder in the Karpenter repo for the test
ref: "3f4cebb703bd136f5034c8b5bf3e6c32e97d1ae6"
fetch-depth: 0
- name: install KPI report dependencies
shell: bash
run: |
pip install pandas==2.2.2
pip install pyarrow==16.1.0
pip install tabulate==0.9.0
pip install prometheus-api-client==0.5.5
pip install ./karpenter_eval/
- name: run test suites
shell: bash
run: |
run: |
OUTPUT_DIR=$(mktemp -d)
export OUTPUT_DIR
make e2etests
python ./karpenter_eval/main.py
- name: cleanup
shell: bash
run: |
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ e2etests: ## Run the e2e suite against your local cluster
--ginkgo.focus="${FOCUS}" \
--ginkgo.timeout=30m \
--ginkgo.grace-period=5m \
--ginkgo.vv
--ginkgo.vv

# Run make install-kwok to install the kwok controller in your cluster first
# Webhooks are currently not supported in the kwok provider.
Expand Down
1 change: 1 addition & 0 deletions test/suites/perf/scheduling_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -147,4 +147,5 @@ var _ = Describe("Performance", func() {
env.TimeIntervalCollector.End("Drift")
})
})

})
Loading