-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Description
What happened?
When deploying a multi nodes K8s cluster with kube_network_plugin_multus: true, sometimes files in multus_manifest_1 (https://github.com/kubernetes-sigs/kubespray/blob/v2.29.0/roles/network_plugin/multus/tasks/main.yml#L12) are not deployed.
After debugging, I think the bug is caused by the change in 84504d1#diff-b51c4bf9b67294b4b1bbccd6cd43ed628d16b206736d23dca73efe299dde3082.
Before the change, Multus | Start resources which deploys multus's K8s resources are ran in role: kubernetes-apps/network_plugin on hosts: kube_control_plane (https://github.com/kubernetes-sigs/kubespray/blob/v2.28.0/playbooks/cluster.yml#L88).
After the change, Multus | Start resources which deploys multus's K8s resources are ran in role: network_plugin on hosts: k8s_cluster (https://github.com/kubernetes-sigs/kubespray/blob/v2.29.0/playbooks/cluster.yml#L56).
hosts: kube_control_plane ordering is defined by user in inventory (e.g. https://github.com/kubernetes-sigs/kubespray/blob/v2.29.0/inventory/sample/inventory.ini#L9).
hosts: k8s_cluster ordering is defined by playbook in https://github.com/kubernetes-sigs/kubespray/blob/v2.29.0/roles/dynamic_groups/tasks/main.yml#L13, and the ordering seems to be RANDOM.
Multus | Start resources has run_once: true (https://github.com/kubernetes-sigs/kubespray/blob/v2.29.0/roles/network_plugin/multus/tasks/main.yml#L47), so ansible will only run that on the 1st node of hosts: k8s_cluster. However since the ordering is random, it means ansible will run that on a RANDOM node in hosts: k8s_cluster.
The problem is that multus_manifest_1 is only avilable on groups['kube_control_plane'][0] (https://github.com/kubernetes-sigs/kubespray/blob/v2.29.0/roles/network_plugin/multus/tasks/main.yml#L13).
So when Multus | Start resources is NOT ran on groups['kube_control_plane'][0], it won't has multus_manifest_1, hence multus_manifest_1 will not be deployed (https://github.com/kubernetes-sigs/kubespray/blob/v2.29.0/roles/network_plugin/multus/tasks/main.yml#L48).
Issue can be fixed by updating (multus_manifest_1.results | default([])) to (hostvars[groups['kube_control_plane'][0]].multus_manifest_1.results | default([])) in https://github.com/kubernetes-sigs/kubespray/blob/v2.29.0/roles/network_plugin/multus/tasks/main.yml#L48. i.e. with_items: "{{ (hostvars[groups['kube_control_plane'][0]].multus_manifest_1.results | default([])) + (multus_nodes_list | map('extract', hostvars, 'multus_manifest_2') | map('default', []) | list | json_query('[].results')) }}".
What did you expect to happen?
All multus K8s resources are deployed when deploying a multi nodes K8s cluster with kube_network_plugin_multus: true.
How can we reproduce it (as minimally and precisely as possible)?
Deploy a multi nodes K8s cluster with kube_network_plugin_multus: true.
OS
RHEL 8
Version of Ansible
Version of Python
Version of Kubespray (commit)
v2.29.0
Network plugin used
flannel