Skip to content

Some multus K8s resources (multus_manifest_1) are not deployed sometimes #12755

@tyeung-harmonicinc

Description

@tyeung-harmonicinc

What happened?

When deploying a multi nodes K8s cluster with kube_network_plugin_multus: true, sometimes files in multus_manifest_1 (https://github.com/kubernetes-sigs/kubespray/blob/v2.29.0/roles/network_plugin/multus/tasks/main.yml#L12) are not deployed.

After debugging, I think the bug is caused by the change in 84504d1#diff-b51c4bf9b67294b4b1bbccd6cd43ed628d16b206736d23dca73efe299dde3082.

Before the change, Multus | Start resources which deploys multus's K8s resources are ran in role: kubernetes-apps/network_plugin on hosts: kube_control_plane (https://github.com/kubernetes-sigs/kubespray/blob/v2.28.0/playbooks/cluster.yml#L88).

After the change, Multus | Start resources which deploys multus's K8s resources are ran in role: network_plugin on hosts: k8s_cluster (https://github.com/kubernetes-sigs/kubespray/blob/v2.29.0/playbooks/cluster.yml#L56).

hosts: kube_control_plane ordering is defined by user in inventory (e.g. https://github.com/kubernetes-sigs/kubespray/blob/v2.29.0/inventory/sample/inventory.ini#L9).

hosts: k8s_cluster ordering is defined by playbook in https://github.com/kubernetes-sigs/kubespray/blob/v2.29.0/roles/dynamic_groups/tasks/main.yml#L13, and the ordering seems to be RANDOM.

Multus | Start resources has run_once: true (https://github.com/kubernetes-sigs/kubespray/blob/v2.29.0/roles/network_plugin/multus/tasks/main.yml#L47), so ansible will only run that on the 1st node of hosts: k8s_cluster. However since the ordering is random, it means ansible will run that on a RANDOM node in hosts: k8s_cluster.

The problem is that multus_manifest_1 is only avilable on groups['kube_control_plane'][0] (https://github.com/kubernetes-sigs/kubespray/blob/v2.29.0/roles/network_plugin/multus/tasks/main.yml#L13).

So when Multus | Start resources is NOT ran on groups['kube_control_plane'][0], it won't has multus_manifest_1, hence multus_manifest_1 will not be deployed (https://github.com/kubernetes-sigs/kubespray/blob/v2.29.0/roles/network_plugin/multus/tasks/main.yml#L48).

Issue can be fixed by updating (multus_manifest_1.results | default([])) to (hostvars[groups['kube_control_plane'][0]].multus_manifest_1.results | default([])) in https://github.com/kubernetes-sigs/kubespray/blob/v2.29.0/roles/network_plugin/multus/tasks/main.yml#L48. i.e. with_items: "{{ (hostvars[groups['kube_control_plane'][0]].multus_manifest_1.results | default([])) + (multus_nodes_list | map('extract', hostvars, 'multus_manifest_2') | map('default', []) | list | json_query('[].results')) }}".

What did you expect to happen?

All multus K8s resources are deployed when deploying a multi nodes K8s cluster with kube_network_plugin_multus: true.

How can we reproduce it (as minimally and precisely as possible)?

Deploy a multi nodes K8s cluster with kube_network_plugin_multus: true.

OS

RHEL 8

Version of Ansible

Version of Python

Version of Kubespray (commit)

v2.29.0

Network plugin used

flannel

Full inventory with variables

Command used to invoke ansible

Output of ansible run

Anything else we need to know

Metadata

Metadata

Assignees

No one assigned

    Labels

    RHEL 8kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions