2489 views
啟動SR-IOV === ###### tags: `K8s` `CNI` `PCI-e` # 事前準備 * 先看網卡型號 ```shell= $ lspci | grep -i ethernet ``` ![](https://minio.mcl.math.ncu.edu.tw:443/hackmd/uploads/upload_8109b15e925d45e1871f90c2fddc793a.png) 前兩張是沒有支援SR-IOV 而後面8張是有支援SR-IOV網卡 * 選擇要確認的網卡 ![](https://minio.mcl.math.ncu.edu.tw:443/hackmd/uploads/upload_07692af8c6161534cdfeb3649efc1865.png) # Install SR-IOV to server * 首先我們需要有SR-IOV driver然後我們要使用intel i40e driver去enable SR-IOV,但也請選擇適合你網卡的driver,我用Intel Corporation I350 Gigabit Network Connection為例 ### Install i40e driver ```shell= sudo apt install -y make gcc libelf-dev I40E_VER=2.4.10 wget https://downloadmirror.intel.com/28306/eng/i40e-${I40E_VER}.tar.gz && \ tar xvzf i40e-${I40E_VER}.tar.gz && cd i40e-${I40E_VER}/src && sudo make install && cd - ``` ### Update GRUB Settings ```shell= sudo sed -i '/GRUB_CMDLINE_LINUX_DEFAULT/c\GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"' /etc/default/grub sudo sed -i '/GRUB_CMDLINE_LINUX/c\GRUB_CMDLINE_LINUX="intel_iommu=on"' /etc/default/grub sudo update-grub ``` ### Setup vfio-pci module auto-load on boot ```shell= echo 'vfio-pci' | sudo tee /etc/modules-load.d/vfio-pci.conf wget -qO- https://fast.dpdk.org/rel/dpdk-17.11.2.tar.xz | sudo tar -xJC /opt sudo mv /opt/dpdk-* /opt/dpdk ``` ### Create SR-IOV Script for systemctl * 我們會寫一source code到/opt/scripts/sriov.sh我們需要佈署server 就要執行 ```shell= sudo mkdir -p /sriov-cni /opt/scripts sudo su cat << "EOF" > /opt/scripts/sriov.sh #!/bin/bash # Copied from infra/sriov.sh # Usage: ./sriov.sh ens785f0 NUM_VFS=$(cat /sys/class/net/$1/device/sriov_totalvfs) echo 0 | sudo tee /sys/class/net/$1/device/sriov_numvfs echo $NUM_VFS | sudo tee /sys/class/net/$1/device/sriov_numvfs sudo ip link set $1 up for ((i = 0 ; i < ${NUM_VFS} ; i++ )); do ip link set $1 vf $i spoofchk off; done for ((i = 0 ; i < ${NUM_VFS} ; i++ )); do ip link set dev $1 vf $i state enable; done EOF exit # Script perms sudo chmod 744 /opt/scripts/sriov.sh ``` * After we have the script, we can write a sriov.service to define a service and control this service by sudo systemctl enable sriov. ```shell= sudo su # Systemd unit to run the above script cat << "EOF" > /etc/systemd/system/sriov.service [Unit] Description=Create VFs for ens802f0 [Service] Type=oneshot ExecStart=/opt/scripts/sriov.sh ens802f0 [Install] WantedBy=default.target EOF exit # Enable the SRIOV systemd unit sudo systemctl enable sriov ``` # Concept of building SR-IOV supported Kubernetes ## Repositories to use * 我們需要SR-IOV Container Network Interface 去建sriov binary 而SR-IOV Network Device Plugin 會去建sriovdp binary * [sriov-cni](https://github.com/intel/sriov-cni) * [sriov-network-device-plugin](https://github.com/intel/sriov-network-device-plugin) ### SR-IOV CNI * SR-IOV CNI has simple work to do, like as: * VF network plumbing * VF allocation to POD network namespace * VF deallocation from POD network namespace * SR-IOV會用PF名稱去選擇VF還有配置VF給POD interface: ```yaml= { "name": "mynet", "type": "sriov", "master": "enp1s0f1", "ipam": { "type": "host-local", "subnet": "10.55.206.0/26", "routes": [ { "dst": "0.0.0.0/0" } ], "gateway": "10.55.206.1" } } ``` ### SR-IOV Network Device Plugin * 當我們需要配置VF到container,如果我們沒有足夠的VF可以用,由於SR-IOV CNI耗盡將不能被排程,所以SR-IOV network device plugin需要去支援: 1. 抓到在node上的SR-IOV Network interface 2. 監控VF的健康 3. 要能在kubernetes cluster上有資源佈署的限制 4. 能創建POD特別的network配置給配置完的VF * SR-IOV network device plugin 使用 `/etc/pcidp/config.json` 去確定是哪一個PCI address 接著去run GO來配置SR-IOV給container ### Illustration of SR-IOV and SR-IOV Network Device Plugin * 下面figure.1會大略解釋kubelet會如何與Multus與SR-IOV CNI&network device plugin來合作: * 故事是這樣: * 當我們要啟動SR-IOV network plugin 在每一個節點上當作一個DaemonSet,他會註冊`intel.com/sriov`當作kubernetes上的資源,Kubernetes會通過SR-IOV network plugin查看在各節點上的SR-IOV資源 * 當我們要創建一個基於Deployment的服務kubernetes會決定去配置一個有剩餘資源的節點上佈署 * 有節點提供Pod佈署,kuberntes會送network setup的請求給Mutlus,接著Multus會告訴SR-IOV CNI有關`POD od`還有`network configutation`。 * SR-IOV CNI會setup network 給VF 還有配置VF給Container當作interface。 * figure ![](https://minio.mcl.math.ncu.edu.tw:443/hackmd/uploads/upload_71bf666a96d6632cae248a3428c11098.png) ## Deploying the cluster * SR-IOV enabled on the network interface. ```shell= cord@node1:~$ ip link show dev ens802f0 4: ens802f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000 link/ether 00:1e:67:d2:ee:ea brd ff:ff:ff:ff:ff:ff vf 0 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off vf 1 MAC 16:17:47:f9:43:9a, spoof checking off, link-state auto, trust off, query_rss off vf 2 MAC fe:96:60:5f:d3:50, spoof checking off, link-state auto, trust off, query_rss off vf 3 MAC 36:91:06:87:0d:c6, spoof checking off, link-state auto, trust off, query_rss off ... snip ``` * SR-IOV CNI located on CNI binary folder` ```shell= cord@node2:~$ ls /opt/cni/bin bridge centralip cnishim dhcp flannel host-local ipvlan loopback macvlan multus portmap ptp sample sriov tuning vlan ``` * SR-IOV Network Device Plugin runs ```shell= $ kubectl -n kube-system logs sriov-device-plugin-5qfxc I0107 23:19:30.469685 13614 server.go:132] ListAndWatch(sriov): send updated devices &ListAndWatchResponse{Devices:[&Device{ID:0000:04:10.0,Health:Healthy,} &Device{ID:0000:04:10.2,Health:Healthy,} &Device{ID:0000:04:10.4,Health:Healthy,} &Device{ID:0000:04:10.6,Health:Healthy,} ... snip],} ``` ## Check resource exists and discovers by Kubernetes ```shell= $ kubectl get node node2 -o json | jq '.status.allocatable' { "cpu": "40", "ephemeral-storage": "452695013856", "hugepages-1Gi": "32Gi", "intel.com/sriov": "63", "memory": "32210384Ki", "pods": "110" } ``` # Conclusion * 本篇主要是描述kubernetes在SR-IOV CNI and SR-IOV Network Device Plugin 並不是聚焦在實作只有闡明安裝步驟。