啟動SR-IOV
===
###### tags: `K8s` `CNI` `PCI-e`
# 事前準備
* 先看網卡型號
```shell=
$ lspci | grep -i ethernet
```
![](https://minio.mcl.math.ncu.edu.tw:443/hackmd/uploads/upload_8109b15e925d45e1871f90c2fddc793a.png)
前兩張是沒有支援SR-IOV 而後面8張是有支援SR-IOV網卡
* 選擇要確認的網卡
![](https://minio.mcl.math.ncu.edu.tw:443/hackmd/uploads/upload_07692af8c6161534cdfeb3649efc1865.png)
# Install SR-IOV to server
* 首先我們需要有SR-IOV driver然後我們要使用intel i40e driver去enable SR-IOV,但也請選擇適合你網卡的driver,我用Intel Corporation I350 Gigabit Network Connection為例
### Install i40e driver
```shell=
sudo apt install -y make gcc libelf-dev
I40E_VER=2.4.10
wget https://downloadmirror.intel.com/28306/eng/i40e-${I40E_VER}.tar.gz && \
tar xvzf i40e-${I40E_VER}.tar.gz && cd i40e-${I40E_VER}/src && sudo make install && cd -
```
### Update GRUB Settings
```shell=
sudo sed -i '/GRUB_CMDLINE_LINUX_DEFAULT/c\GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"' /etc/default/grub
sudo sed -i '/GRUB_CMDLINE_LINUX/c\GRUB_CMDLINE_LINUX="intel_iommu=on"' /etc/default/grub
sudo update-grub
```
### Setup vfio-pci module auto-load on boot
```shell=
echo 'vfio-pci' | sudo tee /etc/modules-load.d/vfio-pci.conf
wget -qO- https://fast.dpdk.org/rel/dpdk-17.11.2.tar.xz | sudo tar -xJC /opt
sudo mv /opt/dpdk-* /opt/dpdk
```
### Create SR-IOV Script for systemctl
* 我們會寫一source code到/opt/scripts/sriov.sh我們需要佈署server 就要執行
```shell=
sudo mkdir -p /sriov-cni /opt/scripts
sudo su
cat << "EOF" > /opt/scripts/sriov.sh
#!/bin/bash
# Copied from infra/sriov.sh
# Usage: ./sriov.sh ens785f0
NUM_VFS=$(cat /sys/class/net/$1/device/sriov_totalvfs)
echo 0 | sudo tee /sys/class/net/$1/device/sriov_numvfs
echo $NUM_VFS | sudo tee /sys/class/net/$1/device/sriov_numvfs
sudo ip link set $1 up
for ((i = 0 ; i < ${NUM_VFS} ; i++ )); do ip link set $1 vf $i spoofchk off; done
for ((i = 0 ; i < ${NUM_VFS} ; i++ )); do ip link set dev $1 vf $i state enable; done
EOF
exit
# Script perms
sudo chmod 744 /opt/scripts/sriov.sh
```
* After we have the script, we can write a sriov.service to define a service and control this service by sudo systemctl enable sriov.
```shell=
sudo su
# Systemd unit to run the above script
cat << "EOF" > /etc/systemd/system/sriov.service
[Unit]
Description=Create VFs for ens802f0
[Service]
Type=oneshot
ExecStart=/opt/scripts/sriov.sh ens802f0
[Install]
WantedBy=default.target
EOF
exit
# Enable the SRIOV systemd unit
sudo systemctl enable sriov
```
# Concept of building SR-IOV supported Kubernetes
## Repositories to use
* 我們需要SR-IOV Container Network Interface 去建sriov binary 而SR-IOV Network Device Plugin 會去建sriovdp binary
* [sriov-cni](https://github.com/intel/sriov-cni)
* [sriov-network-device-plugin](https://github.com/intel/sriov-network-device-plugin)
### SR-IOV CNI
* SR-IOV CNI has simple work to do, like as:
* VF network plumbing
* VF allocation to POD network namespace
* VF deallocation from POD network namespace
* SR-IOV會用PF名稱去選擇VF還有配置VF給POD interface:
```yaml=
{
"name": "mynet",
"type": "sriov",
"master": "enp1s0f1",
"ipam": {
"type": "host-local",
"subnet": "10.55.206.0/26",
"routes": [
{ "dst": "0.0.0.0/0" }
],
"gateway": "10.55.206.1"
}
}
```
### SR-IOV Network Device Plugin
* 當我們需要配置VF到container,如果我們沒有足夠的VF可以用,由於SR-IOV CNI耗盡將不能被排程,所以SR-IOV network device plugin需要去支援:
1. 抓到在node上的SR-IOV Network interface
2. 監控VF的健康
3. 要能在kubernetes cluster上有資源佈署的限制
4. 能創建POD特別的network配置給配置完的VF
* SR-IOV network device plugin 使用 `/etc/pcidp/config.json` 去確定是哪一個PCI address 接著去run GO來配置SR-IOV給container
### Illustration of SR-IOV and SR-IOV Network Device Plugin
* 下面figure.1會大略解釋kubelet會如何與Multus與SR-IOV CNI&network device plugin來合作:
* 故事是這樣:
* 當我們要啟動SR-IOV network plugin 在每一個節點上當作一個DaemonSet,他會註冊`intel.com/sriov`當作kubernetes上的資源,Kubernetes會通過SR-IOV network plugin查看在各節點上的SR-IOV資源
* 當我們要創建一個基於Deployment的服務kubernetes會決定去配置一個有剩餘資源的節點上佈署
* 有節點提供Pod佈署,kuberntes會送network setup的請求給Mutlus,接著Multus會告訴SR-IOV CNI有關`POD od`還有`network configutation`。
* SR-IOV CNI會setup network 給VF 還有配置VF給Container當作interface。
* figure
![](https://minio.mcl.math.ncu.edu.tw:443/hackmd/uploads/upload_71bf666a96d6632cae248a3428c11098.png)
## Deploying the cluster
* SR-IOV enabled on the network interface.
```shell=
cord@node1:~$ ip link show dev ens802f0
4: ens802f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
link/ether 00:1e:67:d2:ee:ea brd ff:ff:ff:ff:ff:ff
vf 0 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off
vf 1 MAC 16:17:47:f9:43:9a, spoof checking off, link-state auto, trust off, query_rss off
vf 2 MAC fe:96:60:5f:d3:50, spoof checking off, link-state auto, trust off, query_rss off
vf 3 MAC 36:91:06:87:0d:c6, spoof checking off, link-state auto, trust off, query_rss off
... snip
```
* SR-IOV CNI located on CNI binary folder`
```shell=
cord@node2:~$ ls /opt/cni/bin
bridge centralip cnishim dhcp flannel host-local ipvlan loopback macvlan multus portmap ptp sample sriov tuning vlan
```
* SR-IOV Network Device Plugin runs
```shell=
$ kubectl -n kube-system logs sriov-device-plugin-5qfxc
I0107 23:19:30.469685 13614 server.go:132] ListAndWatch(sriov): send updated devices &ListAndWatchResponse{Devices:[&Device{ID:0000:04:10.0,Health:Healthy,} &Device{ID:0000:04:10.2,Health:Healthy,} &Device{ID:0000:04:10.4,Health:Healthy,} &Device{ID:0000:04:10.6,Health:Healthy,} ... snip],}
```
## Check resource exists and discovers by Kubernetes
```shell=
$ kubectl get node node2 -o json | jq '.status.allocatable'
{
"cpu": "40",
"ephemeral-storage": "452695013856",
"hugepages-1Gi": "32Gi",
"intel.com/sriov": "63",
"memory": "32210384Ki",
"pods": "110"
}
```
# Conclusion
* 本篇主要是描述kubernetes在SR-IOV CNI and SR-IOV Network Device Plugin 並不是聚焦在實作只有闡明安裝步驟。