Consider your pods (Azure)
Consider your pods (Azure)
Recently, we ran in some issues when doing training with several trainees on our SAP Datahub environment that is running on a Kubernetes cluster deployed on Azure. If you do a default deployment of Kubernetes with advanced networking, you end up with a pod limit of 30 pods per node. This is something you need to consider before installation since it can only be set during the initial deployment of the cluster and it cannot be changed afterward.
We deployed SAP Datahub on a Kubernetes cluster on Azure currently running 3 nodes (8 vcpus, 32 GB memory). For the little exercises, we had foreseen this should have been plenty, but still, we ran into issues. Expanding the resources to 8 nodes temporarily resolved the issue, but soon we were having troubles again. Here is an explanation of why.
Every node in the cluster is limited to 30 pods. When starting up a Kubernetes cluster, Kubernetes itself already starts a number of pods on each node to provide a number of services to the applications that will be deployed onto the cluster:
If you look at our 3 node deployment already 25 pods are taken by Kubernetes itself:
Also, the data hub installation needs a considerable number of pods running to provide all the core services needed to run the environment. If I look at the pods in the Kubernetes “SAP DATA HUB” (name of the namespace we where we deployed our datahub to in Kubernetes), we have 51 pods running for the core system. Actually 49, since 2 pods are thereafter using the SAP Datahub launchpad and using the system management, which I will explain later.
25 pods used by Kubernetes and 49 pods used by means 76 pods of the 90 available on our 3 nodes deployment are already used and I haven’t even started using the application yet. Because when you launch the datahub launchpad a pod is started on the cluster. As soon as you start one of the data hub applications by clicking on of the tiles, at least one other pod is started on the cluster.
If I start using the Connection Management, Meta Data Explorer, Modeler, Vora Tools, and system management, you can retrieve these pods back in the Kubernetes dashboard. Using the launchpad and starting up the applications launches pods that are dedicated to the user. This means that another user will spin-up another set of pods dedicated to his user. The initial delays experienced when using the launchpad and applications for the first time as a new user are caused by the spin-up of the needed pods on the cluster.
Be aware that the launchpad and application pods stay active on the cluster even if you logout of Datahub. A user that already logged in before will reuse the already started pods. You will however notice you have no longer the initial start-up delay.
If you start doing data Profiling via the “Meta explorer” or start executing graphs you created in the “Modeler” the demanded process will perform their execution tasks by submitting pods to the cluster. The next screenshot shows the additional pods launched by starting a profiling job on a dataset.
In the case of the profiling, one coordinator pods are started that will stay active and dedicated to the user, the other pods will end and free the pod allocation on the Kubernetes node where they ran.
Once you run into your pod-limit, the pods will no longer start-up and start waiting for pod-slots to free up by pods that completed or pod-slots that become available by up-scaling the Kubernetes cluster through the addition of a new node. Another way to free up some pods is by deleting application instances for some of the users using the System Management application available via Launchpad. Deleting User instances will also delete the related pod.
To work around the issue we faced during the training, we spun up some additional nodes, but because of our pod-limits, two training users would completely allocate one VM of 8vcpu, 32 GB, leaving available resources underutilized. When looking at the cpu and memory request the concerned node only 1/3 of the cpu and memory where allocated. If you spun up your Kubernetes cluster with even a more powerful vm, the resource loss becomes even greater.
So when deploying your Kubernetes cluster, you should probably consider a higher pod-limit than 30. Azure allows for a maximum pod-limit of 250, while Kubernetes doesn’t recommend to go over 100.
Blog by Pascal De Poorter