Highly Available Resilient Applications in Kubernetes 1 of 3
This is the first in 3 that outlines Highly Available (HA) and application resilience best practices for running a custom, or third party application hosted inside a Kubernetes (K8s) cluster. High Availability and resilience allow us to handle: infrastructure and applications failures, cloud outages where the Kubernetes cluster is still functional, rolling updates of K8s, and rolling updates of applications. One of the guiding principles of Kubernetes is HA fault tolerance, but Kubernetes provides a platform to build applications that meet HA SLAs, it does not make applications fault tolerant.
Whenever I think about architecture I follow a simple thought process: What do I need to do to design and deploy an application, so that I am not woken up at 2am because a page goes off? And if I am woken up at 2am, how can I setup a system that will failover and recover itself by the time that I log into the Kubernetes cluster.
TLDR;
These are not nice-to-haves, but must haves for designing and deploying an application hosted in Kubernetes.
- Make sure your application stops gracefully when it gets a
SIGTERM
signal. See Gracefully handling container stop Signal Handling within Kubernetes. - Use and
ENTRYPOINT
with dumb-init so that signals are passed properly to your binary. - Use Pre or Post stop hooks if your binary needs more TLC to start or stop gracefully.
Covered in Part Two
- Use Deployment Controller Manifests for Microservice, and use StatefulSets only if you need their features.
- Jobs and DaemonSets do not provide out of the box HA, but fill some use cases.
- Persistent Volumes are the way to save a make data persitent.
Covered in Part Three
- Use Liveness and Ready Probes. Design your application to use and support them.
- Use Affinity and Anti-Affinity Selectors if Pods need to be ditributed across nodes.
Application Lifecycle Within Kubernetes
A container, which hosts an application, can be made aware of events in its lifecycle. This information is essential for a hosted application to be alerted that it started, or notified that it is stopping. Within various scenarios, including a Pod eviction from a Kubernetes node. Another such scenario is when a Kubernetes node is drained, before destroying that node.
When an event occurs, kubelet calls into any registered container hook for that
event. The hook calls are synchronous in the processing of the container. This
means for a pre-start hook the container entry point, and the hook will fire
asynchronously. Hooks also impact the state of a container within the
Kubernetes system. For example, if a PostStart
hook fails, the container will
not reach “running” state.
Container Hooks
Hooks execute as either an HTTP request or an execution of a command within the container. More detailed information about Container Lifecycle Hooks is found via the provided link.
PostStart Hook
This hook fires after a container creation, and often runs at the same time as a containers entry point. Since this is an asynchronous call when the hook runs, the timing is not guaranteed.
PreStop Hook
This hook executes before a container termination, while the PID is still running. PreStop
hook event is blocking and completes before the call to delete the container is sent to the Docker daemon.
Container Hook Use
To make an application more resilient application tasks may need to be completed, or some runtime executables may need some help with signal handling to stop. Often when using Java JVM, a container will not handle a shutdown gracefully.
Including the following example, PreStop
to JVM based containers is often helpful. This example lives within the contain section of a Kubernetes manifest.
1
2
3
4
5
6
7
lifecycle:
preStop:
exec:
command:
- /bin/bash
- -c
- PID=`pidof java` && kill -SIGTERM $PID && while ps -p $PID > /dev/null; do sleep 1; done;
Other applications such as Nginx will stop when they receive an SIGTERM
signal. To gracefully stop Nginx use the following preStop
hook.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx
spec:
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
lifecycle:
preStop:
exec:
# use this command to gracefully shutdown
command: ["/usr/sbin/nginx","-s","quit"]
Gracefully Handling Container Stop Signal Handling within Kubernetes
When Kuberentes shuts down a container, two different Unix Signals run: SIGTERM
and SIGKILL
. An example of the workflow to stop a pod and its container(s).
- The Kubernetes API receives a call command to delete a container or Pod.
- Default grace period of 30s starts unless otherwise configured.
- Pod status is set to “Terminating.”
- Kubelet starts the Pod shutdown process.
- If a
preStop
hook exists it executes. - The processes in the Pod’s containers are sent the SIGTERM signal
- If the processes are still running after the default grace period, an SIGKILL signal given to the processes.
- Kubelet updates K8s API removing the Pod when kubelet finished deleting the pod and its container(s).
The application must receive the correct signals and handle those flags. Moreover, properly designed, properly behaving, and appropriately deploy applications should not get to the point where a SIGKILL signal is not needed.
Complexity of Signal Handling in Containers
There is a well known process ID 1 problem that can add complexity to handling signals within containers. Depending on the executable used from a Dockers ENTRYPOINT, that problem can cause complexities.
TLDR;
Process ID 1, or PID 1, is a special process ID that the kernel reserves for init scripts. Because init scripts are not used within containers, having an applications PID running as PID 1 can cause unexpected and obscure-looking issues.
oreover, various implementations of the UNIX shell, /bin/sh
, do not pass
signals to their child processes. For instance, the default implementation of
shell in the alpine base container does not send interrupts to its child
processes.
A simple solution is to use a binary that acts as a signal proxy and starts a child process as PID 2 inside a container.
dumb-init
Various binaries exist that assist with PID management and signal proxying within containers. One such tool that is used within Trebuchet is [dumb-init] (https://github.com/Yelp/dumb-init). Yelp open sourced this a small C-based binary to solve the two problems listed above, and more:
dumb-init
starts as PID 1, and then start a container application as PID 2.dumb-init
proxies any UNIX signals, such as SIGTERM, to its child process PID 2.dumb-init
reaps any zombie process created.
One of our base Docker images dumb-init
contains the ENTRYPOINT
for
dumb-init.
1
ENTRYPOINT ["/sbin/dumb-init"]
If your application uses the above container, add a CMD
reference in
applications Dockerfile.
1
2
FROM "our-repo:dumb-init:0.ourversion"
CMD ["/my-app"]
When the above container the executes its ENTRYPOINT
, the CMD
runs as an
argument.
Next posts will cover manifest types and controlling scheduling.