How running Kubernetes at game studio Pixel Federation

AUTHOR

Adam Hamšík

CEO and Co-Founder

At Pixel Federation, we currently run our infrastructure on a privately hosted OpenStack. After a while we have decided to check and see if there are any better options on how to run our workloads and deploy our code.

There were two obvious choices:

◦ Run everything in containers on Kubernetes

◦ See if public clouds make sense

After a long research, we have concluded that answers for both questions is yes. So I’m happy to report that right now we are already running production on Kubernetes on Amazon AWS. I will try to answer the following questions about our deployments:

How do we deploy Kubernetes?

What have we learned after running Kubernetes in production for a while aka Day Two Kubernetes Operations

How to stay out of trouble with Kubernetes

Kubernetes Deployment

For Kubernetes deployment we use a tool called Kops. It has an excellent AWS support, but it also works great with other public cloud providers (GCE, VMware Vsphere), too. Kops can be used as it is or it can generate Cloudformation/Terraform templates. From our own experience we use terraform and so far it has worked great. Kubernetes is a great tool and you can run almost anything there.

Here are some things what we learned:

Make your cluster Highly Available by using multiple masters/nodes in different AZ and by using Autoscalling.

◦ For nodes make sure you have at least 2 nodes per AZ.

◦ Don’t run stateful services inside Kubernetes, it’s best to run DB/Redis/ElasticCache outside as managed services. ◦ Use ELK/EFK stack for logs gathering (it will make any troubleshooting easier). ◦ Deploy Prometheus/Grafana for monitoring.

Use Kubernetes addons to extend its functionality:

◦ External-DNS ◦ Cluster-autoscaller ◦ Ingress controllers

Application Deployment

Deploying applications to Kubernetes might be challenging and we learned a lot by doing it.

Healthchecks are really important, make sure you have them in your application and that they work correctly.
Use one processper container, use sidecar containers if necessary.
Make sure you setup HPA (Horizontal Pod Autoscaler) and CA(Cluster Autoscaler) properly.
Use HELM for application manifest deployment.
Use the best deployment update method for your application.
Kubernetes integrated deployment types
Rolling Update
Recreate
Helm scripted
Blue/Green
A/B
Canary

Day Two Kubernetes Operations

Running Kubernetes in production might be a challenging effort, but we have found out that following some simple rules can really make your life easier.

Kubernetes Scheduler

It will run your servers to the ground. Default settings for scheduler is to use as many resources as possible on your nodes. That might cause some instability and troubles with your app so make sure you reserve some resource for the operating system. Especially RAM is really important. Currently, we save at least 15% of ram on each node for OS use.

Cluster Autoscaler

Make sure you configure your app deployments properly by using — PodDistributionBudget — podAntiAffinity to distribute your pods around nodes properly. Make sure that Autoscaling groups running your nodes has AWS ASG rebalance policy disabled.

Make sure you set Autoscalling group limits with some room to grow. Kubernetes can handle the huge load but it still needs to compute resources. Be aware of AWS own limits, too. There is nothing worse than having a spike and not been able to launch new nodes because you reached your limit for a particular region.

Node problems

Detecting any problems on kubernetes nodes as soon as possible is critical. It helps you prevent any troubles with miss behaving applications and makes your life easier. For node problem detection we use node-problem-detector

Application troubles

Make sure you define application pod limits/requests properly. It might be a pain at the beginning as you have to understand how much ram/cpu your application needs but it will help kubernetes scheduler make proper decisions.

Make sure your Health Checks work properly and that they give you right info. Worst case for you is GREEN health check with application not able to serve customers.

Container Security

Make sure you check your containers for any security issues and update them often. Running 3 year-old containers full of security issues is a really BAD practice.

Future

◦ We are currently planning to deploy our first production cluster to China.

◦ We plan to evaluate Mesh networks istio.

◦ We would like to test AWS ALB ingress controller.

So this is our approach to Kubernetes. What is yours? Any tips or struggles you would like to share and discuss? Then join our Facebook group called Free to Play Game Developers and feel free to invite your fellow game developers as well.

‍