ETCD in Kubernetes: Day-10

In the intricate world of container orchestration, Kubernetes stands as a towering achievement of modern distributed systems. But beneath its powerful exterior lies a critical component that often goes unnoticed – ETCD. This distributed key-value store serves as the brain of every Kubernetes cluster, quietly maintaining the state that keeps your applications running smoothly.

Picture ETCD as the keeper of truth in a bustling digital city. While containers come and go like citizens through city streets, ETCD maintains the master record of what should be running where, how many instances should exist, and what resources they should have access to. Its inspiration draws from years of distributed systems research at Google, particularly the Chubby lock service, but with a modern twist that emphasizes simplicity and reliability.

In this deep dive, we’ll peel back the layers of ETCD to understand not just what it does, but why its design choices make it the perfect fit for Kubernetes’ distributed nature. Whether you’re a seasoned Kubernetes administrator or just beginning your container orchestration journey, understanding ETCD is crucial to grasping how Kubernetes maintains consistency in an inherently chaotic distributed environment.

Join me as we explore the architecture, consensus mechanisms, and real-world implications of this fascinating piece of technology that keeps the cloud-native world spinning.

The Inspiration Behind ETCD

Let’s start with the inspiration for ETCD. In a typical Linux environment, you’ve probably noticed that configuration files are stored in the /etc directory. This is where various applications and services store their configuration data.

For example, if you run ls -la /etc on a Linux system, you’ll see numerous files. These are configuration files for different services. PAM-related configurations, lab-related configurations if you’re using a lab environment, and so on – they’re all stored here.

ETCD takes inspiration from this concept, but with a twist. The 'et' in ETCD comes from '/etc', and the 'd' stands for 'distributed'. Why? Let’s explore that next.

Why Distributed?

In a traditional Linux server, it makes sense to store all configurations directly on the disk of that server. However, in a distributed environment with potentially hundreds of servers, storing everything on one server isn’t ideal.

You need a system where configuration data is:

Distributed across multiple nodes
Redundant for fault tolerance
Highly available

This is the driving force behind ETCD – a distributed configuration store.

Understanding ETCD

So, what exactly is ETCD? It’s a distributed, reliable key-value store. Let’s break that down:

Distributed: It can run across multiple nodes.
Reliable: It ensures data consistency and fault tolerance.
Key-Value Store: Data is stored as key-value pairs.

In ETCD, you can create a hierarchy of keys, each with associated values. For example:

/myapp/database/url : "localhost:5432"
/myapp/database/user : "admin"

This hierarchical structure allows for organized and easily navigable configuration data.

ETCD in Kubernetes

Now, let’s talk about etcd’s role in Kubernetes. In a Kubernetes cluster, etcd serves as the primary data store for all cluster data. This is crucial information, so let’s emphasize it:

All Kubernetes cluster information is stored in etcd.

What does this mean in practice? Let’s consider a few scenarios:

If your Kubernetes master has 100 worker nodes, where is this information stored? In etcd.
When you run kubectl get nodes, where does this data come from? etcd.
When you create pods, deployments, or services, where is this information persisted? Again, etcd.

Hands-on with ETCD

Let’s do a quick demonstration. Imagine we have a single-node cluster running in our demo environment. We can interact with etcd using the etcdctl command-line utility, which is similar to kubectl for Kubernetes.

Here are some basic operations:

Setting a key-value pair:

etcdctl put instructor "Cecil"

etcdctl put course "Kubernetes"

Retrieving values:

etcdctl get instructor

etcdctl get course

These commands demonstrate the basic key-value nature of etcd storage.

Key Takeaways

Let’s recap the main points:

Kubernetes clusters store all their data in etcd.
Any information you retrieve using kubectl commands (like get pods, get nodes, etc.) is stored and retrieved from etcd.
Any operation that changes the state of the cluster (creating resources, updating configurations) results in updates to etcd.
All cluster information, whether it’s about the master components or node-specific details, is stored in etcd.

etcd is a distributed, reliable key-value store that’s simple, secure, and fast. But what does that really mean? Let’s break it down.

Key-Value Store vs. Traditional Databases

Traditionally, we’ve used relational databases with tables, rows, and columns. For example:

NAME	AGE	OCCUPATION
Alice	28	Engineer
Bob	35	Design
Charlie	42	Manager

But what if we want to add a “Salary” field? We’d need to alter the entire table structure. This is where key-value stores shine.

In etcd, we might store this data like:

/employees/alice: {"name": "Alice", "age": 28, "occupation": "Engineer"}
/employees/bob: {"name": "Bob", "age": 35, "occupation": "Designer"}
/employees/charlie: {"name": "Charlie", "age": 42, "occupation": "Manager"}

Now, if we want to add a salary for Alice, we simply update her record without affecting others:

/employees/alice: {"name": "Alice", "age": 28, "occupation": "Engineer", "salary": 75000}

This flexibility is one of the key advantages of etcd.

Getting Started with etcd

Let’s set up etcd and try some basic operations.First, download and install etcd:

wget https://github.com/etcd-io/etcd/releases/download/v3.4.16/etcd-v3.4.16-linux-amd64.tar.gz

tar xzvf etcd-v3.4.16-linux-amd64.tar.gz
cd etcd-v3.4.16-linux-amd64

Start the etcd server:

./etcd

In another terminal, let’s use etcdctl to interact with our etcd server:

# Set the API version
export ETCDCTL_API=3

# Store a key-value pair
./etcdctl put mykey "Hello, etcd!"

# Retrieve the value
./etcdctl get mykey

# Watch for changes
./etcdctl watch mykey

In another terminal, update the value:

./etcdctl put mykey "Hello, updated etcd!"

You’ll see the change immediately in the watching terminal!

ETCD in Kubernetes

Now, let’s explore how etcd is used in Kubernetes.

Storing Cluster State

In Kubernetes, etcd stores all cluster data. Let’s see a real example:

In a Kubernetes cluster, create a pod:

kubectl run nginx --image=nginx

Now, let’s see how this is stored in etcd. First, port-forward to the etcd pod:

kubectl port-forward -n kube-system etcd-minikube 2379:2379

Then, query etcd:

ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 \
--cacert=/var/lib/minikube/certs/etcd/ca.crt \
--cert=/var/lib/minikube/certs/apiserver-etcd-client.crt \
--key=/var/lib/minikube/certs/apiserver-etcd-client.key \
get /registry/pods/default/nginx -w json | jq .

You’ll see the entire pod configuration stored in etcd!

High Availability in etcd

For production environments, we typically run etcd in a cluster for high availability. Let’s set up a 3-node etcd cluster:On Node 1:

etcd --name infra0 --initial-advertise-peer-urls http://10.0.1.10:2380 \
--listen-peer-urls http://10.0.1.10:2380 \
--listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://10.0.1.10:2379 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
--initial-cluster-state new

On Node 2:

etcd --name infra1 --initial-advertise-peer-urls http://10.0.1.11:2380 \
--listen-peer-urls http://10.0.1.11:2380 \
--listen-client-urls http://10.0.1.11:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://10.0.1.11:2379 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
--initial-cluster-state new

On Node 3:

etcd --name infra2 --initial-advertise-peer-urls http://10.0.1.12:2380 \
--listen-peer-urls http://10.0.1.12:2380 \
--listen-client-urls http://10.0.1.12:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://10.0.1.12:2379 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
--initial-cluster-state new

Now you have a 3-node etcd cluster! This setup provides fault tolerance – your cluster can survive if one node fails.

Advanced etcd Operations

Let’s explore some more advanced operations:

Versioning

etcd supports versioning of keys. Let’s see how:

# Put a value
etcdctl put foo bar

# Update the value
etcdctl put foo bar2

# Get all versions
etcdctl get foo --rev=0

# You'll see both versions!

Leases

Leases in etcd are useful for implementing things like service discovery:

# Create a lease
lease=$(etcdctl lease grant 60 | grep -o '[0-9]\+')

# Attach a key to the lease
etcdctl put --lease=$lease my-service-key '{"host": "10.0.0.1", "port": 8080}'

# The key will automatically be deleted after 60 seconds!

Transactions

etcd supports atomic transactions:

etcdctl txn --interactive

compares:
value("foo") = "bar"

success requests (get, put, delete):
put foo "bar2"

failure requests (get, put, delete):
put foo "bar3"

# This will only update foo to "bar2" if its current value is "bar"

Conclusion

We’ve covered a lot of ground today – from the basics of etcd as a key-value store, to its crucial role in Kubernetes, and even some advanced features. etcd’s simplicity, reliability, and speed make it an excellent choice for storing critical data in distributed systems.

Remember, whether you’re building a small application or managing a large Kubernetes cluster, understanding etcd can help you build more robust, scalable systems.

Thank you for joining me on this deep dive into etcd. Happy coding, and may your clusters always be in sync!

Mr Cloud Book