I/O Multiplexing and Event Loops: How Redis Handles Thousands of Clients

The Problem: How Do You Handle 10,000 Concurrent Clients?

Imagine you’re running a web server. 10,000 clients connect simultaneously, each wanting to store or retrieve data.

Naive Approach 1: One Thread Per Client

1
Client 1 → Thread 1
2
Client 2 → Thread 2
3
Client 3 → Thread 3
4
...
5
Client 10,000 → Thread 10,000

Problems:

Creating 10,000 threads consumes massive memory (each thread needs a stack)
Context switching between 10,000 threads is expensive
If a thread is blocked waiting for I/O, that CPU core is idle
Making code thread-safe is complex (locks, mutexes, race conditions)

Result: Your server becomes a resource hog and slows to a crawl.

Naive Approach 2: Process Per Client

Even worse than threads. Processes have even more overhead.

What We Actually Need:

One thread (or one process) that efficiently manages all 10,000 clients without blocking. This is where I/O multiplexing comes in.

Part 1: The Fundamental Problem With Blocking I/O

Blocking System Calls

Most I/O operations are blocking:

1
// This blocks until data arrives
2
int bytes_read = read(file_descriptor, buffer, 1024);
3

4
// This blocks until the socket is ready to accept data
5
write(socket_fd, data, length);

When a thread calls read(), it waits there until data arrives. Meanwhile, other clients are also waiting. If you have one thread per client, 9,999 threads are blocked, and only 1 is doing work.

The Flow of Data

Let’s trace what happens when data arrives from the network:

1
┌──────────────┐
2
│ Network Card │  ← Data arrives from client
3
└───────┬──────┘
4
        │
5
        ▼
6
┌──────────────────────────┐
7
│ Kernel Buffer            │  ← Kernel stores incoming data here
8
│ (part of OS memory)      │
9
└───────┬──────────────────┘
10
        │
11
        │ (Interrupt: "Hey, data is here!")
12
        │
13
        ▼
14
┌──────────────────────────┐
15
│ User Space (Your app)    │  ← You read data here when ready
16
└──────────────────────────┘

The key insight: The kernel knows when data has arrived (it stored it in the kernel buffer). Your application can ask: “Kernel, which of my file descriptors have data ready?”

This is what I/O multiplexing does.

Part 2: What Is I/O Multiplexing?

The Core Idea

Instead of blocking on one file descriptor, you ask the OS: “Tell me which of these 10,000 file descriptors are ready for I/O.”

The OS efficiently monitors all of them and tells you which ones have data.

Process:

1
1. Register 10,000 file descriptors with the OS
2
2. Ask OS: "Which ones are ready?"
3
   (This call blocks until at least one is ready)
4
3. OS tells you: "Descriptors 5, 42, and 1337 are ready"
5
4. You read from those 3 descriptors
6
5. Go back to step 2

This single-threaded loop can handle 10,000 clients because it never blocks on any individual client. It asks the OS to efficiently check all of them.

File Descriptors: Everything Is a File

In Unix-like systems, everything is a file:

Disk files → File descriptor
Network sockets → File descriptor
Pipes → File descriptor
Device files → File descriptor

A file descriptor is a small integer (32-bit or 64-bit) that uniquely identifies an open file/socket.

Example:

1
int fd = open("myfile.txt", O_RDONLY);  // Returns file descriptor (e.g., 3)
2
int socket_fd = socket(AF_INET, SOCK_STREAM, 0);  // Returns file descriptor (e.g., 4)

File descriptor 0 = stdin, 1 = stdout, 2 = stderr. Other descriptors are assigned sequentially.

The OS keeps a table mapping file descriptors to actual files/sockets. This is how it tracks everything.

Part 3: How I/O Multiplexing Works (The Three Approaches)

Different operating systems provide different system calls for I/O multiplexing. All work on the same principle, but with slightly different APIs.

Approach 1: epoll (Linux)

epoll stands for “event poll.” It’s the most efficient on Linux.

Step 1: Create an epoll instance

1
int epoll_fd = epoll_create1(0);

This creates an epoll object that will monitor file descriptors.

Step 2: Register file descriptors

1
struct epoll_event event;
2
event.events = EPOLLIN;  // Monitor for incoming data
3
event.data.fd = client_socket;
4

5
epoll_ctl(epoll_fd, EPOLL_CTL_ADD, client_socket, &event);

You tell epoll: “Monitor this socket for incoming data (EPOLLIN).”

You do this for every client that connects.

Step 3: Wait for events

1
struct epoll_event events[10];  // Space for 10 events
2

3
int num_ready = epoll_wait(epoll_fd, events, 10, -1);

This is a blocking call. The OS puts the process to sleep. When any of the monitored file descriptors becomes ready, the OS wakes up the process.

num_ready tells you how many file descriptors are ready.

Step 4: Process ready events

1
for (int i = 0; i < num_ready; i++) {
2
    int ready_fd = events[i].data.fd;
3

4
    if (events[i].events & EPOLLIN) {
5
        // This file descriptor has incoming data, read it
6
        read_data(ready_fd);
7
    }
8
}

The Loop (Event Loop Pseudocode):

1
while (1) {
2
    // Block until some file descriptors are ready
3
    int num_ready = epoll_wait(epoll_fd, events, 10, -1);
4

5
    // Process all ready file descriptors
6
    for (int i = 0; i < num_ready; i++) {
7
        int ready_fd = events[i].data.fd;
8

9
        if (events[i].events & EPOLLIN) {
10
            // Read data and process command
11
            handle_client(ready_fd);
12
        }
13
    }
14

15
    // Loop back and wait for next batch of ready events
16
}

This single loop handles all clients!

Approach 2: kqueue (macOS, BSD)

kqueue is BSD’s version. Works similarly but with different API.

1
int kq = kqueue();
2

3
struct kevent changes[1];
4
struct kevent events[10];
5

6
EV_SET(&changes[0], client_socket, EVFILT_READ, EV_ADD, 0, 0, NULL);
7

8
kevent(kq, changes, 1, events, 10, NULL);

Conceptually identical to epoll, just different function names and struct names. Both monitor file descriptors and report ready ones.

Approach 3: IOCP (Windows)

Windows uses IOCP (I/O Completion Ports). The API is different but the principle is the same: monitor multiple file descriptors efficiently.

Why Three Different Implementations?

Each OS has different underlying I/O mechanisms:

Linux: Best with epoll (highly scalable, supports millions of file descriptors)
macOS/BSD: Best with kqueue (elegant, flexible)
Windows: Best with IOCP (integrates with Windows’ asynchronous I/O model)

A portable library (like libuv, used by Node.js and Redis) abstracts over all three.

Note

Why this matters: Understanding that epoll/kqueue/IOCP are just thin wrappers around OS mechanisms helps you understand that I/O multiplexing isn’t magic. The OS already knows which sockets have data ready. These system calls just expose that information to your application.

Part 4: The Data Flow In Detail

Let’s trace the complete flow when a client sends data.

Step 1: Network Card Receives Data

1
Network → Network Card → Kernel Buffer (OS memory)

The network card receives packets and DMA (direct memory access) them into a kernel buffer. The OS allocates memory, and the hardware writes directly to it—no CPU involved yet.

Step 2: Interrupt Handler

1
Network Card → Interrupt
2
             ↓
3
         CPU (Interrupt Handler)
4
             ↓
5
         "Data arrived for socket 5"

The network card raises an interrupt. The CPU stops what it’s doing and runs an interrupt handler (part of the OS kernel).

The interrupt handler updates internal OS structures: “Socket descriptor 5 now has data ready.”

Step 3: Unblock epoll_wait

1
epoll_wait(epoll_fd, events, 10, -1);
2
      ↓
3
[Process was sleeping]
4
      ↓
5
[Interrupt fired, OS updated socket state]
6
      ↓
7
[OS wakes up process and returns]

If your process was blocked in epoll_wait(), the OS wakes it up and returns the list of ready file descriptors.

Step 4: Application Reads Data

1
num_ready = epoll_wait(epoll_fd, events, 10, -1);  // Returns 1
2
int ready_fd = events[0].data.fd;  // Socket 5
3

4
char buffer[1024];
5
int bytes = read(ready_fd, buffer, 1024);  // Read from kernel buffer into user space

The read() call copies data from the kernel buffer to your application’s user space memory.

Step 5: Process Command

1
// Parse the command
2
Command cmd = parse_redis_command(buffer);
3

4
// Execute (e.g., SET key value)
5
redis_set(cmd.key, cmd.value);
6

7
// Write response back to socket
8
write(ready_fd, response, response_len);

Part 5: Why This Is Efficient

Why Not Block on Individual File Descriptors?

If you blocked on individual file descriptors:

1
// Bad: What if client 1 is slow?
2
for (int i = 0; i < 10000; i++) {
3
    read(client_sockets[i], buffer, 1024);  // Blocks if client 1 has no data
4
    // Can't check client 2-9999 until client 1 sends something
5
}

You’d check clients serially. If client 1 has no data, you’d wait forever.

Why I/O Multiplexing Is Better

1
// Good: Check all at once, process ready ones
2
epoll_wait(...);  // Returns only ready file descriptors
3
// Now process only the ones with data
4
for (int i = 0; i < num_ready; i++) {
5
    read(ready_fds[i], buffer, 1024);  // Never blocks, data is ready
6
}

You check all 10,000 simultaneously. The OS handles the work of monitoring. You only read from sockets that have data, so read() never blocks.

CPU Efficiency

With threads:

10,000 threads created = massive memory
Most threads blocked = CPU cores idle
Context switching between threads = CPU cycles wasted

With I/O multiplexing:

1 thread = minimal memory
Thread never blocks = CPU always busy (or sleeping efficiently)
No context switching = all CPU for actual work
CPU sleeps when no I/O ready = power efficient

Part 6: Event Loops in Practice

An event loop is simply the infinite loop that uses I/O multiplexing.

Generic Event Loop Pseudocode

1
while (true) {
2
    // 1. Wait for I/O (blocking call)
3
    ready_fds = wait_for_io(epoll_fd);
4

5
    // 2. Process all ready file descriptors
6
    for (fd in ready_fds) {
7
        if (fd == server_socket) {
8
            // New client connecting
9
            new_client = accept(fd);
10
            register_with_epoll(new_client);
11
        } else {
12
            // Existing client has data
13
            read_and_process(fd);
14
        }
15
    }
16

17
    // 3. Go back to step 1
18
}

This is the heart of Redis, Node.js, Python’s asyncio, and every high-performance server.

How Redis Uses epoll

Redis uses libuv (a C library that abstracts over epoll/kqueue/IOCP) to implement its event loop:

1
Redis Server Start
2
    ↓
3
Create epoll (via libuv)
4
    ↓
5
Register server socket with epoll
6
    ↓
7
Enter event loop:
8
    ├─ Wait for client connections or client data (epoll_wait)
9
    ├─ When ready, accept new connections or read client commands
10
    ├─ Execute Redis commands (SET, GET, ZADD, etc.)
11
    ├─ Write responses back to clients
12
    └─ Loop back to wait

Because Redis processes all events in one loop, it’s single-threaded and efficient.

Part 7: Event Loop vs Threading Model

Threading Model

1
┌─────────────────────────────────────────────────────┐
2
│ Thread 1: Client A (blocked waiting for I/O)       │
3
│ Thread 2: Client B (blocked waiting for I/O)       │
4
│ Thread 3: Client C (blocked waiting for I/O)       │
5
│ ...                                                 │
6
│ Thread 10000: Client J (blocked waiting for I/O)  │
7
└─────────────────────────────────────────────────────┘

Problem: Most threads are blocked, CPU cores are idle.

Event Loop Model

1
┌──────────────────────────────────────────┐
2
│ Single Thread: Event Loop                │
3
│                                          │
4
│ while (true) {                           │
5
│   ready = epoll_wait(...)                │
6
│   for (fd in ready) {                    │
7
│     process(fd)                          │
8
│   }                                      │
9
│ }                                        │
10
└──────────────────────────────────────────┘
11

12
All 10,000 clients handled by one thread that's never blocked.

Part 8: Common Misconceptions

Misconception 1: “Event loops are faster because they’re single-threaded”

Reality: Event loops are faster because they don’t block. They’re single-threaded, but that’s not the speed advantage—it’s the non-blocking nature.

A multi-threaded server can be just as fast if implemented correctly. But it’s more complex (locks, synchronization).

Misconception 2: “If one command takes 10 seconds, all clients are stuck”

True: If you execute a command that takes 10 seconds, the event loop is blocked, and all other clients wait.

But for a well-designed server (like Redis), commands are fast (microseconds to milliseconds). The 10-second problem is rare.

Solution: Use worker threads or background processing for long operations.

Misconception 3: “epoll/kqueue can monitor an unlimited number of file descriptors”

Reality: There’s a limit. Linux 5.5+ can handle ~1 million file descriptors per process. That’s plenty for most servers, but not unlimited.

Also, your system has limits:

ulimit -n  # Show max file descriptors (often 1024 or 65536)

Misconception 4: “I/O multiplexing only works for network sockets”

Reality: It works for any file descriptor: disks, pipes, devices, etc.

However, epoll on Linux has nuances: it works best with sockets and pipes (truly asynchronous). Disk I/O might not use epoll effectively (better to use thread pools for disk).

Note

Critical: I/O multiplexing is not a magic solution. It’s a specific tool for a specific problem: handling many concurrent I/O operations with one thread. If your workload is CPU-bound (lots of computation), I/O multiplexing won’t help. You need parallelism (threads or processes) for CPU-bound work.

Part 9: Practical Example: A Tiny Redis-Like Server

Let’s write pseudocode for a minimal server using epoll:

1
#include <sys/epoll.h>
2
#include <unistd.h>
3

4
int main() {
5
    // 1. Create server socket
6
    int server_socket = socket(AF_INET, SOCK_STREAM, 0);
7
    bind(server_socket, ...);
8
    listen(server_socket, 128);
9

10
    // 2. Create epoll
11
    int epoll_fd = epoll_create1(0);
12

13
    // 3. Register server socket
14
    struct epoll_event event;
15
    event.events = EPOLLIN;
16
    event.data.fd = server_socket;
17
    epoll_ctl(epoll_fd, EPOLL_CTL_ADD, server_socket, &event);
18

19
    // 4. Event loop
20
    struct epoll_event events[10];
21

22
    while (1) {
23
        // Wait for events (blocks until something happens)
24
        int num_ready = epoll_wait(epoll_fd, events, 10, -1);
25

26
        for (int i = 0; i < num_ready; i++) {
27
            int ready_fd = events[i].data.fd;
28

29
            if (ready_fd == server_socket) {
30
                // New client connection
31
                int client_socket = accept(server_socket, NULL, NULL);
32

33
                // Register client socket
34
                struct epoll_event client_event;
35
                client_event.events = EPOLLIN;
36
                client_event.data.fd = client_socket;
37
                epoll_ctl(epoll_fd, EPOLL_CTL_ADD, client_socket, &client_event);
38

39
            } else {
40
                // Existing client has data
41
                char buffer[1024];
42
                int bytes = read(ready_fd, buffer, 1024);
43

44
                if (bytes > 0) {
45
                    // Parse command (e.g., "SET key value")
46
                    char *response = process_command(buffer);
47
                    write(ready_fd, response, strlen(response));
48
                } else {
49
                    // Client disconnected
50
                    close(ready_fd);
51
                    epoll_ctl(epoll_fd, EPOLL_CTL_DEL, ready_fd, NULL);
52
                }
53
            }
54
        }
55
    }
56

57
    return 0;
58
}

This tiny server:

Handles multiple clients
Never blocks on individual clients
Uses one thread
Efficiently multiplexes I/O via epoll

This is the essence of how Redis works (with more features, of course).

Part 10: Scaling Limits

Why Does epoll Scale So Well?

epoll is O(1) for checking if a file descriptor is ready. It doesn’t check all 10,000 descriptors sequentially. Instead, it uses an event-based mechanism (interrupt handlers update data structures). You get only the ready ones.

Compare to select() (older I/O multiplexing):

1
// select() is O(n) - checks all descriptors every time
2
fd_set rfds;
3
for (int i = 0; i < 10000; i++) {
4
    FD_SET(i, &rfds);
5
}
6
select(10000, &rfds, NULL, NULL, NULL);  // Slow: checks all 10,000

vs epoll:

1
// epoll is O(1) - only returns ready ones
2
epoll_wait(epoll_fd, events, 10, -1);  // Fast: only returns ready ones

This is why epoll can handle 100,000+ concurrent connections, while select() struggles with 1,000.

When Does I/O Multiplexing Stop Working?

When you have 10 million concurrent connections and need to process command in microseconds, even epoll might hit limits:

OS memory for epoll structures
Bandwidth (network I/O capacity)
DNS queries, database latency (external I/O)

At this scale, you need:

Multiple epoll instances (multiple processes or threads, each with its own epoll)
Load balancing
Horizontal scaling

Conclusion: The Elegance of I/O Multiplexing

I/O multiplexing is simple yet powerful:

Ask the OS: “Which file descriptors are ready?”
Wait: (OS blocks you)
OS Responds: “Here are the ready ones”
Process: Read from ready descriptors
Loop: Go back to step 1

This simple loop:

Handles 100,000+ concurrent clients
Uses minimal resources
Requires no locks or synchronization
Sleeps efficiently when no I/O is ready

It’s why:

Redis is single-threaded and fast
Node.js can handle thousands of concurrent requests
Python’s asyncio works
High-performance servers use event loops

Understanding I/O multiplexing unlocks a whole new way of thinking about concurrency. It’s not threads or processes—it’s event-driven programming at its finest.