gRPC and Protocol Buffers

Posted Oct 2nd, 2025 by author

Peter Aleksander Bizjak

in the category "grpc"

I was caffeinated and wanted to explore how gRPC actually works under-the-hood. You know, the thing that "writers" on Medium and DEV.to never explain. Heck, most of the time it's regurgitation of official documentation for one of the 2 programming languages they know.

What's gRPC and what are Protocol Buffers? Let's break down how this actually works. No fluff, just the real mechanics.

Protobuf Serialization: How It Actually Works

You take a message from your .proto file, call SerializeToString() in C++ or toByteArray() in Java, and boom - compact binary stream. Deterministic, fast, space-efficient. That's the whole point.

The Encoding Scheme

Protobuf serializes messages as key-value pairs in Tag-Length-Value (TLV) format. No delimiters for the whole message - length comes from either the stream end or transport layer (gRPC handles this).

Each field gets encoded as:

Key (Tag): Variable-length integer (varint). Formula: (field_number << 3) | wire_type

field_number: Your field's unique number from the .proto file
wire_type: The least significant 3 bits (values 0-5). Makes small field numbers encode in fewer bytes. Smart.

Value (Payload): Depends on wire type. Six types:

Varint (0): For integers, bools, enums
64-bit fixed (1): For fixed64, sfixed64, double
Length-delimited (2): Strings, bytes, embedded messages, packed repeated fields
Start group (3): Deprecated, don't use
End group (4): Deprecated, don't use
32-bit fixed (5): For fixed32, sfixed32, float

Fields serialize in memory order, not .proto order. Parsers handle any order. Optional fields? Left out if unset. Unknown fields from future schemas? Kept during round-tripping. Solid backward compatibility.

Varint Encoding: The Foundation

Varints encode unsigned 64-bit integers in 1-10 bytes. Core primitive for tags, lengths, values.

How it works:

Split value into 7-bit chunks (least significant first)
Each byte: MSB is continuation flag (1 = more bytes, 0 = last byte)
Remaining 7 bits = data chunk
Little-endian when reassembling

Example: 150 becomes 10010110 → Two bytes: 10110110 (7 bits: 0010110, MSB=1) and 00000001 (7 bits: 0000001, MSB=0) → hex 96 01.

Negative numbers with regular int32/int64? Bloat city. Two's complement means -1 becomes ten 0xFF bytes. Not great.

ZigZag Encoding (Fix for Negatives)

sint32/sint64 use ZigZag to map signed to unsigned efficiently:

Positive n → 2*n (even numbers)
Negative -n → 2*n - 1 (odd numbers)

Formula: (n << 1) ^ (n >> 31) for 32-bit, (n << 1) ^ (n >> 63) for 64-bit.

Then varint-encode the result. -2 becomes ZigZag 3, encodes as single byte 0x03. Beautiful.

Length-Delimited Fields

Varint length L, then exactly L bytes. Used for:

Strings: UTF-8 bytes
Bytes: Raw binary
Embedded messages: Recursively serialized sub-messages

No alignment, no padding. Pure efficiency.

Repeated Fields

Two ways to handle these:

Non-packed: Each element = separate key-value pair. Tags can repeat.
Packed (default in modern protobuf): Single length-delimited field with concatenated values. Way more efficient for primitives.

Example: [3, 270, 86942] as repeated int32 → Tag (wire_type 2), length varint (6 bytes), then varints 03, 8E 02, 9E A7 05 concatenated.

Deserialization: Reading It Back

ParseFromString() reads sequentially. O(n) time, no backtracking.

Process:

Read varint tag → Extract field_number and wire_type
Based on wire_type:
- Varint: Read varint, interpret by field type (ZigZag decode for sint)
- Fixed 32/64: Read 4/8 bytes, little-endian decode
- Length-delimited: Read varint length L, read L bytes. Recurse for sub-messages, unpack for repeated.
Map to field: Set value if known (append for repeated), store raw if unknown
Handle mismatches: Skip or fail on wire type errors, ignore extra fields
Done when stream exhausted

Fast, deterministic, rock solid.

gRPC's Transport Layer: HTTP/2 Framing

gRPC wraps protobuf messages in length-prefixed frames over HTTP/2. This handles multiplexing, bidirectional streaming, flow control. Serialization happens client-side before framing, deserialization server-side after unframing.

HTTP/2 Headers

Request headers (HPACK compressed):

:method: POST
:path: /service/method
content-type: application/grpc+proto (or +json, whatever)
grpc-encoding: identity (or gzip, deflate, snappy if compression enabled)
grpc-accept-encoding: gzip,deflate,identity (client tells server what it supports)
te: trailers (required)
Plus auth, timeouts, custom headers

Response headers: :status: 200, content-type: application/grpc+proto, grpc-encoding if compressed. Trailers at stream end: grpc-status: 0 (0 = OK), grpc-message: error details.

Message Framing: The 5-Byte Prefix

Each protobuf message gets wrapped:

Compressed-Flag (1 byte): 0 = uncompressed, 1 = compressed with grpc-encoding algorithm
Message-Length (4 bytes): Big-endian uint32. Length of message (post-compression if flagged)
Message (Message-Length bytes): The actual serialized protobuf

This 5-byte prefix + message goes into HTTP/2 DATA frames. Multiple messages in streams just concatenate these. HTTP/2 handles fragmentation across frames. Compression applies only to the message payload, not the prefix. Custom compressors? Pluggable.

The Full Flow

Client-side serialization:

Build protobuf message
Serialize to bytes
If compression beneficial: compress → set flag=1, length=compressed size
Add 5-byte prefix
Send as HTTP/2 DATA frames

Server-side deserialization:

Receive HTTP/2 stream, validate headers
Read DATA frames
For each message: read 1-byte flag, 4-byte length (big-endian)
Read exactly length bytes
If flag=1: decompress using header-specified codec
Deserialize via protobuf ParseFromArray()
Process RPC, repeat for streams

Errors (invalid length, compression failures) → grpc-status: 13 (internal error).

This framing gives you reliable message boundaries, supports huge messages (up to ~4GB per frame), leverages HTTP/2's binary efficiency. Clean separation of concerns.

Why HTTP/2? Why Not HTTP/1.x?

HTTP/1.x is trash for modern RPC. Here's why gRPC went HTTP/2:

Head-of-Line Blocking

HTTP/1.x processes requests sequentially on a single connection. One slow response blocks everything behind it. Pipelining makes it worse - TCP-level HOL blocking stalls the entire connection.

gRPC's streaming RPCs with multiple messages? Dead on arrival with HTTP/1.x. Latency and throughput would tank.

Connection Overhead

HTTP/1.x concurrency means opening multiple TCP connections (6-8 per domain typically). Every connection = TCP handshake overhead, socket resources, buffer usage, risk of port exhaustion. gRPC needs long-lived, high-volume connections for cloud environments. HTTP/2's single-connection multiplexing crushes this. Way less latency, way less resource waste.

Inefficient Encoding

HTTP/1.x uses human-readable text for headers and bodies. Bigger payloads, slower parsing. No header compression means repeated headers in every RPC.

HTTP/2's binary framing and HPACK header compression? Perfect for gRPC's binary payloads. Lean and fast.

No Real Streaming Support

HTTP/1.x has zero native support for server pushes or bidirectional streams. You'd need hacks like long-polling or WebSockets (which aren't general-purpose).

gRPC requires robust streaming for real-time apps - chat services, data feeds, whatever. HTTP/2 has this built in.

Historical Context

gRPC evolved from Google's internal Stubby system. Development aligned with HTTP/2's standardization in 2015 (RFC 7540). HTTP/2 was literally engineered to "dramatically increase network efficiency and enable real-time communication." Perfect match for gRPC's goals: scalability, low latency, resiliency at massive scale.

Retrofitting gRPC onto HTTP/1.x would need workarounds - multiple connections, proxies - complicating everything and killing performance. Not happening.

HTTP/2 Features gRPC Actually Uses

Binary Framing Layer

HTTP/2: Binary protocol with frames (HEADERS, DATA, SETTINGS). 9-byte header (length, type, flags, stream ID) + payload. Machine-optimized, no CRLF delimiters.

gRPC usage: Protobuf messages go in HTTP/2 DATA frames with that 5-byte prefix. Large messages span multiple frames, small messages pack into one. Without binary framing, gRPC's high-frequency RPC efficiency dies.

Advantage: Eliminates text parsing overhead and reduces payload size.

Multiplexing and Streams

HTTP/2: Multiple independent streams share one TCP connection. Each stream has unique ID (odd for client-initiated). Streams carry bidirectional messages, interleaving prevents HOL blocking (except TCP packet level, fixed by HTTP/3/QUIC).

gRPC usage: Each RPC = one HTTP/2 stream. Unary RPCs use single stream for request (HEADERS + DATA) and response (HEADERS + DATA + trailers). Streaming RPCs exploit bidirectional streams - clients send multiple DATA frames in client-streaming, servers in server-streaming, both interleave in bidirectional for real-time updates.

gRPC channels multiplex multiple RPCs across streams on one or more connections. Supports thousands of concurrent RPCs without new TCP setups. Critical: gRPC's streaming types depend on this. HTTP/1.x fallback would need one connection per RPC. Doesn't scale.

Header Compression (HPACK)

HTTP/2: HPACK compresses headers using Huffman encoding and dynamic/static tables. Eliminates redundancy (repeated keys like "content-type").

gRPC usage: Metadata (key-value pairs for auth, timeouts) goes in HTTP/2 HEADERS frames (initial) and trailers (end-of-stream for status codes). Compression cuts overhead in metadata-heavy RPCs, especially in microservices with frequent calls.

gRPC mandates content-type: application/grpc+proto and uses pseudo-headers like :method: POST and :path: /service/method.

Essential for low latency. Uncompressed HTTP/1.x headers would bloat everything.

Flow Control

HTTP/2: Window-based control at connection and per-stream levels (initial 64KB). Updated via WINDOW_UPDATE frames to prevent buffer overflows.

gRPC usage: In streaming RPCs, gRPC respects HTTP/2 flow control for backpressure. Pauses message sends if receiver's window is exhausted. Ensures reliable delivery in high-volume streams without overwhelming endpoints.

Critical for bidirectional streaming stability. HTTP/1.x has zero granularity here.

PING Frames and Connection Health

HTTP/2: PING frames (type 0x6) test liveness, bypass flow control, need ACK responses.

gRPC usage: KeepAlive sends periodic PINGs to detect dead connections fast (seconds, not TCP's minutes). No ACK? Close and reconnect. Also prevents proxy timeouts (AWS ELB's 60s idle limit, etc.).

Health checking integrates with load balancers to redirect traffic from unhealthy connections. This enables gRPC's resiliency in long-lived connections. "Always healthy" abstraction depends on this.

Prioritization and Settings

HTTP/2: Streams have priorities (weight 1-256, dependencies) via PRIORITY frames. SETTINGS frames negotiate max concurrent streams, frame sizes, etc.

gRPC usage: Uses SETTINGS to configure max streams and frame sizes. Prioritization less emphasized but can influence RPC scheduling in resource-constrained environments.

Supports fine-tuned performance. Not available in HTTP/1.x.

Wrapping Up

gRPC's serialization and transport stack is tight: protobuf's binary encoding for compact, fast messages, HTTP/2's framing and multiplexing for efficient, scalable transport. No wasted bytes, no wasted connections, no wasted time.

Understanding these mechanics means you can actually optimize and debug gRPC systems instead of cargo-culting configs. That's the difference between using a tool and mastering it.

Now go build something fast.