I was caffeinated and wanted to explore how gRPC actually works under-the-hood. You know, the thing that "writers" on Medium and DEV.to never explain. Heck, most of the time it's regurgitation of official documentation for one of the 2 programming languages they know.
What's gRPC and what are Protocol Buffers? Let's break down how this actually works. No fluff, just the real mechanics.
Protobuf Serialization: How It Actually Works
You take a message from your .proto
file, call SerializeToString()
in C++ or toByteArray()
in Java, and boom - compact binary stream. Deterministic, fast, space-efficient. That's the whole point.
The Encoding Scheme
Protobuf serializes messages as key-value pairs in Tag-Length-Value (TLV) format. No delimiters for the whole message - length comes from either the stream end or transport layer (gRPC handles this).
Each field gets encoded as:
Key (Tag): Variable-length integer (varint). Formula: (field_number << 3) | wire_type
field_number
: Your field's unique number from the.proto
filewire_type
: The least significant 3 bits (values 0-5). Makes small field numbers encode in fewer bytes. Smart.
Value (Payload): Depends on wire type. Six types:
- Varint (0): For integers, bools, enums
- 64-bit fixed (1): For fixed64, sfixed64, double
- Length-delimited (2): Strings, bytes, embedded messages, packed repeated fields
- Start group (3): Deprecated, don't use
- End group (4): Deprecated, don't use
- 32-bit fixed (5): For fixed32, sfixed32, float
Fields serialize in memory order, not .proto
order. Parsers handle any order. Optional fields? Left out if unset. Unknown fields from future schemas? Kept during round-tripping. Solid backward compatibility.
Varint Encoding: The Foundation
Varints encode unsigned 64-bit integers in 1-10 bytes. Core primitive for tags, lengths, values.
How it works:
- Split value into 7-bit chunks (least significant first)
- Each byte: MSB is continuation flag (1 = more bytes, 0 = last byte)
- Remaining 7 bits = data chunk
- Little-endian when reassembling
Example: 150 becomes 10010110
→ Two bytes: 10110110
(7 bits: 0010110, MSB=1) and 00000001
(7 bits: 0000001, MSB=0) → hex 96 01
.
Negative numbers with regular int32/int64? Bloat city. Two's complement means -1 becomes ten 0xFF
bytes. Not great.
ZigZag Encoding (Fix for Negatives)
sint32/sint64 use ZigZag to map signed to unsigned efficiently:
- Positive n → 2*n (even numbers)
- Negative -n → 2*n - 1 (odd numbers)
Formula: (n << 1) ^ (n >> 31)
for 32-bit, (n << 1) ^ (n >> 63)
for 64-bit.
Then varint-encode the result. -2 becomes ZigZag 3, encodes as single byte 0x03
. Beautiful.
Length-Delimited Fields
Varint length L, then exactly L bytes. Used for:
- Strings: UTF-8 bytes
- Bytes: Raw binary
- Embedded messages: Recursively serialized sub-messages
No alignment, no padding. Pure efficiency.
Repeated Fields
Two ways to handle these:
- Non-packed: Each element = separate key-value pair. Tags can repeat.
- Packed (default in modern protobuf): Single length-delimited field with concatenated values. Way more efficient for primitives.
Example: [3, 270, 86942]
as repeated int32 → Tag (wire_type 2), length varint (6 bytes), then varints 03
, 8E 02
, 9E A7 05
concatenated.
Deserialization: Reading It Back
ParseFromString()
reads sequentially. O(n) time, no backtracking.
Process:
- Read varint tag → Extract field_number and wire_type
- Based on wire_type:
- Varint: Read varint, interpret by field type (ZigZag decode for sint)
- Fixed 32/64: Read 4/8 bytes, little-endian decode
- Length-delimited: Read varint length L, read L bytes. Recurse for sub-messages, unpack for repeated.
- Map to field: Set value if known (append for repeated), store raw if unknown
- Handle mismatches: Skip or fail on wire type errors, ignore extra fields
- Done when stream exhausted
Fast, deterministic, rock solid.
gRPC's Transport Layer: HTTP/2 Framing
gRPC wraps protobuf messages in length-prefixed frames over HTTP/2. This handles multiplexing, bidirectional streaming, flow control. Serialization happens client-side before framing, deserialization server-side after unframing.
HTTP/2 Headers
Request headers (HPACK compressed):
:method: POST
:path: /service/method
content-type: application/grpc+proto
(or +json, whatever)grpc-encoding: identity
(or gzip, deflate, snappy if compression enabled)grpc-accept-encoding: gzip,deflate,identity
(client tells server what it supports)te: trailers
(required)- Plus auth, timeouts, custom headers
Response headers: :status: 200
, content-type: application/grpc+proto
, grpc-encoding
if compressed. Trailers at stream end: grpc-status: 0
(0 = OK), grpc-message: error details
.
Message Framing: The 5-Byte Prefix
Each protobuf message gets wrapped:
- Compressed-Flag (1 byte): 0 = uncompressed, 1 = compressed with grpc-encoding algorithm
- Message-Length (4 bytes): Big-endian uint32. Length of message (post-compression if flagged)
- Message (Message-Length bytes): The actual serialized protobuf
This 5-byte prefix + message goes into HTTP/2 DATA frames. Multiple messages in streams just concatenate these. HTTP/2 handles fragmentation across frames. Compression applies only to the message payload, not the prefix. Custom compressors? Pluggable.
The Full Flow
Client-side serialization:
- Build protobuf message
- Serialize to bytes
- If compression beneficial: compress → set flag=1, length=compressed size
- Add 5-byte prefix
- Send as HTTP/2 DATA frames
Server-side deserialization:
- Receive HTTP/2 stream, validate headers
- Read DATA frames
- For each message: read 1-byte flag, 4-byte length (big-endian)
- Read exactly
length
bytes - If flag=1: decompress using header-specified codec
- Deserialize via protobuf
ParseFromArray()
- Process RPC, repeat for streams
Errors (invalid length, compression failures) → grpc-status: 13
(internal error).
This framing gives you reliable message boundaries, supports huge messages (up to ~4GB per frame), leverages HTTP/2's binary efficiency. Clean separation of concerns.
Why HTTP/2? Why Not HTTP/1.x?
HTTP/1.x is trash for modern RPC. Here's why gRPC went HTTP/2:
Head-of-Line Blocking
HTTP/1.x processes requests sequentially on a single connection. One slow response blocks everything behind it. Pipelining makes it worse - TCP-level HOL blocking stalls the entire connection.
gRPC's streaming RPCs with multiple messages? Dead on arrival with HTTP/1.x. Latency and throughput would tank.
Connection Overhead
HTTP/1.x concurrency means opening multiple TCP connections (6-8 per domain typically). Every connection = TCP handshake overhead, socket resources, buffer usage, risk of port exhaustion. gRPC needs long-lived, high-volume connections for cloud environments. HTTP/2's single-connection multiplexing crushes this. Way less latency, way less resource waste.
Inefficient Encoding
HTTP/1.x uses human-readable text for headers and bodies. Bigger payloads, slower parsing. No header compression means repeated headers in every RPC.
HTTP/2's binary framing and HPACK header compression? Perfect for gRPC's binary payloads. Lean and fast.
No Real Streaming Support
HTTP/1.x has zero native support for server pushes or bidirectional streams. You'd need hacks like long-polling or WebSockets (which aren't general-purpose).
gRPC requires robust streaming for real-time apps - chat services, data feeds, whatever. HTTP/2 has this built in.
Historical Context
gRPC evolved from Google's internal Stubby system. Development aligned with HTTP/2's standardization in 2015 (RFC 7540). HTTP/2 was literally engineered to "dramatically increase network efficiency and enable real-time communication." Perfect match for gRPC's goals: scalability, low latency, resiliency at massive scale.
Retrofitting gRPC onto HTTP/1.x would need workarounds - multiple connections, proxies - complicating everything and killing performance. Not happening.
HTTP/2 Features gRPC Actually Uses
Binary Framing Layer
HTTP/2: Binary protocol with frames (HEADERS, DATA, SETTINGS). 9-byte header (length, type, flags, stream ID) + payload. Machine-optimized, no CRLF delimiters.
gRPC usage: Protobuf messages go in HTTP/2 DATA frames with that 5-byte prefix. Large messages span multiple frames, small messages pack into one. Without binary framing, gRPC's high-frequency RPC efficiency dies.
Advantage: Eliminates text parsing overhead and reduces payload size.
Multiplexing and Streams
HTTP/2: Multiple independent streams share one TCP connection. Each stream has unique ID (odd for client-initiated). Streams carry bidirectional messages, interleaving prevents HOL blocking (except TCP packet level, fixed by HTTP/3/QUIC).
gRPC usage: Each RPC = one HTTP/2 stream. Unary RPCs use single stream for request (HEADERS + DATA) and response (HEADERS + DATA + trailers). Streaming RPCs exploit bidirectional streams - clients send multiple DATA frames in client-streaming, servers in server-streaming, both interleave in bidirectional for real-time updates.
gRPC channels multiplex multiple RPCs across streams on one or more connections. Supports thousands of concurrent RPCs without new TCP setups. Critical: gRPC's streaming types depend on this. HTTP/1.x fallback would need one connection per RPC. Doesn't scale.
Header Compression (HPACK)
HTTP/2: HPACK compresses headers using Huffman encoding and dynamic/static tables. Eliminates redundancy (repeated keys like "content-type").
gRPC usage: Metadata (key-value pairs for auth, timeouts) goes in HTTP/2 HEADERS frames (initial) and trailers (end-of-stream for status codes). Compression cuts overhead in metadata-heavy RPCs, especially in microservices with frequent calls.
gRPC mandates content-type: application/grpc+proto
and uses pseudo-headers like :method: POST
and :path: /service/method
.
Essential for low latency. Uncompressed HTTP/1.x headers would bloat everything.
Flow Control
HTTP/2: Window-based control at connection and per-stream levels (initial 64KB). Updated via WINDOW_UPDATE frames to prevent buffer overflows.
gRPC usage: In streaming RPCs, gRPC respects HTTP/2 flow control for backpressure. Pauses message sends if receiver's window is exhausted. Ensures reliable delivery in high-volume streams without overwhelming endpoints.
Critical for bidirectional streaming stability. HTTP/1.x has zero granularity here.
PING Frames and Connection Health
HTTP/2: PING frames (type 0x6) test liveness, bypass flow control, need ACK responses.
gRPC usage: KeepAlive sends periodic PINGs to detect dead connections fast (seconds, not TCP's minutes). No ACK? Close and reconnect. Also prevents proxy timeouts (AWS ELB's 60s idle limit, etc.).
Health checking integrates with load balancers to redirect traffic from unhealthy connections. This enables gRPC's resiliency in long-lived connections. "Always healthy" abstraction depends on this.
Prioritization and Settings
HTTP/2: Streams have priorities (weight 1-256, dependencies) via PRIORITY frames. SETTINGS frames negotiate max concurrent streams, frame sizes, etc.
gRPC usage: Uses SETTINGS to configure max streams and frame sizes. Prioritization less emphasized but can influence RPC scheduling in resource-constrained environments.
Supports fine-tuned performance. Not available in HTTP/1.x.
Wrapping Up
gRPC's serialization and transport stack is tight: protobuf's binary encoding for compact, fast messages, HTTP/2's framing and multiplexing for efficient, scalable transport. No wasted bytes, no wasted connections, no wasted time.
Understanding these mechanics means you can actually optimize and debug gRPC systems instead of cargo-culting configs. That's the difference between using a tool and mastering it.
Now go build something fast.