Skip to main content

Command Palette

Search for a command to run...

Building a Hybrid Logical Clock (HLC) from Scratch in Go

Causality-aware timestamping, bounded uncertainty, and production-ready APIs for distributed systems in Go.

Published
6 min read
Building a Hybrid Logical Clock (HLC) from Scratch in Go

Distributed systems force engineers to confront an uncomfortable fact: time is unreliable. On a single machine, time.Now() feels good enough. Across regions, networks, and failure modes, it quickly becomes a source of subtle, expensive bugs.​

This post walks through building a Hybrid Logical Clock (HLC) in Go, why it exists, and how systems like Spanner and CockroachDB use similar ideas to order events safely at scale.​


Why wall‑clock time is not enough

Consider two services handling writes:

  • Service A’s clock: 10:00:00.100

  • Service B’s clock: 09:59:59.900

If B processes a write after A, but its clock is behind, its timestamp will appear earlier than A’s. Any “last write wins” heuristic based purely on wall-clock time can now choose the wrong value and silently lose data.​

This pattern shows up everywhere:

  • Multi-region databases and caches

  • Event-driven systems and logs

  • Replication, CDC, and conflict resolution

The core problem: we want a consistent ordering of events, but our clocks do not agree.​


Logical clocks: ordering without time

Lamport clocks address ordering by attaching a monotonically increasing counter to each event.​

  • Each local event increments a counter.

  • Each message carries the sender’s counter.

  • On receive, the node sets its counter to max(local, received) + 1.

This guarantees: if event A causally happens before event B, then Lamport(A) < Lamport(B). However, Lamport clocks intentionally ignore real time; the value has no direct relationship to wall-clock timestamps.​

For many systems, you need both:

  • A notion of causality.

  • A timestamp that roughly tracks real time for observability, snapshots, and user-facing semantics.​


Physical clocks: real time with sharp edges

Physical clocks give you:

  • Real-world alignment (2026-01-04T21:45:00Z).

  • Integration with existing monitoring, logs, and SLAs.

But they also come with well-known issues:

  • Drift between machines.

  • NTP corrections that jump clocks backward or forward.

  • Lack of strict guarantees about monotonicity across hosts.​

On their own, physical clocks cannot reliably express “happened-before” relationships in a distributed system.​


Hybrid Logical Clocks in practice

Hybrid Logical Clocks combine:

  • Physical time (wall clock).

  • Logical counter (to break ties and enforce monotonicity).

The high-level idea:

  • Prefer the physical clock when it is safe.

  • Use the logical component to preserve ordering when clocks disagree or collide.​

Databases like CockroachDB maintain an HLC timestamp for each node and attach it to every transaction. This allows the system to:​

  • Order transactions consistently.

  • Stay close to real time for observability and snapshot reads.

  • Tolerate moderate clock skew and NTP behavior.​


Extending HLC with bounded uncertainty

Google Spanner’s TrueTime API returns an interval [earliest,latest], representing the range in which the current time may lie. The width of this interval is the uncertainty bound and is derived from how precisely the system can synchronize clocks (GPS + atomic clocks in Spanner’s case).​

This uncertainty lets the system say:
“I know the real time is somewhere in this interval, and I will make decisions that remain correct for any time inside it.”​

Inspired by this, the HLC in this project adds an uncertainty window around the physical component:

gotype Timestamp struct {
    Physical    int64  // wall-clock time in milliseconds
    Logical     uint16 // logical counter for concurrency
    Uncertainty int64  // ± uncertainty window in ms
}
  • Physical: Best estimate of real time.

  • Logical: Breaks ties and ensures monotonicity.

  • Uncertainty: Captures “how wrong” the physical time might be.​

This additional field allows safer comparisons in the presence of clock skew and variable network delays.


Designing the Go clock API

The package exposes a Clock type configured with a maximum tolerated clock drift:

goclock := hlc.New(hlc.Config{
    MaxClockDriftMillis: 5,
})

Recommended usage:

  • Create one clock per process and keep it for the process lifetime.

  • Generate timestamps on the server side only.

  • Propagate HLC timestamps across RPC boundaries.

Generating timestamps

gots := clock.Now()

Now() enforces:

  • Monotonicity: time never goes backward from the perspective of this clock.

  • Logical increments when multiple events share the same physical time or when the local clock appears to move backward.

  • Bounded uncertainty: the uncertainty field never exceeds configured limits.​

Conceptually:

  • Read local wall clock in milliseconds.

  • Compare it to the clock’s last seen physical time.

  • If the wall clock is ahead, move the physical time forward and reset/logically adjust.

  • If it is behind or equal, bump the logical counter to maintain ordering.​


Merging remote time

Distributed systems constantly exchange timestamps: through messages, replication streams, logs, or RPC metadata. To incorporate remote information, the clock exposes:​

goclock.Update(remoteTimestamp, rttMillis)

When a message with remoteTimestamp arrives:

  • The local clock computes an effective remote uncertainty as remote.Uncertainty + rttMillis/2.

  • The local uncertainty becomes the maximum of the local and this propagated remote uncertainty.

  • The physical and logical components are updated using the same HLC rules, but with the merged maximum of local and remote physical times.​

This ensures:

  • Causal ordering across services that exchange timestamps.

  • Uncertainty reflects both local drift and network-induced ambiguity.

  • No “backward” movement of the effective clock state.​


Comparing timestamps safely

A critical primitive for practical systems is:

gofunc DefinitelyAfter(a, b Timestamp) bool

This function answers: “Is a guaranteed to have occurred after b, given their uncertainty intervals?”​

Intuitively:

  • Each timestamp defines an interval [physical−uncertainty,physical+uncertainty].

  • DefinitelyAfter(a, b) returns true only if a’s earliest possible time is strictly greater than b’s latest possible time.​

This is essential for:

  • Conflict resolution: only declare “last writer wins” when the newer write is definitely later.

  • Replicated state machines: avoiding reordering that would violate causal dependencies.

  • Snapshot semantics: safely evaluating “all events definitely before T.”​

If intervals overlap, the system must treat the ordering as ambiguous and apply a more conservative strategy (e.g., additional coordination, retries, or multi-version reads).


When (and when not) to use HLC

HLC is a causality and ordering mechanism, not a general-purpose replacement for NTP or a business-facing timestamp.​

Good uses:

  • Distributed caches that need consistent eviction or write ordering.

  • Event sourcing or CQRS systems that require stable event ordering across nodes.

  • Multi-region write paths where naive time.Now() breaks conflict resolution.

  • Systems that compare candidate (user) vs proctor (system) timelines and must reason about skew.

Avoid:

  • Treating HLC as the source of truth for user-visible timestamps.

  • Allowing clients to generate or mutate HLC timestamps.

  • Using it as a database timestamp for compliance or reporting.

Instead:

  • Generate HLC timestamps on trusted backend services.

  • Persist them along with domain data for ordering, snapshotting, or reconciliation.

  • Expose user-facing times from sanitized, business-appropriate sources.​


What building this taught

Implementing HLC with bounded uncertainty in Go reinforces several lessons that many large-scale systems embody.​

  • Time is a coordination problem: clocks alone are not enough; you need protocols that move information about time between nodes.

  • Ordering matters more than raw timestamps: many invariants depend on “what definitely happened before what,” not on the literal milliseconds.

  • Clients should never be time authorities: trust lives on the server side, where clocks, configuration, and invariants are controlled.

  • Uncertainty is a feature: explicitly modeling “how wrong we could be” leads to safer APIs than pretending time is exact.​

The HLC described here provides a practical building block to bring these ideas into Go services without specialized hardware, while staying conceptually aligned with systems like Spanner and CockroachDB.​


Where this goes next

There are several natural extensions to explore:

  • Propagating HLC timestamps over HTTP and gRPC as part of request metadata.

  • Using HLC in persistence layers for snapshot reads and conflict resolution.

  • Comparing this approach with Spanner’s TrueTime in terms of guarantees, hardware requirements, and operational complexity.​

The full implementation, tests, and examples live in the repository:

distributed-systems-journal (Hybrid Logical Clock package in Go).

Distributed Systems Journal

Part 2 of 2

A deep-dive series on designing and implementing distributed systems primitives from first principles. Each article focuses on production-grade behavior under scale, churn, and concurrency, with concrete code, benchmarks, and explicit trade-offs.

Start from the beginning

Designing a Production-Ready Consistent Hash Ring

Consistent hashing beyond toy examples