Thread by @el_bhs, 0/ This is a thread about why tracing will gradually replace most [...]

0/ This is a thread about why tracing will gradually replace most logging, at least where distributed or cloud-native architectures are concerned. And we’re going to explore this through the lens of a relational data model.

It’s going to be fun!

Thread:

https://abs.twimg.com/emoji/v2/... draggable="false" alt="👇" title="Rückhand Zeigefinger nach unten" aria-label="Emoji: Rückhand Zeigefinger nach unten">

0/ This is a thread about why tracing will gradually replace most logging, at least where distributed or cloud-native architectures are concerned. And we’re going to explore this through the lens of a relational data model.It’s going to be fun!Thread: https://abs.twimg.com/emoji/v2/... draggable=

" title="0/ This is a thread about why tracing will gradually replace most logging, at least where distributed or cloud-native architectures are concerned. And we’re going to explore this through the lens of a relational data model.It’s going to be fun!Thread: https://abs.twimg.com/emoji/v2/... draggable="false" alt="👇" title="Rückhand Zeigefinger nach unten" aria-label="Emoji: Rückhand Zeigefinger nach unten">" class="img-responsive" style="max-width:100%;"/>

1/ The best logging is always *structured* logging. That is, logging statements are most useful if they encode key:value pairs which can then be queried and *analyzed* in the aggregate.

(Even for plain, textual logs, NLP and stats can extract basic structure.)

2/ A structured log implicitly defines a *relational table*, with the keys for each attribute defining the columns, and the values for each log line defining rows in this (theoretical) table.

Like this:

3/ And, naturally, there are a number of implicit columns in our table as well. Things like host, timestamp, etc:

4/ Now, to be clear, we’re talking about the “abstract idea” of relational tables here, and not actually inserting every log line into mysql or similar – that would be a disaster at scale. :)

Just think of each line of logging instrumentation as a “table schema.”

5/ Once we realize this, we can write queries with *most* SQL niceties (WHERE filters, GROUP BY aggregations, etc).

But what about “JOIN”? How does *that* work in logging systems? The long answer won’t fit here.

The short answer? “Poorly.” Bummer. :-/

6/ Why is it a bummer? Well, because when we’re instrumenting a microservice, by definition *we only have access to data from that microservice!*

What about version numbers of peer services? Or request customer_ids? Or downstream feature flags? Surely those could be relevant…

7/ But relevant or not, that data lives *in other services.* Which means it’s not there to log. What’s an eng to do??

Faced with this conundrum, engineers stuck with logs will inevitably/sadly hack something together rather than address the underlying structural issue. (

https://abs.twimg.com/emoji/v2/... draggable="false" alt="😭" title="Laut schreiendes Gesicht" aria-label="Emoji: Laut schreiendes Gesicht">)

8/ E.g., have you ever seen a customer_id painstakingly propagated across function and *process* boundaries just so someone can add it to instrumentation?

That’s an error-prone *and* expensive way of implementing log JOINs via app code (rather than automatically via tracing).

9/ When we implement JOIN manually in this way, we are taking on *literally the hardest part of distributed tracing instrumentation* (namely, “context propagation”) and trying to manage it via one-off hacks. It doesn’t end well. (TL;DR “use @opentelemetry instead”)

10/ So again, “that’s wasteful.” And ineffective.

The right way to solve this problem is to leverage distributed tracing to perform a much (much) more powerful JOIN.

Let’s imagine that your system looks like this:

11/ Now, when a truly modern observability solution “assembles a trace,” it’s *really* executing a JOIN across the entire *distributed* transaction, and thus populating a wider and more powerful table: one with columns from every Span that participates in the trace.

Like this:

12/ Now, when people think about tracing, they tend to think about this giant table “one trace (or row) at a time.”

Imagine restricting a logging system to display only one log-line at a time. This is just as bad… perhaps worse. And yet it passes for “tracing.” :-/

13/ It’s really only in the past few years that observability technology has developed to the point that these massive, *distributed*, tables can be hydrated both dynamically and in real-time.

14/ And all of that data engineering is worth it! Because when the relational tables are as wide as your distributed system is deep, amazing things are possible – and I don’t see how logging will ever be able to catch up.

PS/ For example applications of these sorts of dynamic, relational tables, see any of the following (or play with http://lightstep.com/sandbox )

https://lightstep.com/sandbox&q... href=" https://twitter.com/el_bhs/status/1364282343196827650
https://twitter.com/el_bhs/st... href=" https://twitter.com/el_bhs/status/1227358990968877056
https://twitter.com/el_bhs/st... href=" https://lightstep.com/blog/announcing-lightsteps-change-intelligence/">https://lightstep.com/blog/anno...

Latest Threads Unrolled: