Thread by @karlstoney, 1/15 Service Mesh: A bit of a thread. I've seen a few [...]

1/15 Service Mesh: A bit of a thread.
I& #39;ve seen a few posts recently questioning the value of tools like @IstioMesh and @LinkedIn, a bunch of people saying they are yet to see a good business use case. So I figured I& #39;d share a little bit about what it looks like at AutoTrader

2/15 Specific Features
Something I like to point out a lot is that the feature set is extremely large, which makes implementation complicated. You can get a huge amount of value from cherry-picking just the parts that are most valuable to your org. For us, they are:

3/15 Language agnostic, Black Box Metrics
We have almost 400 services written in 13 different languages. Within those languages N different versions, and N frameworks (and versions). We have exactly the same metrics for all of them, without needing to touch app code.

4/15 Black Box Tracing
Very much the same as the metrics, enabling traces without needing to instrument or configuration your applications delivers a whole load of value. Particularly useful as we have a broad microservice architecture.

5/15 Consistency enables Operational Tooling
Consistent observability data enables you to easily build platform level tooling which benefits everyone. Automation can collate relevant information and reduce your MTR.

6/15 Chaos Engineering
We encourage our teams to think about faults, the service mesh gives us a simple way of enabling those teams to inject fault into their services, without needing to touch app code.

7/15 Mutual TLS with frequently rotated certificates
No pictures needed here, 400 microservices all transparently using mutual-tls to communicate, again without needing to touch the app code.

8/15 Smarter Traffic Routing
We run our clusters across multiple availability zones. By utilising locality aware routing rather than the round-robin kube-proxy we save about $40k/year in zonal egress costs, we improve latency, and reduce the impact of issues within a single zone

9/15 Consistent Fault Detection
We use @EnvoyProxy Outlier Detection to quickly detect and evict unhealthy endpoints. As this is done at the platform level, we benefit from consistent alerting and monitoring when this happens.

10/15 Running Cost
Operationally, I& #39;ve covered this over on https://twitter.com/karlstoney/status/1288454799537704960,">https://twitter.com/karlstone... however from a people perspective - we have about 8 people working on our platform and some 200 devs deploying applications on it.

https://twitter.com/karlstoney/status/1288454799537704960

11/15 Complexity
This is a biggie, Kubernetes & Istio & everything else is complex. We took the decision to abstract product teams from that and build a PaaS style platform on top, the level of consistency this has given us has been key.

12/15 You& #39;re probably thinking "That& #39;s lovely, but what did it help you achieve" - and you& #39;re right to ask! There& #39;s no point in investing all this effort unless we& #39;re driving the business forward too, right?

13/15 Getting Products Live
Product teams don& #39;t have to worry about many of the CFRs any more, as they get them for free. Getting an application in front of customers is end to end automated. Within literally minutes we can have code in front of customers in production.

14/15 Cloud Migration
2 years ago we were almost entirely in Physical Data Centres on VMs, we are currently about 90% in the cloud on Kubernetes. All of this black box observability accelerated that migration, removed so much fear of the unknown when moving services.

15/15 Summary
I do believe that attempting to get all of these capabilities via other means would have required a lot more effort than the transparent proxy/mesh approach. There is complexity you should be conscious of, but the gains can be great - so shouldn& #39;t put you off.

Latest Threads Unrolled: