I saw this tweet by @hltbra and spent some time thinking about what he said of SRE practices and small teams, so here are some musings about it. 1/ https://twitter.com/hltbra/status/1147850375124398080
Think of an SLO as your bearing. It points to the direction you want to go, such as "we want this service to answer 99.9% of all queries under 100ms". This points to the direction you are heading. After that, start to collect metrics. 2/
Your metrics will show if you following your desired bearing or not. If you deviate from it, take action to correct the course *and* to prevent you from veering off course again. 3/
Error rate is too high? Find out why and fix the cause. Latency increased? Investigate what make your service slower. Your metrics should tell you the actual course you are going and you must keep going along the planned course. 4/
One important point: your SLO is your bearing, but not your destination. You will never arrive there, because everything is always in motion. There is no destination, no arrival. You are sailing all the time. 5/
IMHO, this is not affected or limited by your team size. Size must only be taken into account when you measure how much time you spend on toil (fixing problems, handling tickets) and how much time you spend improving your tools, automation, etc. 6/
This also mean that you will not be able to manage a lot of services that steer out of course most of the time with a small team, but you may be able to handle many mostly-well-behaved systems, specially if you have automated your work. 7/
As with any other change, you should try this even on a small team, but do it incrementally. Choose one or two services, apply the SRE principles to the way you manage it, let the team get used to them before moving to the next target. 8/
@hltbra I hope you find this useful. Ping me if you want to talk more about it. 9/9