God knows I'm sick of churning out COVID-19 data myself, so going to post a bit of a thread that hopefully might help other people to at least interpret it themselves without me sticking loads of words over it.
This will feature some general principles useful to interpreting data. Most of them are just that - general principles. They will not always apply, are non-exhaustive, and the relevant things you should bear in mind will vary according to the question you're trying to answer.
First: when examining data the first most important thing to understand is (a rough outline sketch of) the data generating process - the way in which observed reality brings this data into being. For COVID-19 I think there are a few important general outline points here.
Data is necessarily a simplification of reality. When you count things you miss things and you flatten things. There is in general terms no single correct definition of things. The definition you use will depend upon the availability of data & your question. More on this later.
Broadly speaking, what is the data generating process at work? There is a lag between events. Epidemics are dynamic processes at the individual level and at the population level. Broadly speaking, infections precede cases which precede hospitalisations which precede deaths.
Just thinking about this alone is useful as a starting point: it will help to contextualise what you are seeing in view of what has happened and what might happen. So if hospitalisations rise today, infections probably rose over a week ago, and deaths will probably rise in future
At each stage here there is some imprecision both in the counting process (we miscount some numbers) and in aggregating (we aggregate things that aren't the same). Lots of really annoying arguments revolve around not accepting generality of this point. Again more on this later.
At the individual level, someone gets infected, and may later become a case (if they test positive) and - hopefully not - and/or appear in hospitalisation stats (if they get admitted to hospital) and/or appear in death statistics (if they die).
At the population level, someone becomes infected, then a few days may be able to infect other people, who themselves a few days later may be able to infect other people. Together these two things create the population level trends in infections/cases/hospitalisations/deaths.
Now we've thought about these events - infections, cases, hospitalisations, deaths - let's think about general principles for how best to interpret them. Since this is a dynamic process, we're not interested just in how many of each there are, but when they happened.
So again, just by thinking through the process, we stumble across another useful general principle: not only do we want counts, but we want the most useful dates possible attached to these: date of infection, date of case diagnosis, date of hospitalisation, date of death.
Now let's think about each type of event. Infections are obviously the most difficult one at a base level to assess: they are determined by random chance, can only ever be estimated, and in large part much less subject to our chosen definitions.
Because we cannot count every single infection that happens, estimates are generally done through random sampling. The ONS produces a weekly report that tests a random sample, attempts to estimate the population prevalence, and from this daily infections. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/latest
Because this is done through sampling and modelling, estimates are subject to uncertainty, which researchers try to quantify. Confidence intervals and credible intervals are horrible to explain, but IMO fair to call them "what the researchers think are plausible ranges".
Cases are people testing positive. Again it's useful to think about the process here: to test positive, someone has to come forward to say they're ill, be accepted for a test, get a test, then test positive. Tests are not perfect, and have false positives and false negatives.
Just thinking this through will again be useful to answering lots of questions - e.g. why are cases now leading to less deaths than March?

* people didn't know about COVID in March
* to get a test you had to have a rare characteristic: eg recent travel to Wuhan, be very very ill
Compared to this, until a few weeks ago tests were very very easy to get: (most people could) go on the govt website, phone 119, and get a home kit / appointment at a testing centre. So lots of less sick people getting tests means lots of cases which means a smaller % dying.
So it's pretty daft to look at this diagram and think it means we're at the situation in late March again: it is nothing like the situation in mid-March, because the thing we call a case is in general not really the same thing as it was.
Cases are the first thing we want to think about with regard to date of reporting: by date of reporting in England, the increase looks a lot more consistent than it is by date of specimen (when the person was tested).
Another point here is that if we use date of specimen we probably want to think about a lag: tomorrow's data for 18 September will probably look different to today's, as more data come in. After 5 days (until recently) this is generally complete.
There is another point here: for the last few weeks there have been problems with cases. We don't really know what this means precisely, but we do know that interpreting cases is very difficult in this light.
Hospitalisations: there are generally two statistics here. Hospitalisations represent someone newly in hospital with a diagnosis of COVID-19 that day. Patients in hospital are an aggregate of these so far, minus discharges and deaths. This is generally done by event date.
What is important here? The absolute thing we want to avoid is an overload of hospital capacity, so for this question numbers in hospital is most important. But if we want to know about the recent state of the pandemic, numbers newly admitted is probably more informative.
Deaths - these are probably the statistics subject to the most dispute. This is because there is a question - "how many people died due to COVID-19?" that has no one single correct answer. There are lots of ways in which this is defined as a result, and lots of unseemly rows.
How can we do this?

(1) Anyone who ever tested positive - this is what the govt did until fairly recently. Back in March, this was fine as far as it goes: if you die shortly after a COVID-19 positive test, it probably contributed to your death. But over time this is less useful.
(2) Anyone who tested positive in the last 28(/60) days. This is useful to get a current state of the pandemic, as it does not include people who tested positive months ago and who died of something else, but would exclude people with long-term problems who died partly due to it.
(3) Anyone who died with COVID-19 on their death certificate. In these cases a doctor has certified that they believe that COVID-19 at least contributed to the patient's death. This is generally a very good estimate of numbers who died due to COVID-19.
(4) Anyone who died with COVID-19 as the underlying cause of death. In these cases, COVID-19 did not just contribute to death but as certified by a doctor, was (roughly speaking) the primary recent cause of it.
In all of (1)-(4) if you are interested in looking at trends in deaths you should be looking at the date of death, not the date of report, but remember to account for the lag. Peak deaths by date of death were April 8, but reported numbers kept rising until a week or two later.
A favourite trick of people who want to mislead on this point is to say "nobody is dying" when there is a low number on Sunday/Monday or "you're all dead next week!" when there is a high number when these reported numbers catch-up post weekend on Tuesday.
(5) Excess deaths. This gives an estimate of the number of people who died over and above the number who would have been expected to die - usually compared to a five year average, but sometimes adjusted for age / frailty profiles.
In general, differences here cannot be isolated to COVID-19, but to COVID-19 and everything that has happened around it. The choice of baseline is non-trivial and can substantially determine conclusions.
I think that is all I want to say as a rough starting point. I think it is fair and I hope it helps people interpret data for themselves, and shows what to look out for if you are trying to examine what's going on.
(There are lots and lots of other things that could be said here - my list of ways in which each type of data is a simplification of reality is itself a simplification of the ways in which data is a simplification of reality. It is non-exhaustive but I think covers key points.)
I've probably missed things but please god don't @ me with your attempt to convince me that deaths are zero or that everyone is dead.
I don't have a Patreon or a Soundcloud or even a charity to plug but I do have one hope: that people try to pause, bear e.g. these things in mind, and don't just RT the scariest thing you see, or the thing most insisting that COVID-19 is a hoax, or whatever.
You can follow @danielhowdon.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: