So I have been thinking about this a lot, especially with the French Covid stats, and wanted to investigate if something is happening with the testing pattern. Here are the results of my investigations. https://twitter.com/Sime0nStylites/status/1248952606241611777
First, there is official data available (but it has limitations, as described https://www.data.gouv.fr/fr/datasets/donnees-relatives-aux-tests-de-depistage-de-covid-19-realises-en-laboratoire-de-ville/#_)
Also bear in mind that the data starts only 1m ago, so statistically not great. Still, what does the data say?
Logically, the number of tests can increase because there are more persons to test with the same severity of symptoms (pandemic increasing) or because we are testing more systematically, people with less severe symptoms.
In the first case it would suggest the % positive is roughly unchanged, in the second I would suggest it goes down. What is actually happening?
What I see here is a lot of noise, but not a trend suggesting more widespread testing. Also, there doesn’t seem to be a big difference in males (blue) and females (green). So more investigation needed.
So what about the number of tests? The raw data is here. It doesn’t take a genius to notice a pattern: almost no testing on Sundays.
This has an impact on the seasonality of cases, but not as significant. (because i use cases and tests with notification dates, there is no lag)
So back to my earlier question, I want to try and understand how is the testing pattern biasing cases reported. So, I looked at all the relevant variables (imo): number of cases, hospitalizations, ICU, tests, %positive tests and %cases reported via testing.
And I did principal component analysis on those times series to try and identify a pattern. It is clear from the variance scree plot that two factors are enough to explain what’s happening.
So what are those two factors? Here is the variable plot which shows how influenced by these two factors are the various variables I’m studying.
We can see three important things.
The first factor is obviously the “size” of the epidemy. Number of tests, of positives and of cases are strongly correlated. (But the causation is hard to assess from this, see later)
The second is that hospitalization and ICU (“reanimation”) are not hat much correlated to this factor ; i suspect this is because of a time lag, but it is hard to have a view on this because the data is not long enough!
The third observation is the most important one imo: it shows that the share of positive tests (“tauxpos”) is decreasing when the epidemy is growing. Or, to put it another way, the number of tests is increasing also because the testing pattern changes.
The more tests we make, the less positive they are, i.e. France is testing less severe cases, which suggests that my estimates of R are too high (good news!)
My next goal is to take this into account for my R estimates!
That's all folks!
That's all folks!