Interesting paper: a guide to reading machine learning articles in @JAMA_current: https://jamanetwork.com/journals/jama/article-abstract/2754798
ICYI,">https://jamanetwork.com/journals/... have a couple of thoughts to share about this paper
1/n
ICYI,">https://jamanetwork.com/journals/... have a couple of thoughts to share about this paper
1/n
TL;DR: overall, I think this article is a quite useful beginners guide
To start, I like the explanation of the terminology and concepts. Nice use of text boxes, imo
2/n
To start, I like the explanation of the terminology and concepts. Nice use of text boxes, imo
2/n
I particularly like the attention to calibration in addition to discrimination performance, and attention to importance of continued testing and updating of algorithms; algorithms are indeed high maintenance, a single “validation study” generally won’t do
3/n
3/n
Also like the reference to the @TRIPODStatement reporting guideline. Reading (and peer review) of machine learning articles would be so much more pleasant if people would just stick to the reporting guidelines
https://www.equator-network.org/reporting-guidelines/tripod-statement/
4/n">https://www.equator-network.org/reporting...
https://www.equator-network.org/reporting-guidelines/tripod-statement/
4/n">https://www.equator-network.org/reporting...
ICYI: the TRIPOD guideline will be updated soon for more specific guidance on reporting of machine learning algorithms
https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(19)30037-6/fulltext
5/n">https://www.thelancet.com/journals/...
https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(19)30037-6/fulltext
5/n">https://www.thelancet.com/journals/...
I’ll choose to ignore the fact that logistic regression is called a “simpler machine learning system” :)
6/n
6/n
There is an accompanying editorial with the article that points out a few interesting things
https://jamanetwork.com/journals/jama/article-abstract/2754776
7/n">https://jamanetwork.com/journals/...
https://jamanetwork.com/journals/jama/article-abstract/2754776
7/n">https://jamanetwork.com/journals/...
As the editorial also points out, the article focusses almost completely on deep learning of medical images. Surprising? It is arguably the most promising area of machine learning applications in medicine, but certainly not the only one
8/n
8/n
Attention to medical imaging and deep learning may have something to do with the authors interests (all affiliated to Google Health), but a bit broader perspective would have been useful for a beginners guide to machine learning, imho
9/n
9/n
The section “How Much Data Are Required for Recent Machine Learning Methods?’ is the part of the article that I don’t particularly like
10/n
10/n
Again a reference to the 5 to 10 events per variable rule for logistic regression. Ugh. More info and references here: https://discourse.datamethods.org/t/reference-collection-to-push-back-against-common-statistical-myths/1787
11/n">https://discourse.datamethods.org/t/referen...
11/n">https://discourse.datamethods.org/t/referen...
While I agree with the authors arguing that deep learning algorithms with millions of parameters probably do not need 10s of millions of events to become useful, the suggestion that regularization will save the day seems a bit optimistic
12/n
12/n
Regularization may in fact not save the day when you need it most: https://arxiv.org/abs/1907.11493
13/n">https://arxiv.org/abs/1907....
13/n">https://arxiv.org/abs/1907....
To the benefit of the authors, the article does state that tens of thousands of images may be required for a deep learning algorithm to do well!
But do we really know how much data we need for developing and validating reliable complex algorithms? Probably not
14/n
But do we really know how much data we need for developing and validating reliable complex algorithms? Probably not
14/n
Thanks to the people that pointed me to this article today, including @boback and @ADAlthousePhD
/end
/end