Interesting paper: a guide to reading machine learning articles in @JAMA_current: https://jamanetwork.com/journals/jama/article-abstract/2754798

ICYI, have a couple of thoughts to share about this paper

1/n
TL;DR: overall, I think this article is a quite useful beginners guide

To start, I like the explanation of the terminology and concepts. Nice use of text boxes, imo

2/n
I particularly like the attention to calibration in addition to discrimination performance, and attention to importance of continued testing and updating of algorithms; algorithms are indeed high maintenance, a single “validation study” generally won’t do

3/n
Also like the reference to the @TRIPODStatement reporting guideline. Reading (and peer review) of machine learning articles would be so much more pleasant if people would just stick to the reporting guidelines

https://www.equator-network.org/reporting-guidelines/tripod-statement/

4/n
I’ll choose to ignore the fact that logistic regression is called a “simpler machine learning system” :)

6/n
As the editorial also points out, the article focusses almost completely on deep learning of medical images. Surprising? It is arguably the most promising area of machine learning applications in medicine, but certainly not the only one

8/n
Attention to medical imaging and deep learning may have something to do with the authors interests (all affiliated to Google Health), but a bit broader perspective would have been useful for a beginners guide to machine learning, imho

9/n
The section “How Much Data Are Required for Recent Machine Learning Methods?’ is the part of the article that I don’t particularly like

10/n
While I agree with the authors arguing that deep learning algorithms with millions of parameters probably do not need 10s of millions of events to become useful, the suggestion that regularization will save the day seems a bit optimistic

12/n
Regularization may in fact not save the day when you need it most: https://arxiv.org/abs/1907.11493 

13/n
To the benefit of the authors, the article does state that tens of thousands of images may be required for a deep learning algorithm to do well!

But do we really know how much data we need for developing and validating reliable complex algorithms? Probably not

14/n
Thanks to the people that pointed me to this article today, including @boback and @ADAlthousePhD

/end
You can follow @MaartenvSmeden.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: