Thread by @MaartenvSmeden, Interesting paper: a guide to reading machine learning articles in @JAMA_current: https://jamanetwork.com/journals/jama/article-abstract/2754798ICYI, [...]

Interesting paper: a guide to reading machine learning articles in @JAMA_current: https://jamanetwork.com/journals/jama/article-abstract/2754798

ICYI,">https://jamanetwork.com/journals/... have a couple of thoughts to share about this paper

1/n

Interesting paper: a guide to reading machine learning articles in @JAMA_current: https://jamanetwork.com/journals/... have a couple of thoughts to share about this paper1/n

TL;DR: overall, I think this article is a quite useful beginners guide

To start, I like the explanation of the terminology and concepts. Nice use of text boxes, imo

2/n

I particularly like the attention to calibration in addition to discrimination performance, and attention to importance of continued testing and updating of algorithms; algorithms are indeed high maintenance, a single “validation study” generally won’t do

3/n

Also like the reference to the @TRIPODStatement reporting guideline. Reading (and peer review) of machine learning articles would be so much more pleasant if people would just stick to the reporting guidelines

https://www.equator-network.org/reporting-guidelines/tripod-statement/

4/n">https://www.equator-network.org/reporting...

ICYI: the TRIPOD guideline will be updated soon for more specific guidance on reporting of machine learning algorithms

https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(19)30037-6/fulltext

5/n">https://www.thelancet.com/journals/...

Reporting of artificial intelligence prediction models

Data-driven technologies that form the basis of the digital health-care revolution provide potentially important opportunities to deliver improvements in individual care and to advance innovation in...

https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(19)30037-6/fulltext

I’ll choose to ignore the fact that logistic regression is called a “simpler machine learning system” :)

6/n

There is an accompanying editorial with the article that points out a few interesting things

https://jamanetwork.com/journals/jama/article-abstract/2754776

7/n">https://jamanetwork.com/journals/...

Evaluating Machine Learning Articles

In this issue of JAMA, Liu and colleagues1 provide a users’ guide to reading clinical machine learning articles. Beyond a synopsis of selected concepts in modern machine learning, the authors...

https://jamanetwork.com/journals/jama/article-abstract/2754776

As the editorial also points out, the article focusses almost completely on deep learning of medical images. Surprising? It is arguably the most promising area of machine learning applications in medicine, but certainly not the only one

8/n

Attention to medical imaging and deep learning may have something to do with the authors interests (all affiliated to Google Health), but a bit broader perspective would have been useful for a beginners guide to machine learning, imho

9/n

The section “How Much Data Are Required for Recent Machine Learning Methods?’ is the part of the article that I don’t particularly like

10/n

Again a reference to the 5 to 10 events per variable rule for logistic regression. Ugh. More info and references here: https://discourse.datamethods.org/t/reference-collection-to-push-back-against-common-statistical-myths/1787

11/n">https://discourse.datamethods.org/t/referen...

Reference Collection to push back against "Common Statistical Myths"

Note: This topic is a wiki, meaning that this main body of the topic can be edited by others. Use the Reply button only to post questions or comments about material contained in the body, or to...

https://discourse.datamethods.org/t/reference-collection-to-push-back-against-common-statistical-myths/1787

While I agree with the authors arguing that deep learning algorithms with millions of parameters probably do not need 10s of millions of events to become useful, the suggestion that regularization will save the day seems a bit optimistic

12/n

Regularization may in fact not save the day when you need it most: https://arxiv.org/abs/1907.11493

13/n">https://arxiv.org/abs/1907....

To the benefit of the authors, the article does state that tens of thousands of images may be required for a deep learning algorithm to do well!

But do we really know how much data we need for developing and validating reliable complex algorithms? Probably not

14/n

Thanks to the people that pointed me to this article today, including @boback and @ADAlthousePhD

/end

Latest Threads Unrolled: