How should we create knowledge?
We know "The Scientific Method" well: we ask questions then seek answers.
But what about having many answers, and no questions?
(
Paper summary + more)
We know "The Scientific Method" well: we ask questions then seek answers.
But what about having many answers, and no questions?
(

"Here is the evidence, now what is the hypothesis?"
This commentary was written in 2003 by Kell & Oliver, after the Human Genome project was completed.
They saw a proliferation of data in many fields, but also saw resistance on data-driven explorations.
This commentary was written in 2003 by Kell & Oliver, after the Human Genome project was completed.
They saw a proliferation of data in many fields, but also saw resistance on data-driven explorations.
Science was mostly driven by hypotheses.
i.e. we begin by ideas/questions, then look for the data to validate.
Data-driven exploration was sometimes rejected & criticized as âmerely a fishing expeditionââ.
To understand why, we need to visit logic.
i.e. we begin by ideas/questions, then look for the data to validate.
Data-driven exploration was sometimes rejected & criticized as âmerely a fishing expeditionââ.
To understand why, we need to visit logic.
We build knowledge through inference â through cause/effect relations.
There are two (main) types for inference:
- Deductive: it rained (cause) -> grass is wet (effect)
- Inductive: grass is wet + we didn't water -> it rained
Hypothesis is deductive, data-driven is inductive.
There are two (main) types for inference:
- Deductive: it rained (cause) -> grass is wet (effect)
- Inductive: grass is wet + we didn't water -> it rained
Hypothesis is deductive, data-driven is inductive.
There is a reason why scientists prefer deduction:
Induction is 'philosophically insecure', because there could be counter examples that we didn't see.
âThe great tragedy of Science: the slaying of a beautiful hypothesis by an ugly factââT.H. Huxley
Induction is 'philosophically insecure', because there could be counter examples that we didn't see.
âThe great tragedy of Science: the slaying of a beautiful hypothesis by an ugly factââT.H. Huxley
Hume illustrates this with his turkey example:
You see a turkey being fed everyday at 9am.
You assume that everyday, itâs fed at 9am.
Except one day, itâs not fed, itâs slaughtered.
You see a turkey being fed everyday at 9am.
You assume that everyday, itâs fed at 9am.
Except one day, itâs not fed, itâs slaughtered.
This is one caveat of inductive science, it is all laden with assumption.
David Wolpert calls this the No Free Lunch (NFL) theorem.
Itâs now widely used as an anchor when considering models. NFL means you can induce but please acknowledge your assumptions.
David Wolpert calls this the No Free Lunch (NFL) theorem.
Itâs now widely used as an anchor when considering models. NFL means you can induce but please acknowledge your assumptions.
You can hear Wolpert's discussion in @sfiscienceâs recent podcast.
https://complexity.simplecast.com/episodes/45
Also, hereâs an excellent essay on how to understand the No Free Lunch Theorem by @amuellerml
https://peekaboo-vision.blogspot.com/2019/07/dont-cite-no-free-lunch-theorem.html
https://complexity.simplecast.com/episodes/45
Also, hereâs an excellent essay on how to understand the No Free Lunch Theorem by @amuellerml
https://peekaboo-vision.blogspot.com/2019/07/dont-cite-no-free-lunch-theorem.html
Anyways, weâve had fruitful attempts at hypothesis-free science (i.e. inductive):
Examples:
- Epidemiology
â many diseases studied through exploration.
- Astronomy
â Kepler monitored planet orbits, logged the data and then induced a math relationship.
Examples:
- Epidemiology

- Astronomy

We lose a lot by ignoring inductive methods.
Inductive methods give us the freedom to look at the system as a whole and deduce insights we might not have constructed otherwise.
Inductive methods give us the freedom to look at the system as a whole and deduce insights we might not have constructed otherwise.
Example: Molecular vs. Systems Biology on Genes
Molecular biologists would seek gene functions, then find the genes.
Systems biologists would start with all data.
It turned out that the former missed 40% of uncovered genes, because they were geared on functions only.
Molecular biologists would seek gene functions, then find the genes.
Systems biologists would start with all data.
It turned out that the former missed 40% of uncovered genes, because they were geared on functions only.
So much has happened since 2003.
We have computational power that enable us navigate different flavors of big data. We even have the capacity to automate some hypotheses discovery.
Data science evolved and formalized as a proactive arm that unearths insights from big data.
We have computational power that enable us navigate different flavors of big data. We even have the capacity to automate some hypotheses discovery.
Data science evolved and formalized as a proactive arm that unearths insights from big data.
Takeaway message:
We need both hypothesis-driven and data-driven sciences.
They are complementary.
Data-driven science is laden with assumptions,
but it help us view the system as a whole.
[paper: https://cpb-us-w2.wpmucdn.com/sites.gsu.edu/dist/d/2411/files/2017/10/Kell_et_al-2004-BioEssays-1sin81j.pdf]
We need both hypothesis-driven and data-driven sciences.
They are complementary.
Data-driven science is laden with assumptions,
but it help us view the system as a whole.
[paper: https://cpb-us-w2.wpmucdn.com/sites.gsu.edu/dist/d/2411/files/2017/10/Kell_et_al-2004-BioEssays-1sin81j.pdf]