Recently, I learned of a research study with #MachineLearning that is incredible!

It’s from the field of materials science but it can be widely applied to rapidly accelerate new discoveries.

I will (try) to summarize this incredible work below (THREAD - 1/n)
Some very clever materials science researchers asked the following question: Can we use #MachineLearning to predict new thermoelectric materials not yet discovered by combing through very large swaths of the existing scientific literature?

2/n
Here’s what they did:

The used an existing AI algorithrm known as “word embeddings” to train the system around thermoelectric materials.

“Word embeddings” is a technique that assigns numerical values to words.

3/n
While complex, this numeric assignment to words allows the algorithm to apply “math” to words. This helps it learn & understand language.

For example, this allows the algorithm to learn that the word “apple” is closer to the word “orange” than the word “steel” for example.

4/n
In this study, the algorithm is learning the scientific language as it relates to thermoelectric materials.
It uses an algorithm known as Word2Vec (originally published by @Google in 2013).

5/n
The researchers trained it by having it comb through 3.3 million material science abstracts from 1922 – 2018. (That’s a lot of abstracts!)

The researchers cleverly chose to use abstracts only given their widespread availability.

What did they find?

6/n
When they ran the algorithm through the abstracts, It learned properties pertaining to thermoelectric materials.

Based on it’s learning, generated a large list of new materials that have not had any mention or discovery to date that may be suitable material candidates.

7/n
These new materials had no mention in the existing literature but the algorithm “ discovered” them as potential candidates!

Basically, it discovered potential items form 1922 - 2018 that humans missed.

What’s even more impressive is what they did next.

8/n
To test their algorithm further, they than combed the literature prior to 2009 only to see what the algorithm would predict as potential materials for the next 10 years (ie from 2009 - 2019).

They compared the algorithm output to actual discoveries from 2009 - 2019

9/n
Incredibly, it predicted many of the materials that were actually and eventually discovered! It predicted some of the best materials from a thermoelectric perspective and it would have done so well before their actual discoveries!

10/n
This is mind blowing!

Basically, this method can comb large amounts of data to make discoveries that humans fail to “see” at a rate much faster than we can. Think of what this can mean for so many different applications?

11/n
Medical literature can be searched for new discoveries (therapies, drug targets, etc.), Genetic databases can be scanned for so many different potential approaches to diseases & molecular pathways leading to new therapies.

12/n
Environmental solutions for our planet can benefit by making discoveries not appreciated by humans to date.

Heck, with all the COVID19 research that has come out in the last several months, who knows what an algorithm like this can teach us if we apply it.

13/n
You can follow @NeilMaha.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: