Thread by @melissaekline, Kicking around different ways of thinking about scientific infrastructure: *A journal article [...]

Kicking around different ways of thinking about scientific infrastructure:
*A journal article is a body of structured knowledge.
*It& #39;s used by getting the info you need back out of it.
*Let& #39;s think about open science practices as improving the API of journal articles.

Fields you can query from PLOS articles:

http://api.plos.org/solr/search-fields/

I">https://api.plos.org/solr/sear... was pleasantly surprised you can get e.g. just the methods (I have no opinion on how usable it is, maybe @daoteajing does?)

What else do we want to & #39;get back& #39; from articles in addition to special paragraphs?

Hypotheses, datasets, analysis specifications, stimuli, etc.?

(Note - this is just coming at some of the big metadata projects @aeryn_thrace @lakens @LisaDeBruine from the other direction, starting with what you want to ask the paper for, instead of a complete map of the paper)

1 problem you run into pretty quickly is internal, relational structure! If each article conveyed exactly one fact/result, we could just keep adding to something like the PLOS API, and all(!) we& #39;d need to argue about is which properties/facts are most important to make available

"Oh wow thanks Melissa academic articles are complex structured webs"

Not a new idea I know. But maybe thinking from the other end helps. If I want to get back & #39;stimuli& #39; from a paper, what do I expect to get?

A set of files? A verbal description? A link? A link *and* a verbal description? - this is an open question, my point here is just that this is a *design choice*.

So is the choice about how to treat the internal structure of papers. Maybe in actuality papers are usually...

...reasonably few sets of results, so if I ask for & #39;data& #39; I can be happy just getting back all of the data, and now it& #39;s on me to decide how to use it. Or maybe I want to insist on being able to ask for the data *just* from Experiment 2 or whatever. More design choices!

Same story for hypotheses. Do I want to be able to get back just a list of H & results, or do I want to map between them? Do I need to know how hypotheses are contingent on one another, or is an unordered list fine?

The fact that it& #39;s *true* that papers have complex arguments...

...makes this hard, I think we all have the intuition that "a pile of statistical tests from this paper" is not enough to characterize/understand it. And that certain things (like a sample size, t test, and p value) hang together.

But a useful API doesn& #39;t have to tell me the...

...full structure of a paper, its job is to make clear what information I can get back, and how to ask for it (or how to add to it, if we are feeling fancy...)

I think I like this framework because it& #39;s a nice way to get back to what our open sci practices are for. It& #39;s not just & #39;data are available& #39;. Someone wants that data. Who& #39;s that someone? (Maybe you in 6 months.) What do they want to do with it? What do they need to do that?

I still think the best first open science approach is & #39;share everything as openly as possible, as protected as necessary& #39;, and you and other humans can work out later what and how it& #39;s valuable.

The API idea helps me think about what might give the most bang for buck though.

Anyway, this thread brought to you by, @roger_p_levy sent an undergrad (who I can& #39;t find on twitter, hi Ben!) to talk to me, and the words "A journal article should have an API" came out of my mouth, and I wanted to think about how true it was.

Latest Threads Unrolled: