Kicking around different ways of thinking about scientific infrastructure:
*A journal article is a body of structured knowledge.
*It's used by getting the info you need back out of it.
*Let's think about open science practices as improving the API of journal articles.
Fields you can query from PLOS articles:

http://api.plos.org/solr/search-fields/

I was pleasantly surprised you can get e.g. just the methods (I have no opinion on how usable it is, maybe @daoteajing does?)

What else do we want to 'get back' from articles in addition to special paragraphs?
Hypotheses, datasets, analysis specifications, stimuli, etc.?

(Note - this is just coming at some of the big metadata projects @aeryn_thrace @lakens @LisaDeBruine from the other direction, starting with what you want to ask the paper for, instead of a complete map of the paper)
1 problem you run into pretty quickly is internal, relational structure! If each article conveyed exactly one fact/result, we could just keep adding to something like the PLOS API, and all(!) we'd need to argue about is which properties/facts are most important to make available
"Oh wow thanks Melissa academic articles are complex structured webs"

Not a new idea I know. But maybe thinking from the other end helps. If I want to get back 'stimuli' from a paper, what do I expect to get?
A set of files? A verbal description? A link? A link *and* a verbal description? - this is an open question, my point here is just that this is a *design choice*.

So is the choice about how to treat the internal structure of papers. Maybe in actuality papers are usually...
...reasonably few sets of results, so if I ask for 'data' I can be happy just getting back all of the data, and now it's on me to decide how to use it. Or maybe I want to insist on being able to ask for the data *just* from Experiment 2 or whatever. More design choices!
Same story for hypotheses. Do I want to be able to get back just a list of H & results, or do I want to map between them? Do I need to know how hypotheses are contingent on one another, or is an unordered list fine?

The fact that it's *true* that papers have complex arguments...
...makes this hard, I think we all have the intuition that "a pile of statistical tests from this paper" is not enough to characterize/understand it. And that certain things (like a sample size, t test, and p value) hang together.

But a useful API doesn't have to tell me the...
...full structure of a paper, its job is to make clear what information I can get back, and how to ask for it (or how to add to it, if we are feeling fancy...)
I think I like this framework because it's a nice way to get back to what our open sci practices are for. It's not just 'data are available'. Someone wants that data. Who's that someone? (Maybe you in 6 months.) What do they want to do with it? What do they need to do that?
I still think the best first open science approach is 'share everything as openly as possible, as protected as necessary', and you and other humans can work out later what and how it's valuable.

The API idea helps me think about what might give the most bang for buck though.
Anyway, this thread brought to you by, @roger_p_levy sent an undergrad (who I can't find on twitter, hi Ben!) to talk to me, and the words "A journal article should have an API" came out of my mouth, and I wanted to think about how true it was.
You can follow @melissaekline.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: