I know this is obvious, but you can't find the errors in your data and analysis if you don't actively look for them.
Step 2: Once you are nice and comfy with scripting, try to write nicer scripts.
- Scripts should be human-readable, not just machine readable.
- Use annotations to explain yourself and the code.
- Make the machine-readable bits as human-readable as possible...
...even if that means writing more lines of code (gasp!)
- Use a consistent style guide. http://adv-r.had.co.nz/Style.html 
- Remember, a good script is a love letter to yourself in 6 months time. It needs to be *written*. Clear. Concise. Complete.
Step 3: "Look" at everything.
- Visualize the variables you use. Plot raw values, summaries, and distributions.
- Plot how variables relate to one another.
- Plots dates and times to see if they are ordered as you would expect.
- Calculate differences between things (including dates and times) and look at those.
- If you transform something, plot it against the original.
- Don't just plot though - think about what the data *should* look like.
Step 4: Generate finished outputs, not just numbers.
- You need to try to eliminate (or greatly reduce) the amount of cutting and pasting results, plots, tables into other documents, like Word.
- Use "literate programming" tools like R Markdown to generate (and re-regnerate) reports that include your "final" plots, tables, and text. https://rmarkdown.rstudio.com/ 
- So when the data changes (and the data *will* change) you just re-run the report. No re-cutting-and-pasting. Fewer chances for mistakes.
Step 6: Aspire to never repeat yourself (NRY).
- Every new keystroke is an opportunity to make an error.
- When you find yourself writing the same block of code over and over, write a little program (function) that does it with a few key strokes.
Step 7: Write tests into your script
- This is the step I am trying to learn more about.
- It is often true that you computer will happily give you the "wrong" results if you give it the wrong inputs.
- So build tests into your code to check that those inputs are correct.
- A simple example: You have 100 patients in your study. That means your dataset probably has 100 rows. So insert a bit of code that checks that this is true before running a model with those data.
- Or every time your code doesn't work and you eventually figure out why, write a little test to check for the underlying problem first.
- This is something that is often used by programmers (I am not one of those), but I think the ideas translate very nicely to analysis scripts. https://r-pkgs.org/tests.html 
Final points:
- If you are paid to analyze data, this is what professionalism looks like.
- If the work you do matters, take it seriously. Good science requires meticulous technique, not just genius ideas.
- You can't learn all of this at once. But you can learn it, no doubt.
*This thread was brought to you by mild procrastination
You can follow @statsepi.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: