I know this is obvious, but you can't find the errors in your data and analysis if you don't actively look for them.
A progression:
Step 1: Script everything. No point and click, ever. This is by far the most important step. I like R and @rstudio. You can do it. https://rstudio.cloud/learn/primers || https://education.rstudio.com/learn/beginner/
Step 1: Script everything. No point and click, ever. This is by far the most important step. I like R and @rstudio. You can do it. https://rstudio.cloud/learn/primers || https://education.rstudio.com/learn/beginner/
Step 2: Once you are nice and comfy with scripting, try to write nicer scripts.
- Scripts should be human-readable, not just machine readable.
- Use annotations to explain yourself and the code.
- Make the machine-readable bits as human-readable as possible...
- Scripts should be human-readable, not just machine readable.
- Use annotations to explain yourself and the code.
- Make the machine-readable bits as human-readable as possible...
...even if that means writing more lines of code (gasp!)
- Use a consistent style guide. http://adv-r.had.co.nz/Style.html
- Remember, a good script is a love letter to yourself in 6 months time. It needs to be *written*. Clear. Concise. Complete.
- Use a consistent style guide. http://adv-r.had.co.nz/Style.html
- Remember, a good script is a love letter to yourself in 6 months time. It needs to be *written*. Clear. Concise. Complete.
Step 3: "Look" at everything.
- Visualize the variables you use. Plot raw values, summaries, and distributions.
- Plot how variables relate to one another.
- Plots dates and times to see if they are ordered as you would expect.
- Visualize the variables you use. Plot raw values, summaries, and distributions.
- Plot how variables relate to one another.
- Plots dates and times to see if they are ordered as you would expect.
- Calculate differences between things (including dates and times) and look at those.
- If you transform something, plot it against the original.
- Don't just plot though - think about what the data *should* look like.
- If you transform something, plot it against the original.
- Don't just plot though - think about what the data *should* look like.
Step 4: Generate finished outputs, not just numbers.
- You need to try to eliminate (or greatly reduce) the amount of cutting and pasting results, plots, tables into other documents, like Word.
- You need to try to eliminate (or greatly reduce) the amount of cutting and pasting results, plots, tables into other documents, like Word.
- Use "literate programming" tools like R Markdown to generate (and re-regnerate) reports that include your "final" plots, tables, and text. https://rmarkdown.rstudio.com/
- So when the data changes (and the data *will* change) you just re-run the report. No re-cutting-and-pasting. Fewer chances for mistakes.
Step 5: Make your code *sharable* - and maybe even share it!
- You've written all that nice code. It would be a shame not to let other people look. They probably won't, but the very "threat" of it can be a powerful motivator to use best practices.
https://statsepi.substack.com/p/open-science-is-really-scary-yall
- You've written all that nice code. It would be a shame not to let other people look. They probably won't, but the very "threat" of it can be a powerful motivator to use best practices.

Step 6: Aspire to never repeat yourself (NRY).
- Every new keystroke is an opportunity to make an error.
- When you find yourself writing the same block of code over and over, write a little program (function) that does it with a few key strokes.
- Every new keystroke is an opportunity to make an error.
- When you find yourself writing the same block of code over and over, write a little program (function) that does it with a few key strokes.
- Save those little functions is a separate scripts that you might use across many different analysis projects.
- It's easier than you think! https://www.earthdatascience.org/courses/earth-analytics/automate-science-workflows/write-efficient-code-for-science-r/
- It's easier than you think! https://www.earthdatascience.org/courses/earth-analytics/automate-science-workflows/write-efficient-code-for-science-r/
Step 7: Write tests into your script
- This is the step I am trying to learn more about.
- It is often true that you computer will happily give you the "wrong" results if you give it the wrong inputs.
- So build tests into your code to check that those inputs are correct.
- This is the step I am trying to learn more about.
- It is often true that you computer will happily give you the "wrong" results if you give it the wrong inputs.
- So build tests into your code to check that those inputs are correct.
- A simple example: You have 100 patients in your study. That means your dataset probably has 100 rows. So insert a bit of code that checks that this is true before running a model with those data.
- Or every time your code doesn't work and you eventually figure out why, write a little test to check for the underlying problem first.
- This is something that is often used by programmers (I am not one of those), but I think the ideas translate very nicely to analysis scripts. https://r-pkgs.org/tests.html
Final points:
- If you are paid to analyze data, this is what professionalism looks like.
- If the work you do matters, take it seriously. Good science requires meticulous technique, not just genius ideas.
- You can't learn all of this at once. But you can learn it, no doubt.
- If you are paid to analyze data, this is what professionalism looks like.
- If the work you do matters, take it seriously. Good science requires meticulous technique, not just genius ideas.
- You can't learn all of this at once. But you can learn it, no doubt.
*This thread was brought to you by mild procrastination