I know this is obvious, but you can& #39;t find the errors in your data and analysis if you don& #39;t actively look for them.
A progression:

Step 1: Script everything. No point and click, ever. This is by far the most important step. I like R and @rstudio. You can do it. https://rstudio.cloud/learn/primers ">https://rstudio.cloud/learn/pri... || https://education.rstudio.com/learn/beginner/ ">https://education.rstudio.com/learn/beg...
Step 2: Once you are nice and comfy with scripting, try to write nicer scripts.
- Scripts should be human-readable, not just machine readable.
- Use annotations to explain yourself and the code.
- Make the machine-readable bits as human-readable as possible...
...even if that means writing more lines of code (gasp!)
- Use a consistent style guide. http://adv-r.had.co.nz/Style.html 
-">https://adv-r.had.co.nz/Style.htm... Remember, a good script is a love letter to yourself in 6 months time. It needs to be *written*. Clear. Concise. Complete.
Step 3: "Look" at everything.
- Visualize the variables you use. Plot raw values, summaries, and distributions.
- Plot how variables relate to one another.
- Plots dates and times to see if they are ordered as you would expect.
- Calculate differences between things (including dates and times) and look at those.
- If you transform something, plot it against the original.
- Don& #39;t just plot though - think about what the data *should* look like.
Step 4: Generate finished outputs, not just numbers.
- You need to try to eliminate (or greatly reduce) the amount of cutting and pasting results, plots, tables into other documents, like Word.
- Use "literate programming" tools like R Markdown to generate (and re-regnerate) reports that include your "final" plots, tables, and text. https://rmarkdown.rstudio.com/ ">https://rmarkdown.rstudio.com/">...
- So when the data changes (and the data *will* change) you just re-run the report. No re-cutting-and-pasting. Fewer chances for mistakes.
Step 5: Make your code *sharable* - and maybe even share it!
- You& #39;ve written all that nice code. It would be a shame not to let other people look. They probably won& #39;t, but the very "threat" of it can be a powerful motivator to use best practices. https://abs.twimg.com/emoji/v2/... draggable="false" alt="😉" title="Zwinkerndes Gesicht" aria-label="Emoji: Zwinkerndes Gesicht"> https://statsepi.substack.com/p/open-science-is-really-scary-yall">https://statsepi.substack.com/p/open-sc...
Step 6: Aspire to never repeat yourself (NRY).
- Every new keystroke is an opportunity to make an error.
- When you find yourself writing the same block of code over and over, write a little program (function) that does it with a few key strokes.
Step 7: Write tests into your script
- This is the step I am trying to learn more about.
- It is often true that you computer will happily give you the "wrong" results if you give it the wrong inputs.
- So build tests into your code to check that those inputs are correct.
- A simple example: You have 100 patients in your study. That means your dataset probably has 100 rows. So insert a bit of code that checks that this is true before running a model with those data.
- Or every time your code doesn& #39;t work and you eventually figure out why, write a little test to check for the underlying problem first.
- This is something that is often used by programmers (I am not one of those), but I think the ideas translate very nicely to analysis scripts. https://r-pkgs.org/tests.html ">https://r-pkgs.org/tests.htm...
Final points:
- If you are paid to analyze data, this is what professionalism looks like.
- If the work you do matters, take it seriously. Good science requires meticulous technique, not just genius ideas.
- You can& #39;t learn all of this at once. But you can learn it, no doubt.
*This thread was brought to you by mild procrastination
You can follow @statsepi.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: