Thread by @statsepi, I know this is obvious, but you can't find the errors in [...]

I know this is obvious, but you can& #39;t find the errors in your data and analysis if you don& #39;t actively look for them.

A progression:

Step 1: Script everything. No point and click, ever. This is by far the most important step. I like R and @rstudio. You can do it. https://rstudio.cloud/learn/primers ">https://rstudio.cloud/learn/pri... || https://education.rstudio.com/learn/beginner/ ">https://education.rstudio.com/learn/beg...

Beginners

No one starting point will serve all beginners, but here are 6 ways to begin learning R.

https://rstudio.cloud/learn/primers

Step 2: Once you are nice and comfy with scripting, try to write nicer scripts.
- Scripts should be human-readable, not just machine readable.
- Use annotations to explain yourself and the code.
- Make the machine-readable bits as human-readable as possible...

...even if that means writing more lines of code (gasp!)
- Use a consistent style guide. http://adv-r.had.co.nz/Style.html
-">https://adv-r.had.co.nz/Style.htm... Remember, a good script is a love letter to yourself in 6 months time. It needs to be *written*. Clear. Concise. Complete.

Step 3: "Look" at everything.
- Visualize the variables you use. Plot raw values, summaries, and distributions.
- Plot how variables relate to one another.
- Plots dates and times to see if they are ordered as you would expect.

- Calculate differences between things (including dates and times) and look at those.
- If you transform something, plot it against the original.
- Don& #39;t just plot though - think about what the data *should* look like.

Step 4: Generate finished outputs, not just numbers.
- You need to try to eliminate (or greatly reduce) the amount of cutting and pasting results, plots, tables into other documents, like Word.

- Use "literate programming" tools like R Markdown to generate (and re-regnerate) reports that include your "final" plots, tables, and text. https://rmarkdown.rstudio.com/ ">https://rmarkdown.rstudio.com/">...

- So when the data changes (and the data *will* change) you just re-run the report. No re-cutting-and-pasting. Fewer chances for mistakes.

Step 5: Make your code *sharable* - and maybe even share it!
- You& #39;ve written all that nice code. It would be a shame not to let other people look. They probably won& #39;t, but the very "threat" of it can be a powerful motivator to use best practices.

https://abs.twimg.com/emoji/v2/... draggable="false" alt="😉" title="Zwinkerndes Gesicht" aria-label="Emoji: Zwinkerndes Gesicht"> https://statsepi.substack.com/p/open-science-is-really-scary-yall">https://statsepi.substack.com/p/open-sc...

Open science is really scary y’all

I did something stupid last week. I publicly posted my data and code. I’ve been living in a nightmare ever since, terrified by every email and notification, fearful it’s someone pointing out how I...

https://statsepi.substack.com/p/open-science-is-really-scary-yall

Step 6: Aspire to never repeat yourself (NRY).
- Every new keystroke is an opportunity to make an error.
- When you find yourself writing the same block of code over and over, write a little program (function) that does it with a few key strokes.

- Save those little functions is a separate scripts that you might use across many different analysis projects.
- It& #39;s easier than you think! https://www.earthdatascience.org/courses/earth-analytics/automate-science-workflows/write-efficient-code-for-science-r/">https://www.earthdatascience.org/courses/e...

Write Efficient Scientific Code - the DRY (Don’t Repeat Yourself) Principle

This lesson will cover the basic principles of using functions and why they are important.

https://www.earthdatascience.org/courses/earth-analytics/automate-science-workflows/write-efficient-code-for-science-r/

Step 7: Write tests into your script
- This is the step I am trying to learn more about.
- It is often true that you computer will happily give you the "wrong" results if you give it the wrong inputs.
- So build tests into your code to check that those inputs are correct.

- A simple example: You have 100 patients in your study. That means your dataset probably has 100 rows. So insert a bit of code that checks that this is true before running a model with those data.

- Or every time your code doesn& #39;t work and you eventually figure out why, write a little test to check for the underlying problem first.

- This is something that is often used by programmers (I am not one of those), but I think the ideas translate very nicely to analysis scripts. https://r-pkgs.org/tests.html ">https://r-pkgs.org/tests.htm...

Final points:
- If you are paid to analyze data, this is what professionalism looks like.
- If the work you do matters, take it seriously. Good science requires meticulous technique, not just genius ideas.
- You can& #39;t learn all of this at once. But you can learn it, no doubt.

*This thread was brought to you by mild procrastination

Latest Threads Unrolled: