The good part of this idea is that with the right platform you can vibe your way towards a probably-working model locally, make it presentable and then 🚢 but the problem is, uh, everything else https://twitter.com/jfhbrook/status/1314747331867291650
papermill gives you headless execution of a notebook and a cheap way to inject parameters - good for airflow - but you also need tools to pull datasets and do training, tuning and testing and they need to do sensible things in their respective environments automatically
for instance, when experimenting you might want the model to pull a fraction of the dataset down in-memory but have it run against a gigantic dask cluster in production; and you're probably happy to keep test results in objects locally but might want a database in production
the only way this is really workable is if you expose a custom framework in jupyter and make your abstractions switch underlying implementations based on environment variables, but that sounds like a huge pain in the ass versus making people dump their model into a library
hyperparameter tuning makes this even more complicated. you basically don't want to do autoML when fiddling around because it blows up runtime by an order of magnitude. the TPOT docs say that it can take literally days for it to do its job!
and TPOT doesn't know shit about papermill - what it knows is sklearn pipeline objects - so lmao if you want to do test runs at that level of abstraction.
It's not a total lost cause though - you just have to find a balance between platform and inflexibility, and even with the issues you'd still be able to use notebooks as the source of truth for models and get automatic runtime reports which is pretty cool
You can follow @jfhbrook.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: