A thread about choosing good data science projects, inspired by our method at @SharpestMindsAI
TL;DR:
- build products, not projects
- have a clear "win" condition
TL;DR:
- build products, not projects
- have a clear "win" condition

You can get hired with a single, high-quality, end-to-end project.
You can’t get hired with 6 jupyter notebooks that use pandas and sklearn to wrangle datasets you found on Kaggle.
Quantity makes you blend in.
Quality makes you stand out.
1/n
You can’t get hired with 6 jupyter notebooks that use pandas and sklearn to wrangle datasets you found on Kaggle.
Quantity makes you blend in.
Quality makes you stand out.
1/n
So, how do you build a quality project?
----> Build a *product* <----
2/n
----> Build a *product* <----
2/n
Build something that you could imagine being useful (ideally, to yourself)
This makes your project easier & more compelling to explain during interviews.
But there’s a much more important reason to build something useful:
**it’s what companies need you to do**
3/n
This makes your project easier & more compelling to explain during interviews.
But there’s a much more important reason to build something useful:
**it’s what companies need you to do**
3/n
Most people build open-ended projects by trying to optimize a metric (accuracy, AUC, etc). That’s a terrible idea.
Real world DS doesn’t work that way. Life is not a Kaggle competition
Instead of optimizing a metric all day, start by defining your project’s “win” condition
4/n
Real world DS doesn’t work that way. Life is not a Kaggle competition
Instead of optimizing a metric all day, start by defining your project’s “win” condition
4/n
How?
Suppose you’re building a fashion recommender system. Does it matter if your performance metric is 1% higher? Will users be able to tell? Probably not
So don’t make your win condition 0.99 AUC. You could add more value doing something besides obsessing over your model
5/n
Suppose you’re building a fashion recommender system. Does it matter if your performance metric is 1% higher? Will users be able to tell? Probably not
So don’t make your win condition 0.99 AUC. You could add more value doing something besides obsessing over your model
5/n
But suppose instead that you’re building a financial analytics tool. That 1% could mean millions of $.
Every % of performance is incredibly valuable. Definitely worth optimizing.
6/n
Every % of performance is incredibly valuable. Definitely worth optimizing.
6/n
Companies *really* need you to be able to define win conditions for their problem.
Otherwise you end up wasting your time and their money optimizing stuff that doesn’t need optimizing.
And not doing stuff that needs doing.
7/n
Otherwise you end up wasting your time and their money optimizing stuff that doesn’t need optimizing.
And not doing stuff that needs doing.
7/n
So an easy way to stand out is to design your project as a product, with a clear use case. Show that you can define sensible win conditions.
Companies need you to know how to do that. So they’ll be looking for hints that you can.
Give yourself the chance to tick that box.
fin/
Companies need you to know how to do that. So they’ll be looking for hints that you can.
Give yourself the chance to tick that box.
fin/