
Mo' data mo' problems — The rise of increasingly sophisticated tools (data warehousing, data ingestion, transformation, etc ...) has empowered modern data teams to work with large, complex, and disparate datasets.
1/

Large enterprises that lack a centralized data catalog find it challenging to:
- harness and curate massive data loads
- understand the provenance of the metadata on which their reports are built on
- derive meaningful insights from the data
- trust the data
2/

To this date, many tools (open source and others) have emerged out of this space from
@NetflixOSS,
@AirbnbData,
@UberEng,
@lyfteng,
@LinkedInEng,
@MarquezProject, and more recently,
@googlecloud.
3/

Next-gen tools will not only serve as the foundational framework for data governance but will also:
- improve internal operational efficiency
- promote transparency and fairness
- provide users of all skill levels access to the data they need, when they need it.
4/

I continue to care a lot about the broader metadata management / data catalog space and if you're an enterprise startup tackling this problem, I'd love to chat!
fin.