Check out my new post on the evolution of data discovery and metadata management https://medium.com/work-bench/work-bench-snapshot-the-evolution-of-data-discovery-catalog-2f6c0425616b
Mo' data mo' problems — The rise of increasingly sophisticated tools (data warehousing, data ingestion, transformation, etc ...) has empowered modern data teams to work with large, complex, and disparate datasets.
1/
1/
Large enterprises that lack a centralized data catalog find it challenging to:
- harness and curate massive data loads
- understand the provenance of the metadata on which their reports are built on
- derive meaningful insights from the data
- trust the data
2/
- harness and curate massive data loads
- understand the provenance of the metadata on which their reports are built on
- derive meaningful insights from the data
- trust the data
2/
To this date, many tools (open source and others) have emerged out of this space from @NetflixOSS, @AirbnbData, @UberEng, @lyfteng, @LinkedInEng, @MarquezProject, and more recently, @googlecloud.
3/
3/
Next-gen tools will not only serve as the foundational framework for data governance but will also:
- improve internal operational efficiency
- promote transparency and fairness
- provide users of all skill levels access to the data they need, when they need it.
4/
- improve internal operational efficiency
- promote transparency and fairness
- provide users of all skill levels access to the data they need, when they need it.
4/
I continue to care a lot about the broader metadata management / data catalog space and if you're an enterprise startup tackling this problem, I'd love to chat!
fin.
fin.