#QConSF - @criccomini shares that WePay data infrastructure is based on Airflow, Kafka and BigQuery.
The roadmap to data maturity
This looks nuts to people who know databases, but at first it works well! Very real time. But soon users and reports get in each other's way
So you give users their own data warehouse. But now you have lots of loading jobs and that gets complex. Data quality may be an issue.
How do you know if real-time is for you?
This is how real time looks like at WePay
But now there are many systems and there is operational pain. We need more integration.
Can you do integration? SREs are key.
Now we stream all things!
You need automation! Not just for operations (everyone knows about that), but also automating data management! Do you have a data catalog?
"we use terraform to manage Kafka topics and connectors. This topic has compaction policy, which is an exciting policy to have when you system evolves". @criccomini at #QConSF
You need a data catalog, or you'll spend all your time chasing compliance issues.
Shout out to Amundsen data catalog. But there are many others ( @mark_grover)
You need all your systems talking to data catalog. You, the data engineer, shouldn't enter the data yourself.
You can monitor for sensitive data in wrong data sets and alert if this happens. GCP has tools to set this up.
Are you ready to decentralize your data flows? Can you let users spin their own micro-dwh, load and populate them on their own.
My biggest take away from @criccomini talk
Is this the future?
You can follow @gwenshap.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: