Thread by @GarrettLeeH, Just spent quite a bit of time seeing what I could find [...]

Garrett Hoffman

GarrettLeeH

Just spent quite a bit of time seeing what I could find on this. The answer is that it& #39;s kind of all/any of the above. I prefer the name "Feature Service" or "Feature Layer" over "Feature Store", because I think these names paint a better picture of what these actually are. https://twitter.com/GarrettLeeH/status/1118584707875311616">https://twitter.com/GarrettLe...

https://twitter.com/GarrettLeeH/status/1118584707875311616

Hopsworks open source "Feature Store" seems to be a "Feature Store" in the literal sense. Build a feature set locally, check these features into the store using an API, pull these features out later via API and create your training set. https://www.youtube.com/watch?v=N1BjPk1smdg">https://www.youtube.com/watch...

But it seems like the consensus is that for the platforms that are being developed internally this is evolving into a "Feature Service" that should be a Framework, a Computation Engine and a Cache/Store.

This layer of the pipeline should standardize feature definitions, compute complex backfills, compute "point in time" features (e.g. page views in last 5 hours), allow production models to access features, enable feature discovery and abstract away engineering.

Here are the best talks I found on the subject.

Branches Feature Service is implemented as a Flask App with DynamoDB as the data store. Users write custom feature definitions by defining computation pipelines. Features are calculated at inference and written to the store for future use in training. https://www.youtube.com/watch?v=PkxX05n_DCE">https://www.youtube.com/watch...

AirBnB created a service called Zipline where you define a feature and a computation using JSON config. These features aren& #39;t pre-computed and stored, they are computed at runtime. They are planning to open source this sometime in 2019. https://www.youtube.com/watch?v=wi2bXvtJ42k">https://www.youtube.com/watch...

GO-JEK defines feature specifications as YAML. These pipelines run in Airflow where features are computed on a schedule and stored separately for training and for inference. They access the data for training via custom queries and for serving via API. https://www.youtube.com/watch?v=0iCXY6VnpCc">https://www.youtube.com/watch...

You can follow @GarrettLeeH.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: