Data Storage #
Feature Store #
Feature store is a central repository for storing documented, curated and access-controlled features.
Features store enables the team to re-use shared features to avoid redundant work.
Key points:
- managing feature data from a single person to an enterprise
- Scalable and performant access to features for training and serving
- provide consistent and point-in-time correct access
- enable discovery, documentation and insights
Offline feature procesing:
- feature engineering, like feature crossing
- data quality control
- offline storage
- discoverability
Online feature usage:
- low-latency access
- access to pre-computed features (computing aggregations and joins online is very time-consuming)
Data warehoues #
Key features:
- subject-oriented
- integrated (multiple data sources)
- non-volatile (access to history)
Data Lakes #
- Aggregates raw data from multiple sources
- Structured and unsctructured
- Purpose of data may be not yet determined
- Doesn’t involve any data processing