Article
feature-storefeastmlopsdata-engineeringonline-servingoffline-storepythoncli
Sync Offline and Online Feature Stores with Feast
Use the Feast CLI to synchronize feature data from an offline store (like Parquet files) to an online store (like SQLite). This makes features available for low-latency model inference while maintaining a single source of truth.
beginner15 min5 steps
The play
- Initialize a Feast RepositoryFirst, install Feast and create a new project directory. Then, run `feast init` to scaffold a basic feature repository structure. We'll use a simple local setup for this guide.
- Define Features and SourcesModify the generated `example.py` file to define your data source, entities, and feature views. A feature view is a logical group of features from a single source. This declarative definition is the core of your Feature Store Sync logic.
- Register and Materialize FeaturesRun `feast apply` to register your definitions with the feature store's registry. Then, run `feast materialize-incremental` to perform the Feature Store Sync, loading the latest feature values from your offline source (Parquet) into the online store (SQLite).
- Backfill Historical FeaturesTo populate the online store with historical data for training or analysis, use `feast materialize`. Specify a start and end date to backfill a specific time window from the offline store.
- Fetch Online FeaturesVerify the sync by fetching feature vectors from the online store. Use the Feast Python SDK to get the latest feature values for specific entities, which can then be passed to a model for inference.
Starter code
#!/bin/bash
# This script sets up a complete local Feast project and demonstrates a feature sync.
# 1. Cleanup and Install
rm -rf feast_demo
mkdir feast_demo
cd feast_demo
pip install feast pandas pyarrow -q
# 2. Create sample offline data (Parquet file)
mkdir data
python -c "
import pandas as pd
from datetime import datetime, timedelta
end_date = datetime.now()
start_date = end_date - timedelta(days=3)
data = {
'event_timestamp': pd.to_datetime([start_date + timedelta(hours=i) for i in range(10)]),
'driver_id': [1001, 1002, 1003, 1001, 1002, 1003, 1001, 1002, 1003, 1001],
'conv_rate': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
'acc_rate': [0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.0],
'avg_daily_trips': [10, 12, 15, 11, 13, 16, 9, 11, 14, 10],
'created': [datetime.now() for _ in range(10)]
}
df = pd.DataFrame(data)
df.to_parquet('data/driver_stats.parquet')
print('Offline data created at data/driver_stats.parquet')
"
# 3. Create Feast repository configuration
cat > feature_store.yaml <<EOL
project: feast_demo
registry: data/registry.db
provider: local
online_store:
type: sqlite
path: data/online.db
EOL
# 4. Create feature definitions
cat > definitions.py <<EOL
from datetime import timedelta
from feast import Entity, FeatureView, Field, FileSource
from feast.types import Float32, Int64
driver_hourly_stats = FileSource(
path="data/driver_stats.parquet",
timestamp_field="event_timestamp",
created_timestamp_column="created",
)
driver = Entity(name="driver_id", join_keys=["driver_id"])
driver_hourly_stats_view = FeatureView(
name="driver_hourly_stats",
entities=[driver],
ttl=timedelta(days=1),
schema=[
Field(name="conv_rate", dtype=Float32),
Field(name="acc_rate", dtype=Float32),
Field(name="avg_daily_trips", dtype=Int64),
],
source=driver_hourly_stats,
)
EOL
# 5. Apply definitions and sync data to online store
echo "\n--> Applying feature definitions..."
feast apply
echo "\n--> Syncing (materializing) latest features to online store..."
feast materialize-incremental $(date +%Y-%m-%dT%H:%M:%S)
# 6. Fetch features from the online store to verify
echo "\n--> Fetching features from online store for verification..."
python -c "
from feast import FeatureStore
import pandas as pd
store = FeatureStore(repo_path=".")
feature_vector = store.get_online_features(
features=[
'driver_hourly_stats:conv_rate',
'driver_hourly_stats:acc_rate',
'driver_hourly_stats:avg_daily_trips',
],
entity_rows=[
{'driver_id': 1001},
{'driver_id': 1002},
],
).to_dict()
print(pd.DataFrame.from_dict(feature_vector))
"