We use analytics and cookies to understand site traffic. Information about your use of our site is shared with Google for that purpose.You can read our privacy policies and terms of use etc by clicking here.
PostgreSQL Persistence for Model Metadata
Note
Before starting the installation procedure, please download installation resources as explained here and make sure that all pre-requisites are satisfied.
This page also assumes that main Seldon components are installed.
We use PostgreSQL for persisting model metadata information.
Seldon Deploy Configuration
Enabling/disabling the PostgreSQL dependency in Seldon Deploy can be done with setting the following Helm variable -
metadata.pg.enabled
. If it is set to false
Seldon Deploy will not attempt to connect to a PostgreSQL database,
but all model metadata functionality will be unavailable. If metadata.pg.enabled
is true
, then Seldon Deploy will
expect a metadata-postgres
Kubernetes secret to be present in the namespace where Seldon Deploy is running.
This secret needs to contain the information for connecting to a PostgreSQL database. The structure of the secret is:
kind: Secret
apiVersion: v1
data:
dbname: the_name_of_the_database_to_use_for_model_metadata
host: the_database_host
user: the_database_user_to_use_to_authenticate
password: the_database_password_to_use_to_authenticate
port: the_port_the_database_is_exposed_on
sslmode: the_sslmode
Installation
PostgreSQL can be installed in many different ways - using managed solutions by cloud providers, or running it in Kubernetes.
Bringing your own PostgreSQL
One option is to use PostgreSQL outside of the Kubernetes cluster that runs Seldon Deploy. If you already have a database
you want to use with Seldon Deploy running on prem or in the cloud you can add the connection information in the
metadata-postgres
secret in the namespace Seldon Deploy is running like this substituting the values with the ones
of your database:
kubectl create secret generic -n seldon-system metadata-postgres \
--from-literal=user=your_user \
--from-literal=password=your_password \
--from-literal=host=your.postgres.host \
--from-literal=port=5432 \
--from-literal=dbname=metadata \
--from-literal=sslmode=require \
--dry-run=client -o yaml \
| kubectl apply -n seldon-system -f -
In the next sections we explore how you can start using a managed PostgreSQL in AWS and GCP and connect it with Seldon Deploy.
Amazon RDS
Amazon RDS provides a managed PostgreSQL solution that can be used for Seldon Deploy’s Model Metadata Storage. For setting up RDS for the first time you can follow the docs here.
Some important point to remember while setting up RDS: * Make sure the instance is accessible from Seldon Deploy. If Seldon Deploy is not on the same VPC, make sure the VPC used by RDS has a public subnet as discussed here. * Make sure the security group used for accessing the RDS instances allow inbound and outbound traffic from and to Seldon Deploy. Setting up security groups for RDS is discussed here.
Once you have a running PostgreSQL instance, with a database and a user created you can configure Seldon Deploy by adding the
metadata-postgres
secret as discussed in the previous section.
To manage backups see the official documentation. Here is more documentation on other best practices around RDS.
Google SQL
GCP provides a managed PostgreSQL solution that can be used for Seldon Deploy’s Model Metadata Storage. For setting up Google SQL for the first time you can follow the docs here.
For connection instructions follow the official documentation. Make sure that the instance is accessible from Seldon Deploy. If using the public IP generated for the instance make sure the network that runs Seldon Deploy is part of the Cloud SQL authorized networks by following this guide.
Once you have a running PostgreSQL instance, with a database and a user created you can configure Seldon Deploy by adding the
metadata-postgres
secret as discussed in the previous section.
Running PostgreSQL in Kubernetes
You can also run PostgreSQL in the Kubernetes cluster that runs Seldon Deploy. We recommend using the Zalando PostgreSQL operator to manage the PostgreSQL installation and maintenance. The official documentation can be seen here. Below we show an example deployment of a PostgreSQL cluster:
To install the Zalando operator you can run:
git clone https://github.com/zalando/postgres-operator.git
git checkout v1.6.1 # Use a tag to pin what we are using.
cd postgres-operator
kubectl create namespace postgres || echo "namespace postgres exists"
helm install postgres-operator ./charts/postgres-operator --namespace postgres
If you want to install the operator UI you can do it by following this doc.
To install a minimal PostgreSQL setup you can run:
cat <<EOF | kubectl apply -f -
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
name: seldon-metadata-storage
namespace: postgres
spec:
teamId: "seldon"
volume:
size: 5Gi
numberOfInstances: 2
users:
seldon: # database owner
- superuser
- createdb
databases:
metadata: seldon # dbname: owner
postgresql:
version: "13"
EOF
For a more complex setup consisting of more users, databases, replicas, etc. please refer to the official documentation of the operator here.
Once the database instances have been created by the Zalando operator you can create the expected secret using the auto generated password:
kubectl get secret seldon.seldon-metadata-storage.credentials.postgresql.acid.zalan.do -n postgres -o 'jsonpath={.data.password}' | base64 -d > db_pass
kubectl create secret generic -n seldon-system metadata-postgres \
--from-literal=user=seldon \
--from-file=password=./db_pass \
--from-literal=host=seldon-metadata-storage.postgres.svc.cluster.local \
--from-literal=port=5432 \
--from-literal=dbname=metadata \
--from-literal=sslmode=require \
--dry-run=client -o yaml \
| kubectl apply -n seldon-system -f -
rm db_pass
Configuring Seldon Deploy
Once you have your PostgreSQL database ready and the secret with credentials ready add to deploy-values.yaml
following:
metadata:
pg:
enabled: true
secret: metadata-postgres
Production operations on self-managed PostgreSQL
One of drawbacks of using self-hosted PostgreSQL rather than a managed solution is that you will need to handle operating the PostgreSQL cluster. Here is a list of some resources for best practices and how to handle some operations:
- Monitoring - deploying postgres exporter and hooking it up with your Prometheus monitoring solution is a common way of getting continuous monitoring of the instances.
- Backups - the Zalando operator provides setup of periodic backups in s3 compatible storage - https://postgres-operator.readthedocs.io/en/latest/administrator/#wal-archiving-and-physical-basebackups It also documents restoring state from backups - https://postgres-operator.readthedocs.io/en/latest/administrator/#restoring-physical-backups. We strongly recommend setting backups if self-hosting PostgreSQL.
- Version update - Zalando support cloning and in-place version updates - https://postgres-operator.readthedocs.io/en/latest/administrator/#minor-and-major-version-upgrade
- Increase storage size - https://postgres-operator.readthedocs.io/en/latest/user/#increase-volume-size