Here you will learn how to configure the VMware Spring Cloud® Data Flow for Kubernetes (SCDF for Kubernetes) installation resources before deploying to the Kubernetes cluster. Before configuring the installation resources, verify that you have followed the steps described in Preparing to Install SCDF for Kubernetes and Installing Command-Line Tools.
Configuration of Spring Cloud Data Flow for Kubernetes involves a number of features:
The SCDF for Kubernetes installation process uses Kustomize, a tool for customizing Kubernetes resource configuration. For each resource, Kustomize takes a base configuration file and applies patch files that modify the base configuration to target different deployment environments. Kustomize is built into the kubectl CLI.
SCDF for Kubernetes provides Kustomize configuration files for a development environment and for a production environment. The development environment is intended for quick exploration of SCDF for Kubernetes. Its configuration provisions a RabbitMQ message broker and a PostgreSQL database.
The "Spring Cloud Data Flow for Kubernetes Legacy Installation" archive, downloaded from the Broadcom Support portal, contains the following directory structure:
bin
: Scripts for installing to the development environment, as well as a script for performing image relocationapps
: Kubernetes resource files and application configuration files to support deployment to multiple environmentsservices
: Kubernetes resource files for deploying RabbitMQ and PostgreSQL to the development environmentThe apps
directory contains the following directory structure, based on Kustomize naming conventions.
.
├── data-flow
│ ├── images
│ ├── kustomize
│ │ ├── base
│ │ └── overlays
│ └── schemas
├── ingress
│ └── kustomize
│ ├── base
│ └── overlays
└── skipper
├── images
├── kustomize
│ ├── base
│ └── overlays
└── schemas
Each application included as part of SCDF for Kubernetes has a directory: the data-flow
directory contains the Data Flow server, and the skipper
directory contains the Skipper server. The ingress
directory contains configuration files for an Ingress resource.
Each of the two application directories contains the following directory structure:
images
: Container images (only present if you have downloaded and extracted the "SCDF for Kubernetes installation images" archive)kustomize
: Kubernetes and application configuration files for use with Kustomizeschemas
: Database schemas (you can install these manually if you do not want each server to install them on startup)For each application, you can edit files in the kustomize
directory to configure the application and Kubernetes resources. The contents of the directory are shown below.
.
├── base
│ ├── deployment.yaml
│ ├── kustomization.yaml
│ ├── role-binding.yaml
│ ├── roles.yaml
│ ├── service-account.yaml
│ └── service.yaml
└── overlays
├── dev
│ ├── application.yaml
│ ├── application-database.yaml
│ ├── deployment-patch.yaml
│ ├── kustomization.yaml
│ └── service-patch.yaml
└── production
├── application.yaml
├── application-database.yaml
├── deployment-patch.yaml
├── kustomization.yaml
└── service-patch.yaml
The base
directory contains Kubernetes resource files with default configuration values. These files are then patched by files in the overlays
directory. The kustomization.yaml
file references the other YAML files in the directory.
The overlays
directory contains one subdirectory for each deployment environment: a dev
subdirectory and a production
subdirectory. Within each subdirectory, a kustomization.yaml
file references the kustomization.yaml
file in the base
directory, and adds additional patches to modify values that are unique to the target deployment environment.
You can set the target namespace to deploy, and add labels and name prefixes, by adding additional configuration fields to the kustomization.yaml
file. See the Kustomize documentation for information about the additional fields that are available.
Within each environment-specific subdirectory in the overlays
directory, the application.yaml
and application-database.yaml
files is a Spring Boot configuration file that you can edit to configure the application (the Data Flow server or Skipper server). For example, you can edit application.yaml
and application-database.yaml
to configure spring.datasource
properties or spring.cloud.dataflow.features
properties.
Also within each environment-specific subdirectory, the deployment-patch.yaml
and service-patch.yaml
files are used to patch the base
Kubernetes resource files, substituting values that are appropriate for the specific environment. For example, you might configure memory or CPU allocations and ingress configuration using one set of values for your dev
environment and another for your production
environment.
For information about how to configure specific settings, see the following sections.
You can customize database configuration settings in the application-database.yaml
files. These files are preconfigured to connect to a PostgreSQL database, which can be installed either by using the manifests provided in services/dev/postgresql
or by installing the bitnami/postgresql
Helm chart.
To use a different database, you must replace or remove the secret volume
and volumeMount
in the deployment-patch.yaml
file. If you wish to replace the secret, VMware recommends that you provide database-password
as the path for the password key. This minimizes the changes needed in the application.yaml
or application-database.yaml
file.
You must modify database settings for both the skipper
application and the data-flow
application. The following example uses a secret named postgresql
with a postgresql-password
key. (You can specify additional keys that are available in the secret, as long as they reference the paths that you specify in the application.yaml
or application-database.yaml
file.)
containers:
- name: skipper
volumeMounts:
- name: database
mountPath: /workspace/runtime/secrets/database
readOnly: true
...
volumes:
- name: database
secret:
secretName: postgresql
items:
- key: postgresql-password
path: database-password
In the application-database.yaml
file, you must edit settings under spring.datasource
. Update the url
, username
, password
, and driverClassName
values to match your database settings, as shown in the example below.
spring:
datasource:
url: jdbc:postgresql://database-host:5432/dataflow-db
username: postgres
password: ${database-password}
driverClassName: org.postgresql.Driver
testOnBorrow: true
validationQuery: "SELECT 1"
To initialize your database schema, you can use the DDL scripts provided in apps/skipper/schemas
and apps/data-flow/schemas
. SCDF for Kubernetes provides DDL scripts for IBM DB2, MySQL, Oracle, PostgreSQL, and Microsoft SQL Server. If applying multiple files, ensure that you apply them in the correct order (the V1-
file first, then the V2-
file, then the V3-
file, and so on).
If you manually initialize the schema for the database, you can disable automatic database initialization for the Skipper and Data Flow server applications. Add the following settings to the application.yaml
files for the applications:
spring:
...
jpa:
hibernate:
ddlAuto: none
flyway:
enabled: false
The published application artifacts for SCDF for Kubernetes include JDBC drivers for MySQL (via the MariaDB driver), HSQLDB, PostgreSQL, and embedded H2. See Spring Cloud Data Flow Artifacts for details. If you wish, you can instead use your own JDBC driver.
To add a JDBC driver, perform the following steps:
Download the "SCDF Pro Server jar file" release artifact from the Broadcom Support portal.
Download the server JAR files for Skipper and the Composed Task Runner:
$ wget https://repo1.maven.org/maven2/org/springframework/cloud/spring-cloud-skipper-server/2.11.2/spring-cloud-skipper-server-2.11.2.jar
$ wget https://repo1.maven.org/maven2/org/springframework/cloud/spring-cloud-dataflow-composed-task-runner/2.11.2/spring-cloud-dataflow-composed-task-runner-2.11.2.jar
Within each application directory (the apps/data-flow
and apps/skipper
directories), create a directory for the new JDBC driver:
$ mkdir -p BOOT-INF/lib
Copy your JDBC driver jar to the new directories.
Add the driver to the downloaded JAR files:
$ jar -u0f spring-cloud-skipper-server-2.11.2.jar BOOT-INF/lib/*.jar
$ jar -u0f scdf-pro-server-1.6.1.jar BOOT-INF/lib/*.jar
$ jar -u0f spring-cloud-dataflow-composed-task-runner-2.11.2.jar BOOT-INF/lib/*.jar
Create new container images using the updated server JAR files. The easiest way to build new container images is to use the pack CLI utility. Use a base image that is based on JDK 8.
First, create a registry_prefix
environment variable containing the prefix that you want to use:
$ export registry_prefix=registry.example.com/example
Then build the three container images.
To build the container image for the skipper
app, run the following command:
$ pack build --env BP_JVM_VERSION=8 \
--path spring-cloud-skipper-server-2.11.2.jar \
--builder gcr.io/paketo-buildpacks/builder:base \
--publish ${registry_prefix}/spring-cloud-skipper-server:2.11.2-jdbc
To build the container image for the data-flow
app, run the following command:
$ pack build --env BP_JVM_VERSION=8 \
--path scdf-pro-server-1.6.1.jar \
--builder gcr.io/paketo-buildpacks/builder:base \
--publish ${registry_prefix}/scdf-pro-server:1.6.1-jdbc
To build the container image for the composed-task-runner
app, run the following command:
$ pack build --env BP_JVM_VERSION=8 \
--path spring-cloud-dataflow-composed-task-runner-2.11.2.jar \
--builder gcr.io/paketo-buildpacks/builder:base \
--publish ${registry_prefix}/spring-cloud-dataflow-composed-task-runner:2.11.2-jdbc
Update your Kustomize manifests with the locations of the new images.
For the skipper
application, update the newName
and newTag
values in apps/skipper/kustomize/overlays/production/kustomization.yaml
. Update the digest
field with the new image digest, or remove the digest
field altogether.
images:
- name: springcloud/spring-cloud-skipper-server # Used for Kustomize matching
newName: registry.example.com/example/spring-cloud-skipper-server
newTag: 2.11.2-jdbc
configMapGenerator:
- name: skipper
files:
- bootstrap.yaml
- application.yaml
- application-database.yaml
bases:
- ../../base
patches:
- deployment-patch.yaml
- service-patch.yaml
For the data-flow
application, update the newName
and newTag
values in apps/data-flow/kustomize/overlays/production/kustomization.yaml
. Update the digest
field with the new image digest, or remove the digest
field altogether.
images:
- name: springcloud/spring-cloud-dataflow-server # Used for Kustomize matching
newName: registry.example.com/example/scdf-pro-server
newTag: 1.6.1-jdbc
configMapGenerator:
- name: scdf-server
files:
- bootstrap.yaml
- application.yaml
- application-database.yaml
bases:
- ../../base
patches:
- deployment-patch.yaml
- service-patch.yaml
For the composed-task-runner
image, update the apps/data-flow/kustomize/overlays/production/application.yaml
file. Update the spring.cloud.dataflow.task.composedTaskRunnerUri
entry to match your new image. The value should be prefixed with docker://
. You can provide either a tag or a digest that matches your new image. See the following example:
spring:
cloud:
config:
enabled: false
dataflow:
server:
uri: http://${SCDF_SERVER_SERVICE_HOST}:${SCDF_SERVER_SERVICE_PORT}
features:
schedules-enabled: true
task:
composedtaskrunner:
uri: docker://registry.example.com/my-project/spring-cloud-dataflow-composed-task-runner:2.11.2-jdbc
...
If you add your own JDBC driver, you should manually initialize the database schema and disable automatic database initialization for the Skipper and Data Flow server applications. See Initializing the Database Schema.
You can customize message broker configuration settings in the application.yaml
and the deployment-patch.yaml
files. These files are preconfigured to connect to a RabbitMQ broker, which can be installed either by using the manifests provided in services/dev/rabbitmq
or by installing the bitnami/rabbitmq
Helm chart.
If you want to use RabbitMQ as the message broker, review the skipper
application settings provided in the Kustomize overlays for the dev
and production
environments. As mentioned earlier, the default configuration connects to a RabbitMQ broker that runs in the same namespace as SCDF for Kubernetes.
If your RabbitMQ broker runs in a different namespace, or if you use an external broker, you must edit the deployment-patch.yaml
file in dev
or production
overlays. In the volume
and container.volumeMounts
sections, remove references to the rabbitmq
secret.
The RabbitMQ broker connection settings are located in the application.yaml
file for the Skipper application. This file is included in the dev
and production
overlays. When deploying a Spring Cloud Stream application, SCDF for Kubernetes passes the value of the environmentVariables
field to the application.
The configuration must include the following environment variables:
SPRING_RABBITMQ_HOST
SPRING_RABBITMQ_PORT
SPRING_RABBITMQ_USERNAME
SPRING_RABBITMQ_PASSWORD
If the RabbitMQ cluster runs in the same namespace as SCDF for Kubernetes, you can reference the service environment variables provided by Kubernetes. If not, replace these references with the actual values.
spring:
cloud:
skipper:
server:
platform:
kubernetes:
accounts:
default:
environmentVariables: 'SPRING_CLOUD_CONFIG_ENABLED=false,SPRING_RABBITMQ_HOST=${RABBITMQ_SERVICE_HOST},SPRING_RABBITMQ_PORT=${RABBITMQ_SERVICE_PORT_AMQP},SPRING_RABBITMQ_USERNAME=user,SPRING_RABBITMQ_PASSWORD=${rabbitmq-password}'
Note Setting SPRING_CLOUD_CONFIG_ENABLED
to false
prevents Spring Cloud Stream applications from attempting to discover a Spring Cloud Config server instance.
If you want to use Kafka as the message broker, review the skipper
application settings provided in the Kustomize overlays for the dev
and production
environments. Edit the deployment-patch.yaml
file. In the volume
and container.volumeMounts
sections, remove references to the rabbitmq
secret.
The Kafka broker connection settings are located in the application.yaml
file for the Skipper application. This file is included in the dev
and production
overlays.
You can either provide your own Kafka cluster, or use the bitnami/kafka
Helm chart to install one. The following example assumes that you used the Helm chart. If using a different Kafka installation, adjust the environment variables specified in the application.yaml
file.
Remove or comment out the line with environmentVariables
for RabbitMQ settings and uncomment the line with the corresponding Kafka settings. The value of the environmentVariables
property is passed to the deployed Spring Cloud Stream apps.
The configuration must include the following environment variables:
SPRING_CLOUD_STREAM_KAFKA_BINDER_BROKERS
(value: the host and port of the Kafka broker)SPRING_CLOUD_STREAM_KAFKA_BINDER_ZK_NODES
(value: the host and port for ZooKeeper)If the Kafka cluster runs in the same namespace as SCDF for Kubernetes, you can reference the service environment variables provided by Kubernetes. If not, replace these references with the actual values.
spring:
cloud:
skipper:
server:
platform:
kubernetes:
accounts:
default:
environmentVariables: 'SPRING_CLOUD_CONFIG_ENABLED=false,SPRING_CLOUD_STREAM_KAFKA_BINDER_BROKERS=${KAFKA_SERVICE_HOST}:${KAFKA_SERVICE_PORT},SPRING_CLOUD_STREAM_KAFKA_BINDER_ZK_NODES=${KAFKA_ZOOKEEPER_SERVICE_HOST}:${KAFKA_ZOOKEEPER_SERVICE_PORT}'
Note Setting SPRING_CLOUD_CONFIG_ENABLED
to false
prevents Spring Cloud Stream applications from attempting to discover a Spring Cloud Config server instance.
If your Kubernetes cluster supports Kubernetes Services of type LoadBalancer
, you may use that type of service to provision a load balancer via an Ingress Controller such as NGINX or Contour.
If your cluster has an ingress controller configured, you can use an Ingress resource manager to manage external access to the Data Flow server service.
Edit the ingress-patch.yaml
file in either the dev
overlay (apps/ingress/kustomize/overlays/dev/
) or the production
overlay (apps/ingress/kustomize/overlays/production/
). Under spec.rules
, update the host
entry, providing the DNS name required for your ingress controller configuration.
If your DNS name is data-flow.example.com
, add that DNS name as the host:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: scdf-ingress
spec:
rules:
- host: data-flow.example.com
http:
paths:
- backend:
serviceName: scdf-server
servicePort: 80
path: /
The Spring Cloud Data Flow and Skipper servers use the Spring Security library to configure authentication and authorization with OAuth 2.0 and OpenID Connect. You can use this to integrate SCDF for Kubernetes into Single Sign-On (SSO) environments.
To configure the servers, edit the application.yaml
configuration file for each server. In the Spring Cloud Data Flow Reference Guide, the Security section describes how to configure the Data Flow server, and the Azure section describes the configuration to use with Azure Active Directory.
To configure the application.yaml
file for the Skipper server, follow the same steps, using the same documentation from the Spring Cloud Data Flow Reference Guide.
The Data Flow server displays Spring Boot application metadata, which is stored as a label in the container image. To retrieve this metadata, you can configure the Data Flow server to access the container registry for the application images that you will deploy.
Place this configuration under the spring.cloud.dataflow.container.registry-configurations
section of the application.yaml
file. For examples of how to configure this section for Docker Hub, JFrog Artifactory, Amazon Elastic Container Registry, Azure Container Registry, and Harbor, see the open-source Spring Cloud Data Flow documentation.
When launching a Spring Cloud Task or Spring Cloud Stream application, the Kubernetes nodes must be able to pull the corresponding container image. If the image is stored in a private repository, you must provide authentication information for the nodes to use in pulling the image from a registry.
You can provide this authentication information by creating a secret in the namespace where you will install SCDF for Kubernetes. To create the secret and configure the SCDF for Kubernetes server applications to use it, follow the steps below.
Set username
and password
environment variables. These variables should contain your login credentials for the registry that stores the application images.
$ registry=<your registry host>
$ username=<your user name>
$ password=<your password>
Run the kubectl create secret
command, using the variables set in the previous step.
$ kubectl create secret \
docker-registry scdf-apps-regcred \
--docker-server=${registry} \
--docker-username=${username} \
--docker-password=${password}
Add the name of the secret to the application.yaml
file for the Skipper application. This will be used for Spring Cloud Stream applications.
spring:
cloud:
skipper:
server:
platform:
kubernetes:
accounts:
default:
imagePullSecret: scdf-apps-regcred
NoteIf you want to use the Composed Task Runner (CTR), create a secret that contains the credentials for the appropriate registry (either the Tanzu Net registry, or if you relocated the CTR image, the registry to which you relocated the image).
Add the name of the secret to the application.yaml
file for the Data Flow application. This will be used for Spring Cloud Task applications.
spring:
cloud:
dataflow:
task:
platform:
kubernetes:
accounts:
default:
imagePullSecret: scdf-apps-regcred
SCDF for Kubernetes provides a configurable limit for the maximum number of concurrently-running tasks. This limit can prevent the saturation of IaaS or hardware resources. If the number of concurrently running tasks on a platform instance is greater than or equal to the limit, the next task launch request fails, and SCDF for Kubernetes returns an error message through the RESTful API, the shell, or the UI.
By default, the concurrent task limit is set to 20. You can configure the limit by setting the maximumConcurrentTasks
property in the account:
spring:
cloud:
dataflow:
task:
platform:
kubernetes:
accounts:
default:
namespace: default
maximumConcurrentTasks: 40
limits:
memory: 1024Mi
A successful task launch results in a running pod, which is expected to eventually either complete or fail. Because there is no convenient way to identify the pods that correspond to a task, SCDF for Kubernetes counts only pods that were launched by the Data Flow server. The task launcher provides the task ID as a label in the pod metadata, and SCDF for Kubernetes filters all running pods by this label.
For information about configuring monitoring for SCDF for Kubernetes, see Configuring monitoring for SCDF for Kubernetes.