See below for information about managing Data Flow service instances using the Cloud Foundry Command Line Interface tool (cf CLI). You can also manage Data Flow service instances using Apps Manager.
Note: In order to have read and write access to a Spring Cloud Data Flow for VMware Tanzu service instance, you must have the SpaceDeveloper role in the space where the service instance was created. If you have only the SpaceAuditor role in the space where the service instance was created, you have only read (not write) access to the service instance.
When creating or updating a Spring Cloud Data Flow service instance, you can configure the service instance using parameters passed to the cf CLI commands. See the following sections for information about the supported parameters.
Each Data Flow service instance can be given the name of a buildpack to use for deploying stream and task apps. You can set the buildpack for the service instance using a buildpack parameter given to cf create-service or cf update-service. To create a service instance that uses a buildpack named custom-java-buildpack to deploy apps, you might run:
$ cf create-service p-dataflow standard data-flow -c '{"buildpack": "custom-java-buildpack"}'
You can configure settings for a service instance's backing Data Flow server and Skipper apps using parameters given to cf create-service or cf update-service.
| Parameter | Function |
|---|---|
dataflow.disk |
The disk used by the Data Flow server. |
dataflow.memory |
The memory used by the Data Flow server. |
skipper.disk |
The disk used by the Skipper backing app. |
skipper.memory |
The memory used by the Skipper backing app. |
For all disk and memory settings, the default unit is mebibytes (MB). You can use other units by naming the unit in the value string (for example, "1G", "512MB", "2GiB", or "3gb").
To create a service instance with a Skipper backing app that uses 4 GiB of disk space, you might run:
$ cf create-service p-dataflow standard data-flow -c '{"skipper": { "disk": "4GiB" } }'
You can configure the domain used by a service instance's backing Data Flow server and Skipper apps using a domain parameter given to cf create-service or cf update-service. To create a service instance that uses the domain my-dataflow.example.com for its backing Data Flow server and Skipper apps, you might run:
$ cf create-service p-dataflow standard data-flow -c '{"domain": "my-dataflow.example.com"}'
You can configure Spring Cloud Skipper settings for a service instance's Skipper backing app by passing the settings as parameters to cf create-service or cf update-service. This can be used to configure the deployer health check timeout, for example. To create a service instance that uses a health check timeout of five (5) minutes, you might run:
$ cf create-service p-dataflow standard data-flow -c '{"spring.cloud.skipper.server.strategies.healthcheck.timeout-in-millis": 300000}'
By default, a Data Flow server instance will not cache artifacts downloaded from a Maven repository, because this caching can overwhelm app containers and cause the service instance's Data Flow or Skipper backing apps to crash. If you wish, you can enable caching of Maven artifacts by setting a maven-cache parameter, passed to cf create-service or cf update-service, to true:
$ cf create-service p-dataflow standard data-flow -c '{"maven-cache": true}'
Each Data Flow service instance uses three dependent data services. Defaults for these services can be configured in the tile settings, and these defaults can be overridden for each individual service instance at create or update time.
Note: The service offerings with the plan proxy are proxy services used by Spring Cloud Data Flow for VMware Tanzu service instances. The Spring Cloud Data Flow service broker creates and deletes instances of these services automatically along with each Spring Cloud Data Flow service instance. Do not manually create or delete instances of these services.
General parameters used to configure dependent data services for a Data Flow service instance are listed below.
| Parameter | Function |
|---|---|
relational-data-service.name |
The name of the service to use for a relational database that stores Spring Cloud Data Flow metadata and task history. |
relational-data-service.plan |
The name of the service plan to use for the relational database service. |
messaging-data-service.name |
The name of the service to use for a RabbitMQ or Kafka server that facilitates event messaging. |
messaging-data-service.plan |
The name of the service plan to use for the RabbitMQ or Kafka service. |
skipper-relational.name |
The name of the service to use for a relational database used by the Skipper application. |
skipper-relational.plan |
The name of the service plan to use for a relational database used by the Skipper application. |
To create a Data Flow service instance that uses VMware Tanzu SQL [MySQL] for the Data Flow and Skipper relational databases and uses RabbitMQ for for VMware Tanzu for the event messaging service, you might use a command such as the following:
$ cf create-service p-dataflow standard data-flow -c '{ "relational-data-service": { "name": "p.mysql", "plan": "med-db" }, "messaging-data-service": { "name": "p.rabbitmq", "plan": "high-vol" }, "skipper-relational": { "name": "p.mysql", "plan": "sm-db" } }'
To run composed tasks, Spring Cloud Data Flow uses a task app called the Composed Task Runner (CTR). By default, Data Flow downloads this app from the Maven Central repository. A different default URL for this app can be configured in the tile settings, and this default can be overridden for each individual service instance at create or update time. You can specify a different URL to use for downloading the app by using a parameter passed to the cf create-service or cf update-service command.
To create a service instance that downloads the CTR app from https://example.com/ctr.jar, you might run:
$ cf create-service p-dataflow standard data-flow -c '{ "composed-task-runner-uri": "https://example.ctr.jar" }'
Each Data Flow service instance can optionally be bound to other service instances. For instance, you can configure a Data Flow service instance to be bound to an existing Spring Cloud Services Config Server service instance. To specify that a Data Flow service instance should be bound to an existing other service instance, include the other service instance's name in a JSON array called services and pass the array to the cf create-service or cf update-service command.
To create a Data Flow service instance that is bound to an existing Spring Cloud Services Config Server service instance named my-config-server, you might use a command such as the following:
$ cf create-service p-dataflow standard data-flow -c '{"services": ["my-config-server"] }'
When created, the data-flow service instance will be bound to the existing my-config-server service instance.
You can use Grafana to view metrics for Spring Cloud Data Flow apps and streams. To enable this, use settings under spring.cloud.dataflow.grafana-info, passed to cf create-service or cf update-service.
To create a service instance that sends metrics to a Grafana dashboard located at https://grafana.example.com:443, you might run:
$ cf create-service p-dataflow standard data-flow -c '{"spring.cloud.dataflow.grafana-info.url": "https://grafana.example.com:443"}'
Note: Spring Cloud Data Flow for VMware Tanzu does not provide a Grafana installation. You must provide your own Grafana installation in order to use Spring Cloud Data Flow for VMware Tanzu with Grafana.
Each Data Flow service instance can optionally specify Maven configuration properties. For the complete list of properties that can be specified, see the "Maven" section in the OSS Spring Cloud Data Flow documentation.
Maven configuration properties can be set for each Data Flow service instance using parameters given to cf create-service or cf update-service. To set the maven.remote-repositories.repo1.url property, you might use a command such as the following:
$ cf create-service p-dataflow standard data-flow -c '{"maven.remote-repositories.repo1.url": "https://repo.spring.io/libs-snapshot"}'
To configure a private Maven repository that requires authentication, you can provide a username and password, such as the following:
$ cf create-service p-dataflow standard data-flow -c '{"maven.remote-repositories.repo1.url":"https://my.private.maven/repo","maven.remote-repositories.repo1.auth.username":"user","maven.remote-repositories.repo1.auth.password":"password"}'
Spring Cloud Data Flow can integrate with VMware Tanzu Observability by Wavefront to monitor deployed event-streaming and batch applications. Default values for Wavefront settings can be set in the tile configuration, and these default values can be overridden for each individual service instance at create or update time.
To configure Wavefront settings for a Data Flow service instance, pass a wavefront parameter to the cf create-service or cf update-service command. This parameter is a JSON object with the fields listed below.
| Parameter | Function |
|---|---|
uri |
The URI of the Wavefront instance. |
api-token |
The user API token to use for Wavefront. |
source |
An arbitrary string used to identify the Data Flow service instance. |
To configure these settings for a new Data Flow service instance, you might use a command such as the following:
$ cf create-service p-dataflow standard data-flow -c '{"wavefront": {"uri": "https://wavefront.example.com", "api-token": "EXAMPLE_API_TOKEN", "source": "my-dataflow-si"} }'
All Wavefront settings are optional. If you do not supply a value for any particular setting, the Data Flow service instance will use the default value for that setting (the value specified in the tile settings).
Each Data Flow service instance can execute a maximum number of concurrently-running tasks (the default limit is 10). You can configure the concurrent task limit using a concurrent-task-limit parameter given to cf create-service or cf update-service:
$ cf create-service p-dataflow standard data-flow -c '{"concurrent-task-limit": 30}'
When the number of concurrent tasks reaches the specified limit, the Data Flow service instance will no longer launch new tasks until the number of running tasks is again below the limit.
Each Data Flow service instance can be configured to run tasks only (with stream support disabled). You can configure the service instance to enable only task support using a task-only parameter given to cf create-service or cf update-service:
$ cf create-service p-dataflow standard data-flow -c '{"task-only": true}'
With task-only set to true, the Spring Cloud Skipper backing app (with its associated relational database backing service instance and the messaging backing service instance) will not be deployed for the service instance, and the service instance's dashboard (see Using the Dashboard) will not display the Streams tab.
Each Data Flow service instance can be configured to run streams only (with task support disabled). You can configure the service instance to enable only stream support using a stream-only parameter given to cf create-service or cf update-service:
$ cf create-service p-dataflow standard data-flow -c '{"stream-only": true}'
With stream-only set to true, the service instance's dashboard (see Using the Dashboard) will not display the Tasks tab.
When creating or updating a Data Flow service instance, you can set the memory allocation for the associated Spring Cloud Skipper server deployed to VMware Tanzu Application Service for VMs (PAS). The default memory allocation for Skipper is 2 GB.
To configure a value for Skipper's memory allocation, you can pass a skipper parameter--a JSON object with a single memory key--to the cf create-service or cf update-service command:
$ cf create-service p-dataflow standard data-flow -c '{"skipper": { "memory": "8G" }}'
You can use the Scheduler service with Spring Cloud Data Flow for VMware Tanzu to schedule task executions (see the Spring Cloud Data Flow OSS documentation on Scheduling Tasks). If you configure a Data Flow service instance to use Scheduler, the Data Flow broker will create a new Scheduler service instance in the Data Flow service instance's backing space. This Scheduler service instance will be bound to the Data Flow server's backing application.
To configure a Data Flow service instance to use Scheduler, pass a scheduler parameter to the cf create-service or cf update-service command. This parameter is a JSON object with the fields listed below.
| Parameter | Function |
name |
The name of the scheduler service offering to use. Only the Scheduler service, scheduler-for-pcf, is supported at this time. |
plan |
The name of the service plan to use. |
instance-name |
The name of the service instance to create. Optional. |
To create a Data Flow service instance that uses a Scheduler service instance with the standard plan, named mysched, you might use a command such as the following:
$ cf create-service p-dataflow standard mydf -c '{"scheduler":{"name": "p-scheduler", "plan": "standard", "instance-name":"mysched"}}
Begin by targeting the correct org and space.
$ cf target -o myorg -s development api endpoint: https://api.system.example.com api version: 2.75.0 user: user org: myorg space: development
You can view plan details for the Data Flow product using cf marketplace -s.
$ cf marketplace Getting services from marketplace in org myorg / space development as user... OK service plans description p-dataflow standard Deploys Spring Cloud Data Flow servers to orchestrate data pipelines p-dataflow-mysql proxy Proxies to the Spring Cloud Data Flow MySQL service instance p-dataflow-rabbitmq proxy Proxies to the Spring Cloud Data Flow RabbitMQ service instance TIP: Use 'cf marketplace -s SERVICE' to view descriptions of individual plans of a given service. $ cf marketplace -s p-dataflow Getting service plan information for service p-dataflow as user... OK service plan description free or paid standard Standard Plan free
Create the service instance using cf create-service. To create a Data Flow service that sets the Maven maven.remote-repositories.repo1.url property to https://repo.spring.io/release, you might run:
$ cf create-service p-dataflow standard data-flow -c '{ "maven.remote-repositories.repo1.url": "https://repo.spring.io/libs-snapshot" }'
Creating service instance data-flow in org myorg / space development as user...
OK
Create in progress. Use 'cf services' or 'cf service data-flow' to check operation status.
As the command output suggests, you can use the cf services or cf service commands to check the status of the service instance. When the service instance is ready, the cf service command will give a status of create succeeded:
$ cf service data-flow Service instance: data-flow Service: p-dataflow Bound apps: Tags: Plan: standard Description: Deploys Spring Cloud Data Flow servers to orchestrate data pipelines Documentation url: https://cloud.spring.io/spring-cloud-dataflow/ Dashboard: https://p-dataflow.apps.example.com/instances/f09e5c77-e526-4f49-86d6-721c6b8e2fd9/dashboard Last Operation Status: create succeeded Message: Created Started: 2017-07-20T18:24:14Z Updated: 2017-07-20T18:26:17Z
You can update settings on a Data Flow service instance using the cf CLI. The cf update-service command can be given a -c flag with a JSON object containing parameters used to configure the service instance.
Note: If you upgrade a Data Flow service instance created using Spring Cloud Data Flow for VMware Tanzu v1.3.x to the version included in v1.5.0, the upgrade will delete the metrics app and Redis analytics backing service instance for the Data Flow service instance. The metrics app and Redis service are no longer used in Spring Cloud Data Flow for VMware Tanzu v1.5.0.
Begin by targeting the correct org and space.
$ cf target -o myorg -s development api endpoint: https://api.system.example.com api version: 2.75.0 user: user org: myorg space: development
You can view all service instances in the space using cf services.
$ cf services Getting services in org myorg / space development as user... OK name service plan bound apps last operation data-flow p-dataflow standard create succeeded mysql-b3e76c87-c5ae-47e4-a83c-5fabf2fc4f11 p-dataflow-mysql proxy create succeeded rabbitmq-b3e76c87-c5ae-47e4-a83c-5fabf2fc4f11 p-dataflow-rabbitmq proxy create succeeded
Run cf update-service SERVICE_NAME -c '{ "PARAMETER": "VALUE" }', where SERVICE_NAME is the name of the service instance, PARAMETER is a supported parameter (see Available Parameters), and VALUE is the value for the parameter. To upgrade a service instance to the latest version included in the tile, include the parameter upgrade with value true.
$ cf update-service data-flow -c '{"upgrade": true}'
Updating service instance data-flow as user...
OK
Update in progress. Use 'cf services' or 'cf service data-flow' to check operation status.
As the output from the cf update-service command suggests, you can use the cf services or cf service commands to check the status of the service instance. When the Data Flow service instance has been updated, the cf service command will give a status of update succeeded:
$ cf service data-flow Showing info of service data-flow in org myorg / space dev as user... name: data-flow service: p-dataflow bound apps: tags: plan: standard description: Deploys Spring Cloud Data Flow servers to orchestrate data pipelines documentation: dashboard: https://p-dataflow.apps.example.com/instances/1cf8ff5b-4a65-469d-bee7-36e6541ac241/dashboard Showing status of last operation from service data-flow... status: update succeeded message: Updated started: 2018-06-19T19:26:09Z updated: 2018-06-19T19:29:17Z
Deleting a Data Flow service instance will result in deletion of all of its dependent service instances.
Begin by targeting the correct org and space.
$ cf target -o myorg -s development api endpoint: https://api.system.example.com api version: 2.75.0 user: user org: myorg space: development
You can view all service instances in the space using cf services.
$ cf services Getting services in org myorg / space development as user... OK name service plan bound apps last operation data-flow p-dataflow standard create succeeded mysql-b3e76c87-c5ae-47e4-a83c-5fabf2fc4f11 p-dataflow-mysql proxy create succeeded rabbitmq-b3e76c87-c5ae-47e4-a83c-5fabf2fc4f11 p-dataflow-rabbitmq proxy create succeeded
Delete the Data Flow service instance using cf delete-service. When prompted, enter y to confirm the deletion.
$ cf delete-service data-flow Really delete the service data-flow?>y Deleting service data-flow in org myorg / space development as user... OK Delete in progress. Use 'cf services' or 'cf service data-flow' to check operation status.
The dependent service instances for the Data Flow server service instance are deleted first, and then the Data Flow server service instance itself is deleted.
As the output from the cf delete-service command suggests, you can use the cf services or cf service commands to check the status of the service instance. When the Data Flow service instance and its dependent service instances have been deleted, the cf services command will no longer list the service instance:
$ cf services Getting services in org myorg / space development as user... OK No services found