Here you will learn how to get started using VMware Spring Cloud® Data Flow for Kubernetes (SCDF for Kubernetes) to quickly create a data pipeline.
First, download and start the SCDF shell. For information about using the shell to connect to the SCDF for Kubernetes server, see Use the SCDF Shell.
$ java -jar spring-cloud-dataflow-shell-2.9.2.jar --dataflow.uri=http://data-flow.example.com
...
dataflow:>
Next, import the Spring Cloud Stream application starters, using the SCDF shell app import
command. You can choose from two versions of the starters, depending on your message broker: "RabbitMQ + Docker", and "Apache Kafka + Docker". For more information about the application starters, see the Register Supported Applications and Tasks section of the Spring Cloud Data Flow Reference Guide.
To use the RabbitMQ starters, run the following command:
dataflow:>app import https://dataflow.spring.io/rabbitmq-docker-latest
To use the Apache Kafka starters, run the following command:
dataflow:>app import https://dataflow.spring.io/kafka-docker-latest
After importing the application starters, you can use three of the imported applications (the http
source, the split
processor, and the log
sink) to create a stream that consumes data through an HTTP POST request, processes it by splitting it into words, and outputs the results to the log.
Create the stream using the SCDF shell stream create
command:
dataflow:>stream create --name words --definition "http | splitter --expression=payload.split(' ') | log"
Finally, deploy the stream using the SCDF shell stream deploy
command. In order to enable HTTP requests to the http
application, you must specify a deployment property that requests a LoadBalancer
for the application service:
dataflow:>stream deploy words --properties deployer.http.kubernetes.createLoadBalancer=true
To see the application pods deployed as part of the stream, run the following command in a different terminal window or tab:
$ kubectl get pods -l role=spring-app
NAME READY STATUS RESTARTS AGE
words-http-v1-7c977c4965-5q27m 1/1 Running 0 5m8s
words-log-v1-855f6ddd69-slmnn 1/1 Running 0 5m8s
words-splitter-v1-5466fdf6d4-lrz2c 1/1 Running 0 5m8s
At the SCDF shell, run the stream list
command to check the status of the stream:
dataflow:>stream list
╔═══════════╤═══════════╤═══════════════════════════════════════════════════════╤═════════════════════════════════════════╗
║Stream Name│Description│ Stream Definition │ Status ║
╠═══════════╪═══════════╪═══════════════════════════════════════════════════════╪═════════════════════════════════════════╣
║words │ │http | splitter --expression="payload.split(' ')" | log│The stream has been successfully deployed║
╚═══════════╧═══════════╧═══════════════════════════════════════════════════════╧═════════════════════════════════════════╝
To view the log output of the log
sink, run the following command in a new terminal window or tab:
kubectl logs -f deployment/words-log-v1
Next, look up the IP address assigned to the http
application Service resource:
$ kubectl get service words-http-v1
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
words-http-v1 LoadBalancer 10.47.244.113 104.197.218.242 8080:31383/TCP 4m4s
You can access the http
application via the IP address shown for EXTERNAL-IP
, using the port 8080
.
At the SCDF shell, send a POST request to the words-http-v1
application, using the assigned IP address:
dataflow:>http post --target http://104.197.218.242:8080 --data "This is a test"
In the terminal window or tab that is viewing the log output of the log
sink application, watch for the processed data to appear:
2020-06-03 19:56:59.163 INFO 1 --- [container-0-C-1] log-sink : This
2020-06-03 19:56:59.169 INFO 1 --- [container-0-C-1] log-sink : is
2020-06-03 19:56:59.171 INFO 1 --- [container-0-C-1] log-sink : a
2020-06-03 19:56:59.255 INFO 1 --- [container-0-C-1] log-sink : test
At the SCDF shell, run the stream undeploy
command to undeploy the stream:
dataflow:>stream undeploy words
Then run the stream destroy
command to destroy the stream definition:
dataflow:>stream destroy words
To learn more about creating streams, see the Getting Started with Stream Processing guide. To learn about creating tasks, see the Getting Started with Task Processing guide.