This topic describes issues and solutions related to using VMware Tanzu GemFire for Tanzu Application Service.

Acquire Artifacts for Troubleshooting

Gather Cluster Logs

Cluster statistics and log files may be obtained by using the gfsh export logs command. See Export gfsh Logs for details.

View Statistics Files and Logs

You can visualize the performance of your cluster by downloading the statistics files from your servers. These files are located in the persistent store on each VM. To copy these files to your workstation, run the command:

bosh2 -e BOSH-ENVIRONMENT -d DEPLOYMENT-NAME scp server/0:/var/vcap/store/gemfire-server/statistics.gfs /tmp

Alternatively, use bosh ssh to access GemFire for Tanzu Application Service service instance server VMs and directly obtain the Tanzu GemFire logs that reside within the directory /var/vcap/sys/log/gemfire-server.

See Visual Statistics Display (VSD) for information about loading the statistics files into VSD.

Acquire Thread Dumps

Thread dumps may be useful for debugging. Take at least three thread dumps on each VM, separating them by about one second.

  1. To list your VMs, run:

    bosh -e ENV -d DEPLOYMENT vms
    

    Acquire the Deployment Name instructs how to acquire the string to substitute for DEPLOYMENT.

  2. Use the VM in a bosh ssh command to ssh in to the GemFire for Tanzu Application Service VM where you want to produce the thread dumps. GemFire for Tanzu Application Service VMs can be referenced using a 0-based index, for example server/0, or locator/2:

    bosh -e ENV -d DEPLOYMENT ssh server/0
    
  3. Get into the Bosh Process Manager (bpm) shell by running

    sudo /var/vcap/packages/bpm/bin/bpm shell JOB-NAME
    

    where JOB-NAME is either gemfire-server or gemfire-locator, depending on which GemFire for Tanzu Application Service VM you are on.

  4. Find the process ID (PID) that is running the Tanzu GemFire Java process by running

    ps -aux | grep java
    

    Typically, the PID is 1.

  5. As you take multiple thread dumps, redirect the output of each to a uniquely named file. This example uses the file name threaddump1.txt:

    /var/vcap/packages/jdk8/bin/jcmd 1 Thread.print > /tmp/threaddump1.txt
    

    Files in /tmp will be accessible on the VM in directory /var/vcap/data/gemfire-server/tmp or /var/vcap/data/gemfire-locator/tmp.

  6. Move the files to the /tmp directory on the VM by running

    mv /var/vcap/data/gemfire-server/tmp/threaddump1.txt /tmp/
    

    Or:

    mv /var/vcap/data/gemfire-locator/tmp/threaddump1.txt /tmp/
    
  7. Files can be copied to your local machine using bosh scp command. From your local machine, run:

    bosh -d DEPLOYMENT scp VM:/tmp/threaddump1.txt .
    

    For example:

    $ bosh -d service-instance_1fd2850e-b754-4c5e-aa5c-ddb54ee301e6 scp server/0:/tmp/threaddump1.txt .
    

Acquire the Deployment Name

The DEPLOYMENT name is needed in several troubleshooting procedures. To acquire the DEPLOYMENT name:

  1. Use the Cloud Foundry CLI. Target the space where the service instance runs.

  2. Discover the globally unique identifier (GUID) for the service instance:

    cf service INSTANCE-NAME --guid
    

    The output is the GUID. For example:

    $ cf service dev-instance --guid
    1fd2850e-b754-4c5e-aa5c-ddb54ee301e6
    
  3. Prefix the GUID with the string service-instance_ to obtain the DEPLOYMENT name. For the example GUID, the DEPLOYMENT name is service-instance_1fd2850e-b754-4c5e-aa5c-ddb54ee301e6.

Troubleshooting for Operators

Smoke Test Failures

  • Error message: “Creating p-cloudcache SERVICE-NAME failed”

    Cause of the Problem: The smoke tests could not create an instance of Tanzu GemFire.

    Action: To troubleshoot why the deployment failed, use the CF CLI to create a new service instance using the same plan and download the logs of the service deployment from BOSH.

  • Error message: “Deleting SERVICE-NAME failed”

    Cause of the Problem: The smoke test attempted to clean up a service instance it created and failed to delete the service using the cf delete-service command.

    Action: Run BOSH logs to view the logs on the broker or the service instance to see why the deletion may have failed.

  • Error message: “Cannot connect to the cluster SERVICE-NAME”

    Cause of the Problem: The smoke test was unable to connect to the cluster.

    Action: Review the logs of your load balancer, and review the logs of your CF Router to ensure the route to your GemFire for Tanzu Application Service cluster is properly registered.

    You also can create a service instance and try to connect to it using the gfsh CLI. This requires creating a service key.

  • Error message: “Could not perform create/put on locator LOCATOR-ADDRESS”

    Cause of the Problem: LOCATOR-ADDRESS is the BOSH DNS name of the locator. The smoke test was unable to write data to the cluster. The user may not have permissions to create a region or write data.

General Connectivity

  • Problem: Client-to-server communication

    Cause of the Problem: GemFire for Tanzu Application Service clients communicate to GemFire for Tanzu Application Service servers on port 40404 and with locators on port 55221. Both of these ports must be reachable from the VMware Tanzu Application Service for VMs network to service the network.

  • Problem: Membership port range

    Cause of the Problem: GemFire for Tanzu Application Service servers and locators communicate with each other using UDP and TCP. The current port range for this communication is 49152-65535.

    Solution: If you have a firewall between VMs, ensure this port range is open.

  • Problem: Port range usage across a WAN

    Cause of the Problem: Gateway receivers and gateway senders communicate across WAN-separated service instances. Each GemFire for Tanzu Application Service service instance uses Tanzu GemFire defaults for the gateway receiver ports. The default is the inclusive range of port numbers 5000 to 5499.

    Solution: Ensure this port range is open when WAN-separated service instances will communicate.

Troubleshooting for Developers

  • Problem: An error occurs when creating a service instance or when running a smoke test. The service creation issues an error message that starts with

    Instance provisioning failed: There was a problem completing your request.
    

    Tanzu GemFire server logs at /var/vcap/sys/log/gemfire-server/gemfire/server-<N>.log will contain a disk-access error with the string

    A DiskAccessException has occurred
    

    And a stack trace similar to this one that begins with:

    org.apache.geode.cache.persistence.ConflictingPersistentDataException
      at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:743)
      at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:819)
      at org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:52)
      at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1178)
      at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1059)
      at org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3089)
    

    Cause of the Problem: The GemFire for Tanzu Application Service VMs are under-provisioned; the quantity of disk space is too low.

    Solution: Use Ops Manager to provision VMs of at least the minimum size. See Configure Service Plans for minimum-size details.

check-circle-line exclamation-circle-line close-line
Scroll to top icon