Cinder Backup Fails Under High Concurrency

The default VMware Integrated OpenStack configuration may be insufficient for Cinder backup operations with high concurrency or large volumes.

Problem

When you increase the concurrency of Cinder backup operations or the size of Cinder volumes, operations may fail and GetResourceFailure errors may be displayed in the logs.

Solution

Scale out the control plane and number of Cinder backup pods.
Each controller node can contain only one Cinder backup pod.
1. Increase the number of controller nodes in your deployment.
  See Add Controller Nodes to Your Deployment.
2. Increase the number of Cinder backup pods in your deployment.
  See Scale OpenStack Services.


                
                
                
                
ssh root@mgmt-server-ip

Update the RPC response timeout and executor thread pool size for Cinder.

Modify the Cinder configuration.


                
                
                
                
viocli update cinder

In the DEFAULT section, add the rpc_response_timeout parameter and set its value to 6000.

Add the executor_thread_pool_size parameter and set its value to 640.

The configuration file now looks similar to the following.


                
                
                
                
conf:
  backends:
    [...]
  cinder:
    DEFAULT:
      [...]
      rpc_response_timeout: 6000
      executor_thread_pool_size: 640

Update the database timeout and maximum connection parameters.

Modify the MariaDB configuration.


                
                
                
                
viocli update mariadb

In the conf section, add the connect_timeout parameter and set its value to 5.
Add the max_connections parameter and set its value to 5000.
Add the net_read_timeout parameter and set its value to 1200.
Add the net_write_timeout parameter and set its value to 1200.
In the conf section, add the ingress section.
In the ingress section, add the proxy-read-timeout parameter and set its value to 1200.
Add the proxy-send-timeout parameter and set its value to 1200.

Add the proxy-stream-timeout parameter and set its value to 3600s.

The configuration file now looks similar to the following.


                
                
                
                
conf:
  connect_timeout: 5
  max_connections: 5000
  net_read_timeout: 1200
  net_write_timeout: 1200
  ingress:
    proxy-read-timeout: "1200"
    proxy-send-timeout: "1200"
    proxy-stream-timeout: 3600s

Update the pool sizes and allocation ratios for Nova.

Modify the Nova configuration.


                
                
                
                
viocli update nova

In the nova section, add the DEFAULT section.
In the DEFAULT section, add the cpu_allocation_ratio parameter and set its value to 30.
Add the executor_thread_pool_size parameter and set its value to 640.
Add the ram_allocation_ratio parameter and set its value to 6.
In the nova section, add the database section.
In the database section, add the max_pool_size parameter and set its value to 50.

The configuration file now looks similar to the following.


                
                
                
                
conf:
  nova:
    DEFAULT:
      cpu_allocation_ratio: 30
      executor_thread_pool_size: 640
      ram_allocation_ratio: 6
    database:
      max_pool_size: 50

Update the token expiration and Web Server Gateway Interface (WSGI) parameters for Keystone.

Modify the Keystone configuration.


                
                
                
                
viocli update keystone

In the conf section, add the keystone section.
In the keystone section, add the wsgi_processes parameter and set its value to 8.
Add the wsgi_threads parameter and set its value to 15.
In the keystone section, add the token section.

In the token section, add the expiration parameter and set its value to 28800.

The configuration file now looks similar to the following.


                
                
                
                
conf:
  keystone:
    wsgi_processes: 8
    wsgi_threads: 15
      token:
        expiration: 28800

My library

Cinder Backup Fails Under High Concurrency

Problem

Solution