The Management Pack for IBM DB2 creates alerts (and in some cases provides recommended actions) based on various symptoms it detects in your IBM DB2 Environment. See the table below for the list of alerts available in the Management Pack.

Alerts List

Name Description Symptom Recommendation
Database Operational State The state of the database can restrict activity or tasks that can be performed. The state can be one of the following: ACTIVE, QUIESCE PENDING, QUIESCED, or ROLLFORWARD. A change from Active to another state may generate an Attention alert. Database Operational State Refer to the DB2 Information Center for information on these states.
Shared Sort Memory Utilization A sample event has been detected Shared Sort Memory Utilization Sorting is considered healthy if there is sufficient heap space in which to perform sorting and sorts do not overflow unnecessarily. This indicator tracks the utilization of the shared sort memory. The SHEAPTHRES_SHR database configuration parameter is a soft limit. If the allocation is close to the limit, an alert may be generated. The indicator is calculated using the formula: (db.sort_shrheap_allocated / SHEAPTHRES_SHR) * 100. The Maximum Shared Sort Memory Used snapshot monitor element maintains a shared sort memory high-water mark for the database. The value of this indicator, shown in the Additional Information, indicates the maximum amount of shared sort memory that has been in use at any one point in time since the database has been active. This value can be used to help determine an appropriate value for the shared sort memory threshold. This health indicator's threshold checking is not enabled by default in new installations where self tuning memory is expected to be used.
Percentage of Sorts That Overflowed A sample event has been detected Percentage of Sorts That Overflowed Sorting is considered healthy if there is sufficient heap space in which to perform sorting and sorts do not overflow unnecessarily. Sorts that overflow to disk can cause significant performance degradation. Some level of sort overflows however, may be unavoidable in such cases where the size of the sort exceeds physical memory or the maximum available memory for the database. If a high percentage of sorts spill, an alert may be generated. The indicator is calculated using the formula: (db.sort_overflows / db.total_sorts)*100. The system monitor data element db.sort_overflows is the total number of sorts that ran out of sort heap and may have required disk space for temporary storage. The data element db.total_sorts is the total number of sorts that have been executed. This health indicator's threshold checking is not enabled by default in new installations where self tuning memory is expected to be used.
Long Term Shared Sort Memory Utilization A sample event has been detected Long Term Shared Sort Memory Utilization Sorting is considered healthy if there is sufficient heap space in which to perform sorting and sorts do not overflow unnecessarily. This indicator tracks an over-configured shared sort heap, whose memory could be better used somewhere else in DB2. An alert may be generated when the percentage usage is low. The indicator is calculated using the formula: (db.max_shr_sort_mem / SHEAPTHRES_SHR)*100. This health indicator's threshold checking is not enabled by default in new installations where self tuning memory is expected to be used.
Log Utilization A sample event has been detected Log Utilization This indicator tracks the total amount of active log space used in bytes in the database. Log utilization is measured as the percentage of space consumed, where a high percentage may generate an alert. The indicator is calculated using the formula: (db.total_log_used / (db.total_log_used + db.total_log_available))*100. The values for the log-related database configuration parameters, shown in the additional information, display the current allocations for logs. The additional information also includes the application id for the application which has the oldest active transaction. This application can be forced to free up log space.
Log Filesystem Utilization A sample event has been detected Log Filesystem Utilization Log Filesystem Utilization tracks the fullness of the filesystem on which the transaction logs reside. DB2 may not be able to create a new log file if there is no room on the filesystem. Log utilization is measured as the percentage of space consumed. If the amount of free space in the filesystem is minimal (i.e. high percentage for utilization), an alert may be generated. The indicator is calculated using the formula: (fs.log_fs_used / fs.log_fs_total)*100 where fs is the filesystem on which the log resides. The values for the log-related database configuration parameters, shown in the additional information, display the current allocations for logs. The additional details also shows if log archiving is enabled. If Block on Log Disk Full, shown in the additional details, is set to yes and utilization is at 100%, you should resolve any alerts as soon as possible to limit the impact to applications which cannot commit transactions until the log file is successfully created.
Deadlock Rate A sample event has been detected Deadlock Rate Deadlock rate tracks the rate at which deadlocks are occurring in the database and the degree to which applications are experiencing contention problems. Deadlocks may be caused by the following situations: o Lock escalations are occurring for the database o An application may be locking tables explicitly when system- generated row locks may be sufficient o An application may be using an inappropriate isolation level when binding o Catalog tables are locked for repeatable read o Applications are getting the same locks in different orders, resulting in deadlock. The indicator is calculated using the formula: (db.deadlocks<t> - db.deadlocks<t-1>) where 't' is the current snapshot and 't-1' is the last snapshot "5" "minutes" ago. A high rate of deadlocks results in a greater degree of contention which may generate an alert.
Lock List Utilization A sample event has been detected Lock List Utilization This indicator tracks the amount of lock list memory that is being used. There is one lock list per database and it contains the locks held by all applications concurrently connected to the database. There is a set limit on lock list memory. Once the limit is reached, performance degrades because of the following situations: o Lock escalation converts row locks to table locks, thereby reducing concurrency on shared objects in the database. o More deadlocks between applications can occur since applications are waiting for a limited number of table locks. As a result, transactions are rolled back. An error is returned to the application when the maximum number of lock requests has reached the limit set for the database. The indicator is calculated using the formula: (db.lock_list_in_use / (LOCKLIST * 4096)) * 100. Utilization is measured as a percentage of memory consumed, where a high percentage represents an unhealthy condition. This health indicator's threshold checking is not enabled by default in new installations where the self tuning memory feature is expected to be used.
Lock Escalation Rate A sample event has been detected Lock Escalation Rate This indicator tracks the rate at which locks have been escalated from row locks to a table lock thereby impacting transaction concurrency. A lock is escalated when the total number of locks held by an application reaches the maximum amount of lock list space available to the application, or the lock list space consumed by all applications is approaching the total lock list space. The amount of lock list space available is determined by the MAXLOCKS and LOCKLIST database configuration parameters. When an application reaches the maximum number of locks allowed and there are no more locks to escalate, the application uses the space in the lock list allocated for other applications. There is one lock list per database and it contains the locks held by all applications concurrently connected to the database. When the entire lock list is full an error occurs. The indicator is calculated using the formula: (db.lock_escals<t> - db.lock_escals<t-1>) where 't' is the current snapshot and 't-1' is the last snapshot "5" "minutes" ago. A high rate of deadlocks results in a greater degree of contention which may generate an alert. This health indicator's threshold checking is not enabled by default in new installations where the self tuning memory feature is expected to be used.
Percentage of Applications Waiting on Locks A sample event has been detected Percentage of Applications Waiting on Locks This indicator measures the percentage of all currently executing applications that are waiting on locks. A high percentage can indicate that applications are experiencing concurrency problems which can negatively affect performance. The indicator is calculated using the formula: (db.locks_waiting / db.appls_cur_cons) *100). This health indicator's threshold checking is not enabled by default in new installations where the self tuning memory feature is expected to be used.
Package Cache Hit Ratio A sample event has been detected Package Cache Hit Ratio The hit ratio is a percentage indicating how well the package cache is helping to avoid reloading packages and sections for static SQL from the system catalogs as well as helping to avoid recompiling dynamic SQL statements. A high ratio indicates it is successful in avoiding these activities. The indicator is calculated using the formula: (1-(db.pkg_cache_inserts/db.pkg_cache_lookups))*100. This health indicator's threshold checking is not enabled by default in new installations where the self tuning memory feature is expected to be used.
Catalog Cache Hit Ratio A sample event has been detected Catalog Cache Hit Ratio The hit ratio is a percentage indicating how well the catalog cache is helping to avoid actual accesses to the catalog on disk. A high ratio indicates it is successful in avoiding actual disk I/O accesses. The indicator is calculated using the formula: (1-(db.cat_cache_inserts/db.cat_cache_lookups))*100.
Shared Workspace Hit Ratio A sample event has been detected Shared Workspace Hit Ratio he hit ratio is a percentage indicating how well the shared SQL workspace is helping to avoid having to initialize sections for SQL statements that are about to be executed. A high ratio indicates it is successful in avoiding this action. The indicator is calculated using the formula: (1-(db.shr_workspace_section_inserts/db.shr_workspace_section_lookups))*100.
Database Heap Utilization A sample event has been detected Database Heap Utilization This indicator tracks the consumption of the monitor heap memory, based on the memory pool with the ID SQLM_HEAP_DATABASE. The utilization is calculated using the formula (db.pool_cur_size / db.pool_max_size) * 100 for the Memory Pool Identifier SQLM_HEAP_DATABASE. Once this percentage reaches the maximum, 100%, queries and operations may fail because there is no heap available.
Reorganization Required A sample event has been detected Reorganization Required This health indicator tracks the need to reorganize tables or indexes within a database. Tables or all indexes defined on a table require reorganization to eliminate fragmented data. The reorganization is accomplished by compacting the information and reconstructing the rows or index data. This may result in a performance improvement and space in the table or indexes being freed. The tables considered by this health indicator can be limited using an SQL query. The scope in the additional information displays the subselect clause on system tables for this query. Refer to the DB2 Information Center for how to update this query. An attention alert might be generated to indicate that reorganization is required. Reorganization can be automated by setting the AUTO_REORG database configuration parameter to ON. If automatic reorganization is enabled, the attention alert indicates that one or more automatic reorganizations could not complete successfully. Refer to the collection details of this health indicator for the list of objects that need attention.
HADR Operational status A sample event has been detected HADR Operational status This health indicator tracks the DB2 High Availability Disaster Recovery (HADR) operational state of the database. The state between primary and standby servers can be one of the following: CONNECTED, CONGESTED or DISCONNECTED. A change from CONNECTED to another state might generate an Attention alert. Refer to the DB2 Information Center for information on these states.
HADR Log Delay A sample event has been detected HADR Log Delay This health indicator tracks the current average delay (in minutes) between the data changes on the primary database and the replication of those changes on the standby database. The large delay value represents a possibility of a data loss in case of failover when a failure on primary occurs. Moreover, the large delay value may also mean longer downtime when takeover is required, since the standby is not yet a failover candidate, as the current primary is ahead of the standby.
Database Backup Required A sample event has been detected Database Backup Required This health indicator tracks the need for a backup on the database. Backups should be taken regularly as part of a recovery strategy to protect your data against the possibility of loss in the event of a hardware or software failure. This health indicator determines when a database backup is required based on the time elapsed and amount of data changed since the last backup. Refer to the DB2 Information Center for information about how to update these settings. An attention alert might be generated to indicate that a database backup is required. Database backups can be automated by setting the AUTO_DB_BACKUP database configuration parameter to ON. If automatic database backups are enabled
Nickname Status A sample event has been detected Nickname Status This health indicator checks all of the nicknames defined in a federated database to determine if there are any invalid nicknames. A nickname may be invalid if the data source object was dropped or changed or if the user mapping is incorrect. An attention alert might be generated if any nicknames defined in the federated database are invalid. Refer to the collection details of this health indicator for the list of objects that need attention.
Data Source Server Status A sample event has been detected Data Source Server Status This health indicator checks all of the data source servers defined in a federated database to determine if any are unavailable. A data source server may be unavailable if the data source server was stopped, no longer exists, or was incorrectly configured. An attention alert might be generated if any data source servers defined in the federated database are invalid. Refer to the collection details of this health indicator for the list of objects that need attention.
Statistics Collection Required A sample event has been detected Statistics Collection Required This health indicator tracks the need to collect statistics for tables and their indexes within a database. Tables and all indexes defined on a table require statistics to improve query execution time. The tables considered by this health indicator can be limited using an SQL query. The scope in the additional information displays the subselect clause on system tables for this query. Refer to the DB2 Information Center on how to update this query. An attention alert may be generated to indicate that statistics collection is required. Statistics can be automatically collected by setting the AUTO_RUNSTATS database configuration parameter to ON. If automatic statistics collection is enabled, the attention alert indicates that one or more automatic statistics collection actions could not complete successfully. Refer to the collection details of this health indicator for the list of objects that need attention.
Database Automatic Storage Utilization A sample event has been detected Database Automatic Storage Utilization This health indicator tracks the consumption of storage for the defined database storage paths. When automatic storage table spaces are created, containers will be allocated automatically for these table spaces on the database storage paths. If there is no more space on any of the filesystems on which the database storage paths are defined, automatic storage table spaces will be unable to increase in size and may become full. This indicator is calculated using the formula: (db.auto_storage_used / db.auto_storage_total) * 100 where db.auto_storage_used and db.auto_storage_total are the sum of used and total space respectively across all physical filesystems identified in the list of database storage paths. Database automatic storage path utilization is measured as a percentage of the space consumed on the database storage path filesystems, where a high percentage indicates less than optimal function for this indicator. The calculation of time remaining to fullness in the additional information is a prediction of how much time is remaining until all free space is consumed.
Table Space Operational State A sample event has been detected Table Space Operational State The state of a table space can restrict activity or tasks that can be performed. A change from normal to another state might generate an Attention alert. For a complete list of possible table space states, refer to the DB2 Information Center.
Table Space Utilization A sample event has been detected Table Space Utilization This health indicator tracks the consumption of storage for each DMS table space. The DMS table space is considered full when all containers are full. If automatic resize is enabled on the table space this alert condition may correct itself when the table space resizes. The indicator is calculated using the formula: (ts.used / ts.useable) * 100 where ts.used and ts.useable are the system monitor data elements Used Pages in Table Space and Useable Pages in Table Space, respectively. Table space utilization is measured as the percentage of space consumed, where a high percentage indicates less than optimal function for this indicator. The short term and long term growth rates, included in the additional information for this indicator, can be used to determine if current rate of growth is a short term aberration or consistent with longer term growth. The calculation of time remaining to fullness in the additional information is a prediction of how much time is remaining until all free space is consumed.
Table Space Automatic Resize Status A sample event has been detected Table Space Automatic Resize Status This health indicator identifies whether or not tablespace resize operations are succeeding for DMS tablespaces which have automatic resize enabled. When a DMS tablespace with automatic resize enabled fails to increase in size it is effectively full. This condition may be due to lack of free space on the filesystems on which the tablespace containers are defined, or a result of the tablespace automatic resize settings. For example, the defined maximum size may have been reached, or the increase amount may be set too high to be accommodated by the remaining free space. Refer to the additional information for details about the tablespace and the automatic resize settings.
Automatic Resize Table Space Utilization A sample event has been detected Automatic Resize Table Space Utilization This health indicator tracks the consumption of table space storage for each automatic resize DMS table space on which a maximum size has been defined. The DMS table space is considered full when the maximum size has been reached. The indicator is calculated using the formula: ((ts.used * ts.page_size) / ts.max_size) * 100 where ts.used, ts.page_size and ts.max_size are the system monitor data elements Used Pages in Table Space, Table Space Page Size and Maximum Table Space Size, respectively. Automatic resize DMS Table Space utilization is measured as a percentage of the maximum table space storage consumed. A high percentage indicates the table space is approaching fullness. The short term and long term growth rates, included in the additional information for this indicator, can be used to determine if current rate of growth is a shot term aberration or consistent with long term growth. The calculation of time remaining to fullness in the additional information is a prediction of how much time there is remaining until the maximum size has been reached.