This section describes the procedure to write a XML based SSH (Remote Shell) custom collector.

The SSH collector can run scripts on remote hosts to collect performance data from the timeseries pipeline. In the SSH collector, users can configure one or more hosts to poll, and can easily group hosts so that you can run common scripts on them.

You can compose scripts, organized by operating system, to return the desired data, each line of which contains a timestamp, value, and related properties. You can test the scripts before deploying them, and create mappings between scripts and hosts or host groups, so that scripts can be run on the hosts.

To write a custom SSH collector, define templates for the following in the collector-package/templates directory:

  1. collector-manager

    1. conf

      1. collecting.xml.ftl

  2. kafka-connector

    1. conf

      1. kafka-connector.xml.ftl

  3. remote-shell-collector

    1. conf

      1. remote-shell-collector.xml.ftl

      2. poller-definition.xml.ftl

      3. hosts-definition.xml.ftl

      4. scripts-definition.xml.ftl

    2. scripts

      1. linux/<script 1>

      2. linux/<script 2> . . .

      3. windows/<script 1>

      4. windows/<script 2> . . .

In addition, create the following files in the collector-package directory:

  1. meta_en.properties - Metadata info for the collectors.

  2. questions.txt - Defines user configured parameters for the collectors

  3. questions_en.properties - Defines values / description for questions.txt

  4. spb.properties - Defines the dependent data collector modules along with version no.

  5. default.txt - Defines default values for user configurable parameters

  6. config.json - Defines sample input for collector for testing

  7. config_input_schema.json - Defines input JSON schema file

Collector-Manager Configuration

To create a SSH collector, follow the below collector-manager configuration example (collector-manager.xml.ftl)

[#ftl]
<?xml version="1.0" encoding="UTF-8"?>
<config xmlns="http://www.watch4net.com/APG/Collecting" 
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://www.watch4net.com/APG/Collecting collecting.xsd ">
   <connectors>
      <connector enabled="false" name="File" type="File-Connector"   config="conf/file-connector.xml" />
      <connector enabled="true" name="Kafka" type="Kafka-Connector"  config="Kafka-Connector/${module['kafka-connector'].instance}/conf/kafka-connector.xml" />
    </connectors>
    <filters>   </filters>
    <collectors>
       <collector enabled="true" name="RemoteShellCollector" next="File Kafka" config="Remote-Shell-Collector/${module['remote-shell-collector'].instance}/conf/remote-shell-collector.xml" />
    </collectors>
</config>

SSH Collector Configuration

There are four main configuration files that you must configure to use the SSH Collector, in addition to any scripts that you have to be composed. They need to be available in the directory templates/remote-shell-collector/scripts/<linux|windows>.

  1. remote-shell-collector.xml - Points to the other configuration files and the scripts directory, and contains collector-specific parameters. This is the file that is pointed to from the collecting.xml when referencing the Remote Shell Collector in the collecting pipeline.

  2. poller-definition.xml- Maps hosts and host groups defined in hosts-definition.xml to scripts listed in scripts-definition.xml, with their polling frequency. This enables you to easily assign scripts to be run on hosts, collect metrics at each polling interval and create logical polling groups

  3. hosts-definition.xml- Defines the hosts and host groups that scripts will be executed on to collect performance data

  4. scripts-definition.xml- Defines all the scripts to be run on the hosts defined in the hosts-definition.xml file. The scripts filenames are included as references, and are taken from the scripts directory defined in the remote-shell-collector.xml configuration file

remote-shell-collector.xml

<configuration
    xmlns="http://www.watch4net.com/Remote-Shell-Collector"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.watch4net.com/Remote-Shell-Collector remote-shell-collector.xsd">
    <poller-definition-file>conf/poller-definition.xml</poller-definition-file>
    <scripts-defintion-file>conf/scripts-definition.xml</scripts-defintion-file>
    <hosts-definition-file>conf/hosts-definition.xml</hosts-definition-file>
    <scripts-directory>scripts/</scripts-directory>
    <property-refresh-frequency>600</property-refresh-frequency>
    <maximum-polling-thread>10</maximum-polling-thread>
</configuration>
  • <poller-definition-file>: The path to the file where hosts are mapped to scripts to be executed on them. By default, this file is named poller-definition.xml.

  • <scripts-defintion-file>: The path to, and name of, the file which includes the names of scripts that will be executed on hosts to retrieve performance data. By default, this file is named scripts-definition.xml.

  • <hosts-defintion-file>: The path to, and name of, the file defines hosts, and groups of hosts, upon which scripts will be run to retrieve performance data. By default, this file is named hosts-definition.xml.

  • <scripts-directory>: The directory path where scripts referenced in the scripts-definition.xml are located. Scripts can be placed in subdirectories on this main reference point, to any level. By default, scripts are divided into directories by operating system.

  • <property-refresh-frequency>: The remote-shell-collector.xml caches the properties of each raw value sent to the next component. This cache is used so the backend does not have to refresh the properties of a variable when they have not changed. However, it needs to be reset occasionally so the cache does not grow too large, and does so at the specified value. The frequency is in seconds and the default is 600, or 10 minutes.

  • <maximum-polling-thread>: The maximum number of threads that can run concurrently in order to execute the scripts on the remote hosts. The default value is 10.

hosts-definition.xml

<hosts-definition
    xmlns="http://www.watch4net.com/Remote-Shell-Collector"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.watch4net.com/Remote-Shell-Collector hosts-definition.xsd">
    <hosts>
       <host id="Host-1" pollingType="ssh" os="linux" 
             hostname="10.106.125.240" port="22" username="root"  
             password="test123" />
       <host id="Host-2" pollingType="ssh" os="windows" 
             hostname="10.106.125.242" port="22" 
             username="admininstrator" password="test123" />
    </hosts>
    <host-groups>
        <host-group name="group1">
            <host alias="Host-1" />
            <host alias="Host-2" />
        </host-group>
    </host-groups>
</hosts-definition>

The first section of the hosts-definition.xml file, enclosed in <hosts> tags, lists hosts on which scripts are to be executed to collect performance data, and their connection parameters.

Hosts Section:

  • <host>: Each <host> tag encloses the definition and connection parameters for a single host, upon which common scripts are to be run.

  • id: A unique ID for the host. This ID can be used in two places:

    • in the poller-definition.xml file, which indicates that this is the individual host that is to be polled.

    • in the hosts-definition.xml file, to define the groups of hosts to be collectively polled.

  • pollingType: The type of communication method used between the server and the remote host. Typically, ssh is used for Linux hosts, and windows for Windows hosts.

  • os: The operating system of the host to be polled. This can be any name as long as the directory has the exact same name.

  • hostname: The hostname or IP address of the host to be polled.

  • port: The port on the remote host to communicate on. For example, for SSH, this is 22 by default regardless if it is a Linux or Windows host (although this can be changed by the administrator). The port attribute is optional as it will not be used in some cases, such as with the Windows pollingType.

  • username: The username for the user account you want to use to access the remote host. You can use the ondemand-script-execution tool to validate how your scripts are behaving on the remote host with a selected user account to check results.

  • password: The password corresponding to the username.

  • pemFile: The file containing a DSA or RSA private key of the user (in PEM format). When using this parameter the password is optional. It is used as thepassphrase when the PEM file is encrypted. (This parameter is only available for the Linux os.)

Groups Section:

The second section of the hosts-definition.xml file groups hosts defined in the first section.

  • <host-group>: Each <host-group> tag encloses a list of hosts, defined in the first section of the hosts-definition.xml file, upon which common scripts are to be run.

  • name: The name of the group of hosts. The value of this attribute will be what is used in the poller-definition.xml file to indicate this group.

  • <host>: Each <host> tag encloses a single host, defined in the first section of the hosts-definition.xml file, upon which common scripts are to be run.

  • alias: A reference to the ID attribute of a host defined in the first section of the hosts-definition.xml file.

scripts-definition.xml

<scripts-definition
    xmlns="http://www.watch4net.com/Remote-Shell-Collector"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.watch4net.com/Remote-Shell-Collector scripts-definition.xsd" >
    <script id="SvrPerf-CPU-Mem" file="SvrPerf-CPU-Mem" />
    <glob-script id="script1" glob="*load*" />
    <pattern-script id="script2" pattern=".*" />
</scripts-definition>

You can reference script files from the scripts-definition.xml file, which must be located in the scripts directory as defined in the remote-shell-collector.xml file.

  • <script>: Each script file referenced is enclosed in its own <script> tags.

  • id: The unique id corresponding to the script file referened. This is the value used in the poller definition.xml file to map the script to the host or host group to be polled.

  • file: The name of the script file located in the scripts directory that this <script> tag references. All subdirectories under the scripts directory defined in the remote-shell-collector.xml file are searched. Script filenames do not take file extensions. If there are two files with the same name, for example, but with different file extensions, the first once that is come across is used and a warning is issued.

  • forced-script: Forces a script to be used. The script path needs to specify the file extension. This can be useful for using a single script for all the hosts, even if they don’t all have the same OS.

  • <glob-script>: Each glob-script finds scripts that match the glob.

    • id - This is the same as above.

    • glob - The glob to use to find scripts.

  • <pattern-script>: Each pattern-script finds scripts that match the regex.

    • id - This is the same as above.

    • pattern - The regex pattern to use to find scripts.

poller-definition.xml

<poller-definition name="main-poller"
    xmlns="http://www.watch4net.com/Remote-Shell-Collector"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.watch4net.com/Remote-Shell-Collector poller-definition.xsd">
    <polling-groups>
        <polling-group name="polling-group1" pollingPeriod="60">
            <scripts>
                <script alias="script1" />
                <script alias="script2" />
                <script alias="script3" />
            </scripts>
            <polled-hosts>
                <host alias="acme-server1" />
                <host group-alias="group1" />
            </polled-hosts>
        </polling-group>
        <polling-group name="polling-group2" pollingPeriod="120">
            <scripts>
                <script alias="script4" />
                <script alias="script5" />
                <script alias="script1" />
            </scripts>
            <polled-hosts>
                <host alias="acme-server2" />
                <host group-alias="group2" />
            </polled-hosts>
        </polling-group>
    </polling-groups>
</poller-definition>

The tags in the poller-definition.xml file map scripts defined in the scripts-definition.xml file, to hosts and host groups defined in the hosts-definition.xml, indicating which scripts should be executed, and how often, on which hosts, to collect performance data.

  • <polling-group>: The <polling-group> tag defines scripts that will run on the hosts and groups of hosts indicated.

  • name: The unique name of the polling group.

  • pollingPeriod: The interval in seconds when the indicated scripts are executed on the indicated hosts or group of hosts.

  • <scripts>: The <scripts> tag wraps multiple instances of the <script> tag, each of which references a script defined in the scripts-definition.xml file.

  • <script>: Each <script> tag references a single script defined in the scripts-definition.xml file. Each script will be executed on all host and host groups defined in the polling-group.

  • alias: The alias attribute references a single script defined in the scripts-definition.xml file through its ID indicated there.

Tips to Write Scripts

In general, there are two types of data that can be returned and parsed by SSH Collector:

  • Raw Value data

  • CSV data

This is an example of data returned from a script that can be parsed by SSH collector using the RAWVALUE dataformat:

LOG: Success Polling

VERSION: 1.0

DATAVERSION: 1.0

DATATYPE: TIMESERIES

DATAFORMAT: RAWVALUE

1692856149 GROUP-TEST DEFAULT 1Min 0.01 unit=load source=DEFAULT name=1MinLoad

1692856149 GROUP-TEST DEFAULT 5Min 0.16 unit=load source=DEFAULT name=5MinLoad

1692856149 GROUP-TEST DEFAULT 15Min 0.24 unit=load source=DEFAULT name=15MinLoad

Header:

You can include optional header information in scripts, for example by formatting it at the beginning of the file:

echo ’DATAFORMAT: CSV’

Not all of this information is currently used, but it is a good idea to include. The only one of these header information elements with a default value is DATAFORMAT, which will be set to RAWVALUE is not explicitly indicated.

Metrics Format:

Metrics can be returned in one of two formats, raw value, or CSV.

In the case of raw values, they must be tab delimited like in the following example:

1224251948 GROUP-TEST DEFAULT 15Min 0.24 unit=load source=DEFAULT name=15MinLoad

For quick reference, the components of the raw value format in reference to the example above are Unix timestamp (in seconds): 1224251948, aggregation group: GROUP-TEST, variable name: DEFAULT 15Min, variable value: DEFAULT 0.24 followed by an unlimited number of property names and values: unit=load, source=DEFAULT, name=15MinLoad, etc.

In the case of CSV values, you can set the delimeter, like in the following case using the same example values as above:

1692856149;GROUP-TEST;DEFAULT 5Min;0.24;unit=load;source=DEFAULT;name=15MinLoad