CSPs can enable the Telco Cloud Infrastructure platform for day 1 and day 2 operations after the platform is deployed in the cloud provider topology. The platform is integrated with an operations management suite that provides capabilities for health monitoring, issue isolation, security, and remediation of the Infrastructure and VNFs.

The operations management framework defines and packages a five-step approach to make day 1 and day 2 workflows operational.

  • Onboard service operations.

  • Service launch and monitoring.

  • Dynamic optimizations.

  • Issue isolation.

  • Demand planning and expansion.

The integrated operational intelligence adapts to the dynamic characteristics of the infrastructure to ensure service quality and issue resolution. Some of the key characteristics include:

  • Dynamic resource discovery: Distributed and complex topologies together with workloads in motion require dynamic resource and service discovery. The platform provides continuous visibility over service provisioning, workload migrations, auto-scaling, elastic networking, and network-sliced multitenancy that spans across VNFs, hosts, clusters, and sites.

  • SLA management: Continuous operational intelligence and alert notifications enable proactive service optimizations, capacity scale-out or scale-in, SLA violations, configuration and compliance gaps, and security vulnerabilities.

  • Remediation: Reduced MTTU and timely issue isolation for improved service reliability and availability. Prioritized alerting, recommendations, and advanced log searching enable isolation of service issues across physical and overlay networks.

  • Security and policy controls: Multivendor services operating in a shared resource pool can create security risks within the virtual environment.

    • Ability to profile and monitor traffic segments, types, and destination to recommend security rules and policies for north-south and east-west traffic.

    • Identification of security policy and configuration violations, performance impacts, and traffic routes.

  • Capacity planning and forecasting: New business models and flexible networks demand efficient capacity planning and forecasting abilities in contrast to the traditional approach of over-provisioning that is costly and unrealistic.

The framework continuously collects data from local and distributed agents, correlating, analyzing, and enabling day 2 operations. The analytical intelligence can be also queried and triggered by third-party components such as existing assurance engines, Network Management System (NMS), EMS, OSS/BSS, VNFM, and NFVO for closed-loop remediation.

Figure 1. Analytics and Monitoring Overview
Analytics and Monitoring Overview

CSPs can deploy the operations management components in the Management Pod and centralize them across the cloud topology, assuming that inter-site latency constraints are met.

  • vRealize Operations Manager collects compute, storage, and networking data providing performance and fault visibility over hosts, hypervisors, VMs, clusters, and site.

  • vRealize Log Insight captures unstructured data from the environment, providing log analysis and analytics for issue isolation. Platform component logs and events are ingested, tokenized, and mined for intelligence so that they can be searched, filtered, aggregated, and alerted.

  • vRealize Network Insight provides layer 2, 3, and 4 visibility into the virtual and physical networks and security policy gaps. The engine is integrated with the networking fabric, ingesting data that ranges in performance metrics, device and network configuration, IPFIX flow, and SNMP. It discovers gaps in network traffic optimization, micro-segmentation, compliance, security violations, traffic routing, and performance.