The Domain Name System (DNS) is crucially important to both workloads and the users of those workloads.
DNS provides an abstraction layer so that IP addresses don’t need to be tracked and used directly, and more human-friendly names can be used. TLS certificates are also correlated with DNS, helping establish trust between systems within virtual infrastructures. As such, most organizations depend heavily on DNS, and require that their DNS servers be made highly available. Your organization’s intentions for an SDDC will determine the methods you use to ensure DNS availability.
VMware Cloud SDDCs provision resolvers and DNS records during deployment for use by the infrastructure. This ensures that the infrastructure is always reachable internally through the DNS names.
DNS is a powerful system, and the hierarchy in it can be used to help track other information, such as where a workload is running, which makes IT operations easier. However, as workloads move between clouds other systems (like a Configuration Management Database) may need to be kept up to date, too. Location information in DNS can also leak information to attackers about where your organization’s facilities are. Security is always a tradeoff; ensure your organization is comfortable with the risks.
Ideas to consider:
DNS records help determine how network traffic flows. If a domain name resolves to an IP address internal to your organization then traffic will flow on internal links and VPN connections between sites. If a domain name resolves to a public IP address then traffic will flow across the public Internet.
Latency of DNS resolution has a profound impact on the performance of workloads and services, from response time to overall system load. Keeping DNS resolution as close to a workload as possible ensures the best performance as well as site resiliency if network links become unavailable.
Do the DNS servers you use supply authoritative name services for customer-facing and external services? If those systems are unavailable will your customers be unable to reach you? Many cloud providers have DNS services that can be used across local clouds and global public cloud regions. This can add resilience while simplifying management & workload migrations with hybrid deployments.
“Split-brain” DNS methods, where one view of an environment is available to some clients, and another view is available to others, can be a useful tool for organizations. It also can be very confusing and lead to errors if there are multiple sources of authoritative information. The phrase “security through obscurity” was coined many years ago to describe the act of hiding things to secure them. This is not a legitimate approach to security.
The method in which your organization implements DNS will determine how it can be made available in a hybrid or disaster recovery scenarios. Some DNS server software is fine with being cloned & replicated. Other software is not. One example of that is Microsoft Active Directory, where the best practices for deployments state that systems supporting an Active Directory should not be cloned, but instead installed as fresh deployments.
How will your workloads reach the DNS server? Does your organization employ “service IPs” for DNS that are highly available, moving between DNS servers, or an anycast routing scheme to direct traffic to the nearest DNS resolver?
How will workloads that move between an on-premises deployment and a VMware Cloud SDDC find their DNS resolvers? Will moving a workload require reconfiguring the network settings? How will reconfiguring many workloads during an incident affect your RTO?
Do you have your authoritative DNS servers and DNS recursive resolvers separated, according to best practices for DNS operations? If so, you may need to employ different availability and security methods for each type of server. In general, authoritative DNS servers should not answer recursive DNS requests from clients, and recursive DNS resolvers should not be accessible publicly.
Do your systems permit access based on domain names? For example, many Linux systems are configured with hosts.allow and hosts.deny files that can contain either IP address ranges or domain names that are permitted. Similar configurations can be achieved with guest OS firewall rules, too. During an outage where DNS is potentially affected will authorized administrators be able to connect to systems to repair them?