F5 GTM & GSLB Deep Dive: Global Traffic Management and DNS Failover

Wed, 15 Apr 2026 00:00:00 +0000

F5 GTM & GSLB Deep Dive: Global Traffic Management and DNS Failover

This article is part of the F5 BIG-IP series.

New to F5? Start with the platform overview first: F5 BIG-IP Is Not a Load Balancer — It’s an Application Delivery Platform

If you already understand the big picture and want to go deep on GTM — iQuery, Wide IPs, topology routing, TTL strategy, and multi-DC design — you’re in the right place.

The Problem GTM Solves

LTM manages application traffic within a single data center. It distributes connections across backend servers, monitors health, and provides HA within one site. But LTM has no visibility into what happens outside its data center — it cannot direct traffic between sites.

This is exactly the gap GTM fills. When an organization has two data centers, a primary DC and a DR site, or globally distributed infrastructure, the question is: how do clients know which data center to connect to, and what happens automatically when one goes offline?

Without GTM, the answer is usually manual DNS changes — slow, error-prone, and completely unsuitable for automated failover. GTM solves this at the DNS layer.

The DNS Flow: How GTM Decides

GTM acts as the authoritative DNS server for your application zones. When a client resolves webapp.company.com, the query reaches GTM instead of a standard DNS server:

1. Client: "What is the IP for webapp.company.com?"
2. Query reaches GTM (authoritative for company.com zone)
3. GTM evaluates in real time:
   → Is DC-1 LTM healthy? (via iQuery)
   → Is DC-2 LTM healthy? (via iQuery)
   → What load balancing policy applies?
   → Where is this client geographically?
4. GTM responds: "webapp.company.com = 10.10.1.100" (DC-1 VIP)
5. Client connects to 10.10.1.100 → LTM handles from here

The intelligence is in step 3. GTM is not a passive DNS server returning a static record — it makes a real-time decision based on current infrastructure health and policy every time it answers a query.

GTM Object Hierarchy

GTM uses a four-level hierarchy that mirrors the DNS structure:

Data Center  →  Server  →  Virtual Server  →  Wide IP (Pool Member)

Data Center: A logical grouping for a physical site (DC-Istanbul, DC-Frankfurt). Contains all GTM-registered servers at that location.

Server: A BIG-IP device (typically an LTM) registered with GTM. GTM communicates with it via iQuery for real-time health data.

Virtual Server: A specific LTM virtual server (VIP) on a registered server. The actual endpoint GTM can direct traffic to.

GTM Pool: A group of virtual servers across one or multiple LTMs. GTM distributes DNS responses across pool members.

Wide IP: The DNS name (webapp.company.com) that clients resolve. A Wide IP references one or more GTM pools and defines the resolution policy.

iQuery: Why GTM Actually Understands Application Health

iQuery is the proprietary F5 protocol that gives GTM genuine application-awareness. This is what distinguishes GTM from external DNS load balancers that simply ping a VIP or check TCP reachability.

Without iQuery, GTM would only know if an IP address responds to a probe. With iQuery, GTM knows:

Whether the LTM virtual server is enabled and active
Whether the pool behind that virtual server has healthy, active members
Current active connection count on the LTM
Current CPU and memory utilization

The critical implication: When a backend application fails and LTM marks its pool down, iQuery propagates this to GTM immediately — before any client has a chance to connect to a broken endpoint. GTM stops answering DNS queries with that DC’s VIP right away.

A TCP health probe from GTM to a VIP IP would never detect this — the VIP IP remains reachable even when the application behind it is completely broken. iQuery is what makes the difference.

Registering an LTM with GTM

Server: ltm-dc1
  Product:         BIG-IP (enables iQuery)
  Address:         10.10.0.1 (LTM management IP)
  Data Center:     DC-Istanbul
  Virtual Servers: [auto-discovered via iQuery]
    - vs_webapp_443  (10.10.1.100:443)
    - vs_api_443     (10.10.1.101:443)
    - vs_admin_443   (10.10.1.102:443)

Once registered, GTM automatically discovers and monitors all virtual servers on that LTM. No manual virtual server configuration needed in GTM — iQuery handles discovery.

Wide IPs and GTM Pools

A Wide IP ties a DNS name to a resolution policy:

Wide IP: webapp.company.com (type: A)
  Pool: gtm_pool_webapp_primary
    Load Balancing Method: Global Availability
    Members:
      - ltm-dc1 / vs_webapp_443  (Order: 1)
      - ltm-dc2 / vs_webapp_443  (Order: 2)
  Fallback Pool:    gtm_pool_webapp_dr
  Last Resort Pool: [return SERVFAIL if all pools exhausted]

A Wide IP can have multiple pools in priority order. If the primary pool has no available members, GTM automatically tries the next pool. This enables tiered failover architectures without manual intervention.

Load Balancing Methods

GTM load balancing operates at the DNS response level — each method determines which pool member’s IP to return for a given query.

Global Availability

Always return the first available member’s IP. Move to the next member only when the current one is unavailable:

Members (in priority order):
  1. ltm-dc1 / vs_webapp_443  ← all traffic when healthy
  2. ltm-dc2 / vs_webapp_443  ← traffic only when DC-1 fails

Best for: Primary/DR architectures. All traffic runs on the primary site; DR activates only during genuine outages. This was the standard method in the banking environment — cost-effective, simple to reason about, and predictable under failure.

Round Robin

Alternates DNS responses between pool members:

Query 1 → DC-1 VIP
Query 2 → DC-2 VIP
Query 3 → DC-1 VIP

Best for: Active-active multi-DC architectures with equal capacity at each site. Common in large-scale web applications where both sites should share load continuously.

Ratio

Weighted distribution:

ltm-dc1 / vs_webapp_443  Ratio: 7  ← 70% of DNS responses
ltm-dc2 / vs_webapp_443  Ratio: 3  ← 30% of DNS responses

Best for: Active-active when data centers have different capacities. Allows capacity-proportional load distribution.

Topology

Routes DNS responses based on the geographic location of the client’s DNS resolver IP:

Topology Records (evaluated top-to-bottom, first match wins):
  subnet 10.0.0.0/8         → DC-Istanbul   (all internal users)
  ISP: Turk Telekom         → DC-Istanbul
  region: Europe            → DC-Frankfurt
  region: Middle East       → DC-Istanbul
  default                   → DC-Frankfurt

Best for: Global applications with users in multiple regions. Reduces latency by routing users to the nearest DC. Also enables data residency compliance — ensuring EU users’ requests are processed in EU data centers.

GTM uses an IP geolocation database to map resolver IPs to regions. Keep this database updated — ISPs regularly reallocate IP ranges, and stale geolocation data sends users to wrong regions.

Least Connections

Routes to the pool member with the fewest active connections, obtained in real time via iQuery:

Current state via iQuery:
  ltm-dc1: 4,521 active connections
  ltm-dc2: 3,102 active connections
  → GTM returns DC-2 VIP for next query

Best for: Active-active architectures where dynamic load balancing based on actual connection counts is preferred over static weights.

DNS TTL Strategy: The Most Misunderstood Aspect

TTL determines how long DNS resolvers cache GTM’s response. This is widely misunderstood:

TTL does not control failover speed for clients that have already resolved the name. A client that resolved webapp.company.com 10 seconds ago continues using the cached IP until TTL expires — regardless of what GTM is currently responding with.

TTL controls how quickly new DNS lookups reflect the current infrastructure state.

TTL Trade-offs

TTL	Failover visibility	DNS query load	Use case
30 sec	Fast	High	Critical payment systems, auth services
60 sec	Good	Moderate	Most production applications
300 sec	Moderate	Low	Internal tools, monitoring
3600 sec	Slow	Very low	Static content, rarely-changing records

In the banking environment:

30 seconds for payment processing and authentication services
60 seconds for general banking applications
300 seconds for internal dashboards and monitoring tools

Calculating Minimum Effective Failover Time

Worst-case failover time ≈ Health check interval + TTL

With a 5-second iQuery check interval and 30-second TTL: worst case is ~35 seconds. Some clients shift to the new DC within seconds after failure detection; others wait for TTL expiry.

For applications with strict RTO requirements, combine low TTL with application-level retry logic to handle the transition window gracefully.

Multi-DC Architecture Patterns

Active-Standby (Primary / DR)

Normal:
  GTM → DC-1 LTM → App Servers (DC-1)
  DC-2 is hot standby — running, synchronized, no traffic

DC-1 failure:
  iQuery: DC-1 LTM virtual server marked down
  GTM: stops responding with DC-1 VIP immediately
  New DNS queries: receive DC-2 VIP
  Traffic shifts: within TTL window

Requirements:

Identical application deployment on both DCs
Database replication (synchronous for zero RPO, asynchronous for lower RPO)
GTM method: Global Availability

Normal:
  GTM → DC-1 (50% of DNS responses)
  GTM → DC-2 (50% of DNS responses)

DC-1 failure:
  GTM routes 100% to DC-2

Requirements:

Session state shared between DCs, or stateless application design
Each DC must sustain 100% traffic independently (plan for this capacity)
Database writable from both DCs simultaneously
GTM method: Round Robin, Ratio, or Least Connections

Active-active delivers better resource utilization and faster effective failover (no cold-start DR site). The architectural complexity is higher — particularly around database write conflicts and session state management.

Topology-Based Geographic Distribution

EU users    → resolver IP in Europe    → DC-Frankfurt
MENA users  → resolver IP in Middle East → DC-Istanbul
Internal    → RFC1918 source            → DC-Istanbul (nearest)
Default     → DC-Frankfurt

Each DC handles its region’s traffic under normal conditions. Fallback topology records route all regions to the surviving DC during an outage.

This pattern combines performance benefits (lower latency) with compliance requirements (data residency) and capacity optimization (each DC sized for its region).

GTM Monitors vs. iQuery: When to Use Each

iQuery (for BIG-IP LTMs):

Real-time application health data from LTM
No additional monitoring traffic
Knows application health, not just IP reachability
Automatic virtual server discovery
Use for all BIG-IP LTM endpoints

GTM-native monitors (for non-BIG-IP endpoints):

GTM sends its own health probes directly to endpoints
Supports TCP, HTTP, HTTPS, ICMP
Use for third-party load balancers, cloud endpoints, origin servers in hybrid environments

In most enterprise deployments, iQuery handles all BIG-IP-to-BIG-IP monitoring. GTM-native monitors are reserved for hybrid architectures where some endpoints are not F5 devices.

DNS Persistence in GTM

GTM does not maintain TCP state — it only influences DNS responses. However, it supports DNS persistence to ensure repeated queries from the same resolver return the same IP for a configurable period:

Wide IP Persistence:
  Enabled: Yes
  TTL:     300 seconds
  Type:    by Source IP (resolver IP)

With persistence enabled, GTM returns the same VIP to the same resolver for 300 seconds regardless of load balancing method. This reduces session disruption when clients re-resolve frequently.

Important caveat: DNS persistence is based on the resolver IP, not the end-client IP. When many end users share a single ISP recursive resolver, they all appear as the same “client” to GTM. Persistence may concentrate disproportionate traffic to one DC in these cases. Evaluate whether persistence genuinely helps before enabling it.

Key Takeaways

GTM solves multi-DC DNS failover — the problem LTM cannot address alone.
iQuery is GTM’s most important feature: real application-health awareness, not just IP reachability. When LTM marks an application pool down, GTM reacts immediately.
Global Availability is the right method for primary/DR. Round Robin or Ratio for active-active.
Topology routing reduces latency and enables data residency compliance for global deployments.
TTL controls new lookups only — not existing connections. Minimum effective failover = health check interval + TTL.
Use iQuery for BIG-IP LTMs; use GTM-native monitors for third-party endpoints.

This Series

📖 F5 BIG-IP Platform Overview — All Modules ← Start here if you’re new to F5
🔧 F5 LTM Deep Dive
🛡️ F5 WAF Deep Dive

🏗️ IT Infrastructure Is Not a Collection of Products — Systems thinking for multi-DC design
🔐 The Zero Trust Mindset — Identity-aware access across distributed infrastructure
📊 Monitoring Done Right — Monitoring GTM and LTM health proactively

GSLB on Barash Helvadzhaoglu

F5 GTM & GSLB Deep Dive: Global Traffic Management and DNS Failover

F5 GTM & GSLB Deep Dive: Global Traffic Management and DNS Failover

The Problem GTM Solves

The DNS Flow: How GTM Decides

GTM Object Hierarchy

iQuery: Why GTM Actually Understands Application Health

Registering an LTM with GTM

Wide IPs and GTM Pools

Load Balancing Methods

Global Availability

Round Robin

Ratio

Topology

Least Connections

DNS TTL Strategy: The Most Misunderstood Aspect

TTL Trade-offs

Calculating Minimum Effective Failover Time

Multi-DC Architecture Patterns

Active-Standby (Primary / DR)

Active-Active (Load Sharing)

Topology-Based Geographic Distribution

GTM Monitors vs. iQuery: When to Use Each

DNS Persistence in GTM

Key Takeaways

This Series

Related Articles