Configuration note for ip multicast multipath last updated 03/04/06 History: 29/11/00 Initial version 10/09/03 Revisited, added IPv6, polarization, (*,G) info, ... 10/15/03 Corrected notes about static mroutes, added note on sort order 04/05/05 Detailed BGP section, added Quick checklist 03/04/06 Added Hash function source code example ----------------------------------------------------------------------------- Content I. Quick checklist 1. Load Splitting IPv4 multicast traffic across equal-cost paths 1.1 Default behavior - no load splitting 1.2 Load splitting with "ip multicast multipath" o Load splitting vs. Load Balancing o No per-group load splitting for (S,G) o Possibility to engineer load-splitting o Load splitting and BGP o Load splitting with static mroutes o (Non-)consideration of PIM neighbor Query/Hellos for RPF path select. o Convergence o No load splitting for Asserts and DF election o Assert can invalidate load-splitting even in PIM-SM/SSM o Polarization problem 2. Load Splitting IPv6 multicast traffic across equal-cost paths o New hash mechanism to avoid polarization and improve route stability 3. Load splitting across tunnels or L2 link bundles 3.1 Load splitting across tunnels. 3.2 Load splitting IP multicast traffic across bundle interfaces Appendix A. IPv4 Hash calculation example C-code I. Quick checklist If you are trying to load split multicast traffic across multiple paths, and fail to do so, here is a quick checklist on what you might have done wrong. All the quick checklist points listed here are explained in more detail below: - Enable "ip multicast multipath" on the router that is supposed to be the _receiver_ for traffic from more than one incoming interfaces. This is opposite to unicast: in Unicast, multicast is active on the _sending_ router connecting to more than one outgoing interfaces! - Traffic is only load split between different sources, not different groups today. Make sure you have a larger number of source, load-splitting is statistical based on source address: If you only have two sources, they may end up using the same link! - Use PIM-SSM or PIM-SM with shortest path tree forwarding. In PIM-SM check that you do have (S,G) state with forwarding (T-bit) set. If you need to use shared tree forwarding (Bidir-PIM or PIM-SM RPT forwarding, read more details below). - Make sure that you do actually have multiple paths for the source addresses in question: Use "show ip route " to validate. If you do not see multiple paths in its output, then ip multicast multipath will not work. Use "show ip rpf " for all different you intend to use to validate the effectiveness of multipath. - Some routing protocols, most notably BGP do not install multiple equal cost paths by default. Use "maximum-paths " to configure multipath (for example in BGP). 1. Load Splitting IPv4 multicast traffic across equal-cost paths Load splitting for IPv4 multicast traffic across equal cost path is disabled by default, it can be enabled via the global configuration command "ip multicast multipath". This command was introduced with 12.0(7), 12.0(5)S, 12.0(8)T. All documentation in the following section applies to all Cisco IOS versions since then - unless otherwise explicitly noted. Related commands: show ip rpf 1.1 Default behavior - no load splitting If two or more equal-cost routes are available, by default IPv4 multicast traffic for PIM-SM, PIM-SSM and PIM-DM groups will RPF towards the PIM neighbor with the highest IP address. This is in accordance with RFC2362. Example 1: Sources | | (S1,G1) ----+ +--+s0 s0+--+ +--- Rcvr (S1,G2) | e0| +--------+ |e0 | +----+R1|s1 s1|R2+------+ | | +--------+ | | (S2,G1) ----+ +--+ +--+ (S2,G2) | On the left side of the picture, two sources, S1 and S2 send traffic to IPv4 multicast groups G1 and G2 which are either PIM-SM, PIM-SSM or PIM-DM. If PIM-SM is used, then we assume that the default "ip pim spt-threshold 0" is being used on R2, eg: that (S,G) state is being established. Some IGP is run, and "show ip route S1" or "show ip route S2" will show R1 via s0 and R1 via s1 as equal cost next-hop PIM neighbors. Without further configuration, IPv4 multicast traffic in this setup will always flow across the one serial interfae (s0 or s1) on which R1 has the higher IP addrss. Eg: assume the IP address of R1 on s0 is 10.1.1.1 and on s1 it is 10.1.2.1 - in this case R2 will always send its PIM join messages towards (in the case of PIM-SM/SSM) 10.1.2.1 and thus receive the IPv4 multicast traffic on s1 - for all sources and groups shown. In the case of PIM-DM, the same happens, only that in this case no PIM join messages are used, but instead R2 will prune the traffic across s0, because it chooses to only receive it via s1 because R1 has the higher IP address on R1. This "highest PIM neighbor" mechanism is not dependent on this specific topology (eg: "only works with N parallel paths between two routers"), but it works in any topology whenever a downstream router sees equal cost paths toward a source or RP: RPF election is a purely local decision of this router. The result shown would for example be the same if s0 and s1 on R would connect via two different routers towards the sources, as long as unicast routing does have equal cost routes towards these sources. The default IPv4 RPF selection method of "highest IP address PIM neighbor" applies both to (S,G) state as well as PIM-SM (*,G) state. For PIM-SM (*,G) state, the router looks for the highest IP address PIM neighbor towards the RP for the group G. 1.2 Load splitting with "ip multicast multipath" To enable load-splitting of IPv4 multicast traffic between equal-cost paths configure the following command in global configuration mode: Command purpose ------------------------------------------------------------------------- ip multicast multipath Enable ip multicast load-splitting over equal-cost paths. With "ip multicast multipath", the RPF interface for each (*,G) or (S,G) state will be selected amongst the available equal cost paths depending on the RPF address to which the state resolves. For an (S,G) state, this is the address S of the source, for a (*,G) state this is the address of the RP associated with the group G of the state. In result, multicast traffic for different states can be received across more than just one equal-cost interfaces. The method applied by Cisco IOS IPv4 multicast in this case is quite similar in principle to the default per-flow load splitting in IPv4 CEF or the load splitting used with (Fast/Gig)-Etherchannels (although - see the discussion on polarization below). It is hash-based as follows: With "ip multicast multipath" enabled, Cisco IOS IPv4 multicast does determine the RPF neighbor for a particular (*,G) or (S,G) state as follows: Assume the RPF lookup has resulted in a list I = 0..(N-1) of N equal cost next-hops, in the example 1 above, there are 2 equal cost next-hops for R2: I = 0: interface serial 0, neighbor 10.1.1.1 I = 1: interface serial 1, neighbor 10.1.2.1 In addition, the hash function used is: hash(A.B.C.D) = A XOR B XOR C XOR D XOR 0xFF When R2 determines for a particular (S,G) state which of these two paths to take, it calculates I = (hash(S) modulo N) which will return a result I = 0..(N-1), and indicates which path to use for this (S,G) state. Likewise, if the router needs to determine the RPF interface for a PIM-SM (*,G) state, it calculates I = (hash(RP) modulo N) where RP is of course the address of the RP associate with the group. The result of this RPF selection can always be verified with the "show ip rpf " command, which will not only show the choosen RPF for actually established state, but also the RPF interface/neighbor that Cisco IOS IPv4 Multicast would use for sources or RPs for which there is no state build - this show command will simply execute the RPF calculation as described above for the address argument given, wether it is an RP address or a source address. IP multicast multicast in more complex scenarios: o Load splitting vs. Load Balancing Load splitting is not load-balancing. If there are just few (S,G) or (*,G) states flowing across a set of equal-cost links, the chance that they are well balanced is quite low unless the source addresses (for (S,G) states) or the RP addresses (for (*,G) states) are "engineered" (eg: manually calculated), so that a reasonable form of balancing is achieved. This limitation applies equally to the per flow load-splitting in CEF or with ether-channels: As long as there are few flows, all these methods will not result in good load distribution without manual engineering. o No per-group load splitting for (S,G) There is no load splitting for multiple (S,G1), (S,G2),... (S,Gn) states because the group is not taken into consideration for the RPF selection of (S,G) states. If you have a single server sending traffic to many multicast groups, you today need to use (*,G) forwarding with PIM-SM or Bidir-PIM and engineer the RP addresses to achieve load splitting. o Possibility to engineer load-splitting The method used by Cisco IOS IPv4 multicast allows for consistent load splitting in a network where the same number of equal cost paths (eg: 2 path as the example above) are present in multiple places of the topology. If RP or source addresses are calculated once to have flows split onto N paths, then they will be split across these N paths in the same way in all places in the toplogy. o Load splitting and BGP "ip multicast multipath" will work with RPF information learned via BGP in the same way as with RPF information learned from other protocols: It will choose one path out of multiple installed by the protocol. The difference with BGP is only the BGP in Cisco IOS by default only installs a single path: When a BGP speaker learns two identical EBGP paths for a prefix, it will choose the path with the lowest route-id as the best path. This best path is installed in the IP routing table. If BGP multipath support is enabled and the EBGP paths are learned from the same neighboring AS, instead of picking one best path, multiple paths are installed in the IP routing table. By default, BGP will install only one path to the IP routing table. To leverage "ip multicast multipath" for BGP learned prefixes you thus need to enable BGP multipath: router bgp 200 .... maximum-paths When BGP provides remote next-hops, RPF lookup recurses to find the best next-hop towards that BGP next-hop as in unicast. If for example there is only a single BGP path for a given prefix, but two IGP paths to reach that BGP next hop, then multicast RPF will correctly load split between the two different IGP paths. Note: this functionality was broken before 2002 - see CSCdu59373, "ip multicast multipath" did not load split after doing a route recursion. o Load splitting with static routes / mroutes If it is not possible to use an IGPs to install equal cost routes for certain sources or RPs, static routes can be used to configure equal cost multiple paths. You can not use static mroutes ("ip mroute") to configure equal cost multi-paths because Cisco IOS does only allow to configure one static mroute per prefix. There are some workarounds for this limitation via recursive route lookups, but they can not be applied to equal cost multi-path routing as explained in the following URL: ftp://ftpeng.cisco.com/ipmulticast/config-notes/static-mroutes.txt o (Non-)consideration of PIM neighbor Query/Hellos for RPF path selection When "ip multicast multipath" is NOT enabled, and there are multiple equal-cost paths towards an RP or a source, Cisco IOS IPv4 multicast will first elect the highest IP address PIM neighbor. A PIM neighbor is a router from which we have received PIM Hello (or PIMv1 Query) messages. Example: Consider a router has two equal cost paths learned via an IGP or configured via two static routes. The next hops of these two paths are 10.1.1.1 and 10.1.2.1. If both of these next-hop routers send PIM hello messages, then 10.1.2.1 is selected as the highest IP address PIM neighbor. If only 10.1.1.1 sends PIM hello messages, then 10.1.1.1 is selected. If neither of these routers sends PIM hello messages, then 10.1.2.1 is selected. This deferrence to PIM hello messages allows to construct certain type of dynamic failover scenarios with only static routes. It is otherwise not very useful. See the discussion in the above configuration not (".../static-mroutes.txt") for a useful example. When "ip multicast multipath" is enabled, the presence of PIM Hello message from neighbors is not considered, eg: the RPF neighbor choosen does not depend on wether or not PIM hello messages are received from that neighbor - it only depends on the presence or absence of an equal cost path route entry. o Convergence When unicast routing changes, all IP multicast routing states will quickly reconverge to the newly available unicast routing information. Specifically, if one path goes down, reconvergence to the remaining paths happens immediately, and when the path comes up again, multicast forwarding will also reconverge to the same RPF paths that where used before the path did fail. This immediate re-convergence works whether "ip multicast multipath" is configured or not. Note that convergence in the face of path failures is not minimal in Cisco IOS IPv4 multicast if "ip multicast multipath" is being used - refer to the description below for IPv6 for further explanations. o No load splitting for Asserts and DF election "ip multicast multipath" only changes the RPF selection on the downstream router, it does not have an effect on the DF election in Bidir-PIM or the assert processing on upstream routers. In the above shown example, this does not have an effect, but consider the following example: Example 2: Sources Ethernet1 | +--+s0 s0+--+ | | (S1,G1) | | +------+R3+---+ +--+ +--- Rcvr (S1,G2) ----+ | | +--+ | e1| |e0 | | |R1| +----+R2+------+ +--+ |s0 s0+--+ | | | | (S2,G1) | | +------+R4+---+ +--+ (S2,G2) ----+ | | +--+ | | +--+ In example 2, R2 has two equal cost paths to S1, S2 and the RP addresses that we expect to be on R1. Both paths are across e1, one towards R3,one towards R4. For PIM-SSM, and PIM-SM (*,G) as well as (S,G) RPF selection there is no difference in the behavior of R2 over the above shown example 1. There is albeit a difference when using PIM-DM or Bidir-PIM. In PIM-DM, both R3 and R4 will start flooding traffic for the states onto the Ethernet1, they will see each others forwarded traffic and they will use the PIM assert process to elect one router amongst them to forward the traffic and avoid duplicates. As both routers R3 and R4 will have the same route cost, the router with the higher IP address on Ethernet1 will always win this assert process. In result, traffic will NOT be load split across R3 and R4 in this configuration with PIM-DM. In Bidir-PIM, a process called the DF-election will take place between R2, R3 and R4 on Ethernet1, electing the one single router for each RP that is supposed to forward traffic for any groups using this particular RP onto the Ethernet1. Even if multiple RPs are used (for example one for G1 and another one for G2), the DF election for all those RPs will always be won by the router (either R3 or R4) which has the higher IP address on Ethernet1 - the election rules are pretty much the same as the ones used in PIM assert, only the protocol mechanisms to negotiate them are more refined to get to the same result faster. In result, with Bidir-PIM being used in this example, there will be no load splitting across Ethernet1. The reason why "ip multicast multipath" does influence the RPF selection but not the assert process or PIM-DF election is because both are cooperative processes that need to be implemented consistently between participating routers. Changing them would require some form of protocol change that also needs to be agreed upon by the participating routers. RPF selection is a purely router local policy and can thus be done (enabled or disabled) without protocol changes individually on each router. In summary, "ip multicast multipath" for PIM-DM or Bidir-PIM will only give the desired results in topologies where the equal-cost multiple paths are not multiple upstream PIM neighbors on the same LAN, but instead neighbors on different LANs or point-to-point links (eg: serial interfaces). o Assert can invalidate load-splitting even in PIM-SM/SSM There are also cases where "ip multicast multipath" can become ineffective because the above mentioned PIM assert takes over, even when using PIM-SM with (*,G) or (S,G) forwarding or PIM-SSM with (S,G) forwarding: Example 3: Sources Ethernet1 | +--+s0 s0+--+ | | (S1,G1) | | +------+R3+---+ +--+ +--- Rcvr1 (S1,G2) ----+ | | +--+ +----+R2+------+ | |R1| | +--+ | +--+ |s0 s0+--+ | (S2,G1) | | +------+R4+---+ +--+ | (S2,G2) ----+ | | +--+ +----|R5+------+ | +--+ | +--+ +--- Rcvr2 In this example, another router R5 is added to the topology also joining to the same traffic as R2. If both R2 and R5 are Cisco IOS routers consistently configured for both "ip multicast multipath" and the same setting of "ip multicast multipath", then both load splitting will continue to work as expected. Both routers will have R3 and R4 as equal-cost next hops, then both will have this list also sorted in the same way (sorted by ip address), and when applying the multipath hash function they will thus for each (S,G) or (*,G) state choose the same RPF neighbor R3 or R4, and send their PIM joins to this neighbor. If instead R5 and R2 are inconsistently configured (eg: one has "ip multicast multipath", the other not), or if R5 is a non Cisco IOS router, then R2 and R5 may choose different RPF neighbors for some (*,G) or (S,G) states. For example R2 chooses R3, and R5 choses R4. In this situation, R3 and R4 would both start to forward traffic for this state onto Ethernet1, they would see each others forwarded traffic, and to avoid this duplication of traffic, they would start the assert process. In result, for this state the one router of R3 or R4 with the higher IP address on Ethernet1 would forward the traffic. Both R2 and R5 are tracking this so called winner of the assert election and would henceforth send their PIM join for this state to this assert winner - even if this assert winner is not the same router as the one that they calculated in their RPF selection. In summary, in PIM-SM or PIM-SSM, the operations of "ip multicast multipath" can only be guaranteed when all downstream routers on a LAN are consistently configured Cisco IOS routers. Note: Even with R2 and R5 being consistently configured Cisco IOS routers, asserts can happen if the unicast routing table entries for the routes (R3,R4) are not in ascending order of IP address. Older versions of Cisco IOS did not have this sort order, and in effect R2 and R5 would calculate the same hash index I, but it might not point to the same path ! o Polarization problem The hash mechanism used in Cisco IOS IPv4 multicast to split multicast traffic is subject to a problem usually called "polarization": Example 4: Sources Ethernet1 | +--+s0 s0+--+ +--+R1+-----+ | | +--+ | | | |R5| +--+ (S1,G) ----+ +--+s0 s1| |s2 s0| | | +--+R2+-----+ +-----+ | +--- Rcvr ... | +--+ +--+ | | | | |R7+-----+ (S10,G) ----+ +--+s0 s0+--+s2 s1| | | +--+R3+-----+ +-----+ | | | +--+ | | +--+ | |R6| | +--+s0 s1| | +--+R4+-----+ | | +--+ +--+ Consider the topology shown in example 4. R7 has two equal cost paths towards S1,...S10 via R5 and R6. It applies equal-cost load splitting to the 10 (S,G) states, and ends up for example choosing s0/R5 for S1...S5 and s1/R6 for S6...S10. The problem of polarization now accurs in R5 and R6. R5 has two equal cost paths for S1...S5 via s0/R1 and s1/R2. Because R5 applies the same hash function to select which of the two paths to use, it will end up using just one of these two upstream paths for all sources S1...S5, eg: either all thraffic will flow across R1/R5 or via R2/R5, but in this topology it is never possible to utilize both R1/R5 and R2/R5. The same applies to R3/R6 and R4/R6. Polarization is a direct result of the design goal to be able to engineer load-splitting in Cisco IOS IPv4 multicast! 2. Load Splitting IPv6 multicast traffic across equal-cost paths The following description will refer back to the explanations given for IPv4 multicast load splitting and focus on the differences. Cisco IOS does support IPv6 multicast starting with Cisco IOS versions 12.0(26)S (for Cisco 12000 series routers), 12.2(18)S and 12.3(2)T. In Cisco IOS IPv6 multicast, load-splitting is enabled by default and can in curently available Cisco IOS versions not be disabled. There is no command to configure/unconfigure load-splitting. You can use "show ipv6 rpf" to determine and predict the effects of load-splitting in the same way as in IPv4 [Note: In Cisco IOS 12.3T and 12.2S, the command "show ipv6 rpf" is introduced in later releases than 12.2(18)S and 12.3(2)T]. The explanations of operations for IPv4 multicast load-splitting via IP multicast multicast apply in general, with the following modification: o New hash mechanism to avoid polarization and improve route stability The hash mechanism used with Cisco IOS IPv6 multicast does not produce polarization and it also maintains more RPF stability in the face of failing paths. These benefits come at the cost that manual tweaking of source IP or RP IP addresses can not be used anymore to reliably predict and engineer the effects of load-splitting. The reason for this change is that many customers do have equal cost multipath topologies but do not need to manually engineer the load-splitting. Instead, they expect a default behavior of IP multicast similar to IP unicast - use the equal cost multiple paths on a best effort basis. The method used in Cisco IOS IPv4 multicast could not be enabled by default because of the anomaly of polarization. Load splitting used in Cisco IOS CEF unicast is also using a method that does not exhibit polarization and likewise can not be used to predict/engineer load splitting results. The hash method used by Cisco IOS IPv6 multicast works as follows: For each actual PIM neighbor that is a next-hop nh on a best path towards a source or RP, a hash value is calculated as follows: bsr_hash(addr, nh) = (1103515245 * ((1103515245 * addr + 12345) ^ nh) + 12345) & 0x7fffffff; The nh choosen as the RPF neighbor is the one with the largest returned hash value. are the last 32 bits of the IPv6 source or RP address for which RPF information is needed. This hash function is the same hash function used in PIM-SM with BSR. This hash function approach avoids polarization, because it introduces the actual next-hop PIM neighbors IP address into the calculation, so the hash results are different for each router, and in effect, there is no problem of polarization - eg: in example 4 all paths would be utilized. In addition to avoiding polarization, this hash mechanism also increases stability of the RPF paths choosen in the face of path failures. Consider a router with 4 equal cost paths and a large number of states that are load-split across these paths. Consider one of these paths fails, leaving only 3 usable paths. With the hash mechanism used in IPv4, the RPF paths of likely all states will reconverge/change between those three paths, especially those that in before where already using one of those three paths: These states may unnecessarily change their RPF interface/next-hop neighbor. The reason for this problem is simply that the choosen path is determined by taking the total number of paths available into consideration in the algorithm, so once that changes, the RPF selection for all states is subject to change too. With the hash mechanism used in Cisco IOS IPv6 multicast, only the states that previously where RPFing toward the broken path need to reconverge onto one of the three remaining paths. All states that already did use one of those paths will not change. Of course, once the fourth path comes up again, the states that initially used it will immediately reconverge back to it (and the other states still stay as they were). o Static (m)routes supported with equal cost paths As described above, if you only want to use static routes, then you can not do this in Cisco IOS IPv4 for so called "static mroutes", which only apply to multicast (you can do it for "static routes" used by both unicast and multicast). In Cisco IOS IPv6 there is no such restriction, equal cost multipath can be configured for static routes that are usable by unicast only, multicast only or both unicast and multicast. 3. Load splitting across tunnels or L2 link bundles 3.1 Load splitting across tunnels. Load splitting for IP Multicast traffic can also be achieved by consolidating multiple parallel links into a single tunnel over which the multicast traffic is then routed. The methodology to use tunnels for load splitting is described in the IP Multicast configuration guides on CCO. This method is more complex to set up and with the availability of the "ip multicast multipath" command tunnels typically only need to be used if per-packet load sharing is needed (Note: you need to configre CEF per-packet load balancing or else the GRE packets will not be load balanced per packet). One such case where this might be helpfull is the above mentioned case where the total number of (S,G) or (*,G) states is so small and the bandwidth carried by each state so much varying that even manual engineering of the source or RP addresses can not guarante appropriate load splitting of the traffic. 3.2 Load splitting IP multicast traffic across bundle interfaces IP multicast traffic can be load split across bundle interfaces like Fast/Gig-Etherchannel interfaces or MLPPP link bundles. Factually the above mentioned GRE or other type of tunnels can also constitute such form of L2 link bundles. Before using such an L2 mechanism make sure you understand, how unicast/multicast traffic is load split. In the case of MLPPP for example, load splitting is on a per-packet basis and re-sequencing on the receive side is used to guarantee packet order. This is an ideal mechanism to load-split/load-balance traffic across multiple links without having to consider any form of traffic engineering. Unfortunately, it comes at the expense of higher CPU load on the routers having to perform MLPPP processing. Using GRE tunnels as L2 mechanisms with per-packet load balancing is similar, but there is no reordering on the receive end of the tunnel, so packe reordering may happen, which may result in interoperability problems in badly written IP multicast applications that can not handle reordered packets. Using Gig/Fast-Etherchannels typically results in a per-flow load-splitting. How flows are split is hardware/platform dependent and need to be determined individually for each platform used. Appendix A. A. IPv4 Hash calculation example C-code /* * Cisco IOS IPv4 multicast "ip multicast multipath" selects * the path to which to RPF an individual source address by a simple * hash function. * * This test program allows to see which path an IP source address would use. * * Equal cost paths are sorted by increasing ip address of next-hop in IOS. * * Compile with -lsocket -lnsl (on solaris) */ #include #include #include #include #include int hash (unsigned long addr) { union { unsigned char b[4]; unsigned long l; } d; d.l = addr; return d.b[0] ^ d.b[1] ^ d.b[2] ^ d.b[3]; } main(argc,argv) int argc; char *argv[]; { in_addr_t a; int n, l; char *c; if(argc != 3) { fprintf(stderr, "usage: %s ipaddr n\n", argv[0]); fprintf(stderr, " Returns path [1..n] used for ipaddr\n"); exit(1); } a = inet_addr(argv[1]); n = atoi(argv[2]); /* Hash function calculation from IOS */ l = (hash((unsigned long) a) % n); c = inet_ntoa(*((struct in_addr *) &a)); printf("%s will RPF to link %d [1..%d]\n", c, l + 1, n); } .