I tried this with a 12x DataNode cluster located in a 2: 1 ratio, split between two data centers about 120 miles apart. The delay between data centers was ~ 4 ms after 2 x 1GbE.
2 racks were configured on site A, 1 rack configured on site B. Each βrackβ had 4 cars in it. We mainly tested site B as the "DR" site. The replication rate was set to 3.
In short, it works, but the performance was really, really poor. You definitely need to use compression on your source, display and reduce the output to reduce your I / O records, and if links between sites are used for anything else, you will get timeouts when transferring data. A TCP window would actually limit our transmission to about 4 Mbps instead of the potential 100 Mbps + on the 1 Gbps line.
Save the headache and just use distcp jobs to replicate the data.
source share