Usually there is some kind of "presence server": the devices are registered ("I'm here!"), And the calls are set up through the server (when you say "I want to connect to the device (555) 555-1234 " that the connection request is sent through presence servers).
After the call is established and real-time streaming / streaming media, this traffic is usually peer-to-peer (bypassing any central server), unless there is a complication, such as both devices are behind firewalls.
source share