I feel that Paul’s answer may be a little slang, so here is my attempt.
There are many display devices (VGA monitors, composite video, HDMI, etc.), and they are usually processed directly in hardware, for example. in a dedicated graphics processor (GPU).
Instead of our applications directly talking to this equipment, we use drivers (which live inside the kernel of the operating system). Different hardware needs different drivers, but all drivers can be provided with instructions using the same interface, for example OpenGL:
App --OpenGL--> Driver --> Hardware --VGA--> Screen
Of course, like most standards, in fact there are a whole bunch of different ones! OpenGL is supported by most drivers on most operating systems; its subset of “OpenGL ES” works well on mobile phones, and there are “software drivers” that can create images based on OpenGL instructions (and all drivers can draw images, although this is much slower than having real OpenGL support). OpenGL is a major competitor to DirectX, but it only works on Windows and XBox.
Rendering to OpenGL is great for something like a full-screen 3D game, but the * NIX graphics system (known as "X") offers two main functions: drawing multiple applications on one screen and drawing a network. To do this, the server process accesses the screen, and applications ("clients") communicate with this server using the "X11 protocol" ("11" is just the version number):
App A ----------OpenGL-------+ | App B --+ | | | +--X11--> X server --+----> Driver --> Hardware --> Screen | App C --+ | ...network ... | App D --+
X has direct access to drivers as it was longer than OpenGL, but that is not too important.
The X11 protocol works if applications create windows that they are allowed to draw. X can arrange these windows on the screen, including overlapping them. Applications using OpenGL can have their commands “go through” X directly to the driver, and X will still place the window just like any other (which will not work over the network, since it bypasses the network capabilities of X11).
Usually we have an application designed to place, hide / show and close windows for us, called the window manager. Optionally, the window manager can create some thin windows around the edges of others so that it can draw headings, resize handles, etc.
The X11 protocol includes commands for drawing shapes, rendering fonts, etc., and there are applications that use them directly, for example, the xterm program and the twm window manager:
xterm --+ | +--X11--> X --> Driver --> ... | twm --+
However, most modern applications find the original X11 too tedious; instead of drawing lines and shapes, we prefer to draw entire widgets (buttons, menus, icons, etc.). For this, tools were created. The two most famous are Qt and GTK + (the GIMP Tool, since it was originally created for GIMP); others include Motif, Lesstif, ETK, Tk and FLTK. We can ask the toolbox to draw a button, and it will send all the necessary X11 commands to draw the button, plus it takes care of the size and position, refreshing the drawing if something overlays it, and then leaves, telling our code when the button was pressed , and some tools even allow you to change the appearance of widgets using themes. Some tools are also cross-platform, so they will send X11 commands to Linux, various commands to Windows, OSX, etc.
Rhythmbox --> GTK+ --+ | GIMP --> GTK+ --+ | Amarok --> Qt --+--X11--> X --> Driver --> ... | Skype --> Qt --+ | aMSN --> Tk --+
Some tools offer features on top of others; for example, wxWidgets gets Qt to draw it (on Linux, Windows and OSX are "native"), XUL used by Firefox and uses GTK + to create its own picture:
Audacity --> wxWidgets --> Qt --+ | Firefox --> XUL --> GTK+ --+--X11--> X --> Driver --> ... | GIMP ----------------> GTK+ --+
It is important to note that the X11 form and text commands are not really used very much, as they are very primitive. Many tools actually display their widgets as images, and then get X to draw these images. The new Wayland system is trying to replace X by dropping drawing commands and allowing applications and tools to directly use OpenGL, which should do much faster.
You mentioned different desktop environments, such as GNOME and KDE, and whether they work together. These are basically large collections of collaboration applications. It's just that GNOME applications are written using GTK +, while KDE applications are written using Qt.
If you look at the arrows in my diagrams above, you will notice that each Qt application talks to X separately, each GTK + application talks to X separately, etc .; not only the Qt application and GTK + work side by side, since X is the same as two Qt applications or two GTK + applications!
The only thing to worry about when mixing desktop computers is whether the two applications compete with the same job, for example, if you are trying to run two window managers or two desktop panels. Please note that this is not a problem with graphics, toolkits, etc., since I would get the same problems if I used two desktop computers built on the same toolbox (for example, lxpanel and gnome-panel are written with GTK +, but they will still receive in order each other!)