What you are talking about is what is known in the embedded world as a bare metal application. They are very common for things like ARM Cortex-M3, which is included (say) in a debit card authentication unit or an interactive toy and does not have enough memory or the ability to run a full operating system. Thus, instead of the "ARM / Linux" compiler, which would compile the Linux application on the ARM processor, you get the "ARM bare-metal" compiler, which compiles everything that needs to be run on the ARM processor without an operating system. (As an example, I use ARM, not x86, because currently, pure x86 metal applications are really very rare.)
As stated in your question and other answers, your application will need to do some things that would otherwise take care of the operating system.
First, it needs to initialize the memory system, interrupt vectors, and various other bits of the board. This is usually what the bare metal compiler does for you, although if you have a weird board, you may need to tell her how to do it. This gets things from where the board turns on at the point where your main () function starts.
Then you need to interact with things outside the CPU and RAM. The operating system includes all kinds of functions for this: disk I / O, screen output, keyboard and mouse input, networking, etc. Etc. Etc. Without an operating system, you should get it from somewhere else. You can get some of these libraries from your equipment manufacturer; for example, the board I recently played with has a 40x200-pixel LED screen, and it includes a library with code to enable this and set individual pixel values on it. And there are several companies that sell libraries to implement the TCP / IP stack and the like, to create networks or something else.
Consider, for example, that this makes even basic printf difficult to execute. When you have an operating system, printf simply sends a message to the operating system that says “put this line on the console”, and the operating system finds the current cursor position on the console and does whatever it takes to figure out which pixels to change on the screen and which processor instructions to use to change these pixels to do this.
Oh, and we mentioned that you first need to figure out how to get the program into the CPU? A typical computer has a bit of programmable ROM that will load instructions from the moment it starts. On x86, this is the BIOS, and it usually already contains a handy program that starts the processor, sets up the display, looks for disks and loads the program from the disk that it finds. In an embedded system, which usually runs your program, which means that you need to somehow place your program. Often this means that you have a device called a "debugger" physically attached to the built-in board that loads the program, and can also do things that let you pause the processor and determine what its state is, so you can step through your a program as if you were running it in a software debugger on your computer. But I'm distracted.
In any case, in order to answer your second question, this executable file that you create will be stored in this ROM on the built-in board - or maybe you just saved it in the ROM (which is pretty small after that) and save the rest part on the flash drive, and a bit in the ROM will contain instructions to get the rest of it from the flash drive. It will probably be stored as a file on your main computer (that is, on a computer running Linux or Windows, where you create it), but it's just for storage, it won’t work there.
You will notice that when you have many of these libraries together, they do an honest part of what the operating system does, and there is such a space between a bunch of libraries and the real operating system. In this space what is called RTOS - the "real operating system time. " The smaller ones are actually just collections of libraries that work together to run all operating systems, and sometimes also include material so that you can run multiple threads at the same time (and then you can use different threads as different programs) - - although all this is all compiled into the same compiled “program”, and RTOS is nothing but the library that you included. Larger ones begin to store parts of the code in different places, and I think some of them can even load code snippets from disks - just like Windows and Linux when the program starts. This is a kind of continuum, not either / or.
The FreeRTOS system is an open source RTOS that targets the smaller end of the RTOS space; they can be a good place to take a look at some of them if you are more interested. They have some examples of x86 applications that will give you an idea of which x86 systems will run the program based on a simple or RTOS application and how you will compile something to run on one; link here: http://www.freertos.org/a00090.html#186 .