(1) This is not only possible, but also documented for almost every generation of x86 processors. Return to the year 8088 and make your way forward, every generation. Clock to Clock The new processor was slower for current applications and operating systems (including Linux). Switching from 32 to 64 bits does not help, more cores and less clock speed make it even worse. And this is true for the same reason.
(2) The bank in your binaries does not work or it fails. Sometimes you are lucky; most of the time you do not. There are new instructions, yes, and to support them would probably mean a trap for an undefined instruction and have software emulation of that instruction that would be terribly slow, and the lack of demand for it means that it is probably not well done or just not there. Optimization may use new instructions, but more than that, the main part of the optimization that I think you are talking about is reordering instructions so that various pipelines do not stop. Thus, you organize them quickly on a processor of one generation, they will be slower on another, because in the x86 family the cores change too much. AMD worked well there for a while, as they could make the same code simply faster, instead of trying to invent new processors that would eventually be faster when the software caught. Already, both amd and intel are struggling to just keep the chips without crashing.
(3) Generally yes. For example, gcc is a terrible compiler, one size fits all, no one fits, it can never and will never be optimistic. For example, gcc 4.x code is slower in gcc 3.x code for the same processor (yes, it is all subjective, it all depends on the particular compiled application). The native compilers I used were faster and faster than cheap or free (I'm not limited to x86 here). Are they worth the price? This is a question.
All in all, due to the terrible new programming languages and storage devices, storage, caching levels, software development skills are all low all the time. This means that the pool of engineers able to make a good compiler is much smaller, a good optimizing compiler is reduced over time, this has been happening for at least 10 years. Thus, even the compilers in the house deteriorate over time, or they just have their own employees to work and contribute to open source tools, rather than using the built-in tool. In addition, the tools used by machine engineers are degrading for the same reason, so now we have processors that we hope to launch without failures and are not so much trying to optimize. There are so many bugs and chips that most of the work with the compiler avoids bugs. Bottom line, gcc destroyed the compiler world.
(4) See (2) above. Do not take care of him. Your operating system, for which you want to run it, most likely will not be installed on an older processor, saving you the pain. For the same reason, binaries optimized for your Pentium III run slower on your Pentium 4 and vice versa. Code written to run on multi-core processors will run slower on single-core processors than if you optimized the same application for a single-core processor.
The root of the problem is the x86 command set is terrible. So many distant instruction sets that do not require hardware tricks to make them faster every generation. But the pendulum created two monopolies, while others could not enter the market. My friends keep reminding me that these x86 machines are microcoded so that you really don't see the instruction inside. What irritates me even more is that the awful isa is just a layer of interpretation. This is similar to using Java. The problems that you outlined in your questions will continue until Intel remains at the top, if the replacement does not become a monopoly, then we will forever remain on the Java model, where you are one or the other from the common denominator, or you emulate a common platform on your particular hardware or write applications and compile them onto a common platform.