The scientific method is always applied - first check your assumptions. If the systems are different, the problem may be in some implicit default in different ways or in another implementation of some function.
In all debugging processes, localization is the key. You must first isolate the area of โโthe problem. If your OS, patches, libraries, and main software are identical, then these are probably system settings (restrictions for sockets, file descriptors, etc.). If you know that you have enough inodes, left space and memory, then this is not a resource problem. If the computer almost does not respond to your interactive fraud, your load is too high, or you have some runaway processes. Remember that every process should start, and make sure that they get what they need.
This may be code that simply can not cope with the load of the production system. Blocking mechanisms are a very popular cause of problems in production and dev / test systems, simply because you cannot create enough test cases that you get for free in production.
Logging can be easily overlooked, but I also wanted to add a lot of debugging values โโto the code to make debugging easier. I canโt even calculate how many times a particular environment variable, path or broken symbolic link ruined my day, just to realize that it would be a trivial fix if I look at the values โโof the variables at run time, and not just at the static code.
If all else fails, ltrace and strace are the best way to really see what happens under the hood. They are not easy to read, but as soon as you get used to how to locate and interpret the return values โโof some system calls, you get a very powerful debugging tool.
Marcin
source share