I am currently debugging C ++ code written at the end of '90 that analyzes scripts to load data, perform simple operations and print results, etc.
The people who wrote the code used functors to match the string keywords in a file in which it parses the actual function calls, and they are templated (with a maximum of 8 arguments) to handle the many functional interfaces that the user can query in their script.
For the most part, all this works just fine, except in recent years he started segfault on some of our 64-bit build systems. Running through valgrind, to my surprise, I found that errors seem to occur inside of "printf", which is one of the specified functors. Here are some code snippets to show how this works.
Firstly, the script, which is a parsing, contains the following line:
printf( "%5.7f %5.7f %5.7f %5.7f\n", cos( j / 10 ), tan( j / 10 ), sin( j / 10 ), sqrt( j / 10 ) );
where cos, tan, sin and sqrt are also functors corresponding to libm (this part is not important if I replace those with fixed numerical values, the same result is obtained).
When it comes to calling printf, this is done as follows. First, the template functor:
template<class R, class T1, class T2, class T3, class T4, class T5, class T6, class T7, class T8> class FType { public : FType( const void * f ) { _f = (R (*)(T1,T2,T3,T4,T5,T6,T7,T8))f; } R operator()( T1 a1,T2 a2,T3 a3,T4 a4,T5 a5,T6 a6,T7 a7,T8 a8 ) { return _f( a1,a2,a3,a4,a5,a6,a7,a8); } private : R (*_f)(T1,T2,T3,T4,T5,T6,T7,T8); };
And then the code that calls it is inside another class of templates - I show the prototype and the corresponding code fragment that FType uses (as well as some additional code that I insert for debugging):
template<class T1, class T2, class T3, class T4, class T5, class T6, class T7, class T8> static Token evalF( const void * f, unsigned int nrargs, T1 a1, T2 a2, T3 a3, T4 a4, T5 a5, T6 a6, T7 a7, T8 a8, vtok & args, const Token & returnType ) { Token result; printf("Count: %i\n",++_count); if( _count == 2 ) { const char *fmt = *((const char **) &a1); result = printf(fmt,a2,a3,a4,a5,a6,a7,a8); FType<int, const void*,T2,T3,T4,T5,T6,T7,T8> f1(f); result = f1("Hello, world.\n",a2,a3,a4,a5,a6,a7,a8); result = f1("Hello, world2 %5.7f\n",a2,a3,a4,a5,a6,a7,a8); result = f1(fmt,a2,a3,a4,a5,a6,a7,a8); } else { FType<int, T1,T2,T3,T4,T5,T6,T7,T8> f1(f); result = f1(a1,a2,a3,a4,a5,a6,a7,a8); } }
I inserted the if (_count == 2) bit (since this function is called several times). Under normal circumstances, it only performs operations in the else clause; it calls the FType constructor (which creates the return type as int) with "f", which is a functor for printf (checked in the debugger). Once f1 is constructed, it calls the overloaded call statement with all argument templates, and valgrind starts complaining:
==29358== Conditional jump or move depends on uninitialised value(s) ==29358== at 0x92E3683: __printf_fp (printf_fp.c:406) ==29358== by 0x92E05B7: vfprintf (vfprintf.c:1629) ==29358== by 0x92E88D8: printf (printf.c:35) ==29358== by 0x5348C45: FType<int, void const*, double, double, double, double, void const*, void const*, void const*>::operator()(void const*, double, double, double, double, void const*, void const*, void const*) (Interpreter.cc:321) ==29358== by 0x51BAB6D: Token evalF<void const*, double, double, double, double, void const*, void const*, void const*>(void const*, unsigned int, void const*, double, double, double, double, void const*, void const*, void const*, std::vector<Token, std::allocator<Token> >&, Token const&) (Interpreter.cc:542)
So, this led to experiments inside the if () clause. Firstly, I tried directly calling printf with the same arguments (note the trick of the tag with parameter a1 - format - to compile it, otherwise it complains about many instances of the template where T1 is not (char *), as expected printf). It works great.
Then I tried to call f1 with a replacement format string in which there are no variables (Hello, world). This also works great.
Then I add one of the variables (Hello, World2% 5.7f) and then I start to see valgrind errors as above.
If I run this code on a 32-bit system, it will be cleared of valgrind (otherwise the same versions of glibc, gcc).
Works on several different Linux systems (all 64-bit), sometimes I get segfault (e.g. RHEL5.8 / libc2.5 and openSUSE11.2 / libc-2.10.1), and sometimes not (e.g. libc2.15 with Fedora 17 and Ubunutu 12.04), but valgrind always complains in the same way for all systems, making me think it's an accident if it works or not.
All this makes me suspect some kind of error with glibc in the 64-bit version, although I would be much happier if someone could find something wrong with this code!
One hunch I had was that it was somehow related to parsing lists of variable arguments. How exactly do they play with patterns? Itβs actually not clear to me how this works, because it doesnβt know the format string until runtime, so how does it know what specific instances of the template you need to do when compiling? However, this does not explain why everything seems beautiful in the 32-bit version.
Update in response to comments
Thanks to everyone for this useful discussion. I think the answer from ora regarding% al register is probably the correct explanation, although I have not verified it yet. Regardless of the benefit of the discussion, here is a complete, minimal program that reproduces an error on my 64-bit system that others can play with. If you are #define _VOID_PTR at the top, it uses void * pointers to pass pointers to functions, as in the source code (and causes valgrind errors). If you comment out #define _VOID_PTR , it will use correctly prototyped function pointers instead, as suggested by WhosCraig. The problem with this case is that I could not just put int (*f)(const char *, double, double) = &printf; since the compiler complains about prototype mismatch (maybe I'm just fat and is there a way to do this?) I assume that this is the problem the original author tried to deal with void * pointers). To handle this particular case, I create this wrap_printf() function with the correct explicit list of arguments. When I execute this version of the code, it is cleared of valgrind. Unfortunately, this does not tell us if this is a void * vs. storage problem. function or something related to% al; I think most of the evidence points to the latter case, and I suspect that the printf() wrapper with a fixed list of arguments made the compiler "correctly":
#include <cstdio> #define _VOID_PTR // set if using void pointers to pass around function pointers template<class R, class T1, class T2, class T3> class FType { public : #ifdef _VOID_PTR FType( const void * f ) { _f = (R (*)(T1,T2,T3))f; } #else typedef R (*FP)(T1,T2,T3); FType( R (*f)(T1,T2,T3 )) { _f = f; } #endif R operator()( T1 a1,T2 a2,T3 a3) { return _f( a1,a2,a3); } private : R (*_f)(T1,T2,T3); }; template <class T1, class T2, class T3> int wrap_printf( T1 a1, T2 a2, T3 a3 ) { const char *fmt = *((const char **) &a1); return printf(fmt, a2, a3); } int main( void ) { #ifdef _VOID_PTR void *f = (void *)printf; #else // this doesn't work because function pointer arguments don't match printf prototype: // int (*f)(const char *, double, double) = &printf; // Use this wrapper instead: int (*f)(const char *, double, double) = &wrap_printf; #endif char a1[]="%5.7f %5.7f\n"; double a2=1.; double a3=0; FType<int, const char *, double, double> f1(f); printf(a1,a2,a3); f1(a1,a2,a3); return 0; }