In C ++, how to calculate the average of a vector of integers using the vector view and gsl_stats_mean?

my program manages the STL vectors of integers , but from time to time I need to calculate some statistics about them. Therefore, I use the GSL features . To avoid copying the STL vector to the GSL vector, I create a GSL vector view and pass it to the GSL functions, as in this code fragment:

#include <iostream> #include <vector> #include <gsl/gsl_vector.h> #include <gsl/gsl_statistics.h> using namespace std; int main( int argc, char* argv[] ) { vector<int> stl_v; for( int i=0; i<5; ++i ) stl_v.push_back( i ); gsl_vector_int_const_view gsl_v = gsl_vector_int_const_view_array( &stl_v[0], stl_v.size() ); for( int i=0; i<stl_v.size(); ++i ) cout << "gsl_v_" << i << "=" << gsl_vector_int_get( &gsl_v.vector, i ) << endl; cout << "mean=" << gsl_stats_mean( (double*) gsl_v.vector.data, 1, stl_v.size() ) << endl; } 

After compilation (gcc -lstdC ++ -lgsl -lgslcblas test.cpp) this code outputs this:

 gsl_v_0=0 gsl_v_1=1 gsl_v_2=2 gsl_v_3=3 gsl_v_4=4 mean=5.73266e-310 

The vector representation is correctly created, but I do not understand why the average value is incorrect (it should be 10/5 = 2). Any ideas? Thanks in advance.

+7
source share
6 answers

Use integer statistics functions:

 cout << "mean=" << gsl_stats_int_mean( gsl_v.vector.data, 1, stl_v.size() ) << endl; 

Pay attention to gsl_stats_int_mean instead of gsl_stats_mean .

+3
source

Casting to double* very suspicious.

Anytime you are tempted to use a throw, think again. Then find a way to do this without casting (perhaps by introducing a temporary variable if the conversion is implicit). Then think a third time before throwing.

Since the memory area does not actually contain double values, the code simply interprets the bit patterns there, as if they represented doubles with predictable undesirable effects. Dropping int* to double* VERY different from casting each element of an array.

+4
source

If you don't do a lot of statistics much more complicated than the average, I would ignore gsl and just use standard algorithms:

 double mean = std::accumulate(stl_v.begin(), stl_v.end(), 0.0) / stl_v.size(); 

When / if using a statistical library is warranted, your first choice should probably be to look for something else that is better designed (e.g. Boost Accumulators).

If for some reason you decide that you really need to use gsl, it looks like you will have to copy the int array to the double array first, and then use gsl for the result. This is obviously quite inefficient, especially if you are dealing with a lot of data - so the previous tip is to use something else.

+2
source

Although I am not familiar with GSL, the expression (double*) gsl_v.vector.data looks extremely suspicious. Did you set the reinterpret_cast pointer to receive double data correctly?

+1
source

Listing double* is to corrupt your data. It does not convert data to double , but simply uses int binary data as double

+1
source

According to http://www.gnu.org/software/gsl/manual/html_node/Mean-and-standard-deviation-and-variance.html , the gsl_stats_mean function accepts a double array. You take vector from int and tell it to use raw bytes like double , which will not work correctly.

You will need to create a temporary vector for dual access:

 // Assumes that there at least one item in stl_v. std::vector<double> tempForStats(stl_v.begin(), stl_v.end()); gsl_stats_mean(&tempForStats[0], 1, tempForStats.size()); 

EDIT: you can also use standard library algorithms to make int int:

 // Assumes that there at least one item in stl_v. double total = std::accumulate(stl_v.begin(), stl_v.end(), 0); double mean = total / stl_v.size(); 
+1
source

All Articles