How to interpolate

I have very little data for my analysis, so I want to get more data for analysis using interpolation.

My dataset contains 23 independent attributes and 1 dependent attribute ..... how to do this interpolation?

EDIT:

My main problem is the lack of data, I hv to increase the size of my data set, n attributes are categorical, for example, attribute A can be low, high, medium, so interpolation is the right approach for it or not ???

+7
source share
5 answers

This is a mathematical problem, but there is too little information in the question to answer correctly. Depending on the distribution of your real data, you may try to find the function that follows it. You can also try to interpolate the data using an artificial neural network, but it will be difficult. The fact is that to search for interpolations you need to analyze the data that you already have, and this defeats the goal. This is probably due to this problem, but not explained. What is the nature of the data? Can you put it in n-dimensional space? What do you expect to get from the analysis?

+1
source share

Roughly speaking, to interpolate an array:

double[] data = LoadData(); double requestedIndex = /* set to the index you want - eg 1.25 to interpolate between values at data[1] and data[2] */; int previousIndex = (int)requestedIndex; // in example, would be 1 int nextIndex = previousIndex + 1; // in example, would be 2 double factor = requestedIndex - (double)previousIndex; // in example, would be 0.25 // in example, this would give 75% of data[1] plus 25% of data[2] double result = (data[previousIndex] * (1.0 - factor)) + (data[nextIndex] * factor); 

This is really pseudo code; it does not perform a range check, assumes your data is in an object or array with an index, etc.

Hope this helps you get started - any questions please write a comment.

0
source share

If 23 independent variables are taken in a hypernetwork (on a regular basis), you can choose a partition into hypercubes and linearly interpolate the dependent value from the vertex closest to the origin, along the vectors defined from this vertex along the edges of the hypercube from the origin. In the general case, for this partition, you project an interpolation point onto each vector, which gives you a new β€œcoordinate” in this particular space, which can then be used to calculate a new value by multiplying each coordinate by the difference of the dependent variable, adding up the results and adding to the dependent value in local origin. For hypercubes, this projection is simple (you just subtract the nearest vertex closest to the origin.)

If your samples are not evenly distributed, the problem is much more complicated, since you will need to select the appropriate partitioning if you want to perform linear interpolation. In principle, Delaunay triangulation generalizes to N dimensions, but this is not easy to do, and the resulting geometric objects are much more difficult to understand and interpolate than a simple hypercube.

One thing you might think is that your dataset naturally lends itself to projection so you can reduce the number of dimensions. For example, if your two independent variables dominate, you can minimize the problem to two dimensions, which is much easier to solve. Another thing you can consider is taking the sample points and placing them in the matrix. You can decompose SVDs and look at specific values. If there are several dominant singular values, you can use this to project onto the hyperplane defined by these basis vectors and reduce the size of your interpolation. Basically, if your data is distributed in a specific set of dimensions, you can use these dominant dimensions to perform your interpolation, since in reality you do not have much information in other dimensions.

I agree with other commentators, however, that your premise may be disabled. Usually you do not want the interpolation to perform the analysis, because you simply choose to interpolate your data in different ways, and the choice of interpolation distorts the analysis. This only makes sense if you have good reason to believe that a certain interpolation is physically consistent, and you just need additional points for a particular algorithm.

0
source share

Can I suggest Cubic Spline interpolation http://www.coastrd.com/basic-cubic-spline-interpolation

if you do not have special needs, it is easy to implement and plan splines well.

0
source share

Look at the regression methods presented in Elements of Statistical Learning ; most of them can be tested in R. There are many models that can be used: linear regression, local models, etc.

0
source share

All Articles