Interpolation and extrapolation of randomly scattered data into a single grid in 3D

Question

Interpolation and extrapolation of randomly scattered data into a single grid in 3D

I have a 256 x 256 x 32 grid with regular intervals located above x, y and z and with the corresponding variable "a". I also have a group of randomly scattered points in a more limited space x, y, z with the associated variable "b". What I really want to do is interpolate and extrapolate my random data to a regularly distributed grid that corresponds to cube “a”, as shown below:

Visual representation.

I have used scipy griddata so far to achieve interpolation, which seems to work fine, but it cannot handle extrapolation (as far as I know), and the result is abruptly trimmed to "nan" values. Studying this problem, I came across a couple of people using griddata a second time with the “closest” as the interpolation method to populate the “nan” values. I tried this, but the results do not seem reliable. More suitable search results are obtained if I use fill_Value with "linear" mode, but at the moment it is more fudge, because fill_Value should be constant.

I noticed that MATLAB has a ScatteredInterpolant class that seems to do what I want, but I can't find an equivalent class in Python and don't know how to efficiently implement such a procedure in 3D. Any help is appreciated.

The code I use for interpolation is below:

x, y, z, b = np.loadtxt(scatteredfile, unpack = True) # Create cube to match aCube dimensions xi = np.linspace(-xmax_aCube, xmax_aCube, 256) yi = np.linspace(-ymax_aCube, ymax_aCube, 256) zi = np.linspace(zmin_aCube, zmax_aCube, 32) # Interpolate scattered points X, Y, Z = np.meshgrid(xi, yi, zi) bCube = griddata((x, y, z), b, (X, Y, Z), method = 'linear')

+7

python scipy interpolation extrapolation

MEvans Feb 24 '14 at 16:42

source share

1 answer

thclark · Answer 1 · 2017-08-03T23:43:40+0000

This discussion applies in any dimension. For your 3D case, we’ll first talk about computational geometry to understand why part of the area gives NaN from griddata .

The scattered points in your volume make up a convex hull; geometric shape with the following properties:

The surface is always convex (as the name implies)
The volume of the form is the smallest possible without breaking the bulge
The surface (in 3d) is triangulated and closed

A less formally convex hull ( which you can easily calculate with scipy ) is like stretching a balloon over a frame, where the frame angles are the outermost points of your scattered cluster.

At a regular place on the grid inside the balloon, you are surrounded by known points. You can interpolate these locations. Beyond this, you must extrapolate.

Extrapolation is complicated. There is no general rule on how to do this ... this is a problem. In this area, algorithms such as griddata return NaN - this is the safest way to inform the scientist that he / she should choose a reasonable way to extrapolate.

Let's look at some ways to do this.

1. [WORST] Botch it

Assign some scalar value outside the enclosure. In numpy docs you will see that this is done with: s = average (b) bCube = griddata ((x, y, z), b, (X, Y, Z), method = 'linear', fill_value = s)

Cons: This creates a sharp gap in the interpolated field at the housing boundary, strongly distorts the average value of the scalar field and does not take into account the functional form of the data.

2. [NEXT WORST] "Mixed Search"

Suppose you apply some value in the corners of your domain. This may be the average of the scalar field associated with your scattered points.

Sorry, this is pseudo code, as I don't use numpy at all, but it will probably be pretty clear.

 # With a unit cube, and selected scalar value x, y, z, b = np.loadtxt(scatteredfile, unpack = True) s = mean(b) x.append([0 0 0 0 1 1 1 1]) y.append([0 0 1 1 0 0 1 1]) z.append([0 1 0 1 0 1 0 1]) b.append([ssssssss]) # drop in the rest of your code

Cons: This creates a sharp gap in the gradient of the interpolated field at the border of the case, pretty much distorts the average value of the scalar field and does not take into account the functional form of the data.

3. [STILL PRETTY BAD] Nearest neighbor

For each of the regular NaN points, find the nearest non-NaN and assign this value. It is efficient and stable, but rude, because your field can have patterned functions (for example, stripes or rays emitted from the body) that are often visually unattractive or, even worse, unacceptable in terms of data smoothness

Depending on the data density, you can use the nearest scattered data set instead of the nearest non-NaN regular point. This can be done simply (again, with pseudo-code):

 bCube = griddata((x, y, z), b, (X, Y, Z), method = 'linear', fill_value=nan) bCubeNearest = griddata((x, y, z), b, (X, Y, Z), method = 'nearest') indicesMask = isNan(bCube) # Use nearest interpolation outside the hull, keeping linear interpolation inside. bCube(indicesMask) = bCubeNearest(indicesMask)

Using MATLAB delaunay-based approaches will show more powerful methods for achieving similar ones in single-line space, but numpy looks a bit limited here.

4. [NOT ALWAYS TERRIBLE] Naturally Weighted

Sorry for the poor explanation in this section, I never wrote an algorithm, but I'm sure that some research on the natural neighbor method will take you far.

Use the distance weighting function with some parameter D , which may be similar to, or twice (say) the length of your field. You can customize. For each NaN location, determine the distance to each of the scattered points.

 # Don't do it this way for anything but small matrices - this is O(NM) # and it can be done much more effectively (eg MATLAB has a quick # natural weighting option), but for illustrative purposes: for each NaN point 1:N for each scattered point 1:M calculate a basis function using inverse distance from NaN to point, normalised on D, and store in a [1 x M] vector of weights Multiply weights by the b value, summate and divide by M

You basically want to get a function that smoothly transitions to the average intensity B at a distance D from the body, but matches the body at the border. Far from the border, it is most heavily weighted at the nearest points.

Pros: well stable and reasonably continuous. Due to weighting, it is more resistant to noise at single data points than the nearest neighbor.

5. [HEROIC ROCKSTAR] Assumption of functional form

What do you know about physics? Suppose that a functional form representing what you expect from physics should make the least squares (or some equivalent) of this form match the scattered data. Use the function to stabilize extrapolation.

Some good ideas that can help you build a function: - Do you expect symmetry or periodicity? - Is a component of a vector field that has some property as zero divergence? - Orientation: do you expect all angles to be the same? Or maybe linear variation in one direction? - field b at a certain point in time - perhaps smoothed measurement timers can be used to create a basic function? - Is there already a known shape, such as Gaussian or quadratic?

Some examples: - b represents the intensity of a laser beam passing through a volume. You expect the input side to be nominally identical to the outlet, and the remaining four boundaries of zero intensity. The intensity will have a concentric Gaussian profile.

b is one component of the velocity field in an incompressible fluid. The liquid should be free of discrepancies, so any field created in the NaN zone should also be divergent free, so you apply this condition.
b represents the temperature in the room. You expect the upper temperature to be higher because hot air rises.
b is an elevator on an aerodynamic surface, tested for three independent variables. You can easily lift the elevator to the stall, so you know exactly what will happen in some parts of the space.

Pros / Cons: Get it right and it will be awesome. Wrong, especially with non-linear functional forms, and it will be very bad and can lead to very unstable results.

Security warning you cannot take a functional form, get good results, and then use them to prove the correctness of the functional form. This is just bad science. The form should be well-behaved and well-known, regardless of your data analysis.

Interpolation and extrapolation of randomly scattered data into a single grid in 3D

1. [WORST] Botch it

2. [NEXT WORST] "Mixed Search"

3. [STILL PRETTY BAD] Nearest neighbor

4. [NOT ALWAYS TERRIBLE] Naturally Weighted

5. [HEROIC ROCKSTAR] Assumption of functional form

More articles: