How to choose a uniformly distributed subset of a partially dense data set?

Question

How to choose a uniformly distributed subset of a partially dense data set?

Pis an n * d matrix containing nd-dimensional samples. Pin some areas several times denser than others. I want to choose a subset Pin which the distance between any pairs of samples will be greater than d0, and I need it to be distributed throughout the area. All samples have the same priority, and there is no need to optimize anything (for example, coverage area or the sum of pairwise distances).

Here is an example of code that does this, but it is very slow. I need a more efficient code, as I need to call it several times.

%% generating sample data
n_4 = 1000; n_2 = n_4*2;n = n_4*4;
x1=[ randn(n_4, 1)*10+30; randn(n_4, 1)*3 + 60];
y1=[ randn(n_4, 1)*5 + 35; randn(n_4, 1)*20 + 80 ];
x2 = rand(n_2, 1)*(max(x1)-min(x1)) + min(x1);
y2 = rand(n_2, 1)*(max(y1)-min(y1)) + min(y1);
P = [x1,y1;x2, y2];
%% eliminating close ones
tic
d0 = 1.5;
D = pdist2(P, P);D(1:n+1:end) = inf;
E = zeros(n, 1); % eliminated ones
for i=1:n-1
    if ~E(i)
        CloseOnes = (D(i,:)<d0) & ((1:n)>i) & (~E');
        E(CloseOnes) = 1;
    end
end
P2 = P(~E, :);
toc
%% plotting samples
subplot(121); scatter(P(:, 1), P(:, 2)); axis equal;
subplot(122); scatter(P2(:, 1), P2(:, 2)); axis equal;

Edit: how large should the subset be?

j_random_hacker, , P(1, :) , . ! , . : " m samples, ". m=n . , , .

+4

algorithm matlab

saastn 15 . '16 13:22

2

, . .

.

, , "" ( , ).

%%
figure;
subplot(121); scatter(P(:, 1), P(:, 2)); axis equal;

d0 = 1.5;

m_range = linspace(1, 2000, 100);
m_time = NaN(size(m_range));

for m_i = 1:length(m_range);
    m = m_range(m_i)

    a = tic;
    % Test points in random order.
    r = randperm(n);
    r_i = 1;

    S = false(n, 1); % selected ones
    for i=1:m
        found = false;

        while ~found
            j = r(r_i);
            r_i = r_i + 1;
            if r_i > n
                % We have tried all points. Nothing else can be valid.
                break;
            end
            if sum(S) == 0
                % This is the first point.
                found = true;
            else
                % Get the points already selected
                P_selected = P(S, :);
                % Exclude points >= d0 along either axis - they cannot have
                % a Euclidean distance less than d0.
                P_valid = (abs(P_selected(:, 1) - P(j, 1)) < d0) & (abs(P_selected(:, 2) - P(j, 2)) < d0);
                if sum(P_valid) == 0
                    % There are no points that can be < d0.
                    found = true;
                else
                    % Implement Euclidean distance explicitly rather than
                    % using pdist - this makes a large difference to
                    % timing.
                    found = min(sqrt(sum((P_selected(P_valid, :) - repmat(P(j, :), sum(P_valid), 1)) .^ 2, 2))) >= d0;
                end
            end
        end
        if found
            % We found a valid point - select it.
            S(j) = true;
        else
            % Nothing found, so we must have exhausted all points.
            break;
        end
    end
    P2 = P(S, :);
    m_time(m_i) = toc(a);
    subplot(122); scatter(P2(:, 1), P2(:, 2)); axis equal;
    drawnow;
end
%%
figure
plot(m_range, m_time);
hold on;
plot(m_range([1 end]), ones(2, 1) * original_time);
hold off;

original_time - , . , - , - , , x. , , "" , .

, d0. , d0 ( d0=0.1):

, , . , . , d0, ( ). .

, , pdist. Matlab .

+2

zelanix 15 . '16 16:06

Peter · Accepted Answer · 2016-04-15T17:09:20+0000

, . delaunay.

"" , , . , , , . , , , , .

.

dt = delaunayTriangulation(P(:,1), P(:,2));
d0 = 1.5;

while 1
    edge = edges(dt);  % vertex ids in pairs

    % Lookup the actual locations of each point and reorganize
    pwise = reshape(dt.Points(edge.', :), 2, size(edge,1), 2);
    % Compute length of each edge
    difference = pwise(1,:,:) - pwise(2,:,:);
    edge_lengths = sqrt(difference(1,:,1).^2 + difference(1,:,2).^2);

    % Find edges less than minimum length
    idx = find(edge_lengths < d0);
    if(isempty(idx))
        break;
    end

    % pick first vertex of each too-short edge for deletion
    % This could be smarter to avoid overdeleting
    points_to_delete = unique(edge(idx, 1));

    % remove them.  triangulation auto-updates
    dt.Points(points_to_delete, :) = [];

    % repeat until no edge is too short
end

P2 = dt.Points;

How to choose a uniformly distributed subset of a partially dense data set?

More articles: