How to translate R wrapper around C ++ function in Python / Numpy

The R package Ckmeans.1d.dp relies on C ++ code to do 99% of its work.

I want to use this function in Python without relying on RPy2. Therefore, I want to "translate" the R shell to a similar Python shell that works with Numpy arrays in the same way that R code works on R vectors. Is it possible? It seems to be, since the C ++ code itself (to my unprepared eye), as if it stands up on its own.

However, the documentation for Cython does not actually cover this use case, wrapping existing C ++ with Python. He briefly mentioned here and here , but I'm on a journey above my head since I have never worked with C ++ before.

Here is my attempt, which fails with an error " Cannot assign type 'double' to 'double *':

Directory structure

.
├── Ckmeans.1d.dp  # clone of https://github.com/cran/Ckmeans.1d.dp
├── ckmeans
│   ├── __init__.py
│   └── _ckmeans.pyx
├── setup.py
└── src
    └── Ckmeans.1d.dp_pymain.cpp

SIC /Ckmeans.1d.dp_pymain.cpp

#include "../Ckmeans.1d.dp/src/Ckmeans.1d.dp.h"
static void Ckmeans_1d_dp(double *x, int* length, double *y, int * ylength,
                          int* minK, int *maxK, int* cluster,
                          double* centers, double* withinss, int* size)
{
    // Call C++ version one-dimensional clustering algorithm*/
    if(*ylength != *length) { y = 0; }

    kmeans_1d_dp(x, (size_t)*length, y, (size_t)(*minK), (size_t)(*maxK),
                    cluster, centers, withinss, size);

    // Change the cluster numbering from 0-based to 1-based
    for(size_t i=0; i< *length; ++i) {
        cluster[i] ++;
    }
}

ckmeans / INIT .py

from ._ckmeans import ckmeans

ckmeans / _ckmeans.pyx

cimport numpy as np
import numpy as np
from .ckmeans import ClusterResult

cdef extern from "../src/Ckmeans.1d.dp_pymain.cpp":
    void Ckmeans_1d_dp(double *x, int* length,
                       double *y, int * ylength,
                       int* minK, int *maxK,
                       int* cluster, double* centers, double* withinss, int* size)

def ckmeans(np.ndarray[np.double_t, ndim=1] x, int* min_k, int* max_k):
    cdef int n_x = len(x)
    cdef double y = np.repeat(1, N)
    cdef int n_y = len(y)
    cdef double cluster
    cdef double centers
    cdef double within_ss
    cdef int sizes
    Ckmeans_1d_dp(x, n_x, y, n_y, min_k, max_k, cluster, centers, within_ss, sizes)
    return (np.array(cluster), np.array(centers), np.array(within_ss), np.array(sizes))
+4
source share

All Articles