Why clustering? This is not like a clustering problem. You can do cluster analysis as a pre-processing phase to highlight several user groups (or you can omit this phase), but then you need to make some kind of numerical prediction : both - the number of payments and days - numbers, since you are going to receive these clustered numbers?
I suggest you use regression for this task. Linear regression should fit your needs. If the dependent variables (# packets and days) depend on other attributes non-linearly, you can try polynomial regression or even algorithms such as M5 ' , which first assemble a decision tree and then add a regression model to each sheet of this tree.
If you have non-numeric attributes, you can also try using classification - in this case you need to manually create possible classes (for example, the number of payments: from 3 to 5, from 6 to 10, etc.), and then use any algorithm classification (C4.5, SVM, Naive Bayes to mention a few).
Actually, I don’t think you have a lot of data. I believe that if the total is less than 50 MB, there is no need to use monsters such as Mahout, which are designed to handle really large amounts of data. You can use Weka or RapidMiner for this purpose. Even if they cannot process your data using the default configuration, simply increase the memory for the JVM, and in 99% of cases they will be fine.
ffriend
source share