How to approach machine learning problems with a dynamic-size dataset?

I come up with a problem trying to classify a sample of data as good or bad with machine learning.

The sample data is stored in a relational database. The sample contains attributes id, name, number up-votes (to show good / bad quality), the number of comments, etc. There is also a table with elements with foreign keys pointing to the identifier of the data sample. Elements contain weight and name. All elements together pointing to a sample of data characterize a sample of data, which usually can help classify a sample of data. The problem is that the number of elements pointing to one foreign key is different for different samples.

I want to feed a machine learning input, for example. neural network, with elements that indicate a specific sample of data. The problem is that I do not know the number of elements, so I do not know how many input nodes I want.

Q1) Can neural networks be used when the input measurement is dynamic? If so, how?

Q2) Are there any recommendations for submitting a network with a list of tuples when the length of the list is unknown?

Q3) Are there any recommendations for using machine learning for relational databases?

+7
source share
5 answers

There is a machine learning field called Inductive Logic Programming that deals exclusively with relational data. In your case, if you want to use a neural network, you would like to convert your relational data set into a propositional data set (a separate table), that is, a table with a fixed number of attributes that can be submitted to the neural network or any other propositional student. These methods usually create so-called first-order functions that capture data from secondary tables. In addition, you only need to do this in order to induce your student - after you have functions and a student, you can evaluate these functions for new data points on the fly.

Here is a review article of some methods that can be used for such a problem. If you have further questions, ask away.

+3
source

Neural networks are not designed to work with dynamic input size. Very few machine learning methods tend to assume a constant dimension. I think the easiest way to handle this is to compute summary statistics for your variable-sized instances, for example, if you have an arbitrary number of inputs, calculate the average (and variance and everything else that you like) of these inputs, which will have fixed size if your heart is set to use neural networks.

There is a class of models that are suitable for what you want to do: Bayesian nonparametrics . This is a particularly elegant class of models that can grow to infinite size, but always use a finite number of parameters to explain the finite amount of data. Model updates are well defined in the presence of growing volumes of data (the number of parameters in models simply increases as much as is necessary for their placement).

However, there are two big caveats:

  • These patterns are complex. If you don’t have a decent background for machine learning and statistics, there may be a significant amount of time to get it.
  • The conclusion in endless models is insoluble. This is usually handled using MCMC methods, which are fairly computationally intensive. Despite recent successes in variational nonparametric parameters , this is still a new area of ​​research, and you certainly will not find implementations of this material for a wide range of models.
+2
source

I do not know the answer to all the questions, but maybe this will help:

Q1) You can try using some kind of dimensionality reduction technique, such as basic component analysis (PCA), to map all of your original objects to a common size. To do this, you will need to select all data points with a length of N and use only those that will study the map from dimension N to size M.

Example: suppose you have inputs that can be 3, 4, and 5 in size. You need to study a card from 5 to 3 in size, which you can study using all your glasses in size 5 and a card from size 4 to size 3 which you can recognize by using all your 4 points.

I do not expect this to work very well.

Q2) If Q1 is not resolved, there will no longer be a problem.

Q3) I’m thinking about this, but maybe you could match the database with the graph and use the many algorithms that are available for learning on the graphs?

+1
source

As far as I know, there are no known classification methods that directly work with dynamic-size input collections. Downsizing works by reducing arrogant but fixed data to lower sizes, so it really doesn't look like what you need.

One of the ways that this is handled by computer-aided training for binary classification (it seems to be the problem that interests you) is to plot the histograms. For example, you can classify text (of different lengths) by plotting histograms of words that appear in the text. Several extensions were presented, histograms of bigrams, n-grams, but they are based on the same idea.

Another type of idea is structured prediction, a good example of which is when you have a sentence and you need to determine for each word what part of the speech it has, in this type of setting, each word has a label and the interaction between the labels is very important. A well-understood method for this type of problem: hidden structural SVM , CRF and Max Margin Markov networks

+1
source

My apologies for adding the second answer, but it is significantly different from my first.

One of the possibilities that could work is to do the following: suppose your input can be 3, 4, or 5. You make your neural network 5 input nodes (the maximum size of your input). Then, if a point with size 3 appears, you indicate its values ​​on the first 3 nodes and provide a dummy value for the remaining nodes.

Let's discuss a specific example: suppose your input is points in R ^ 3, R ^ 4 or R ^ 5, and they are binary: they can take values ​​0 or 1 in each record. If the dot (0,1,0,0,1) appears, you simply load these values ​​into the 5 input nodes of the network. If the point (0,1,1) appears, you submit (0,1,1, -1, -1) to the network, where -1 is a dummy value. This ensures that you pass on to your network the information that "the last two nodes are special."

In a linear classifier, the values ​​of dummy values ​​are very dangerous; however, since the neural network is not linear, it can (in principle) study any function if you give it the necessary information and have sufficient training data to supply it.

0
source

All Articles