Hmm, I think that you do not understand well, there was a check in terms of neural networks. You cannot test your network with just one sample . So, I will try to teach you what I know about testing neural networks. This is a long statistical process that involves some reflection on “real data”, “expected behavior” ... You cannot check anything with 10-20 data and one check point.
As a rule, when training a neural network, you should have 3 sets :
- First, the training set is the input of the algorithm, which is used to set the weights of various networks. This is just a kind of required data used to run the algorithm.
- The second set, a test set, is used to select the right algorithm for your problem and reduce retraining. He compares the performance of different and chooses the best (the converted one will not have good performance at all).
- Test : this is the last phase. After choosing an algorithm and its parameter, you use a set of new data (taken from the real world), and you check whether it does what it should do (it looks like a consistency test).
(source: https://stats.stackexchange.com/questions/19048/what-is-the-difference-between-test-set-and-validation-set )
For example, we are building an algorithm used to test whether a person has a "chance to become rich." Here, as you would do and allow your neural network.
- First, we ask 10,000 people if they are rich or not, and we check some parameters (age, location, ...). This makes the "source dataset".
- We divided this list of 10,000 people into 3 sets (6000 2000 and 2000): a set of trainings, a set of tests and a set of tests (note: the proportion may vary depending on the verification procedure).
- We apply the training set (6000 first data), and we apply it to our various neural networks for their training (name them A, B, C and D).
- We take a test set (2000 of the following data) to test the performance of four networks. Here he avoids retraining. Suppose network A is not a network at all, it is just a recorder. He writes down different data and their classes, but cannot predict anything. This “dummy algorithm” would give a 100% result if we check the test with 6000 first person, but completely fail in this test. So, after this test, you can choose the "best algorithm". Let choose C.
- Now we run C with the rest of the data (a test suite or new data, if possible, always better). If we see that C has very strange and unpredictable behavior (this can be caused by some human error, for example, creating sets that are not really independent or are still correct, for example, if the data were obtained from 1996), we choose another algorithm or we are trying to check that there is a problem with the data or algorithm.
How can you create a reliable neural network (do not forget that the two main problems do not check the final result and retraining).
How refitting is a key concept. I will try to define it a bit and give an example. Overfitting makes an algorithm that is able to build very close approximations, but which cannot predict anything (what I called a "dummy algorithm").
Compare, for example, a linear interpolator and a polynomial (1,000,000 degrees, a very high degree). Our polynomial algorithm probably selects the data very well (extreme retooling exactly matches all our data). But he cannot predict anything.
In the example below, if we have a point in (2, -2) and (-1,2) in our test set (extracted from real-world data), we can assume that our polynomial interpolation was clearly overestimated because offers values such as (-1.10) and (2.20). Linear should be closer.

Hope this helps. (note that I am not an expert in this domain, but I tried to make a very clear and simple answer, so if something is wrong, do not hesitate to comment :))