The choice between algorithms

I am sure that there are many software testing engineers, algorithm testing engineers at Stackoverflow. Can someone please tell me how to proceed in the following scenario.

Say we have a mammogram and 5 different algorithms that take this mammogram as input and determine if the patient has cancer. If 3 out of 5 algorithms say that the patient has cancer, and 2 say that the patient has no cancer. Which algorithm should I believe. How should I continue testing these algorithms. Is there any statistical concept used in such scenarios?

I was asked this question in an interview for an algorithm evaluation engineer position. I believe that they tried to understand how I think, given such a scenario. How was I supposed to answer that?

thank you for your time

-Sashi

+7
comparison statistics
source share
16 answers

You cannot say anything with only this information. What if some of the algorithms reuse some other algorithms from these 5? Then they may be prone to the same defects.

Let's say A, B and C actually use the same sub-algorithm for data preprocessing, and the latter gives suboptimal results on some specific image, and therefore the preprocessed image leads to the fact that later phases lead to incorrect results - it doesn’t matter that you have three algorithms saying the same thing.

You need more specific data on how the algorithms relate and what statistical characteristics matter, on the presence of errors in order to be able to perform any analysis.

+7
source share

This is actually quite difficult to answer. I am sure that every algorithm is good at choosing different types of input triggers. Most likely, you will need a statistical analysis to determine what each algorithm usually detects as cancer. In addition, you can go so far as to make something like a Bayesian model for describing / determining whether a patient has cancer from algorithmic results.

you may find that 3 algorithms sequentially miss a certain type of cancer, that the other two are moderately good at collecting. you can find similar relationships that arise, for example, when Algorithms 2, 3, and 5 say that there is no cancer, Algorithm 1 says that it exists, and Algorithm 4 is unconvincing that benign spots of a certain shape and color intensity usually appear analyzed, but probably not cancer.

+4
source share

Choosing the best classifier for a job or combining different classifiers is a separate field. This general classification article is as good a start as any other to find out about choosing the best classifier to work with. And this article on ensembles of classifiers is a good place to start studying the union of classifiers.

To give a basis for an answer to your (rather wide) question: the best classifier for a task depends on several factors:

  • Required classification quality (in your case, it will be high)
  • Allowable classification complexity (i.e. you can calculate in a few days to get a response from a few milliseconds) (time is not a limitation in case I assume)
  • Cost associated with misclassification. This is a very important factor in your case. If you tell people that they have cancer when they don’t have much stress, but (one hopes) further testing (which costs money) will eventually find out that they are healthy. On the other hand, if you miss a patient’s cancer, it may die. This means that the “best” classifier (the one that makes the least mistakes) may not be the best for your problem.

At this last point: let's say 1 out of 1000 women has cancer, I have some classifiers:

  • Mrs. 20% of cancer cases, and says that a healthy woman has cancer in 2% of cases. This classifier will be about 200 errors in a population of 10,000 people.
  • Just say, “This person has no cancer” in all cases. Only 10 errors in 10,000 cases!
  • Just say, “This vice has cancer,” it will lead to 9990 errors in 10,000 cases.

The second classifier makes the fewest mistakes, but a few months after using it, people who could be saved begin to die. The third classifier sends everyone to the next test (which will have the same problem as this one), or maybe it does a useless life-changing operation on 9,990 healthy people. The second test makes a compromise. Two people can become very sick or even die, 198 people experience painful and stressful procedures and operations. Obviously, in your case, all classifiers were similar to classifier 1 with slight changes in percentage. In these cases, you need to compromise between the missing cases of cancer and cause the rest of the procedure (including the cost!) To healthy people. The starting point for exploring this trade-off is the performance of the receiver-operator

+4
source share

Put your hat on for an interview, this is a psychological assessment. Questions such as evaluating this algorithm have more than one correct answer. I learned about these issues from my wife, who worked as a recruiter for 5 years. The interviewer wants to see how you react. It is best to make assumptions and draw a logical conclusion. Do not say "I do not know", do not argue, do not ask many questions. You will seem complicated and reasoned (like many programmers).

Now that you know that this is not a programming issue, think about it on careeroverflow.com. I like these questions because it shows the ability to adapt and become non-rigid.

Why a round hatch? <- Microsoft version

+3
source share

Well, obviously, false negatives are much more serious than false positives, so all else being equal, we might want to show preference for algorithms that find more cancer.

If we download more mammogram software, and we find that the set of algorithms seems to agree with a large selection of mammograms, then we may prefer these algorithms because their results are supported by more algorithms.

Something like that.

+2
source share

All things being equal, you can say that the patient has a 60% chance of cancer. To give a better answer, you need to know more information about how the algorithm works. Some points to consider:

  • It is possible that some algorithms are newer than others, or may be less reliable. It would be nice to know the accuracy of each algorithm using historical mammogram data labeled “Cancer” and “No Cancer.”
  • Each person’s cancer is slightly different - maybe there are signs that a particular algorithm is better to identify? Is a domain expert required to determine which diagnosis is correct based on the findings of the algorithm and mammogram (image?) Data?
  • As mentioned above, it is possible that some algorithms use the same methods as other algorithms, so both can have the same offset.
+2
source share

This is not a trivial problem and depends heavily on what risks you are willing to take.

Formalities, such as decision theory and Bayesian inference, should indeed be considered here. This allows you to consider the different probabilities of false positives / negatives and whether you want to weigh them differently.

+2
source share

I do not think you should have answered in any particular way. The interviewer will probably want to analyze how you evaluate this problem, not your final answer. In other words, they were probably interested in your own algorithm to make a decision.

In real life, I can’t think of any serious choice between the 5 cancer search algorithms, especially when they give different results.

+2
source share

This is a good opportunity to implement what is sometimes called an “expert system”. You take a large sample of your data (in your case, mammogram images and the results of various algorithms) and run it behind a series of experts in real life, flesh and blood in this area (here, oncologists or laboratory technicians). Record the answers for each image along with the outputs of the algorithms. In the end, you should have enough data to display the algorithm output to the expert output. To make sure your comparison works, run a bunch of test images through your system (samples that were not part of the original data set), and ask your expert team to double-check the results. Ideally, experts should coordinate a very large percentage of the time with your system.

Without knowing anything about the algorithms themselves, it is difficult to make a decision based on 3 yes and 2 no results (especially for something as important as cancer screening). As close as possible to the same results as a trained specialist, this is your goal (at least in the beginning), and such systems can sometimes be made more accurate, based on the knowledge and experience of experts in this field, and not only mathematical algorithms .

+2
source share

I would ask if a computer uses a computer to determine if someone has cancer the right course of action, given that using algorithms is error prone.

But if for some reason it is necessary to use a set of algorithms, then a person (for example, a doctor) will check the mammogram personally in case of uncertainty. The doctor can then decide whether additional tests are needed based on disagreement with the algorithms used.

The only thing that we overlook as programmers is that people can solve some problems that we cannot predict; Imagine that the doctor notices something on the mammogram, that the algorithms were not designed to be detected?

+1
source share

I think that if you had some kind of statistical information about each previous execution of the algorithm (how many times it was right / wrong in a number of statistical experiments), then you could calculate the probability of the right choice for each algorithm. Then you could somehow combine these probabilities to get a chance that the person has cancer. Just speculation ...

+1
source share

To succeed in this situation, you usually want to have a gold standard — for example, the doctor’s opinion on whether a mammogram shows cancer or uses historical information, where you know that one set of mammograms shows cancer and the other doesn’t . Along with this, if possible, you need information on which indicators use each algorithm in a particular case.

With the standard, you can begin to evaluate which algorithm is / more accurate (that is, most often agree with an expert opinion). Information on indicators allows you to talk in more detail about the time and circumstances under which each of them seems more or less accurate, so you can begin to make a judgment about the times / circumstances in which to trust each other. With this, you can (at least hopefully) combine the results of the five existing algorithms into a single overall result, which (with caution and perhaps a little luck) is more accurate than any of them individually.

+1
source share

In principle, if you know that the results of the algorithms are conditionally independent (i.e., independent, given the true but unknown class label), use Naive Bayes is the optimal meta classifier.

Otherwise, this question is impossible without knowledge of the structure of conditional dependencies among classifiers. For example, if the classifiers A, B, C, and D are weak, identical classifiers (i.e., they always give the same results) and have an accuracy of 0.51, while the classifier E is conditionally independent of the classifiers A, B, C, and D and has an accuracy of 0.99, then I think it’s pretty obvious that voting is a bad idea.

+1
source share

Since the algorithm answers yes or no, this is pretty easy. To compare your algorithms, actual test data is needed. Probably, you should collect long-term data on the success rates of various heuristics, as well as make a statistical analysis of which ones are more likely.

Validating things like the Google search algorithm, which doesn't have the “right” answer, will be harder.

+1
source share

Go back and look at the trend data for each of the algorithms. In the past, how often A was correct, B was correct, etc. A typical route here would be the execution of all algorithms and the application of a Bayesian control system, but I think this approach is too general, because it largely depends on the quality of the source data. Since each algorithm has a different type of input, a more specialized approach would be to create a filter that displays the source data for markers that satisfy a specific algorithm. For example, if the source comes from an older machine, you do not need an algorithm that does not work well with image noise analysis. A specialist in mammography technology would be a great advantage to help identify more specific markers. You may be able to apply a weighing system after this filtering process to provide a more accurate estimate of confidence.

+1
source share

Based on the information provided, you will not be able to answer. You will need to take all 5 algorithms and test them on patients diagnosed with cancer, as well as those who are known to be cancer free. This will allow you to determine which algorithm was the most accurate.

You could also make an algorithm out of 5 (assuming all of them were good and valid algortihms) and given that one has more votes. This may or may not be a valid sixth algorithm, depending on how good the first 5 are.

0
source share

All Articles