How to evaluate a search engine?

I am a student doing research to improve my existing search engine algorithm.

I want to know how I can evaluate a search engine that I have improved in order to quantify how much the algorithm has been improved.

How can I compare the comparison of old and new algorithms?

thanks

+7
seo search-engine pagerank
source share
10 answers

This is usually done by creating a set of test sets of questions, and then to evaluate how well the answer to the search answers these questions. In some cases, the answers must be unambiguous (if you enter slashdot into the search engine that you expect to receive slashdot.org as your top hit), you can think of it as a class of tough queries with "correct" answers.

Most other queries are inherently subjective. To minimize bias, you should ask several users to try your search engine and evaluate the results for comparison with the original. Here is an example of an article on computer science that does something similar:

http://www.cs.uic.edu/~liub/searchEval/SearchEngineEvaluation.htm

As for the specific comparison of the algorithms, although it is obvious what you are measuring depends on what you are interested in knowing. For example, you can compare computational performance, memory usage, overhead, or time to return results. If you are trying to create very specific behavior, for example, start a specialist search (for example, a literature search) for certain parameters, then you need to explicitly test this.

Heuristics for relevance is also a useful check. For example, when someone uses search terms that are probably related to programming, do you usually get more results from stackoverflow.com ? Will your search results be better if you do this? If you provide a set of trust scales for specific sites or domains (for example, .edu or .ac.uk domains as more reliable for technical results), you need to check the effectiveness of these scales.

+11
source share

First, let me start by telling you that you want to apply traditional research methods to search engine results. Many SEOs did this in front of you and, as a rule, kept it to themselves, since sharing “amazing results” usually means that you cannot use or have the upper hand, this suggests that I will share, as far as I can, some pointers and all you need to look for

  • Determine which part of the algorithm you are trying to improve?

Different searches perform different algorithms.

Broad searches

For example, in the broad sense of the search for h, engines tend to return many results. A common part of these results includes

  • News feeds
  • Products
  • Images
  • Blog posts
  • Local results (this is based on a Geo IP search).

Which of these types of results is thrown into the mix may vary depending on the word.

Example: Cats return images of cats and news, Shoes returns local shopping for shoes. (this is based on my IP in Chicago on October 6th)

The goal of returning results for a broad term is to provide a little of everything for everyone so that everyone is happy.

Regional modifiers

As a rule, at any time when a regional term is tied to search, it will greatly change the results. If you are looking for “Chicago Web Design” because the word “Chicago” is attached, the results will start with the top 10 regional results. (this is one liner to the right of the map), after 10 lists will be displayed in a general way "fashion result".

The results in the "top ten local", as a rule, are very different from the results in the list of organic substances below. This is due to the fact that local results (from Google maps) rely on completely different data for ranking.

Example. Having a phone number on your website with the Chicago area code will help in local results ... but NOT in overall results. The same with the address, yellow book list, etc.

Speed ​​of results

Currently (as of 10/06/09) Google is beta testing caffeine. The main feature of this assembly is that it returns results in almost half the time. Although you cannot consider Google to be slow right now ... speeding up the algorithm is important when millions of requests happen every hour.

Reduce Spam Lists

We all found an experienced search that was riddled with spam. A good example is the new Google Caffeine release http://www2.sandbox.google.com/ . Over the past 10+, one of the largest battles on the Internet has been between Search Engines and Search Engines. Gaming google (and other engines) is very profitable and that Google spends most of its time.

A good example is the new Google Caffeine release. So far, my research, as well as several others in the field of SEO, have found that this is the first build in the last 5 years to give more weight to Onsite elements (such as keywords, internal site linking, etc.) than previous builds. Prior to this, each “release” seemed to support incoming links more and more ... this is the first step back to the “content”.

Ways to test the algorithm.

  • Compare two assemblies of the same engine. This is currently possible by comparing Caffeine (see link above or google, google caffeine) and current Google.

  • Compare local results across regions. Try searching terms, such as web designs, that return local results without a local keyword dictionary. Then use the proxy (found through google) to search from different places. You will want to make sure you know the location of the proxy (find a site on Google that will tell you your IP address or the city of your geo IP address). Then you can see how different regions return different results.

Warning ... DONT select the term locksmith ... and be careful with any terms that have a lot of spam records when returning the result. Google local is pretty easy to spam, especially in competitive markets.

  1. As mentioned in the previous answer, compare the number of click users needed to find the result. You should know that at present, none of the main engines uses “failure indicators” as indicators of site accuracy. This is PROBLEM, because it would be easy to make it look as if your result has a failure rate in the range of 4-8%, without having such a low level ... in other words, it would be easy to play the game.

  2. Keep track of how many custom search options are used on average for a given term to find the desired result. This is a good indicator of how well a smart engine guesses the type of request (as mentioned in this answer).

** Denial of responsibility. These views are based on my experience since October 6, 2009. One thing about SEO and engines is EVERY DAY change. Google may release caffeine tomorrow, and that will change a lot ... it says it's fun to research SEO!

Greetings

+10
source share

To appreciate something, you must determine what you expect from it. This will help determine how to measure it.
Then you can measure the improvement.

As for the search engine, I assume that you could measure its ability to find things, their accuracy in returning what is relevant.

This is an interesting task.

+2
source share

I do not think that you will find the final mathematical solution, if that is your goal. To evaluate this algorithm, you need standards and goals that must be met.

  • What is your baseline for comparison?
  • What do you classify as "superior"?
  • What do you consider a "successful search"?
  • How big is your test group?
  • What are your tests?

For example, if your goal is to improve the page ranking process, then decide whether you are evaluating the effectiveness of the algorithm or the accuracy. Efficiency judging means that you are gaining time for your code to ensure a consistent large set of data and record results. You will then work with your algorithm to improve time.

If your goal is to improve accuracy, you need to determine what is "inaccurate." If you are looking for a “Cup”, you can only say that the first site provided is “the best”, if you yourself can determine exactly what is the best answer to the “Cup”.

My suggestion for you would be to narrow down the scope of your experiment . Identify one or two qualities of a search engine that, in your opinion, need clarification and are working to improve them.

+2
source share

Scientists typically use accuracy and recall as two competing quality metrics for a search engine (such as a search engine).

Thus, you can measure the effectiveness of your search engine relative to Google, for example, by counting the number of relevant results in the top ten (call this accuracy) and the number of important pages for this query, which, in your opinion, should have been the top 10, but were not (call it).

You will still have to compare the results from each search engine manually for a certain set of queries, but at least you will have one metric to evaluate them. And the balance of these two is also important: otherwise you can get perfect accuracy without returning any results or perfect feedback, returning every page on the Internet as a result.

The Wikipedia article on accuracy and recall is not bad (and defines an F-measure that takes into account both).

+2
source share

In the comments that you said, "I heard about a way to measure the quality of search engines, counting how long a user needs to click the back button before finding the link that he wants, but I can use this because you need users to check your search engine, and that’s a headache itself. " Well, if you put your engine on the Internet for free for several days and advertise a little, you will probably get at least a couple of dozen attempts. Give these users an old or new version at random and measure these clicks.

Another possibility: suppose Google is by definition perfect and compares your answer with its requests. (Maybe the sum of the distance of your first ten links to their partners in Google, for example: if your second link is a google twelveth link, it's 10 distances). This is a huge assumption, but much easier to implement.

+1
source share

You must clearly identify the positive and negative qualities, such as how quickly the answer they are looking for or how many “wrong” answers they get along the way. This is an improvement if the correct answer is No. 5, but the results return 20 times faster? Such things will be different for each application. The correct answer may be more important in finding a corporate knowledge base, but a quick answer may be required for the phone support application.

Without parameters, no test can be considered a victory.

0
source share

Discover the fact that the quality of search results is ultimately subjective. For comparison, you should have several counting algorithms: old, new, and several control groups (for example, counting by the length of the URI or page size or some similarly intentionally violated concept). Now select a bunch of queries that implement your algorithms, say, about a hundred. Let's say you are done with 4 common algorithms. Create a 4x5 table by displaying the first 5 query results for each algorithm. (You can make the top ten, but the first five are more important.) Remember to randomize which algorithm appears in each column. Then roll up the person in front of this thing and ask them to choose one of the 4 sets of results that they like best. Repeat throughout the set of queries. Repeat for more people how you can stand. This should give you a fair comparison based on the total winnings for each algorithm.

0
source share

http://www.bingandgoogle.com/

Create an application like this that compares and retrieves data. Then run the test with 50 different things that you need to look for, and then compare with the desired results.

0
source share

I had to professionally test a search engine. This is what I did.

The search included fuzzy logic. The user enters the “Kari Trigger” web page and the search engine will retrieve entries such as “Gary Trager”, “Trager, C”, “Corey Trager” etc., each with a score from 0 to> 100, so that I could rate them with probability to the least probable.

First, I re-pinned the code so that it could be removed from the web page in batch mode using a large file of search queries as input. For each line of the input file in batch mode, the result of the top search and its evaluation are written. I collected thousands of real search queries from our production system and led them through a batch setup to establish a baseline.

Since then, every time I changed the search logic, I ran the package again and then deleted new results compared to the baseline. I also wrote tools to make it easier to see interesting parts of diff. For example, I don’t care that the old logic returned “Corey Trager” as 82, and the new logic returned it as 83, so my tools would filter them out.

I could not handle these problems manually. I simply would not have the imagination and understanding to create good test data. Data in the real world was much richer.

So, to repeat:

1) Create a mechanism to distinguish between the results of starting a new logic and the results of the previous logic. 2) A test with lots of realistic data.
3) Create tools that will help you work with diff, filtering out noise, amplifying the signal.

0
source share

All Articles