Why don't search engines use mysql?

Question

Why don't search engines use mysql?

Search engines (or similar web services) use flat files and nosql databases. The structure of the Inverted Index is simpler than the many-to-many relationship, but it should be handled more effectively with the latter. For several billion web pages and millions of keywords, there should be two tables. I checked a table of 50 million rows; mysql speed may be comparable to BerkeleyDB speed.

I think that the problem of working with a large mysql database appears when working with something like ALTER TABLE (which is not the case here). This performance is characterized by read intensity, in which mysql is not bad. When reading a SELECT row, I did not find a significant difference between a table with several rows or several million rows; Is it different in that it has billions of lines?

NOTE. I do not mean Google or Bing (or advanced features such as full-text search), I am discussing the concept.

+4

database mysql search search-engine inverted-index

Googlebot Oct 16 '11 at 10:33

source share

1 answer

AlexanderMP · Accepted Answer · 2011-10-16T11:01:59+0000

AFAIK, nosql provides flexibility that no other regular relational database engine offers. I don’t know which search engines use this database engine, but I could think of several advantages of using nosql (not flat files. I don’t know why to use them for complex applications).

Now, if you simply compare the criteria and produce results in no particular order, you are fine with any relational database. But as soon as you want to provide the most relevant results, there are many criteria to consider. You could:

Prioritize results that have the same content as user-selected results.
List the results that are more relevant to a person based on location, language, other known facts.
First, list the more popular results (again, the most popular in a particular group / age group / group of classes or in other groups based on known facts about the user).

These are just the basic sorting criteria, those that come to mind. When someone begins to develop and support, hundreds of other criteria will come to mind and will be able to be implemented. Now think about how each will be implemented. There may be thousands of fields characterizing each resource, and each new function will require additional data.

You can do this using the EAV template in a relational database, which will give you some flexibility, or you can use NoSQL, which is built specifically for such purposes.

Again, this is just the reason for using NoSQL. I know many more reasons to use an RDBMS.

Why don't search engines use mysql?

More articles: