MySQL JOIN Abuse? How bad is it?

I read a lot about relational databases using many JOIN statements for each SELECT. However, I was wondering if there is any long term performance problem with abusing this method.

For example, let's say we have a users table. Usually I add the "most used" data, instead of doing any extra JOINs. When I say that the "most used" data, for example, will be the username, image displayed and location.

This data will always be needed when displaying any user interaction on a website, for example: on each comments JOIN table for articles . Instead of doing JOINs in the users and users_profiles to get "location" and "display", just use the information in the users table.

That is my approach, however I know that there are many excellent and experienced programmers who can give me a word of advice on this matter.

My questions:

Should I try to be conservative with JOINs? or should i use them more? Why?

Are there long-term performance issues when reusing JOINs?

Note. I must clarify that I am not trying to avoid JOINS at all. I use them only when necessary. In this example, there may be authors of comments / articles, additional profile information that is displayed only on user profile pages ... etc.

+4
source share
5 answers

My advice on data modeling:

  • You must approve optional (nullable) columns greater than 1: 1, generally speaking. There are still examples where 1: 1 makes sense, usually revolving around subtypes. People tend to be more squeamish when it comes to columns with zero value, which is weird than joins;
  • Do not make the model too indirect if it is really not justified (more on this below);
  • Advantage combines aggregation. This may vary, so it must be tested. See Oracle vs MySQL vs SQL Server: aggregation vs Joins for an example of this;
  • Joins are better than N + 1. Choosing N + 1 is, for example, choosing an order from a database table, and then issuing a separate request to get all the items for this order;
  • The scalability of the compounds is usually a problem only with mass selection. If you select one row and then attach it to several things, this is rarely a problem (but sometimes it is);
  • Foreign keys should always be indexed unless you are dealing with a trivially small table;

Read more in Database Design Errors Made by AppDevelopers .

Now, regarding the straightforwardness of the model, let me give you an example. Let's say you are developing a user authentication and authorization system. The reworked solution might look something like this:

  • Alias โ€‹โ€‹(id, username, user_id);
  • User (id, ...);
  • Email (id, user_id, email address);
  • Login (id, user_id, ...)
  • Input roles (id, login_id, role_id);
  • Role (id, name);
  • Role Privilege (id, role_id, privilege_id);
  • Privilege (id, name).

So, you need 6 unions to get on behalf of the user entered into the actual privileges. Of course, this may require an actual requirement, but most often this system arises due to the fact that some developers believe that they may someday be needed, although each user has only one alias, the user for logging in is 1: 1 and so on. A simpler solution is:

  • User (identifier, username, email address, user type)

and, well, that. Perhaps if you need a complex system of roles, but it is also quite possible that you will not do this, and if you do it easily enough for a slot (a user type becomes a foreign key in the table of user or role types), or it is usually easy to display from old to new.

This is a difficult task: it is easy to add and remove it is difficult. This is usually a constant vigil against unintended complexity, which is bad enough if you don't make it worse by adding unnecessary complexity.

+8
source

Some bright man once said:

Normalize until it hurts, normalize until it works!

It all depends on the type of associations and connection conditions, but there is nothing wrong with them. Attaches to table table1.PK = table2.FK is very efficient.

+5
source

If the data is 1 โ†” 1 and you donโ€™t have many null fields, do not exceed the normalization. You can specify the required fields ("most used data") in the select statements.

0
source

Do not be afraid to join. The relational model is strong and you must use it. Someone always discussed N + 1, but also took into account - in your context - simultaneous connection with users for security purposes, since the request may additionally indicate the existence of the user, state, correct session, and field expectation.

Many large sites go so far as to have a session table and an HTTP request table for each request, always connected to each other for page requests. The advantage is that the parameters are always mapped to sessions, sessions for the right users, the status of the user is always checked, as well as c & c, but moreover, it provides some interesting advantages.

Long story, do it wisely, but don't skimp on joining.

0
source

As others have said, associations are not something that can be avoided at all. In fact, in most models, few connections are rarely found in every request executed by the application.

Even in the largest queries, they are usually not performance issues - and often fix performance issues that might occur if you have redundant and duplicate data everywhere.

However, keep in mind that under the cover, a database simply joins two tables at a time. Thus, federations require several steps for the database that are invisible to the developer. When he joins, he has to make several decisions on how to do this:

  • go through all the values โ€‹โ€‹in the left table and then match them one by one with the values โ€‹โ€‹on the right?
  • Do the opposite?
  • Sort keys from both tables and go through them at the same time?
  • Create key hashes on both sides?
  • Apply filter criteria before or after this compound?
  • etc.

So, if your associations are complex, ultimately the effectiveness will be determined by the complexity of your optimizer / scheduler, as well as the currency and details of your statistics. MySQL is not a strong contender here, so I usually support my model and sql is a bit simpler than if I used something else. But a few connections to the request should always be in order.

0
source

All Articles