Can I optimize my database by breaking one large table into many small ones?

Question

Can I optimize my database by breaking one large table into many small ones?

Suppose I have one large table with three columns: "user_name", "user_property", "value_of_property". Lat also suggest that I have many users (say, 100,000) and many properties (say, 10,000). Then the table will be huge (1 billion rows).

When I retrieve information from a table, I always need information about a specific user. Therefore, I use, for example, where user_name='Albert Gates' . Thus, each time the mysql server needs to parse 1 billion rows to find those that contain Albert Gates as the username.

Wouldn't it be wise to split a large table into many small ones that correspond to fixed users?

+6

optimization split mysql

Roman Nov 07 '10 at 11:13

source share

4 answers

You should normalize your design as follows:

 drop table if exists users; create table users ( user_id int unsigned not null auto_increment primary key, username varbinary(32) unique not null ) engine=innodb; drop table if exists properties; create table properties ( property_id smallint unsigned not null auto_increment primary key, name varchar(255) unique not null ) engine=innodb; drop table if exists user_property_values; create table user_property_values ( user_id int unsigned not null, property_id smallint unsigned not null, value varchar(255) not null, primary key (user_id, property_id), key (property_id) ) engine=innodb; insert into users (username) values ('f00'),('bar'),('alpha'),('beta'); insert into properties (name) values ('age'),('gender'); insert into user_property_values values (1,1,'30'),(1,2,'Male'), (2,1,'24'),(2,2,'Female'), (3,1,'18'), (4,1,'26'),(4,2,'Male');

In terms of performance, the innodb cluster index works in this similar example (COLD run):

 select count(*) from product count(*) ======== 1,000,000 (1M) select count(*) from category count(*) ======== 250,000 (500K) select count(*) from product_category count(*) ======== 125,431,192 (125M) select c.*, p.* from product_category pc inner join category c on pc.cat_id = c.cat_id inner join product p on pc.prod_id = p.prod_id where pc.cat_id = 1001; 0:00:00.030: Query OK (0.03 secs)

+3

Jon black Nov 07 '10 at 11:22

source share

Correct indexing of your database will be number 1 for better performance. I once had a request for half an hour (on a large dataset, but nonetheless). Then we find out that the tables do not have an index. After indexing, the query took less than 10 seconds.

+2

dkinzer Nov 07 '10 at 14:04

source share

Why do you need this table structure. My problem with financing is that you have to pass data to the property value every time you want to use it. This is bad, in my opinion, also storing numbers, because the text is crazy, given that all of its binaries are anyway. For example, how will you have the required fields? Or fields that should have restrictions based on other fields? For example, a start and end date?

Why not just have properties as fields and not some relationships of many?

have 1 flat table. When your business rules begin to show that properties should be grouped, you can consider moving them to other tables and have several ratios of 1: 0-1 with the user table. But this is not normalization, and it will slightly worsen performance due to the additional connection (however, the documenting nature of the table names will greatly help any developers)

In one case, I regularly see that databqase performance is fully castrated, having a common

Identifier, property type, property name, property value table.

It is really lazy, but extremely flexible, but it completely kills performance. In fact, at a new job, where performance is poor, I really ask if they have a table with this structure - it invariably becomes the center point of the database and runs slowly. The whole point of a relational database is that relationships are determined in advance. This is just a method that aims to accelerate development with huge costs for application speed. It also greatly affects business logic at the application level to behave - which is not at all protective. In the end, you will find that you want to use properties in a key way, which leads to all kinds of casting, which further degrades performance.

If the data is 1: 1 related to the object, then it should be a field in the same table. If your table falls into more than 30 fields, then think about moving them to another table, but do not call it normalization, because it is not. This is a method that helps development teams group together at the expense of productivity in an attempt to help understand.

I do not know if mysql has an equivalent, but sqlserver 2008 has sparse columns - null values do not take up space. Encrypted datatypes columns

I'm not saying that the EAV approach is always wrong, but I think using a relational database for this approach is probably not the best choice.

+1

John nicholas Nov 07 '10 at 13:42

source share

Mark byers · Accepted Answer · 2010-11-07T11:17:09+0000

No, I do not think this is a good idea. The best approach is to add an index to the user_name column - and possibly another index in (user_name, user_property) to search for one property. Then the database does not need to scan all the rows - it just needs to find the corresponding record in the index, which is stored in the B-Tree , which makes it easy to find the record in a very short period of time.

If your application is still slow, even after indexing correctly, sometimes it might be a good idea to partition your largest tables.

Another thing you might consider is normalizing your database so that the username is stored in a separate table and uses the foriegn integer key instead. This can reduce storage requirements and improve performance. The same goes for user_property .

Can I optimize my database by breaking one large table into many small ones?

More articles: