MySQL or NoSQL? Recommended way to handle large amounts of data

I have a database that will be used by a large number of users to store a random long string (up to 100 characters). The table tables will be: userid, stringid, and the actual long string.

So, it will look something like this:

enter image description here

Userid will be unique and stringid will be unique for each user.

The application is like a simple todo-list application, so each user will have an average of 50 todo. I use stringid so that users can delete a specific task at any given time.

I guess this todo app could end in 7 million jobs in 3 years, and that scares me using MySQL.

So my question is: if this is really the recommended way to work with large amounts of data with a long row (each new task gets a new row)? and - is that MySQL is the right database solution for selecting such projects?

I have not experienced a lot of data, and I'm trying to save myself in the future.

+4
source share
4 answers

This is a fairly simple relational use case. I would not see NoSQL here.

The table you provided should work fine, but I will personally question the need for a composite primary key as you imagine it. I would probably have a primary key on stringid just to ensure the uniqueness of all entries. Instead of a composite primary key via userid and stringid. Then I would set a regular index for userid.

The reason for this is that you just want to query only with stringid (i.e. for deletions or updates), you are not tied to always having to query both fields in order to use your index (or adding the need to add separate indexes to stringid and userid, to include a query for each field, which means my space in memory and disk occupied by indexes).

As for whether MySQL is the right solution, it really will be up to you to decide. I would say that MySQL should not have problems processing tables with 2 million rows and 2 indexes on two integer identification fields. It is assumed that you have allocated enough memory to store these indexes in memory. There is some information about working with MySQL, so if you are just trying to learn, this is likely to be a good choice.

+2
source

This is not a matter of "large amounts" of data (mysql processes large amounts of data with only a fine, and 2 mio rows are not "large amounts" in any case).

MySql is a relational database. Therefore, if you have data that can be normalized, it is distributed between several tables that guarantee that each datapoint will be saved only once, then you should use MySql (or Maria or any other relational database).

If you don't have data without a schema, and speed is more important than sequence, you can / should use some NoSql database. Personally, I don’t see how the todo list will profit from NoSql (in this case, it does not really matter, but I think that at present most software frameworks have better support for relational databases than for Nosql).

+3
source

Regardless of what you consider to be "a lot of data," modern database engines are designed for a lot. Question "Relational or NoSQL?" not about which option can support more data. Different relational and NoSQL solutions will handle large amounts of data in different ways, and some are better than others.

MySQL can handle many millions of records, SQLite cannot (at least not as efficiently). Mongo (NoSQL) is trying to store it in memory (and also in the file system), so I saw that it crashes with less than 1 million records on servers with limited memory, although it offers shards that can help it scale more efficiently.

The bottom line is: the number of stored records should not be reproduced in SQL and NoSQL solutions, this solution should be left on how you will save and retrieve data. It looks like your data is already normalized (e.g. UserID), and if you also need consistency when, for example, you delete a user (TODO elements are also deleted), I would suggest using a SQL solution.

+2
source

I assume that all requests will refer to a specific user id. I also assume that stringid is a dummy value used internally instead of the actual task text (your random string).

Use an InnoDB table with a composite primary key on {userid, stringid} , and you will get all the performance you need because of how the clustered index works.

+1
source

All Articles