Storage of data objects. Can a JOIN table do something that cannot be selected by a separate SELECT table?

Now that NOSQL or only objects storage systems such as MongoDB or memcached are really gathering steam in the world. I was wondering if there are any queries that cannot be executed on them, which can be executed using multiple joins of objects (in SQL, which is the JOIN "table" ). In other words, are there multiple table queries that cannot be processed by multiple single table queries per row?

Basically, is there a precedent where the join of several tables cannot be replicated by accessing the same table at the same time in object-based storage systems?

The following are examples of normal 3NF queries using has_man and has_many_through relationships. These are not the most difficult queries, but they should give you a starting point for this concept. Note that any value in {} means the value of the result of the last query.


The company has many users.

 SELECT user.*, company.name as company_name FROM user LEFT JOIN company ON company.id = user.company_id WHERE user.id = 4 

against

 SELECT * FROM user WHERE id = 4 SELECT * FROM company WHERE id = {user.comany_id} 

There are many students in the club through membership.

 SELECT student.* FROM student LEFT JOIN membership on membership.student_id = sudent.id WHERE membership.club_id = 5 

against

 SELECT * FROM membership WHERE club.id = 5 SELECT * FROM student WHERE id = {membership.student_id} 

The reason I'm curious is because I want to know if object-based systems (which rely on accessing objects with the same table at the same time) can do what RDBMS databases such as PostgreSQL or MySQL can do .

So far, the only thing that seems to be wrong is that more requests are required.

+7
sql database mysql mongodb memcached
source share
4 answers

1 - running multiple split queries leaves you useless consurrency - by the time you got something from table 1, it could have been deleted and it could still be in table 2 - now suppose 5 correlated tables.

2 - query execution with at least moderately complex logic over fields that are not a mythical identifier

3 - control of the amount of extracted data (you are unlikely to need more than 50% of the data needed to deserialize / create valid objects and even the worst whole trees of related objects)

4 - correlated queries (nested selects) that the SQL server will optimize as joins for additive complexity or better (| T1 | + | T2 | + | T3 | + | T4 |), while any ORM or nonSQL will have to be repeated internal queries and the occurrence of multiplicative complexity (| T1 || T2 || T3 | * | T4 |)

5 - data set sizes, scalability not only in data sizes, but also in concurrency processing under updates. Even ORMs that support transactions make them so long that the chances of deadlocks increase exponentially.

6 - blind updates (much more data touched for no reason) and their dependence and failure based on a blind tool (a mythical version that is really necessary, for example, in 1% of the relational data model, but ORM and alikes should have it everywhere)

7 - the absence of any standards and compatibility - this means that your system and data will always be at significantly higher risk and depend on software changes due to academic adventurism, and not any real business responsibility and expect to invest a lot of resources just when testing changes.

8 - data integrity - if some code just deleted half of today's order records from T1, because there was no T2 foreign key to stop it. A pre-normal thing related to individual requests.

9 - negative trend of maturity - continues to split instead of standardization - give it 20 years and maybe it will stabilize

Last but not least, it does not reduce any complexity (the same correlation between the data still exists), but it is very difficult to track and manage the complexity or have any realistic means or transparency when something goes wrong . And this adds complexity to 1-2 layers. If something goes wrong in your SQL tables, you have tools and queries to discover and even correct your data. What are you going to do when some ORM just tells you that it has an "invalid pointer" and an exception is thrown, since you don't want an "invalid object"?

I think enough :-)

+3
source share

Just because you can, it does not mean that you should.

Several SELECT statements; alternative cons:

  • The fewer trips to the database, the better. TCP overhead cannot be recouped, and it seems that Network Neutrality is officially dead, so we can expect you to see traffic from multi-select / nosql because you might have to pay for this bandwidth ...
  • due to the delay between the initial and subsequent operations, the risk of supporting data that does not reflect what was in the system when the first request was launched.
  • less scalable - the larger the data set, the more work the application does to solve business rules and associations that can scale much better in the database.
  • complexity in the application, which also makes the business less portable (IE: migrate from Java to .NET or vice versa), you look at creation from scratch when the business logic in the database minimizes this)
+4
source share

In fact, one of the biggest problems is that some of the NoSQL databases are not transactional across multiple queries. ORM, like Hibernate, will perform multiple β€œjoin” requests sometimes, but it has the advantage of being in the same transaction.

With NoSQL, you don't have that luxury. Thus, this can very easily lead to erroneous results:

 SELECT * FROM user WHERE id = 4 SELECT * FROM company WHERE id = {user.comany_id} 

If the company for user.company_id is deleted between two operator calls. This is a well known issue with these databases. Therefore, regardless of whether you are performing the JOIN correctly, the problem will not have transactions.

Otherwise, you can simulate everything while it can store bytes :)

+2
source share

You could use nosql as an old-fashioned "hierarchical" database!

In addition to OMGPonies answers, reporting is harder to do.

On scaling, it’s not. nosql is designed to scale if you use it correctly.

Another reason to do nosql is if you do all your work in objects, go on to matching or in sql and don't do the work with complex (i.e. manual for efficiency) UPDATE statements. for example, updating a connection or updating 'where ... in (...)'.

If the database is single-purpose (for example, for applications with a large volume), nosql is more likely to be in order.

Multipurpose - OLTP - Business Line - comes with SQL.

I could go on, but this is in my lunch break. Not that I ever went to work hours. I prefer to just eat during the lunch break.

+1
source share

All Articles