When to use a query or code

Question

When to use a query or code

I am asking for a specific case for Java + JPA / Hibernate + Mysql, but I think you can apply this question to a lot of languages.

Sometimes I have to execute a query in the database to get some objects, such as employees. Let's say you need some specific employees (those with "John" as their first name), you would rather make a request returning this exact set of employees, or you would prefer to search for all employees and then use the programming language to extract those what interest you? why (lightness, efficiency)? What is (generally) more effective?

Is one approach better than another depending on the size of the table?

Considering:

The same complexity, reuse in both cases.

+7

java-ee sql database

dgmora Dec 12 '12 at 15:41

source share

7 answers

Always execute the query in the database. If you do not, you will have to copy more data to the client, as well as databases written to effectively filter the data, almost certainly more efficient than your code.

The only exception I can think of is if the filter condition is computationally complex and you can extend the calculation to more CPUs than the database.

In cases where I had a database, the server has more processor power than the clients, so if overload does not execute the request faster for the same amount of code.

You also need to write less code to execute the query in the database using the Hibernates query language, rather than writing code to manage data on the client. Requests for sleep mode will also use any client caching in the configuration without the need to write more code.

+10

Mark Dec 12 '12 at 15:45

source share

In general, I would let the database do what the databases are good at. Data filtering is that the databases are really good, so it’s best to leave it alone.

However, there are some situations where you may just want to capture them all and do filtering in the code. I would have thought if the number of lines would be relatively small, and you plan to cache them in your application. In this case, you will simply search for all the lines, cache them and perform subsequent filtering against what you have in the cache.

+4

Eric Petroelje Dec 12 '12 at 15:47

source share

Situation. I think in general it is better to use sql to get an accurate set of results.

The problem with loading all objects and then searching programmatically is that you must load all resources, which can take up a lot of memory. In addition, you need to search for all objects. Why do this when you can use your RDBMS and get the exact results you want. In other words, why upload a large dataset that can use too much memory and then process it when you can let your RDBMS do the job for you?

On the other hand, if you know that the size of your data set is not too large, you can load it into memory and then query it - this has the advantage that you do not need to switch to the DBMS, which may or may not need to be switched through your network, depending on your system architecture.

However, even then you can use various caching utilities to cache the overall query results, which eliminates the advantage of caching data yourself.

+2

hvgotcodes Dec 12 '12 at 15:45

source share

Remember that your approach must scale with time. What can be a small data set can subsequently turn into a huge data set over time. We had a problem with a programmer who encoded the application to query the entire table, and then performed manipulations with it. The approach worked fine when there were only 100 lines with two subheadings, but as data grew over the years, performance problems became apparent. Inserting even a date filter for a query for the last 365 days only can help your application cabinet better.

+2

Sun Dec 12 '12 at 17:18

source share

- if you are looking for an answer specific to hibernation, check @Mark's answer

Given the Employee example, provided that the number of employees can scale over time, it is best to use the approach to query the database to get accurate data. However, if you are considering something like a Department (for example), where the likelihood of fast data growth is less, it is useful to query all of them and keep them in memory - this way you do not need to go to an external resource (database) every time, which can be expensive .

So these are common parameters,

data scaling
business criticality
data volume
frequency of use

to make sense when the data will not be scaled often, but the data is not critical for the critical and the amount of data is managed in memory on the application server and is used often - bring it all and filter it out programmatically, if necessary.

if otherwise only certain data is received.

+1

humblelistener Dec 12 '12 at 18:16

source share

Which is better: keep a lot of food at home or buy a little of it? When do you travel a lot? Just organizing a party? It depends, right? Similarly, the best approach is to optimize performance. This is due to many variables. The art is to prevent drawing yourself into a corner when developing your solution and optimizing later when you know your real bottlenecks. A good starting point is here: en.wikipedia.org/wiki/Performance_tuning. You might think that it can be more or less universal: encapsulate data access.

+1

full.stack.ex Dec 16 '12 at 16:49

source share

dasblinkenlight · Accepted Answer · 2012-12-12T15:52:37+0000

There is a common trick often used in programming - paying with memory to speed things up. If you have a lot of employees, and you will request a significant part of them, one after another (say, 75% will be requested at one time or another), then request everything, cache (very important!), And complete the search in memory. The next time you request, skip the trip to the RDBMS, go directly to the cache and quickly browse: the return path to the database is very expensive compared to finding a hash in memory.

On the other hand, if you are addressing a small part of employees, you should ask only one employee: transferring data from an RDBMS to your program takes a lot of time, a lot of network bandwidth, a lot of memory on your side and a lot of memory on the RDBMS side. Querying a lot of rows to throw away everything but one never makes sense.

When to use a query or code

More articles: