Comprehensive processing in stored procedures Vs.net application

We are creating a new application in .net 3.5 with a SQL Server database. The database is quite large, with about 60 tables with data load. The .net application has the functionality to transfer data to this database from data entry and from third-party systems.

After all the data is available in the database, the system must perform many calculations. The logic of the calculation is quite complicated. All data necessary for the calculations are in the database, and the output also needs to be stored in the database. Data collection will occur every week, and the calculation should be performed every week to create the necessary reports.

Due to the above scenario, I thought of doing all these calculations using a stored procedure. The problem is that we need data independence, and the stored procedure cannot provide this to us. But if I do all this all the time in .net based on the query base, I don’t think he can finish the job quickly.

For example, I need to query one table, which will return 2000 rows for me, and then for each row I need to query another table, which will return 300 results to me, than for each row of this, I need to query several tables (about 10) to get the required data , perform the calculation and save the output in another table.

Now my question is should I continue to solve the stored procedure and forget about the independence of the database, since performance is important. I also think that development time will be much shorter if we use the solution for stored procedures. If any of the clients wants this solution to be used in the oracle database (because they do not want to support another database), we transfer the stored procedures to the oracle database and support two versions for any future changes / improvements. Similarly, other clients may query other databases.


In the 2000 lines that I mentioned above, there is a skus product. The 300 lines that I mentioned have different attributes that we want to calculate, for example. transportation cost, transportation costs, etc. In the ten tables that I mentioned, there is information on currency conversion, unit conversion, network, region, company, sale price, daily quantity sold, etc. The resulting table stores all the information in the form of a star chart for analysis and reporting. The goal is to get any minute information about the product, so that you know what attribute of the sale of the product costs us money, and where we can make an improvement.

+4
source share
5 answers

I would not consider manipulating data anywhere except the database.

most people try to work with database data using looping algorithms. if you need real speed, think of your data as SET rows, and you can update thousands of rows in a single update. I rewrote so many cursor cycles written by novice programmers into separate update statements, where the execution time has been greatly improved.

you speak:

I need to query one table that will return 2000 rows to me, and then for each row I need to query another table that will return 300 results to me, than for each row of this I need to query several tables (about 10) to get the required data

from your question, it seems you are not using joins, and you are already thinking about loops. even if you intend to loop, it is much better to write a query to combine into all the necessary data, and then loop on it. keep in mind that update and insert statements can have massive complex queries that manage them. include CASE statements, views, conditional joins (LEFT OUTER JOIN), and you can solve any problem in one update / insert.

+3
source

Well, without any specific details about what data you have in these tables, only the back of the napkin calculation shows that you are talking about processing more than 6 million lines of information in your example (2000 lines * 300 lines * (1 row * 10 tables)).

Are all of these lines excellent or are they 10-page search information that has relatively low power? In other words, is it possible to create a program containing information from 10 lookup tables in memory, and then simply process a result set of 300 lines in memory to perform calculations?

In addition, I will be concerned about scalability - if you do this in a stored procedure, you will be guaranteed a consistent process, limited by the speed of a single database server. If you have the option of multiple copies of the client program, each of which processes a piece of 2000 of the initial set of records, then you can perform some of the parallel calculations, possibly speed up the overall processing time, and also make it scalable if your initial set of records is 10 times more.

+3
source

Programming things like a calculation code is usually simpler and more convenient in C #. In addition, generally, maintaining minimal processing on SQL Server is good practice because the database is the most difficult to scale.

Having said that, from your description, it looks like a stored procedure approach is the way to go. When the calculation code depends on large amounts of data, it will be more expensive to calculate data from the server. So, if you don’t have reasonable ways to optimize dependent data (for example, caching lookup tables?), Then you will most likely find it more painful, then you should not use the stored procedure.

+1
source

Stored procedures every time, but, as KM said in these stored procedures, these iterations are minimal, which means using joins in your SQL, relational databases are so good at connecting.

Scalability of the database will be a small problem, especially if it seems to you that you are doing these calculations in a batch process.

Database independence does not really exist, except for the most trivial CRUD applications, so if your initial requirement is to get it all working with SQL Server, then use the tools that RDBMS provides (after your client has a great deal on it). If (and this is big, if), then the subsequent client really does not want to use SQL Server, then you will have to bite the bullet and encode it in a different taste of the stored procedure. But then, as you determined: "If I do all this all the time in .net based on the query base, I don’t think he will be able to quickly complete the work." you deferred the cost of this until needed.

+1
source

I would think about this in SQL Server Integration Services (SSIS). I put the calculations in SSIS, but left the requests as stored procedures. This will ensure database independence - SSIS can process data from any database using an ODBC connection, as well as high performance. Only simple SELECT statements will be in stored procedures, and these are the parts of the SQL standard that are likely to be identical for several database products (provided that you adhere to standard query forms).

0
source

All Articles