Performance difference
There are many things that can cause the performance of a “direct” data flow task and an equivalent SQL execution task.
- Network delay. You are inserting into table a from table b on the same server and instance. In the Execute SQL Task, this work will be performed entirely on one machine. I could run the package on server B, which requests 1.25M lines from server A, which will then be transmitted over the network to server B. This data will then be sent back to server A for the corresponding INSERT operation. If you have a bad network, wide data , especially binary, or just a large distance between the servers (server A is in the USA, server B is in India), there will be poor performance.
- Starvation of memory. Assuming the package runs on the same server as the target / source database, it can still be slow because the data flow task is an in-memory engine. This means that all the data that will come from the source to the destination will go into memory. The more SSIS memory can get, the faster it will happen. However, he will have to deal with the OS for memory allocation, as well as for SQL Server itself. Although SSIS is SQL Server Integration Services, it does not start in the same memory space as the SQL Server database. If your server has 10 GB of memory allocated, and 2 GB is used in the OS, and 8 GB is used in SQL Server, there are few working options for SSIS. He cannot ask SQL Server to abandon some of his memory, so the OS will have to go to pages, and the inkjet data is moved through a compressed data pipeline.
- Shoddy. Depending on which version of SSIS you are using, the default access mode for the OLE DB destination was "Table or View". It was a good setting to try to prevent low-level locking before table locking. However, this results in a row using agonistic row inserts (sent by unique 1.25M nested statements). Compare this with the suite-based
Execute SQL Task INSERT INTO approach. Later versions of SSIS use the “fast” version of the destination method by default. It will be more like a set-based equivalent and give better performance. - Convert OLE DB commands. There is an OLE DB Destination , and some people confuse this with the OLE DB Command Transformation . These are two very different components with different uses. The first target and uses all the data. It can go very fast. The latter is always RBAR. It will perform singleton operations for each line that passes through it.
- Debugging BIDS / SSDT has a utility program. This package completes on the DTS Debugging Host. This may cause a “slight” slowdown in the execution of the package. There is not much debugger that can do with Execute SQL Task - it starts or does not work. A data stream, there is a lot of memory that it can check, control, etc., which reduces the amount of available memory (see Pt 2), and also just slows it down due to various checks that it performs. For a more accurate comparison, always run packages from the command line (dtexec.exe / file MyPackage.dtsx) or schedule it using the SQL Server agent.
Package design
There is nothing wrong with the SSIS package, it's just Execute SQL Task s. If the problem is easy to solve by executing the queries, I will completely abandon SSIS and write the appropriate stored procedure and pay for it using SQL Agent and do it.
May be. What I still like about using SSIS even for “simple” cases like this can provide consistent delivery. This may not seem very similar, but from a service point of view, it may be nice to know that everything that dumps data is contained in these SSIS packets with a controlled source. I don’t need to remember or train a new person that AC tasks are “simple”, so they are stored in procedures called from an SQL agent job. The tasks of the DJ, or he was K, are even simpler, so it’s just “in line” in the tasks of the agent to download data, and then we have packages for the rest of things. With the exception of Service Broker and some web services, they also update the database. The older I get, and the more places I am exposed, the more I can find value in a consistent, even if excessive approach to delivering solutions.
Performance is not everything, but the SSIS team installed ETL tests using SSIS, so it definitely has the ability to push some data in a hurry.
Since this answer has been growing for a long time, I would just leave it as an advantage of SSIS, and the direct TSQL data stream was native, out of the box
- entry
- error processing
- configuration
- parallelization
It's hard to beat them for your money.
source share