The goal of the Linq provider is to basically “translate” the Linq expression trees (which are built behind the scenes of the query) into the source query language of the data source. In cases where the data is already in memory, you do not need a Linq provider; Link 2 Objects are beautiful. However, if you use Linq to communicate with an external data warehouse, such as a DBMS or the cloud, this is absolutely necessary.
The basic premise of any query structure is that the data source mechanism should do as much of the work as possible and return only the data that the client needs. This is due to the fact that it is assumed that the data source knows best how to manage the data stored in it, and also because the network transport of data is relatively expensive in time and therefore should be minimized. Now, in reality, this second part "returns only the data requested by the client"; the server cannot read your software mind and know what it really needs; he can give only what he asked. Here, where the Linq smart provider completely dumps the "naive" implementation. Using the IQueryable Linq side that generates expression trees, the Linq provider can transform the expression tree into, say, an SQL statement, which the DBMS will use to return the records requested by the client in the Linq statement. A naive implementation would require ALL records using some wide SQL statement to provide the client with a list of objects in memory, and then all the actions for filtering, grouping, sorting, etc. Performed by the client.
For example, let's say you used Linq to retrieve a record from a table in a database by its primary key. The Linq provider can translate dataSource.Query<MyObject>().Where(x=>x.Id == 1234).FirstOrDefault() to "SELECT TOP 1 * from MyObjectTable WHERE Id = 1234". This returns zero or one record. The “naive” implementation will probably send the server a “SELECT * FROM MyObjectTable” request, and then use the IEnumerable side of Linq (which works in classes in memory) for filtering. In the statement, you expect to get the results 0-1 from a table with 10 million records, which one, in your opinion, will do the job faster (or even work at all, without running out of memory)?
Keiths
source share