What is the most efficient way to get only the final row of an SQL table using EF4?

Question

What is the most efficient way to get only the final row of an SQL table using EF4?

I am looking to get the last row of a table by the table ID column. What I'm using now works:

var x = db.MyTable.OrderByDescending(d => d.ID).FirstOrDefault();

Is there a way to get the same result at a higher speed?

+7

c # linq entity-framework entity-framework-4

Bazinga Sep 2 '12 at 1:15

source share

1 answer

Jon hanna · Accepted Answer · 2012-09-02T01:29:23+0000

I do not see that these are queries throughout the table.

You do not have an index in the ID column?

Can you add the results of the query analysis to your question, because this is not the way it should be.

Like the results of the analysis, SQL is produced. I don’t see how it would be anything other than select top 1 * from MyTable order by id desc , only with explicit column names and some alias. There is also no index on id , as it is nothing but scanning at that index.

Edit: This is the promised explanation.

Linq provides us with a set of common interfaces, and in the case of support for the C # and VB.NET keywords, for various operations on sources that return 0 or more elements (for example, in memory collections, database calls, parsing XML documents, etc.) d.).

This allows us to express similar tasks regardless of the source. For example, your request includes a source, but we could make a more general form:

 public static YourType FinalItem(IQueryable<YourType> source) { return source.OrderByDesending(d => d.ID).FirstOrDefault(); }

Now we can do:

 IEnumerable<YourType> l = SomeCallThatGivesUsAList(); var x = FinalItem(db.MyTable);//same as your code. var y = FinalItem(l);//item in list with highest id. var z = FinalItem(db.MyTable.Where(d => d.ID % 10 == 0);//item with highest id that ends in zero.

But the really important part is that although we have a way to determine the type of operation we want to do, we can have a hidden implementation.

The OrderByDescending call calls an object that has information about the source, and the lambda function that it will use when ordering.

The FirstOrDefault call FirstOrDefault in turn, contains information about this and uses it to get the result.

In the case of a list, the implementation should create equivalent Enumerable code ( Queryable and Enumerable mirror each other, as well as the interfaces it uses, such as IOrderedQueryable and IOrderedEnumerable , etc.).

This is because with a list that we don’t know is already sorted in the order we care about (or in reverse order), there is no faster way than looking at each element. The best we can hope for is the O (n) operation, and we can get the O (n log n) operation, depending on whether the ordering implementation is optimized to be able to retrieve only one element *.

Or, to put it another way, the best that we could use in code that worked only on counters is only slightly more efficient than:

 public static YourType FinalItem(IEnumerable<YourType> source) { YourType highest = default(YourType); int highestID = int.MinValue; foreach(YourType item in source) { curID = item.ID; if(highest == null || curID > highestID) { highest = item; highestID = curID; } } return highest; }

We can improve some micropots a little when handling the enumerator directly, but only a little, and the additional complication will just make a less good code example.

Since we cannot do anything better than manually, and since the linq code knows nothing about the source than we do, this is the best we could hope for if it matches. It may be worse (again, depending on whether a particular case of our only desired item has been considered), but it will not beat it.

However, this is not the only approach linq will ever use. This will be a comparable approach to an enumerated source in memory, but your source is different.

db.MyTable represents the table. Enumerating through this gives us the results of an SQL query, more or less equivalent:

 SELECT * FROM MyTable

However, db.MyTable.OrderByDescending(d => d.ID) not the equivalent of calling, and then ordering the results in memory. Since the queries are processed as a whole when they are executed, we actually get the result of the SQL query more or less:

 SELECT * FROM MyTable ORDER BY id DESC

Finally, the entire query db.MyTable.OrderByDescending(d => d.ID).FirstOrDefault() results in a query of the type:

 SELECT TOP 1 * FROM MyTable ORDER BY id DESC

or

 SELECT * FROM MyTable ORDER BY id DESC LIMIT 1

Depending on which database server you are using. The results are then passed to code equivalent to the following ADO.NET code:

 return dataReader.Read() ? new MyType{ID = dataReader.GetInt32(0), dataReader.GetInt32(1), dataReader.GetString(2)}//or similar : null;

You cannot get much better.

And as for this SQL query. If there is an index in the id column (and since it looks like a primary key, it must be), then this index will be used to quickly find the row in question, and not to check each row.

In general, since different linq providers use different methods of query execution, they can all do their best to do this in the best way. Of course, being in an imperfect world, we will undoubtedly find that some are better than others. What's more, they can even work to choose the best approach for different conditions. One example of this is that database-related providers can select different SQLs to take advantage of different versions of the databases. Another is that the implementation of the Count() version, which works with memory enumerations, works something like this:

 public static int Count<T>(this IEnumerable<T> source) { var asCollT = source as ICollection<T>; if(asCollT != null) return asCollT.Count; var asColl = source as ICollection; if(asColl != null) return asColl.Count; int tally = 0; foreach(T item in source) ++tally; return tally; }

This is one of the simplest cases (and a little simplified in my example here, I show the idea of non-actual code), but it shows the basic principle of code that uses more efficient approaches, re is the length of the O (1) arrays and the Count property in the collections, which are sometimes O (1), and it’s not as if we worsened the situation when O (n) - and then when they are not available back to a less efficient but still functional approach.

The result of all this is that Linq tends to give a very good hit for buck in terms of performance.

Now a decent coder should be able to match or beat his approach in any given case most of the time †, and even when Linq comes up with the perfect approach, there is some overhead for it.

However, in the framework of the entire project, the use of Linq means that we can concisely create reasonably efficient code that refers to a relatively limited number of clearly defined objects (usually one table for each database). In particular, using anonymous functions and associations means that we receive very good requests. Consider:

 var result = from a in db.Table1 join b in db.Table2 on a.relatedBs = b.id select new {a.id, b.name};

Here we ignore the columns that we do not need here, and the generated SQL will do the same. Consider what we will do if we create objects that a and b belong to DAO classes with manual encoding:

Create a new class to represent this combination of the names a id and b and the corresponding code to run the query that we need to instantiate.
Run the query to get all the information about each a and the corresponding b , and live with the waste.
Run a query to get information about each a and b that we need, and just set the default values for the other fields.

Of these, option 2 will be wasteful, possibly very wasteful. Option 3 will be a bit wasteful and very error prone (what if we accidentally try to use the field elsewhere in the code that was not installed correctly?). Only option 1 will be more efficient than what the linq approach will create, but this is just one case. In a large project, this can mean creating tens or even hundreds or thousands of slightly different classes (and unlike the compiler, we will not necessarily identify cases when they are actually the same). Therefore, in practice, linq can provide us with some help when it comes to efficiency.

Good policies for efficient linq:

Stay with the type of request you start with as much as you can. Whenever you capture elements in memory using ToList() or ToArray etc., think about whether you really need to. If you do not need it, or you can clearly state that it gives you an advantage, do not do it.
If you need to move on to processing in memory, leave AsEnumerable() above ToList() , and the other means, so you'll only capture one at a time.
Examine long queries with SQLProfiler or similar. There are several cases where policy 1 is wrong here and moving to memory using AsEnumerable() is actually better (most relate to using GroupBy , which do not use aggregates in non-group fields and therefore do not actually have a single SQL query, with which they correspond).
If a complex query is hit many times, then CompiledQuery can help (to a lesser extent, with 4.5, since it has an automatic optimization that covers some of the cases in which this helps), but it is usually better to leave it from the first and then use it only in hot spots that are performance issues.
You can get EF to run arbitrary SQL, but avoid it if it does not have a big gain, because too much of this code reduces consistent readability using the linq approach in everything gives (I have to say though, I think Linq2SQL is superior EF when calling stored procedures, and even more so when calling UDF, but even there it still applies - it’s less clear just looking at the code how everything relates to each other).

* AFAIK, This specific optimization is not applied, but we are talking about the best possible implementation at the moment, so it does not matter whether it is, not or only exists in some versions.

† I admit that Linq2SQL often leads to queries using APPLY, which I would not have thought of, since I'm used to thinking about how to write queries in versions of SQLServer before 2005, but the code does not have such human tendencies to go with old habits. This pretty much taught me how to use APPLY.

What is the most efficient way to get only the final row of an SQL table using EF4?

More articles: