LEFT OUTER JOIN in LINQ for objects

Question

LEFT OUTER JOIN in LINQ for objects

Consider the following code.
City and CitPlace are combined by CityCode.
I want to make a LEFT OUTER JOIN between CityPlace and City.

City[] cities = new City[]{ new City{CityCode="0771",CityName="Raipur",CityPopulation="BIG"}, new City{CityCode="0751",CityName="Gwalior",CityPopulation="MEDIUM"}, new City{CityCode="0755",CityName="Bhopal",CityPopulation="BIG"}, new City{CityCode="022",CityName="Mumbai",CityPopulation="BIG"}, }; CityPlace[] places = new CityPlace[]{ new CityPlace{CityCode="0771",Place="Shankar Nagar"}, new CityPlace{CityCode="0771",Place="Pandari"}, new CityPlace{CityCode="0771",Place="Energy Park"}, new CityPlace{CityCode="0751",Place="Baadaa"}, new CityPlace{CityCode="0751",Place="Nai Sadak"}, new CityPlace{CityCode="0751",Place="Jayendraganj"}, new CityPlace{CityCode="0751",Place="Vinay Nagar"}, new CityPlace{CityCode="0755",Place="Idgah Hills"}, new CityPlace{CityCode="022",Place="Parel"}, new CityPlace{CityCode="022",Place="Haaji Ali"}, new CityPlace{CityCode="022",Place="Girgaon Beach"}, new CityPlace{CityCode="0783",Place="Railway Station"}};

What i did is

 var res = places.GroupJoin(cities, p1=>p1.CityCode, c1=>c1.CityCode, (p2,c2s)=>new {Place=p2.Place, CityName=c2s.Count()==0 ? "NO NAME" : c2s.First().CityName }); foreach(var v in res) Console.WriteLine(v);

Is this the standard or is it a quick and dirty solution?

+6

c # linq-to-objects

Akshay j Feb 25 '10 at 7:09

source share

3 answers

Your own answer is fine, but it is not very elegant. Yes, it's dirty. There is a standard way to make a left outer join that processes your example and will handle cases where there are duplicate cities. Your example cannot handle duplicate cities because any duplicates are ignored when choosing c2s.First() .

Standard left join steps:

Create a hierarchy from your data with GroupJoin.
Smooth hierarchy with SelectMany.

Your GroupJoin aligns the hierarchy in one step, ignoring everything except the first matching city. This is something dirty. If you tried to use this code in the reverse order, taking the cities and joining them in place, you will receive only one place per city! This is clearly a bad thing. It’s better to learn how to make the left one, join the right path, and then it will always work.

SelectMany in step 2 is actually optional if you want to maintain the hierarchy and then use the foreach nested loops to display them, but I'm going to assume that you want to display the data in a flat table format.

If you just want to see the answer to your specific problem, scroll down to the "Cities and Places" heading below, but first a complete example using two simple string arrays.

Abstract with a full explanation

Here is a complete example using two arrays of letters instead of your code. At first I wanted to show a simpler example. You can copy and paste this into LINQPad and set the language to "C # Statement" and run it for yourself if you want. I highly recommend LINQPad as a tool for testing all kinds of code, not just LINQ. Alternatively, you can also create a console application in Visual Studio.

Here is the code without too many comments. Below is a version that is highly annotated. You can move on to this if you want to know exactly what each parameter means.

 var leftLetters = new string[]{ "A", "B", "C" }; var rightLetters = new string[]{ "A", "B" }; //Create a hierarchical collection that includes every left item paired with a collection of matching right items (which may be empty if there are no matching right items.) var groupJoin = leftLetters.GroupJoin( rightLetters, leftLetter => leftLetter, rightLetter => rightLetter, ( leftLetter, matchingRightLetters ) => new { leftLetter, matchingRightLetters } ); //Flatten the groupJoin hierarchical collection with a SelectMany var selectMany = groupJoin.SelectMany( groupJoinItem => groupJoinItem.matchingRightLetters.DefaultIfEmpty( "MISSING" ), ( groupJoinItem, rightLetter ) => new { LeftLetter = groupJoinItem.leftLetter, RightLetter = rightLetter } ); //You can think of the elements of selectMany as "rows" as if this had been a left outer join in SQL. But this analogy breaks down rapidly if you are selecting objects instead of scalar values. foreach( var row in selectMany ) { Console.WriteLine( row.LeftLetter + ", " + row.RightLetter ); }

Here's a conclusion that should be pretty obvious, since we all know what the left join should do.

 A, A B, B C, MISSING

Pretty much annotated version:

 var leftLetters = new string[]{ "A", "B", "C" }; var rightLetters = new string[]{ "A", "B" }; //Create a hierarchical collection that includes every left item paired with a collection of matching right items (which may be empty if there are no matching right items.) var groupJoin = leftLetters.GroupJoin( rightLetters, //inner: the right hand collection in the join leftLetter => leftLetter, //outerKeySelector: There is no property to use as they join key, the letter *is* the key. So this lambda simply returns the parameter itself. rightLetter => rightLetter, //innerKeySelector: Same with the rightLetters ( leftLetter, matchingRightLetters ) => new { leftLetter, matchingRightLetters } //resultSelector: given an element from the left items, and its matching collection of right items, project them to some class. In this case we are using a new anonymous type. ); //Flatten the groupJoin hierarchical collection with a SelectMany var selectMany = groupJoin.SelectMany( //collectionSelector: given a single element from our collection of group join items from above, provide a collection of its "right" items which we want to flatten out. In this case the right items are in a property of the groupJoinItem itself, but this does not need to be the case! We use DefaultIfEmpty to turn an empty collection into a new collection that has exactly one item instead: the string "MISSING". groupJoinItem => groupJoinItem.matchingRightLetters.DefaultIfEmpty( "MISSING" ), //resultSelector: SelectMany does the flattening for us and this lambda gets invoked once for *each right item* in a given left item collection of right items. ( groupJoinItem, //The first parameter is one of the original group join item, including its entire collection of right items, but we will ignore that collection in the body of this lamda and just grab the leftLetter property. rightLetter //The second parameter is *one* of the matching right items from the collection of right items we selected in the first lambda we passed into SelectMany. ) => new { LeftLetter = groupJoinItem.leftLetter, //groupJoinItem is one of the original items from the GroupJoin above. We just want the left letter from it. RightLetter = rightLetter //This is one of the individual right letters, so just select it as-is. } ); //You can think of the elements of selectMany as "rows" as if this had been a left outer join in SQL. But this analogy breaks down rapidly if you are selecting objects instead of scalar values. foreach( var row in selectMany ) { Console.WriteLine( row.LeftLetter + ", " + row.RightLetter ); }

Again, the output is for reference:

 A, A B, B C, MISSING

The above use of LINQ is often referred to as the “method chain”. You take some collections and combine methods to get what you want. (Most of the time you do not use variables to store individual expressions. You simply execute GroupJoin (...). SelectMany (...), therefore, why it is called a “chain of methods.” It is very detailed and explicit, and takes a lot of time. to write.

Instead, we can use what is called "understanding", "understanding of the query" or "understanding of LINQ". Understanding is an old term in computer science from the 1970s that, frankly, doesn't make much sense to most people. Instead, people call them “LINQ queries” or “LINQ expressions,” but they are technically applicable to method chains, because in both cases you build an expression tree. (Expression trees are beyond the scope of this lesson.) Understanding LINQ is SQL type syntax for writing LINQ, but it is not SQL! This has nothing to do with actual SQL. Here is the same code written as understanding the request:

 var leftLetters = new string[]{ "A", "B", "C" }; var rightLetters = new string[]{ "A", "B" }; var query = from leftLetter in leftLetters join rightLetter in rightLetters on leftLetter equals rightLetter into matchingRightLetters from rightLetter in matchingRightLetters.DefaultIfEmpty( "MISSING" ) select new { LeftLetter = leftLetter, RightLetter = rightLetter }; foreach( var row in query ) { Console.WriteLine( row.LeftLetter + ", " + row.RightLetter ); }

This compiles with the exact same code as the example above, except that the parameter named "groupJoinItem" in SelectMany will be called something like "temp0" because this parameter does not explicitly exist in a clear version of this code.

I think you can appreciate how much easier this version of the code is. I always use this syntax when doing a left outer join. I never use GroupJoin with SelectMany. However, at first glance this makes little sense. A join followed by into creates a GroupJoin. You must know this first, and why you need it. Then the second from points to SelectMany, which is not obvious. When you have two words from , you effectively create a cross join (Cartesian product), which is what SelectMany does. (Sorting.)

For example, this query:

 from leftLetter in leftLetters from rightLetter in rightLetters select new { LeftLetter = leftLetter, RightLetter = rightLetter }

will give:

 A, A A, B B, A B, B C, A C, B

This is the basic cross connection.

So, back to our original left union. LINQ query: the first from query is a grouping, and the second from expresses a cross-connection between each JoinItem group and its own collection of matching regular letters. This is something like this:

 from groupJoinItem in groupJoin from rightLetter in groupJoinItem.matchingRightLetters select new{...}

In fact, we could write it as such!

 var groupJoin = from leftLetter in leftLetters join rightLetter in rightLetters on leftLetter equals rightLetter into matchingRightLetters select new { LeftLetter = leftLetter, MatchingRightLetters = matchingRightLetters }; var selectMany = from groupJoinItem in groupJoin from rightLetter in groupJoinItem.MatchingRightLetters.DefaultIfEmpty( "MISSING" ) select new { LeftLetter = groupJoinItem.LeftLetter, RightLetter = rightLetter };

This selectMany expresses the following: "for each element in the Join group, cross it with your own MatchingRightLetters property and merge all the results together." This gives the same result as any of our codes on the left.

This is probably too many explanations for this simple question, but I don't like the programming of the cult load (google it). You must know exactly what your code does and why, otherwise you cannot solve more complex problems.

Cities and places

So, here is the code chain of your code. This is an entire program so that people can run it if they want (use the "C # Program" language in LINQPad or create a console application with Visual Studio or the C # compiler.)

 void Main() { City[] cities = new City[]{ new City{CityCode="0771",CityName="Raipur",CityPopulation="BIG"}, new City{CityCode="0751",CityName="Gwalior",CityPopulation="MEDIUM"}, new City{CityCode="0755",CityName="Bhopal",CityPopulation="BIG"}, new City{CityCode="022",CityName="Mumbai",CityPopulation="BIG"}, }; CityPlace[] places = new CityPlace[]{ new CityPlace{CityCode="0771",Place="Shankar Nagar"}, new CityPlace{CityCode="0771",Place="Pandari"}, new CityPlace{CityCode="0771",Place="Energy Park"}, new CityPlace{CityCode="0751",Place="Baadaa"}, new CityPlace{CityCode="0751",Place="Nai Sadak"}, new CityPlace{CityCode="0751",Place="Jayendraganj"}, new CityPlace{CityCode="0751",Place="Vinay Nagar"}, new CityPlace{CityCode="0755",Place="Idgah Hills"}, new CityPlace{CityCode="022",Place="Parel"}, new CityPlace{CityCode="022",Place="Haaji Ali"}, new CityPlace{CityCode="022",Place="Girgaon Beach"}, new CityPlace{CityCode="0783",Place="Railway Station"} }; var query = places.GroupJoin( cities, place => place.CityCode, city => city.CityCode, ( place, matchingCities ) => new { place, matchingCities } ).SelectMany( groupJoinItem => groupJoinItem.matchingCities.DefaultIfEmpty( new City{ CityName = "NO NAME" } ), ( groupJoinItem, city ) => new { Place = groupJoinItem.place, City = city } ); foreach(var pair in query) { Console.WriteLine( pair.Place.Place + ": " + pair.City.CityName ); } } class City { public string CityCode; public string CityName; public string CityPopulation; } class CityPlace { public string CityCode; public string Place; }

Here's the conclusion:

 Shankar Nagar: Raipur Pandari: Raipur Energy Park: Raipur Baadaa: Gwalior Nai Sadak: Gwalior Jayendraganj: Gwalior Vinay Nagar: Gwalior Idgah Hills: Bhopal Parel: Mumbai Haaji Ali: Mumbai Girgaon Beach: Mumbai Railway Station: NO NAME

Note that DefaultIfEmpty will return a new instance of the actual City class, not just a string. This is because we are joining CityPlaces to City City objects, not to strings. Instead, you can use DefaultIfEmpty() without a parameter, and you will get a null City for "Railway Station", but then you will need to check the zeros in your foreach loop before calling the pair. City.CityName. This is a matter of personal preference.

Here's a program that uses query understanding:

 void Main() { City[] cities = new City[]{ new City{CityCode="0771",CityName="Raipur",CityPopulation="BIG"}, new City{CityCode="0751",CityName="Gwalior",CityPopulation="MEDIUM"}, new City{CityCode="0755",CityName="Bhopal",CityPopulation="BIG"}, new City{CityCode="022",CityName="Mumbai",CityPopulation="BIG"}, }; CityPlace[] places = new CityPlace[]{ new CityPlace{CityCode="0771",Place="Shankar Nagar"}, new CityPlace{CityCode="0771",Place="Pandari"}, new CityPlace{CityCode="0771",Place="Energy Park"}, new CityPlace{CityCode="0751",Place="Baadaa"}, new CityPlace{CityCode="0751",Place="Nai Sadak"}, new CityPlace{CityCode="0751",Place="Jayendraganj"}, new CityPlace{CityCode="0751",Place="Vinay Nagar"}, new CityPlace{CityCode="0755",Place="Idgah Hills"}, new CityPlace{CityCode="022",Place="Parel"}, new CityPlace{CityCode="022",Place="Haaji Ali"}, new CityPlace{CityCode="022",Place="Girgaon Beach"}, new CityPlace{CityCode="0783",Place="Railway Station"} }; var query = from place in places join city in cities on place.CityCode equals city.CityCode into matchingCities from city in matchingCities.DefaultIfEmpty( new City{ CityName = "NO NAME" } ) select new { Place = place, City = city }; foreach(var pair in query) { Console.WriteLine( pair.Place.Place + ": " + pair.City.CityName ); } } class City { public string CityCode; public string CityName; public string CityPopulation; } class CityPlace { public string CityCode; public string Place; }

As a long-term SQL user, I highly prefer the query understanding version. This is much easier if someone else reads the intent of the code as soon as you know what the individual parts of the request do.

Happy programming!

+10

Glazed Jan 18 '14 at 0:28

source share

Here is the linq request version

 var noCity = new City {CityName = "NO NAME"}; var anotherway = from p in places join c in cities on p.CityCode equals c.CityCode into merge from c in merge.DefaultIfEmpty(noCity) select new { p.Place, c.CityName };

I think using DefaultIfEmpty () makes it more clear.

In general, I find that external connections in linq are quite confusing. This is one of the few places where I find SQL queries much higher.

+8

ScottS Feb 25 '10 at 8:08

source share

Adele · Accepted Answer · 2010-02-25T10:55:05+0000

In your case, you do not group the entries, so do not use your solution. you can use the solution from ScottS or use the request below.

 var res = from p in places select new { Place = p.Place, CityName = (from c in cities where p.CityCode == c.CityCode select c.CityName).DefaultIfEmpty("NO NAME").ElementAtOrDefault(0) };

LEFT OUTER JOIN in LINQ for objects

Abstract with a full explanation

Cities and places

More articles: