Please navigate to the bottom of the page for Table of Contents

Thursday, June 16, 2011

LINQ JOIN interview questions

The JOIN clause is one of the most complex and potentially confusing aspects of LINQ. A thorough understanding of the JOIN clause can help you glide easily through the interview process. Most of the interview questions that I see around JOIN using LINQ are both direct (and simple) JOIN queries and in a few cases in-direct applications of the JOIN clause. If the position requires extensive SQL or LINQ expertise, you probably would already be familiar with complex joins.

Well, so what does a JOIN clause do? The join clause is useful for associating elements from different source sequences that have no direct relationship in the object model. The only requirement is that the elements in each source share some value that can be compared for equality. For example, a food distributor might have a list of suppliers of a certain product, and a list of buyers. A join clause can be used, for example, to create a list of the suppliers and buyers of that product who are all in the same specified region. Let’s explore this notion with a few examples.

Question: For all the categories in the database, list the products in those categories. In addition, the output should just include the product and category names.

This is a simple JOIN between the products and the categories.


var innerGroupJoinQuery =
from category in categories // for all categories
// join with product table on product category ID
join prod in products on category.ID equals prod.CategoryID into prodGroup
// create a new object with just the product and category names
select new { CategoryName = category.Name, Products = prodGroup };


Note that in this code example, if an element from categories has no matching products, that category will not appear in the results. We will review how to fix that using LEFT OUTER JOIN below.


Question: For all the categories in the database, list the products in those categories. If a category does not have any products, still list the category.

In a left outer join, all the elements in the left source sequence are returned, even if no matching elements are in the right sequence. To perform a left outer join in LINQ, use the DefaultIfEmpty method in combination with a group join to specify a default right-side element to produce if a left-side element has no matches.


var leftOuterJoinQuery =
from category in categories
join prod in products on category.ID equals prod.CategoryID into prodGroup
from item in prodGroup.DefaultIfEmpty(new Product{Name = String.Empty, CategoryID = 0})
select new { CatName = category.Name, ProdName = item.Name };


Question: Given simple data structures of Person (Name) and Pet (Name, Owner), write a LINQ query that lists all the pets that each person owns.

This is a more complex example involving GROUP JOINS. The group join is useful for producing hierarchical data structures. It pairs each element from the first collection with a set of correlated elements from the second collection. GroupJoin has no direct equivalent in traditional relational database terms. However, this method does implement a superset of inner joins and left outer joins. Both of these operations can be written in terms of a grouped join.

So, our basic data structures can be represented as:


class Person
{
public string Name { get; set; }
}

class Pet
{
public string Name { get; set; }
public Person Owner { get; set; }
}

And some sample data might look like this


Person magnus = new Person { Name = "Hedlund, Magnus" };
Person terry = new Person { Name = "Adams, Terry" };
Person charlotte = new Person { Name = "Weiss, Charlotte" };

Pet barley = new Pet { Name = "Barley", Owner = terry };
Pet boots = new Pet { Name = "Boots", Owner = terry };
Pet whiskers = new Pet { Name = "Whiskers", Owner = charlotte };
Pet daisy = new Pet { Name = "Daisy", Owner = magnus };

List<Person> people = new List<Person> { magnus, terry, charlotte };
List<Pet> pets = new List<Pet> { barley, boots, whiskers, daisy };


Now we have two ways to implement the same LINQ query. Using the traditional syntax:


// Create a list where each element is an anonymous type
// that contains the person's first name and a collection of
// pets that are owned by them.
var query = from person in people
join pet in pets on person equals pet.Owner into gj
select new { OwnerName = person.Name, Pets = gj };


Using the Lambda syntax:


// Create a list where each element is an anonymous
// type that contains a person's name and
// a collection of names of the pets they own.
var query =
people.GroupJoin(pets,
person => person,
pet => pet.Owner,
(person, petCollection) =>
new
{
OwnerName = person.Name,
Pets = petCollection.Select(pet => pet.Name)
});


Now for debugging purposes, we can spew out the info as:


foreach (var v in query)
{
// Output the owner's name.
Console.WriteLine("{0}:", v.OwnerName);
// Output each of the owner's pet's names.
foreach (Pet pet in v.Pets)
Console.WriteLine(" {0}", pet.Name);
}


This should produce an output similar to the following.

/*
Hedlund, Magnus:
Daisy
Adams, Terry:
Barley
Boots
Weiss, Charlotte:
Whiskers
*/

7 comments:

  1. var query = from person in people
    join pet in pets on person equals pet.Owner into gj
    select new { OwnerName = person.Name, Pets = gj };

    in this block of code, why gj contains only list?? We join two table together, so where does person part go?? In my mind it should be something like Pets = gj.Pets ??

    ReplyDelete
  2. Hi Nikhil,

    What a brilliant post I have come across and believe me I have been searching out for this similar kind of post for past a week and hardly came across this.

    I google so many articles and blog to know what exactly Proxy Pattern is and its examples. every where I say example related to banking or Images.

    But now I am confused and just want to know "Can we say that Server-side pagination comes under Proxy Pattern".

    From Blogs and Articles I read, I come to know that proxy pattern create object having original object to interface its functionality to outer world. In other words, provide desire information from the real object as and when required.

    please correct me if I am wrong and help me to clear my confusion.

    But nice Article Mate! Great Information! Keep up the good work!

    Thanks a heaps,
    Kanna

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Given two tables with dates
    db1
    End
    24th March
    31st March
    22nd March
    db2
    Expire Date
    4/30/2018 11:59:00 PM
    4/30/2018 11:59:00 PM
    How to check "db1 end <= db2 expire date" with single query?

    ReplyDelete
  5. Given array of values {1, 2, 3, 4, 5}

    Output : {“Id”:1},{“Id”:2},{“Id”:3},{“Id”:4},{“Id”:5},{“Id”:6}
    With sigle query?

    ReplyDelete
  6. How to group specific column rows with specific column value i.e
    aa bb 1
    cc dd 1
    dd ee 2

    Output:
    List1-[aa, cc]
    List2-[dd]
    With minimal lines of query?

    ReplyDelete