Please navigate to the bottom of the page for Table of Contents

Thursday, June 16, 2011

LINQ JOIN interview questions

The JOIN clause is one of the most complex and potentially confusing aspects of LINQ. A thorough understanding of the JOIN clause can help you glide easily through the interview process. Most of the interview questions that I see around JOIN using LINQ are both direct (and simple) JOIN queries and in a few cases in-direct applications of the JOIN clause. If the position requires extensive SQL or LINQ expertise, you probably would already be familiar with complex joins.

Well, so what does a JOIN clause do? The join clause is useful for associating elements from different source sequences that have no direct relationship in the object model. The only requirement is that the elements in each source share some value that can be compared for equality. For example, a food distributor might have a list of suppliers of a certain product, and a list of buyers. A join clause can be used, for example, to create a list of the suppliers and buyers of that product who are all in the same specified region. Let’s explore this notion with a few examples.

Question: For all the categories in the database, list the products in those categories. In addition, the output should just include the product and category names.

This is a simple JOIN between the products and the categories.


var innerGroupJoinQuery =
from category in categories // for all categories
// join with product table on product category ID
join prod in products on category.ID equals prod.CategoryID into prodGroup
// create a new object with just the product and category names
select new { CategoryName = category.Name, Products = prodGroup };


Note that in this code example, if an element from categories has no matching products, that category will not appear in the results. We will review how to fix that using LEFT OUTER JOIN below.


Question: For all the categories in the database, list the products in those categories. If a category does not have any products, still list the category.

In a left outer join, all the elements in the left source sequence are returned, even if no matching elements are in the right sequence. To perform a left outer join in LINQ, use the DefaultIfEmpty method in combination with a group join to specify a default right-side element to produce if a left-side element has no matches.


var leftOuterJoinQuery =
from category in categories
join prod in products on category.ID equals prod.CategoryID into prodGroup
from item in prodGroup.DefaultIfEmpty(new Product{Name = String.Empty, CategoryID = 0})
select new { CatName = category.Name, ProdName = item.Name };


Question: Given simple data structures of Person (Name) and Pet (Name, Owner), write a LINQ query that lists all the pets that each person owns.

This is a more complex example involving GROUP JOINS. The group join is useful for producing hierarchical data structures. It pairs each element from the first collection with a set of correlated elements from the second collection. GroupJoin has no direct equivalent in traditional relational database terms. However, this method does implement a superset of inner joins and left outer joins. Both of these operations can be written in terms of a grouped join.

So, our basic data structures can be represented as:


class Person
{
public string Name { get; set; }
}

class Pet
{
public string Name { get; set; }
public Person Owner { get; set; }
}

And some sample data might look like this


Person magnus = new Person { Name = "Hedlund, Magnus" };
Person terry = new Person { Name = "Adams, Terry" };
Person charlotte = new Person { Name = "Weiss, Charlotte" };

Pet barley = new Pet { Name = "Barley", Owner = terry };
Pet boots = new Pet { Name = "Boots", Owner = terry };
Pet whiskers = new Pet { Name = "Whiskers", Owner = charlotte };
Pet daisy = new Pet { Name = "Daisy", Owner = magnus };

List<Person> people = new List<Person> { magnus, terry, charlotte };
List<Pet> pets = new List<Pet> { barley, boots, whiskers, daisy };


Now we have two ways to implement the same LINQ query. Using the traditional syntax:


// Create a list where each element is an anonymous type
// that contains the person's first name and a collection of
// pets that are owned by them.
var query = from person in people
join pet in pets on person equals pet.Owner into gj
select new { OwnerName = person.Name, Pets = gj };


Using the Lambda syntax:


// Create a list where each element is an anonymous
// type that contains a person's name and
// a collection of names of the pets they own.
var query =
people.GroupJoin(pets,
person => person,
pet => pet.Owner,
(person, petCollection) =>
new
{
OwnerName = person.Name,
Pets = petCollection.Select(pet => pet.Name)
});


Now for debugging purposes, we can spew out the info as:


foreach (var v in query)
{
// Output the owner's name.
Console.WriteLine("{0}:", v.OwnerName);
// Output each of the owner's pet's names.
foreach (Pet pet in v.Pets)
Console.WriteLine(" {0}", pet.Name);
}


This should produce an output similar to the following.

/*
Hedlund, Magnus:
Daisy
Adams, Terry:
Barley
Boots
Weiss, Charlotte:
Whiskers
*/

1 comment:

  1. var query = from person in people
    join pet in pets on person equals pet.Owner into gj
    select new { OwnerName = person.Name, Pets = gj };

    in this block of code, why gj contains only list?? We join two table together, so where does person part go?? In my mind it should be something like Pets = gj.Pets ??

    ReplyDelete