Please navigate to the bottom of the page for Table of Contents

Wednesday, March 20, 2013

SQL self join or sub-query interview question (employee-manager salary)

One of my favorite interview questions that tips even seasoned SQL guys (maybe because it's too simple) is around querying data that involves a self join.

What is a self join?
A self join is a join in which a table is joined with itself, specially when the table has a foreign  key which references its own primary key. 

Question: Given an Employee table which has 3 fields - Id (Primary key), Salary and Manager Id, where manager id is the id of the employee that manages the current employee, find all employees that make more than their manager in terms of salary. Bonus: Write the table creation script.


Let's start by creating our table:

CREATE TABLE Employee (
    [Id] [int] NOT NULL,
    [ManagerId] [int] NOT NULL,
    [Salary] [money] NOT NULL,
 CONSTRAINT [PK_Employee] PRIMARY KEY CLUSTERED
    (
        [Id] ASC
    )
)


This will create our table with Id as primary key. Now let's modify this table to add a foreign key betweek ManagerId and Id.

ALTER TABLE [dbo].[Employee]  WITH CHECK ADD  CONSTRAINT [FK_Employee_Employee] FOREIGN KEY([ManagerId])
REFERENCES [dbo].[Employee] ([Id])
GO

ALTER TABLE [dbo].[Employee] CHECK CONSTRAINT [FK_Employee_Employee]
GO


Now we come to the meat of the question. How do you find all employees that make more than their manager. There are 2 ways to do this - one, use a self join; and two; use a sub-query.

Self-join solution:
select e.*, m.Salary as "Manager Salary"
from Employee e
join Employee m on e.ManagerId = m.Id
where e.Salary > m.Salary


Here you are "joining" Employee table to itself on the FK relation of ManagerId and then querying on salary.

Sub-query solution:
select *
from Employee e
where e.Salary >
    (select m.Salary from Employee m
    where e.ManagerId = m.Id)


This is most common solution I get from candidates. Which is fine, but knowing alternate approaches to finding a solution is always very helpful.

Happy coding till next time!

10 comments:

  1. In the table creation script, I wouldn't do it like this:
    [ManagerId] [int] NOT NULL
    When you get to the top manager, they don't have a manager themselves so you need to let the field be null for that one person at least.

    ReplyDelete
  2. Both options are fine. Since the top guy will never make the select clause (his salary cannot be more than his salary) either of the assumptions work. But yes, from an interview perspective, I like it if the candidate can think of such questions.

    ReplyDelete
  3. how can i find the managers of managers?

    ReplyDelete
  4. You will need to use WITH RECURSIVE

    ReplyDelete
  5. how to know one employee working more than one department in sql ?

    ReplyDelete
  6. This is one of the cult game now, a lot of people enjoy playing them . Also you can refer to the game :
    animal jam 2 | five nights at freddys 2 | hotmail login

    ReplyDelete
  7. wow...nice blog, very helpful information. Thanks for sharing.

    NO.1 API DEVELOPMENT SERVICES | MASSIL TECHNOLOGIES

    ReplyDelete
  8. Hello There,


    Hip Hip Hooray! I was always told that slightly slow in the head, a slow learner. Not anymore! It’s like you have my back. I can’t tell you how much I’ve learnt here and how easily! Thank you for blessing me with this effortlessly ingestible digestible content.
    I don't often ask questions here, but this one (especially being a regex thing) has me a bit stumped.

    I'm trying to piece together a regex that allows characters, digits and a couple of other characters.
    Essentially:
    ^[a-zA-Z0-9:,';\/]+$

    Also it can't start or end with a slash...so the above becomes:
    ^[^\/][a-zA-Z0-9:,';\/]+[^\/]$

    So this gives a match:
    abcD,xy/z
    and this doesn't
    /abcD,xy/z

    Anyways great write up, your efforts are much appreciated.

    Thanks & Regards,
    Morgan

    ReplyDelete
  9. Hello,


    I love all the posts, I really enjoyed.
    I would like more information about this, because it is very nice., Thanks for sharing.

    i reviewed few articles on improving group by performance by adding indexes for columns in group by clause for single table.

    Please advise how to improve performance in following scenario where i have to join multiple tables.

    Should i add the group by columns to the clustered index or create a new nonclustered index on all the group by columns for each table

    table?

    I am so grateful for your blog. Really looking
    forward to read more.


    Thanks and Regards




    ReplyDelete