Please navigate to the bottom of the page for Table of Contents

Tuesday, May 17, 2011

Lost in a Forest of Trees

Trees are another basic data structure within programming. Similar to a linked list, a tree is a collection of linked nodes but child nodes in a tree can have more than one child node. There are many types of trees but the same basic concept remains true for all trees; they are data structures that are used to store data in a hierarchical manor.

This doesn’t mean that the data necessarily has to be hierarchical data to be stored in a tree. Sometimes you’ll want to store linear data in a hierarchical form so that it’s much quicker to search and find items within a large collection.

There are different tree structures for different needs. In this post we’ll explore binary trees and see how they can be beneficial.

Binary Tree

A binary tree is one of the more simpler tree structures. It is a tree where each node can have a left and right node but no more than that. A simple binary tree would look something like this. This is the base for which all binary trees are built upon.

binary-tree

Binary Search Tree

A binary search tree is very similar to a binary tree except that a binary search tree has an extra rule and is used for a different purpose. The same rule applies in that each node can still only have a left and a right child but the value of left node must be less than the value of right node for any given parent node.

Given this simple extra rule, this makes this tree a great data structure for storing linear data in a hierarchical form. Starting at the root node you simply determine is this the value you wanted? If so, you found what you are looking for. If not, is the value less than the current value you are looking at? If so, move to the left node. If it’s greater that that value, move down the right side. Then you’d keep using this logic until you find the value you want. This is much more performant than searching through an array of linear data because in an array, you potentially have to touch every element to find what you are looking for. With a binary search tree, you follow one basic rule and you only have to hit the nodes that are in the logical path of your search.

The one down fall to this is that there is overhead of adding elements as you have to perform the same logic to insert an item at the appropriate leaf.

binary-search-tree

Self-balancing Binary Search Trees

A self-balancing binary search tree is also a binary search tree that keeps height of the nodes equal upon insertion and deletion. Typically a self-balancing binary search tree will be used when fast retrieval of data is the ultimate priority as there is a performance cost when inserting and deleting nodes as the tree must be rebalanced after each operation. Two of the more popular self-balancing binary search trees are red-black trees and ACL trees.

Now that we have a basic understanding of tree data structures and binary trees, in my next post I’ll go over how to implement a binary search tree. We’ll cover insert, delete and search operations which can make for some good interview questions.

3 comments:

  1. "This is much more performant than searching through an array of linear data because in an array, you potentially have to touch every element to find what you are looking for."

    Is this true?

    ReplyDelete
  2. Going with the definition the author provides, no, it's not. He states the definition of a binary search tree (BST) incorrectly, and this article is wrong.

    A BST is a binary tree (a tree in which each node has at most two children, a left node and a right node) with the constraints that a parent node's left child must always have a value smaller than the parent, and the right child must have a value larger than the parent, not just that the left child is smaller than the right child. http://www.technicallyidle.com/wp-content/uploads/2011/02/nodes-in-binary-search-tree.png

    Note that the binary search tree doesn't have to be balanced; one side can be much larger than the other, as long as ALL nodes to the "left" of any given node are smaller than it, and ALL nodes to the "right" are larger.

    With the added constraint of being a BALANCED binary search tree (there are a few ways of doing this), the asymptotic time complexity of searching for any given data value in the tree becomes O(log(n)) rather than the linear O(n).

    If anyone in an interview used this article as a reference for what a binary search tree is, I sure wouldn't hire them.

    ReplyDelete
  3. To provide an example, using the "binary search tree" image posted in the article (the one with 8 as the root node):

    Say you were looking for a 4. You start searching at the root node, 8. 8 isn't 4, so you compare the values. Since a 4 is smaller than an 8, you'd move to the left node, the 3. Again, 3 isn't 4. Since 4 is larger than 3, you'd move to the right child of 3, which doesn't exist, so the search is done: there is no 4 in the tree.

    See a problem?

    ReplyDelete