二叉查找树学习笔记

二叉查找树如其名是一种主要用于查找的每个非叶子节点最多有两个子节点（左右子节点）的树，其特点是：

 a
b c

任何a, b, c三个节点（父节点和两个子节点，左右子节点可能只有一个或者都没有）满足a < b < c。对于相等的值使用节点计数+1或者节点中的链表等不同的方式来处理。 a < b < c这个条件乍一看和二分搜索binary_search很像，实际查找元素时也是类似的，时间复杂度在O(logn)。一般的二叉查找树（不考虑链表退化和平衡的处理）包含查找，创建树/增加节点，删除节点，前/中/后序遍历等操作，下面分别介绍。

节点数据结构

二叉查找树不一定是完全或者近似完全二叉树，所以不能直接用数组来存储，也就是大部分情况都是基于链表的方式，下面就是节点的结构定义：

// public class BinarySearchTree>

class Node {
  T value;
  Node leftChild;
  Node rightChild;

  // assumption: left < value < right

  Node(T value) {
    this.value = value;
  }
}

T是Java的泛型定义，类定义上满足T是可以比较的。节点要关联左右两个节点。最后是一个构造函数，方便创建无子树节点。

二叉查找树类定义

public class BinarySearchTree<T extends Comparable<T>> implements Iterable<T> {

  private Node root;
  
  public boolean contains(T x) { ... }
  public void insert(T x) { ... }
  public void remove(T x) { ... }
  public Iterator<T> iterator() { ... }

}

类只有一个成员变量，即根节点，四个方法分别是查找（是否存在），插入元素，删除元素，遍历（这里是中序遍历，结果为元素升序排列）。

查找

首先从最简单的查找开始。和二分查找类似，步骤是

判断当前节点是否和目标元素一致，一致就认为存在
如果目标元素小于当前节点，从当前节点的左子树中查找（递归）
否则（目标元素大于当前节点），从当前节点中的右子树中查找（递归）

实际实现中，可以将左右子节点是否为空放在递归步骤的第一步中，这样就可以减少一些空（null）判断。代码如下：

public boolean contains(T x) {
  return contains(x, root);
}

private boolean contains(T x, Node node) {
  if (node == null) return false;
  int relation = x.compareTo(node.value);
  if (relation == 0 /* equal */) return true;
  if (relation < 0 /* less than */) return contains(x, node.leftChild);
  return contains(x, node.rightChild);
}

插入元素

插入元素用的步骤和查找类似，但是找到后的处理步骤不同。以下是我实现时参考的步骤：

如果目标元素和当前节点一致，抛出异常：重复节点
如果目标元素小于当前节点，并且当前节点左节点不存在，创建一个新的节点链接到当前元素；否则插入元素到左子树中
否则（目标元素大于当前节点）在当前节点右节点不存在时，创建一个新的节点链接到当前元素；否则插入元素到右子树中

这里给出两种方式，第一种需要多次判断空(null)，但是相对清晰。第二种减少判断空(null)，相对难理解一些，应该就是教科书上的那种。

public void insert(T x) {
  if (root == null)
    root = new Node(x);
  else
    insert(x, root);
}

private void insert(T x, Node node) {
  int relation = x.compareTo(node.value);
  if (relation == 0 /* equal */)
    throw new IllegalArgumentException("duplicated element [" + x + "]");
  if (relation < 0 /* less than */) {
    if (node.leftChild == null) {
      node.leftChild = new Node(x);
    } else {
      insert(x, node.leftChild);
    }
  } else {
    // greater than
    if (node.rightChild == null) {
      node.rightChild = new Node(x);
    } else {
      insert(x, node.rightChild);
    }
  }
}

第二种方式

public void insert2(T x) {
  root = insert2(x, root);
}

private Node insert2(T x, Node node) {
  if (node == null) return new Node(x);
  int relation = x.compareTo(node.value);
  if (relation == 0 /* equal */)
    throw new IllegalArgumentException("duplicated element [" + x + "]");
  if (relation < 0 /* less than */) {
    node.leftChild = insert2(x, node.leftChild);
  } else {
    // greater than
    node.rightChild = insert2(x, node.rightChild);
  }
  // no node created, just return node itself
  return node;
}

删除元素

应该是实现二叉查找树中最难的部分，考虑如下二叉查找树：

删除分为三种情况：

删除类似1这种没有子树的节点，直接删除即可
删除类似6这种只有左子树或者只有右子树的节点，把这种节点的左子树或者右子树链接到他的父亲节点，从约束上左子树的所有元素都小于待删除节点的父亲节点，右子树的所有元素都大于待删除接节点的父亲节点
删除类似8这种左右子树都存在的情况，方案有把6移动到8位置，或者把9移动到8的位置，更一般的描述是使用当前节点的中序前继或者后继节点替换当前节点的值，并删除中序前继或者后继节点（递归）

对于最后一种情况，结合二叉查找树中序遍历的结果升序序列来看，就是就是把邻接当前节点值的节点替换当前节点。比如[1, 3, 4]删除3，其实就用4或者1替换3，再删除4或者1，最后变成[1, 4]。
因为分析比较复杂，代码也稍微复杂一点，注释相对多一些。

// position of child, for remove
enum ChildPostion {
  ROOT, // current node is root
  LEFT, // current node is left child of parent node
  RIGHT; // current node is right child of parent node
}

public void remove(T x) {
  if (root == null) throw new IllegalStateException("empty tree");
  remove(x, root, null, ChildPostion.ROOT);
}

private void remove(T x, Node node, Node parent, ChildPostion position) {
  if (node == null) return; // no such element, do nothing
  int relation = x.compareTo(node.value);
  if (relation < 0 /* less than */) {
    remove(x, node.leftChild, node, ChildPostion.LEFT);
  } else if (relation > 0 /* greater than */) {
    remove(x, node.rightChild, node, ChildPostion.RIGHT);
  } else {
    // current one is the node contains the element to be removed
    remove(node, parent, position);
  }
}

private void remove(Node node, Node parent, ChildPostion position) {
  if (node.leftChild == null) {
    if (node.rightChild == null) {
      // case 1: no child, just remove node itself
      connectNode(parent, position, null);
    } else {
      // case 2: only right child presents
      // connect parent to right child
      connectNode(parent, position, node.rightChild);
    }
  } else {
    // left child presents
    if (node.rightChild == null) {
      // case 2: only left child presents
      // connect parent to left child
      connectNode(parent, position, node.leftChild);
    } else {
      // case 3: left and right child both present
      // find in-order successor
      // node contains minimum value in right child tree
      Node min = node.rightChild;
      Node minParent = node;
      while (min.leftChild != null) {
        minParent = min;
        min = min.leftChild;
      }
      // replace value
      node.value = min.value;
      // remove in-order successor
      ChildPostion minPosition =
          minParent == node ? ChildPostion.RIGHT : ChildPostion.LEFT;
      remove(min, minParent, minPosition);
    }
  }
}

private void connectNode(Node parent, ChildPostion position, Node newChild) {
  switch (position) {
    case ROOT:
      root = newChild;
      break;
    case LEFT:
      parent.leftChild = newChild;
      break;
    case RIGHT:
      parent.rightChild = newChild;
      break;
    default:
      throw new IllegalArgumentException("unexpected child position ["
          + position + "]");
  }
}

说明一下实际实现中需要注意的一些东西，因为结构上的限制，当前节点不能直接获得父亲节点，所以删除节点时需要传入父亲节点和子节点位置（注意第三个重载的remove方法）。最后一个connectNode是实际的替换/删除/链接方法。对于只有一个节点（根）的情况，直接设置为null。重点查看第三个remove方法。
左右子树是否存在会有2×2共4中情况。两个都没有，直接让父亲节点链接null，即设置此节点为空。只有右子树，链接父亲节点到右子树。只有左子树，链接父亲节点到左子树。两者都有，这里选择了取中序后继节点，即右子树中最小的值，最小值节点的位置根据约束应该在左边，除了右子树只有一个节点的情况，所以需要判断右子树最小节点是不就是右子树节点，这里直接判断父亲节点是否就是当前节点也是一样的。

中序遍历

乍一下，貌似很简单，假如你用递归实现的话，但是退化为栈的话，就需要一点考虑了。
个人的思路是先把根节点压入栈，枚举时从栈中弹出元素，判断是否已经展开，如果已经展开就能直接返回元素，否则持续展开到有展开的元素为止。具体代码如下：

public Iterator<T> iterator() {
  return inorderIterator();
}

public Iterator<T> inorderIterator() {
  return new InOrderIterator(root);
}

// in-order iterator
class InOrderIterator implements Iterator<T> {

  class NodeWithStatus {

    Node node;
    boolean expanded;

    NodeWithStatus(Node node, boolean expanded) {
      this.node = node;
      this.expanded = expanded;
    }

    T getValue() {
      return node.value;
    }

    boolean isEmpty() {
      return node == null;
    }

  }

  private LinkedList<NodeWithStatus> stack;

  public InOrderIterator(Node node) {
    stack = new LinkedList<NodeWithStatus>();
    pushItem(unexpandedNode(root));
  }

  private void pushItem(NodeWithStatus item) {
    if (!item.isEmpty()) {
      stack.push(item);
    }
  }

  private NodeWithStatus unexpandedNode(Node node) {
    return new NodeWithStatus(node, false);
  }

  private NodeWithStatus expandedNode(Node node) {
    return new NodeWithStatus(node, true);
  }


  public boolean hasNext() {
    return !stack.isEmpty();
  }

  public T next() {
    if (!hasNext()) throw new IllegalStateException("no next element");
    NodeWithStatus item;
    while (!(item = stack.pop()).expanded) {
      Node node = item.node;
      pushItem(unexpandedNode(node.rightChild));
      pushItem(expandedNode(node));
      pushItem(unexpandedNode(node.leftChild));
    }
    return item.getValue();
  }

  public void remove() {
    throw new UnsupportedOperationException("unsupported operation");
  }

}

这里在原有节点基础上增加了是否展开的属性。

最后因为整体代码比较长，给一下GIST地址