HashMap底层源码解析

2021/7/7 14:34:51

编程Tag： node 源码 hash 解析 value key null next hashmap

本文主要是介绍HashMap底层源码解析，对大家解决编程问题具有一定的参考价值，需要的程序猿们随着小编来一起学习吧！

HashMap作为java中使用频率非常高的集合之一，一直是面试的高频问题，接下来一块学习下HashMap的底层原理，以及jdk1.8版本都做了哪些优化。

一、jdk1.7和1.8中，HashMap的主要区别是什么？

1.底层架构有变化

jdk1.7中，HashMap 是以数组加链表的形式组成的

jdk1.8中，HashMap 是以数组加链表或红黑树的形式组成的，当链表长度大于8并且容量大于64时，链表会转换为红黑树。

数组中的元素在源码中是以node形式定义的，具体源码如下：

/**
     * Basic hash bin node, used for most entries.  (See below for
     * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
     */
    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

jdk1.8引入红黑树优化，主要是因为减少hash碰撞导致的链表过长问题，链表过长意味着在发生hash碰撞时，会影响HashMap的性能，而红黑树有快速增删改查的特点，这样可以解决链表过长导致的性能问题。

设置链表长度大于8才转为红黑树，是因为红黑树每次插入元素时都需要进行旋转保证平衡，为了在碰撞情况发生小的情况下保证性能，所以才会链表长度大于8才转为红黑树。

2.扩容方式有变化

扩容的时候，jdk1.7是每次重新拿key做hash运算，而jdk1.8中是通过高位运算来完成的，性能更高，可以重点关注下面的resize方法

3.链表元素插入方式有变化

jdk1.7中，链表插入元素时头插法，jdk1.8中采用的尾插法，这样也可以解决jdk1.7中的死循环问题，文章中有具体场景验证

二、HashMap源码解析

1.HashMap中有以下几个属性

    //初始化长度，默认是16
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    //最大长度
    static final int MAXIMUM_CAPACITY = 1 << 30;

    //加载因子
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    //链表长度大于多少转成红黑树
    static final int TREEIFY_THRESHOLD = 8;

    //红黑树元素个数小于多少转成链表
    static final int UNTREEIFY_THRESHOLD = 6;

加载因子也叫扩容因子或负载因子，用来判断什么时候进行扩容的，如果创建集合时，不设置容量，那么默认容量是16，在16*0.75=12个元素时，HashMap会进行扩容

那加载因子为什么是 0.75 而不是 0.5 或者 1.0 呢？

这其实是出于容量和性能之间平衡的结果：

当加载因子设置比较大的时候，扩容发生的概率低，占用空间小，但是hash冲突的几率也会大大提升，链表或红黑树就有更多的元素，在操作链表或红黑树时，性能会更低
而当加载因子值比较小的时候，扩容发生的概率高，占用空间大，发生hash冲突的几率就会比较小，操作新能会更高

所以综合了以上情况就选择0.75 作为加载因子。

2.查询方法

    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
             //判断第一个元素是否是要查询的元素，总是先判断第一个元素
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
             //开始判断下一个节点
            if ((e = first.next) != null) {
                 //如果当前是红黑树结构，那么使用getTreeNode方法获取元素
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                 //如果不是红黑树结构即是链表，循环遍历链表
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

3.新增

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        //哈希表为空，则初始化
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        //如果当前数组下标为空，初始化node
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            //如果key已经存在，直接覆盖
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            //如果是红黑树结构， putTreeVal方法插入数据
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                //是链表结构，循环插入数据
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        //将链表转为红黑树结构
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                     //如果key已经存在，直接覆盖
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        //是否需要扩容
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

4.扩容方法

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        // 超过最大值就不再扩容了
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // 扩大容量为当前容量的两倍，但不能超过 MAXIMUM_CAPACITY
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    // 当前数组没有数据，使用初始化的值
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        // 如果初始化的值为 0，则使用默认的初始化容量
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // 如果新的容量等于 0
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr; 
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    // 开始扩容，将新的容量赋值给 table
    table = newTab;
    // 原数据不为空，将原数据复制到新 table 中
    if (oldTab != null) {
        // 根据容量循环数组，复制非空元素到新 table
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                // 如果链表只有一个，则进行直接赋值
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    // 红黑树相关的操作
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    // 链表复制，JDK 1.8 扩容优化部分
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        // 原索引
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        // 原索引 + oldCap
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    // 将原索引放到哈希桶中
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // 将原索引 + oldCap 放到哈希桶中
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

从以上源码可以看出，JDK 1.8 在扩容时并没有像 JDK 1.7 那样，重新计算每个元素的哈希值，而是通过高位运算（e.hash & oldCap）来确定元素是否需要移动

5.jdk1.7中的死循环问题

先看下jdk1.7中的源码，当HashMap发生扩容时，会创建一个更大的数组，通过transfer方法来移动元素

void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    Entry[] newTable = new Entry[newCapacity];
    transfer(newTable, initHashSeedAsNeeded(newCapacity));
    table = newTable;
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}

void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
        while(null != e) {
            Entry<K,V> next = e.next;
            if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            int i = indexFor(e.hash, newCapacity);
            e.next = newTable[i];
            newTable[i] = e;
            e = next;
        }
    }
}

假如HashMap已经达到扩容临界点，并且同事有两个线程同时插入元素a、b，好巧不巧这两个元素都对应同一个hash，该hash桶中已有元素a，这两个线程都是执行resize并创建新的数组，有可能会触发死循环问题

主要造成问题原因的代码是Entry<K,V> next = e.next;

这篇关于HashMap底层源码解析的文章就介绍到这儿，希望我们推荐的文章对大家有所帮助，也希望大家多多支持为之网！

HashMap底层源码解析

相关编程文章