HashMap的底层原理

2023-02-05 01 Java java 0 评论阅读时长: 7(分)

1. 简介

HashMap是java后端面试必问内容，其中一个高频问题就是：HashMap的底层原理是怎样的？

本文介绍HashMap的原理，包括数据结构、存储机制、hashCode方法。

2. HashMap原理总结

HashMap其实就是一个大的数组，将key的hashCode作为数组的下标，将value作为数组的值，如果key的hashCode重复（即：数组下标重复）
，则将新的key和旧的key放到链表中。

如果链表长度大于等于8（且数组大小大于等于64），则将链表改为红黑树（提高查询的效率）

如果红黑树节点个数小于6，则将红黑树转化为链表。

3. 数据结构

数组和链表

数据结构中有数组和链表来实现对数据的存储，但这两者个有利弊。
哈希表

哈希表：综合数组和链表的特性，查找（寻址）容易，插入删除容易，占空间中等的数据结构。

哈希表有多种不同的实现方法，HashMap则使用的是拉链法，也叫链地址法。

例如：

哈希表的数组初始化长度为16，每个元素存储的是一个链表的头结点，这些元素按照hash(key)% len获得，也就是元素key的哈希值对数组长度区模得到；

12 % 16=12；28 % 16=12；108 % 16=12；140 % 16=12

所以： 12， 28， 108， 140都存储在数组下标为12的位置
红黑树

4. 存储机制

键值对的数据

每个键值对都是一个Node<K,V>对象，它实现了Map.Entry<K,V>接口。Node<K,V>有四个属性：hash、key、value、next(下一个结点)

public class HashMap<K,V> extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable {
    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
    }
    // 其他代码
}

既然是线性数组，为什么能随机存取？这里HashMap用来一个小算法，大致实现如下：

// 存储时:
int hash = key.hashCode();
int index = hash % Entry[].length;
Entry[index] = value;
 
// 取值时:
int hash = key.hashCode();
int index = hash % Entry[].length;
return Entry[index];

5. put方法

哈希冲突：若两个key通过hash(key) % Entry[].length得到的index相同，怎么处理？HashMap用的是链表，Entry类里面有一个next属性，指向下一个Entry.

比如：

第一个键值对A进来，通过计算key的hash得到的index=0，记做：Entry[0] = A
又进来一个键值对B，通过计算key的hash得到的index=0， HashMap会这样做：B.next = A, Entry[0] = B
又进来C，index也等于0，那么C.next = B,Entry[0] = C；

这样我们发现index = 0的地方其实利用单链表的头插法存储了A，B，C三个键值对，他们通过next属性连接在一起，所以会发生覆盖的情况，数组中存储的总是最后插入的元素

//部分代码
transient Node<K,V>[] table;
transient int size;
static final int TREEIFY_THRESHOLD = 8;
int threshold;
 
public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}
 
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // 若链表的节点数大于等于8，则转化为红黑树
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        // 如果key在链表中已存在，则替换为新value
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

6. get方法

public V get(Object key) {
    if (key == null)
        return getForNullKey();
    int hash = hash(key.hashCode());
    //先定位到数组元素，再遍历该元素处的链表
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
            return e.value;
    }
    return null;
}

7. null key的存取

null key总是存放在Entry[]数组的第一个元素

private V putForNullKey(V value) {
    for (Entry<K,V> e = table[0]; e != null; e = e.next) {
        if (e.key == null) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
    modCount++;
    addEntry(0, null, value, 0);
    return null;
}
 
private V getForNullKey() {
    for (Entry<K,V> e = table[0]; e != null; e = e.next) {
        if (e.key == null)
            return e.value;
    }
    return null; 
}

确定数组

index: hashcode % table.length 取模

HashMap存取时，都需要计算当前key对应Entry[]数组哪个元素，即计算数组下标，算法如下：
1
2
3
4
// Returns index for hash code h.
static int indexFor(int h, int length) {
return h & (length-1);
}
换位取并，作用上相当于取模mod获取取余%;这意味着数组下标相同，并不表示hashcode相同。

table初始大小

public HashMap(int initialCapacity, float loadFactor) {
    .....
    // Find a power of 2 >= initialCapacity
    int capacity = 1;
    while (capacity < initialCapacity)
        capacity <<= 1;
    this.loadFactor = loadFactor;
    threshold = (int)(capacity * loadFactor);
    table = new Entry[capacity];
    init();
}

注意：table初始大小并不是构造函数中的initialCapacity, 而是 >= initialCapacity的2的n次幂

8. hashCode方法

String#hashCode

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;
 
        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

为什么乘 31 呢？

选31是因为它是一个奇素（质）数，这里有两层意思：奇数 && 素数

什么是奇数，偶数不行？

因为如果乘子是偶数，并且当乘法溢出的时候(数太大，int装不下)，相当于在做位移运算，有信息就损失了

比如说只给2bit空间，二进制的10，乘以2相当于左移1位，10(bin) << 1 = 00,1就损失了。
为什么是31

h * 31 == (h << 5) - h;现代虚拟机会自动做这样的优化，计算快。

再反观其他数：

h * 7 == (h << 3) - h; //太小，容易hash冲突

h * 15 == (h << 4) - h; //15不是素数

h*31 == (h<<5)-h; // 31既是素数又不大不小刚刚好

h*63 == (h<<6)-h; // 63不是素数

h*127 == (h<<7)-h; // 太大了，乘不到几下就溢出了

实例追踪：

"abc".hashCode()
hash为：0
value为：[a, b, c]

第一次循环：h = 0 + 97 = 97

第二次循环：h = 31 * 97 + 98 = 3105

第三次循环：h = 3105 * 31 + 99 = 9635

本文链接： http://xiaoyuge.work/hashmap/index.html

版权声明： 本博客所有文章除特别声明外，均采用 CC BY 4.0 CN协议许可协议。转载请注明出处！本博客如有侵权，请联系删除！

小余哥xiaoyouge

分享一些技术教程与实践，以及日常踩坑记录。