wangjie_fourth

may the force be with you

0%

HashMap源码分析

哈希表

哈希表是能够快速找到某个元素的一种数据结构,在算法题上,经常用这个数据结构去优化查找特定元素的O(n)时间复杂度到O(1)时间复杂度。但这个数据结构只适合单个元素查找,如果是范围查找的话,就不太适用了。
哈希表的缺点就是哈希值的碰撞collision。哈希碰撞是指:hash桶有多个元素了。常见解决哈希碰撞得方法就是在hash桶后面加个其他东西,比如数组、链表、树。

jdk1.8中的HashMap

jdk1.8中的HashMap主要发生了以下改变:

  • 底层是采用数组 + 链表/红黑树组成的哈希表来解决哈希表退化成链表来带来的安全隐患
  • 在扩容时,通过对插入顺序的改进优化了死锁问题

相关概念

负载因素:load factor

引用JDKHashMap的注释:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
* <p>An instance of <tt>HashMap</tt> has two parameters that affect its
* performance: <i>initial capacity</i> and <i>load factor</i>. The
* <i>capacity</i> is the number of buckets in the hash table, and the initial
* capacity is simply the capacity at the time the hash table is created. The
* <i>load factor</i> is a measure of how full the hash table is allowed to
* get before its capacity is automatically increased. When the number of
* entries in the hash table exceeds the product of the load factor and the
* current capacity, the hash table is <i>rehashed</i> (that is, internal data
* structures are rebuilt) so that the hash table has approximately twice the
* number of buckets.
*
* <p>As a general rule, the default load factor (.75) offers a good
* tradeoff between time and space costs. Higher values decrease the
* space overhead but increase the lookup cost (reflected in most of
* the operations of the <tt>HashMap</tt> class, including
* <tt>get</tt> and <tt>put</tt>). The expected number of entries in
* the map and its load factor should be taken into account when
* setting its initial capacity, so as to minimize the number of
* rehash operations. If the initial capacity is greater than the
* maximum number of entries divided by the load factor, no rehash
* operations will ever occur.
  • load factor是表示一个哈希表的饱满程度;如果超过这个程度,就会触发resize方法
  • load factor的默认值0.75是在时间与空间基础上权衡的最佳值(在一般情况下)

容量:capacity

capacity是值哈希表中桶的个数,DEFAULT_INITIAL_CAPACITY是16。这个值有一个特点就是必须是2的倍数,即使你在构造方法中不传2的倍数,后面也会修改成2的倍数。

哈希表resize的阙值:threshold

threshold表示当前哈希表的最大值,即capacity * load factor。一旦超过这个值,当前哈希表就会进行resize操作。

树化的阙值:TREEIFY_THRESHOLD

1
2
3
4
5
6
7
8
9
/**
* The bin count threshold for using a tree rather than list for a
* bin. Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
*/
static final int TREEIFY_THRESHOLD = 8;

哈希表中哈希桶从链表升级到红黑树的值。不过关键值不止这一个,还有MIN_TREEIFY_CAPACITY。俩者同时满足,链表才会升级成红黑树。

退树化的阙值:UNTREEIFY_THRESHOLD

1
2
3
4
5
6
/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
*/
static final int UNTREEIFY_THRESHOLD = 6;

最低树化的阙值:MIN_TREEIFY_CAPACITY

1
2
3
4
5
6
7
/**
* The smallest table capacity for which bins may be treeified.
* (Otherwise the table is resized if too many nodes in a bin.)
* Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
* between resizing and treeification thresholds.
*/
static final int MIN_TREEIFY_CAPACITY = 64;

哈希表树化开始的最低值

相关流程

resize

前面看到capacityload factorthreshold都是跟resize操作相关的,那么为啥要resize
为了更加有效率, 在HashMap的注释中说明load factor0.75是根据泊松分布计算出来的结果,在满足0.75的前提下,此时哈希表发生哈希碰撞的概率非常低。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) {
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
else { // zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
if (oldTab != null) {
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if ((e = oldTab[j]) != null) {
oldTab[j] = null;
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
// preserve order:保持顺序,这里仅仅是降低死锁的概率,并没有完全解决死锁问题
// 低位链表
Node<K,V> loHead = null, loTail = null;
// 高位链表
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}

putVal

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
// 判断当前哈希表是否为null,为空就新建一个
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
// 如果是第一个节点,就往这个哈希桶添加一个元素
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
// 如果不是第一个节点
else {
Node<K,V> e; K k;
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
// 节点是树节点
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
// 节点是链表
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
// 如果达到可以红黑树化的量,判断是否需要将链表转换为树,如果不需要就resize一下
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}

为什么哈希桶个数要保持2的幂次方

1、即使你在构造函数时,不传2n次方,在后来的初始化方法中,也会强制变成2n次方
可以参考tableSizeFor()方法

2、让元素能够快速定位哈希桶;让Hash表元素分布均匀,减少哈希碰撞的几率
(1)快速定位哈希桶
在哈希表实现中,有一个很重要的问题:如何将hash值映射到几个哈希桶里呢?对应java来说,就是int范围的数如何映射到哈希桶里呢?
很常见的解决方案是:取模,即int % n。但是这种方案有俩个缺点:

  • 负数取模仍然是负数,这里要确保hash值一定是正数;
  • 相比较HashMap的实现,这种方案性能稍微差一点;
    1
    2
    3
    1.7:indexFor(int h, int length){
    return h & (length - 1);// java8是直接用这段代码的
    }
    (2)为什么能让Hash表元素分布均匀,减少哈希碰撞的几率?
    leng - 1 等价于2n - 1,其值格式是:11111...,这样它与hash计算时,得到哈希桶下标,完全取决于hash,只要确保hash算法比较均衡,那么哈希表元素就会比较均衡了。

3、方便扩容
在扩容之后,所有的元素需要重新计算其所在哈希桶的位置。只要保证哈希桶容量是2的幂,其原先的元素要么保持不变,要么被移动到旧index+旧桶大小的位置上。

jdk1.7中的HashMap

相关概念

源码分析

缺点

其他知识点

HashMapkey可以为null,而ConcurrentHashMapkey不可以为null


https://coolshell.cn/articles/9606.html