Last Updated: 2023-12-30 12:38:13 Saturday
-- TOC --
unordered_map容器,就是内部对象没有顺序的map容器,其查找性能要比map容器要快,虽然遍历顺序不稳定,但它更加常用。C++对其的实现,是基于Hash表。本文总结一些日常编程遇到的操作unordered_map容器的代码写法。
对于unordered_map对象,在使用[]
操作符按key访问value的时候,key可以不存在,此时对应此key的value会被创建,是一个空对象(value在访问时创建的)。使用at
接口时,如果key不存在,会throw。
测试代码:
#include <iostream>
#include <unordered_map>
using namespace std;
int main() {
unordered_map<int,int> ii {{1,2},{3,4},{5,6},{7,8}};
cout << ii.contains(2) << endl; // 0, not exist
cout << ii[2] << endl; // 0, create at access
cout << ii.contains(2) << endl; // 1, exist
try{
cout << ii.at(9) << endl; // throw std::out_of_range
}catch(out_of_range&){
cerr << __FILE__ << ":" << __LINE__ << ", out of range" << endl;
}
return 0;
}
Unlike with sequential containers, operator[] won’t cause undefined behavior if the key doesn’t exist; instead, it will (silently) default construct a Value and insert the corresponding key/value pair into the map, even if you only intended to perform a read.
从C++20开始,可以使用
contains
成员接口判断key是否存在。
这个特性支持了一种很nice的代码风格:
unordered_map<string,size_t> wneeds;
for(auto &w: words)
wneeds[w] += 1; // directly +1 to count
前一段的测试代码,说明了value对象在通过create by access机制创建的时候,需要有zero-parameter constructor,否则代码会异常。
测试代码:
#include <unordered_map>
using namespace std;
struct xyz{
int a;
xyz(int a): a{a} {}
};
int main() {
unordered_map<int,xyz> ixyz;
ixyz[1]; // call xyz's zero-parameter constructor
return 0;
}
用clang++编译,它给出的错误信息比较容易看明白:
$ clang++ test.cpp
In file included from test.cpp:1:
In file included from /usr/bin/../lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/unordered_map:46:
In file included from /usr/bin/../lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/hashtable.h:35:
In file included from /usr/bin/../lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/hashtable_policy.h:34:
/usr/bin/../lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/tuple:1824:2: error: no matching constructor for initialization of 'xyz'
second(std::forward<_Args2>(std::get<_Indexes2>(__tuple2))...)
...
error: no matching constructor for initialization of 'xyz'
使用[]
赋值的效果是覆盖原value,若key不存在,则是用默认值创建一个新元素,前文已经介绍完毕。而使用insert
或用emplace
接口插入entry的时候,如果key值已经在unordered_map容器中存在,则无效,操作不会成功,同时也没有异常!
测试代码:
#include <iostream>
#include <unordered_map>
using namespace std;
int main(void) {
unordered_map<int,int> ii;
ii[1] = 1; // init to 1
ii.insert({1,2}); // fail silently
cout << ii[1] << endl; // 1
ii.emplace(1,3); // fail silently
cout << ii[1] << endl; // 1
ii[1] = 9;
cout << ii[1] << endl; // update to 9
return 0;
}
如果使用自定义结构体类型作为unordered_map的key,此时需要自定义一个hash函数对象和此类型的operator==。注意:这两个接口的signature都有const属性,都是accessor。
#include <iostream>
#include <unordered_map>
using namespace std;
struct ball{
string name;
float price;
bool operator ==(const ball &b) const noexcept{
return name==b.name && price==b.price;
}
};
struct ball_hash{
size_t operator ()(const ball &b) const noexcept{
size_t h1 {hash<string>{}(b.name)};
size_t h2 {hash<float>{}(b.price)};
return h1^h2;
}
};
int main(){
unordered_map<ball,string,ball_hash> udict;
udict.insert({{"aa",1.11}, "bob"});
udict.insert({{"bb",2.22}, "tom"});
udict.insert({{"aa",3.33}, "cat"});
udict[{"cc",4.5}] = "fish"; // it's ok
for(auto &[k,v]: udict)
cout << k.name << " " << k.price << " " << v << endl;
return 0;
}
用指针做key是个非常不错的选择!这样就避免了所有这些麻烦事儿!
通过iterator或者range-based loop,都可以实现遍历unordered_map对象,后者更优雅。
#include <iostream>
#include <string>
#include <unordered_map>
using namespace std;
int main(void) {
unordered_map<int,string> umis;
umis[0] = "0000";
umis[1] = "1111";
umis[2] = "2222";
umis[3] = "3333";
// traverse by iterator
for(auto it{umis.begin()}; it!=umis.end(); ++it)
cout << it->first << ":" << it->second << endl;
// traverse in range-based loop
for(const auto &[k,v]: umis)
cout << k << ":" << v << endl;
return 0;
}
erase成员接口返回下一个iterator,注意:下一个iterator有可能等于end。示例:
#include <iostream>
#include <string>
#include <unordered_map>
using namespace std;
int main(void) {
unordered_map<int,string> umis;
umis[0] = "0000";
umis[1] = "1111";
umis[2] = "2222";
umis[3] = "3333";
for(auto it{umis.begin()}; it!=umis.end();){
if(it->first == 1){
it = umis.erase(it); // delete it, move it to next valid entry
continue;
}
cout << it->first << ":" << it->second << endl;
++it;
}
for(const auto &[k,v]: umis)
cout << k << ":" << v << endl;
return 0;
}
Bucket表示unordered_map中的一个位置,如果hash发生冲突,一个bucket内会存放多个元素(用链表或红黑树实现)。
#include <iostream>
#include <string>
#include <unordered_map>
using namespace std;
int main(void) {
unordered_map<string,int> ump;
for(int i=0; i<24; ++i)
ump[to_string(i)] = i;
cout << ump.bucket_count() << endl;
for(int i=0; i<ump.bucket_count(); ++i)
cout << i << ":" << ump.bucket_size(i) << endl;
cout << ump.max_bucket_count() << endl;
return 0;
}
输出:
29
0:0
1:0
2:0
3:1
4:0
5:2
6:0
7:2
8:1
9:0
10:2
11:1
12:1
13:0
14:0
15:0
16:1
17:1
18:1
19:0
20:1
21:0
22:2
23:4
24:2
25:0
26:2
27:0
28:0
164703072086692425
24个元素对应29个bucket,23号bucket内,挤了4个元素!好些个bucket都是空的,这就是hash的效果,冲突了。
这是unordered_map的负载因子。影响Hash表性能的重要因素是conflict,越多的conflict,性能就越差。load factor = size/bucket_count
,这个比值越小,发生conflict的概率较小,性能就越好,但内存占用会更多。
默认的load factor = 1.0
,可以设置,类型是float。从提高性能的角度看,调小这个值,性能越好,内存占用越多。
下面是测试代码:
#include <iostream>
#include <string>
#include <unordered_map>
using namespace std;
int main() {
unordered_map<int,int> ii;
ii.max_load_factor(0.618); // set
cout << 0 << ", load factor:" << ii.load_factor()\ // read
<< ", size:" << ii.size()\
<< ", bucket:" << ii.bucket_count() << endl;
for(int i{1}; i<64; ++i){
ii[i] = i;
cout << i << ", load factor:" << ii.load_factor()\
<< ", size:" << ii.size()\
<< ", bucket:" << ii.bucket_count() << endl;
}
return 0;
}
输出:
0, load factor:0, size:0, bucket:1
1, load factor:0.0526316, size:1, bucket:19
2, load factor:0.105263, size:2, bucket:19
3, load factor:0.157895, size:3, bucket:19
4, load factor:0.210526, size:4, bucket:19
5, load factor:0.263158, size:5, bucket:19
6, load factor:0.315789, size:6, bucket:19
7, load factor:0.368421, size:7, bucket:19
8, load factor:0.421053, size:8, bucket:19
9, load factor:0.473684, size:9, bucket:19
10, load factor:0.526316, size:10, bucket:19
11, load factor:0.578947, size:11, bucket:19
12, load factor:0.292683, size:12, bucket:41
13, load factor:0.317073, size:13, bucket:41
14, load factor:0.341463, size:14, bucket:41
15, load factor:0.365854, size:15, bucket:41
16, load factor:0.390244, size:16, bucket:41
17, load factor:0.414634, size:17, bucket:41
18, load factor:0.439024, size:18, bucket:41
19, load factor:0.463415, size:19, bucket:41
20, load factor:0.487805, size:20, bucket:41
21, load factor:0.512195, size:21, bucket:41
22, load factor:0.536585, size:22, bucket:41
23, load factor:0.560976, size:23, bucket:41
24, load factor:0.585366, size:24, bucket:41
25, load factor:0.609756, size:25, bucket:41
26, load factor:0.313253, size:26, bucket:83
27, load factor:0.325301, size:27, bucket:83
28, load factor:0.337349, size:28, bucket:83
29, load factor:0.349398, size:29, bucket:83
30, load factor:0.361446, size:30, bucket:83
31, load factor:0.373494, size:31, bucket:83
32, load factor:0.385542, size:32, bucket:83
33, load factor:0.39759, size:33, bucket:83
34, load factor:0.409639, size:34, bucket:83
35, load factor:0.421687, size:35, bucket:83
36, load factor:0.433735, size:36, bucket:83
37, load factor:0.445783, size:37, bucket:83
38, load factor:0.457831, size:38, bucket:83
39, load factor:0.46988, size:39, bucket:83
40, load factor:0.481928, size:40, bucket:83
41, load factor:0.493976, size:41, bucket:83
42, load factor:0.506024, size:42, bucket:83
43, load factor:0.518072, size:43, bucket:83
44, load factor:0.53012, size:44, bucket:83
45, load factor:0.542169, size:45, bucket:83
46, load factor:0.554217, size:46, bucket:83
47, load factor:0.566265, size:47, bucket:83
48, load factor:0.578313, size:48, bucket:83
49, load factor:0.590361, size:49, bucket:83
50, load factor:0.60241, size:50, bucket:83
51, load factor:0.614458, size:51, bucket:83
52, load factor:0.311377, size:52, bucket:167
53, load factor:0.317365, size:53, bucket:167
54, load factor:0.323353, size:54, bucket:167
55, load factor:0.329341, size:55, bucket:167
56, load factor:0.335329, size:56, bucket:167
57, load factor:0.341317, size:57, bucket:167
58, load factor:0.347305, size:58, bucket:167
59, load factor:0.353293, size:59, bucket:167
60, load factor:0.359281, size:60, bucket:167
61, load factor:0.365269, size:61, bucket:167
62, load factor:0.371257, size:62, bucket:167
63, load factor:0.377246, size:63, bucket:167
每当load factor突破0.618时,bucket都会自动增加,降低load factor。
对于unordered_map对象来说,每当load factor突破max_load_factor时,bucket都会自动增加到下一个档位。此时,会有一个看不见的rehash的过程,即将所有entry重新hash到新申请的更大的bucket空间中去。reserve和rehash成员,就是手动实现这一过程的接口。
reserve(count)
,Sets the number of buckets to the number needed to accommodate at least count elements without exceeding maximum load factor and rehashes the container. Effectively callsrehash(std::ceil(count / max_load_factor()))
.
rehash(n)
, changes the number of buckets to a value n that is not less than count and satisfiesn >= size() / max_load_factor()
, then rehashes the container.
rehash(0)
有shrink_to_fit的效果。
本文链接:https://cs.pynote.net/sf/c/cpp/202210311/
-- EOF --
-- MORE --