关于c++ vector<T> 和 vector<bool>

vector是vector的特化版，在gnu的stl_bvector文件中，有详细介绍。

  /**
   *  @brief  A specialization of vector for booleans which offers fixed time
   *  access to individual elements in any order.
   *
   *  @ingroup sequences
   *  @headerfile vector
   *  @since C++98
   *
   *  @tparam _Alloc  Allocator type.
   *
   *  Note that vector<bool> does not actually meet the requirements for being
   *  a container.  This is because the reference and pointer types are not
   *  really references and pointers to bool.  See DR96 for details.  @see
   *  vector for function documentation.
   *
   *  In some terminology a %vector can be described as a dynamic
   *  C-style array, it offers fast and efficient access to individual
   *  elements in any order and saves the user from worrying about
   *  memory and size allocation.  Subscripting ( @c [] ) access is
   *  also provided as with C-style arrays.
  */

大概意思就是说vector<bool>实际上并不满足作为容器的要求。这是因为引用和指针类型实际上并不是指向 bool 的引用和指针。也就是不支持如下操作：

vector<bool> arr{1,0,1,0,1,0};
bool *v_ptr = &arr[0]; // compile error

bool arr2[] = {1,0,1,0,1,0};
v_ptr = &arr2[0]; // compile success

产生如上编译错误的原因是因为在基本数据类型中，由于计算机读写的最小数据单位是字节，因此对于bool类型的数组，每个bool类型都存储在一个字节中，这就会导致空间的极大浪费，vector<bool>对bool类型的vector容器做了特殊化处理，将以往的bool类型的数组没用到的空间利用起来，这样就不再是一个字节存储一个bool类型的数据，而是一个字节存储8个bool类型的数据。

但这又会带来其他方面的缺陷，例如在性能上没有其他vector容器速度快，同时不能使用指针或者引用去指向其中的某个元素。

查看关于vector的源代码，也可以看出vector<bool>和vector<T>很大的差异，例如vector<bool>被列为一个单独的文件，同时继承于不同的基类，如下

// vector<T> from stl_vector.h
  template<typename _Tp, typename _Alloc = std::allocator<_Tp> >
    class vector : protected _Vector_base<_Tp, _Alloc>

// vector<bool> from stl_bvector.h
  template<typename _Alloc>
    class vector<bool, _Alloc> : protected _Bvector_base<_Alloc>

不同的STL实现有不同的方式，代码也许不同，但是思想基本一致，常见是STL实现有如下三种版本：

GNU Standard C++ Library (libstdc++): GNU Compiler Collection (GCC) 使用的标准库。（当前文章中使用的版本）

Microsoft Standard Library (MSVC STL): Microsoft Visual Studio 使用的标准库。

LLVM Standard Library (libc++): LLVM 项目的标准库实现。

存在如下代码：

vector<bool> arr{1,0,1,0,1,0,1};
bool a = arr[0];
auto b = arr[0];

cout << "type a:" << typeid(a).name() << "\ntype b:" << typeid(b).name() << endl;

b = false;
cout << "arr[0]: " << arr[0] << endl;

对于a的初始化，它其实暗含了一个隐式的类型转换，

而对于b的类型，它的类型并不是bool，而是vector<bool>中的一个内部类（reference类），这也间接表明vector<bool>中的每一个元素实际上并不是一个真正的bool类型。感兴趣的可以网上细查，这里不细说。

同时修改b的值，也会导致arr中的值发生变化。

type a:b
type b:NSt3__115__bit_referenceINS_6vectorIbNS_9allocatorIbEEEELb1EEE // 该类型就是_Bit_reference，这里输出的是编译器生成的符号名称，没有转换为人类可读的方式
arr[0]: 0

如果销毁arr，b将会变成一个悬挂指针，这时在对b进行操作就属于未定义行为。

关于sizeof(vector<T>)

有如下代码：

cout<<"sizeof(vector<char>) = "<<sizeof(vector<char>)<<endl;
cout<<"sizeof(vector<int>) = "<<sizeof(vector<int>)<<endl;
cout<<"sizeof(vector<short>) = "<<sizeof(vector<short>)<<endl;
cout<<"sizeof(vector<double>) = "<<sizeof(vector<double>)<<endl;
cout<<"sizeof(vector<long>) = "<<sizeof(vector<long>)<<endl;
cout<<"sizeof(vector<float>) = "<<sizeof(vector<float>)<<endl;
cout<<"sizeof(vector<bool>) = "<<sizeof(vector<bool>)<<endl;

输出结果为：

sizeof(vector<char>) = 24
sizeof(vector<int>) = 24
sizeof(vector<short>) = 24
sizeof(vector<double>) = 24
sizeof(vector<long>) = 24
sizeof(vector<float>) = 24
sizeof(vector<bool>) = 40

sizeof是在编译器进行计算，而vector是可变长数组，确定一个vector数组，使用三个变量即可唯一确定，参考STL源码，如下是部分代码

  template<typename _Tp, typename _Alloc = std::allocator<_Tp> >
    class vector : protected _Vector_base<_Tp, _Alloc>
    { ... }
  template<typename _Tp, typename _Alloc> struct _Vector_base
    {
      typedef typename __gnu_cxx::__alloc_traits<_Alloc>::template
    rebind<_Tp>::other _Tp_alloc_type;
      typedef typename __gnu_cxx::__alloc_traits<_Tp_alloc_type>::pointer
        pointer;

      struct _Vector_impl_data
      {
    pointer _M_start;               // 内存起始位置，begin
    pointer _M_finish;              // 当前已存储元素的下一个位置，end
    pointer _M_end_of_storage;// 容器边界，capicity

// sizeof(vector<T>) = _M_start(8 bypes) + _M_finish(8 bypes) + _M_end_of_storage(8 bypes) = 24

vector使用三个pointer指针，即可唯一确定一个vector容器，这也是sizeof(vector<T>)（T不是bool）不同类型，但得到的结果都一样的原因。

那为什么sizeof(vector<bool>)的大小是40呢？

由于vector对bool类型的数据的特例化，利用了一个字节内的所有bit，这导致对bool类型的处理更为复杂，我们先查看STL中对vector bool类型的定义，再来说明解释缘由

template<typename _Alloc>
    class vector<bool, _Alloc> : protected _Bvector_base<_Alloc>
    { ... }
struct _Bvector_base
    {
      typedef typename __gnu_cxx::__alloc_traits<_Alloc>::template
        rebind<_Bit_type>::other _Bit_alloc_type;
      typedef typename __gnu_cxx::__alloc_traits<_Bit_alloc_type>
    _Bit_alloc_traits;
      typedef typename _Bit_alloc_traits::pointer _Bit_pointer;

      struct _Bvector_impl_data
      {
    _Bit_iterator   _M_start;                       // 内存起始位置，begin
    _Bit_iterator   _M_finish;                  // 当前已存储元素的下一个位置，end
    _Bit_pointer    _M_end_of_storage;  // 容器边界，capicity
      }

  // 关于_Bit_iterator的大小的定义
  struct _Bit_iterator : public _Bit_iterator_base{ ... }
  struct _Bit_iterator_base
  : public std::iterator<std::random_access_iterator_tag, bool>
  {
    _Bit_type * _M_p;                           // 指向位数组的指针
    unsigned int _M_offset;                 // 在位数组中的偏移量
  }
  // sizeof(vector<bool>) = _M_start (12 bytes) + _M_finish (12 bytes) + _M_end_of_storage (8 bytes) = 32 bytes

然而，由于内存对齐的原因，使结构体的总大小变为 40 字节。

结构原理：

vector<bool> 不是简单地存储布尔值，而是将多个布尔值打包在一个更大的基本类型（例如unsigned int）中进行存储。读取 vector<bool> 中的某一位需要定位到相应的 _Bit_type 单元，并从中提取出特定的位。基本步骤如下：

首先使用_M_start定位到相应的单元
将_M_p设置为该单元的地址
然后使用偏移量_M_offset提取出特定的位

所以这也是为什么当使用bool类型的vector后，读写效率下降、不支持指针和引用的原因。个人认为，vector对bool的这种操作，虽然有效利用了空间，但是得不偿失。不过当必须使用bool类型的变长数组，一般推荐如下两种方式

使用vector<char>或者vector<int>代替
使用deque<bool>代替，deque没有对bool类型进行特例化，一个bool类型仍然使用8bit存储

一	二	三	四	五	六	日
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28

归档日历

最新文章

关于c++ vector<T> 和 vector<bool>

关于sizeof(vector<T>)

那为什么sizeof(vector<bool>)的大小是40呢？

发送评论编辑评论

关于c++ vector<T> 和 vector<bool>

关于sizeof(vector<T>)

那为什么sizeof(vector<bool>)的大小是40呢？

发送评论 编辑评论

推荐文章

发送评论编辑评论