bytes对象 | CS笔记

bytes()函数是python的一个内置函数，用此函数可以创建不可变的（immutable）bytes对象（bytes对象，可以理解为byte string）。

>>> bytes([1,2,3,4,5])
b'\x01\x02\x03\x04\x05'
>>> bytes((1,2,3,4,5))
b'\x01\x02\x03\x04\x05'
>>> bytes(b'12345')
b'12345'

由于每一个byte的值只能是0到255，因此在创建bytes对象的时候，输入的某个int值如果大于255，就会出错：

>>> bytes([123])
b'{'
>>> bytes([1234])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: bytes must be in range(0, 256)
>>> bytes(range(256))
b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'

普通string转换到bytes

>>> bytes('12345', 'utf-8')
b'12345'
>>> bytes('www.pynote.net', 'utf-8')
b'www.pynote.net'
>>> bytes('麦新杰', 'utf-8')
b'\xe9\xba\xa6\xe6\x96\xb0\xe6\x9d\xb0'
>>> bytes('麦新杰', 'utf-8').decode()
'麦新杰'

bytes对象decode，就有回到了string对象。string对象encode，得到bytes。

bytes(int)

如果bytes()函数只有一个int参数，表示创建一个有int个NULL为值（0x00）的bytes对象：

>>> bytes(2)
b'\x00\x00'
>>> bytes(6)
b'\x00\x00\x00\x00\x00\x00'

由于得到的bytes对象不可更改，这个功能可能只能在某些拼接byte string的时候用得上。

直接调用bytes()

直接无参数使用bytes()，得到一个空的bytes对象。就像直接调用list()，set()，dict()一样.....

bytes对象的字符串特性

>>> a = bytes([1,2,3,4,5,6,7,8])
>>> len(a)
8
>>> a[2:]
b'\x03\x04\x05\x06\x07\x08'
>>> a[5:7]
b'\x06\x07'
>>> a[7]
8
>>> a.find(b'5')
-1
>>> a.find(b'\x05')
4

单独取bytes的一个字符，就是一个0-255的数字。再次提醒创建bytes对象时的list中的1-8，不是ASCII中的数字1-8，而是0-255中的1-8。

用re模块处理bytes对象

python中的bytes-like对象为bytes和bytearray，re模块一样可以对它们进行正则表达式的匹配。

>>> import re
>>> a = b'abcde 12345'
>>> re.search(rb'\d+', a)
<re.Match object; span=(6, 11), match=b'12345'>
>>> re.search(rb'\w+', a)
<re.Match object; span=(0, 5), match=b'abcde'>

小技巧就是，在正则表达式前面加b 。rb连在一起，就是raw bytes的意思。

>>> import re
>>> re.search(rb'\x00', bytes.fromhex('000102030405'))
<re.Match object; span=(0, 1), match=b'\x00'>
>>> re.search(rb'\x00{2}', bytes.fromhex('000102030405'))
>>> re.search(rb'\x00{2}', bytes.fromhex('00000102030405'))
<re.Match object; span=(0, 2), match=b'\x00\x00'>
>>> re.search(rb'\x00\x01', bytes.fromhex('00000102030405'))
<re.Match object; span=(1, 3), match=b'\x00\x01'>

bytes转int

>>> int.from_bytes(b'\xF1\x02\x03\x04', 'big')
4043440900
>>> int.from_bytes(b'\xF1\x02\x03\x04', 'big', signed=True)
-251526396

signed参数表示这个bytes对应的是有符号的数，或者无符号的int。

>>> int.from_bytes(b'\xFF\xFF\xFF\xFF', 'big')
4294967295
>>> int.from_bytes(b'\xFF\xFF\xFF\xFF\x01\x02\x03\x04', 'big')
18446744069431493380

big参数表示的是字节序，分大段（big），也是网络序，即高字节位于低地址，另一个是小端（little），即高字节在高地址，x86 CPU使用的就是小端字节序。文本代码中使用b开头的bytes对象表达式，都是big模式。

>>> int.from_bytes(b'\xF1\x02\x03\x04', 'big')
4043440900
>>> int.from_bytes(b'\xF1\x02\x03\x04', 'little')
67306225

int转bytes

int对象有一个成员函数 to_bytes，可以用来将自己转换成bytes对象。

>>> (123).to_bytes(4, 'big')
b'\x00\x00\x00{'
>>> (123).to_bytes(4, 'little')
b'{\x00\x00\x00'

to_bytes函数第1个参数表示得到的bytes对象的长度，第2个参数表示字节序。同样，这个转换函数也有signed参数。如果参数不对，会有OverflowError异常抛出：

>>> (168).to_bytes(1, 'big', signed=False)
b'\xa8'
>>> (168).to_bytes(2, 'big', signed=True)
b'\x00\xa8'
>>> (168).to_bytes(1, 'big', signed=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: int too big to convert

第1个参数length，其实可以用比较大的数，然后用前面的0x00来判断，这个int转换成bytes后，需要多少个字节：

>>> (123456789).to_bytes(10, 'big')
b'\x00\x00\x00\x00\x00\x00\x07[\xcd\x15'

bytes.hex和fromhex

bytes对象的hex函数，用来将bytes对象的值转换成hexstr；而fromhex函数，用来将hexstr导入bytes对象，相当于用hexstr来创建bytes对象。bytes对象的这两个函数，有的时候非常好用！

>>> bytes([0,1,2,3,4,5]).hex()
'000102030405'
>>> bytes.fromhex('000102030405')
b'\x00\x01\x02\x03\x04\x05'
>>> b'abcde'.hex()
'6162636465'
>>> a = bytes.fromhex('6162636465')
>>> a
b'abcde'

bytearray对象

python在处理底层字节序列的时候，有两个对象，bytes和bytearray。前者就是byte string，属于immutable对象，而后者，更像byte list，属于mutable对象。

在很多方面，bytearray都与byte string很相似，包括成员函数。一个很大的区别是，bytearray可以通过index索引，直接修改某个元素的值，就像修改list对象中某个元素一样，这就是mutable对象的特性！

>>> ba = bytearray([1,2,3,4,5,6,7,8])
>>> ba.hex()
'0102030405060708'
>>> ba[0]
1
>>> ba[0]=123
>>> ba.hex()
'7b02030405060708'