Typing of Python

Last Updated: 2023-11-20 14:10:41 Monday

-- TOC --

动态语言的优势是极其的灵活,但严肃大型的项目,需要更多的是高安全性,高可靠性,更好的代码可读性和可维护性,更有信心的refactor。Typing就是给Python这种动态类型语言做静态类型标注的手段。有了typing信息,python代码会更安全可靠。

Function Annotations

在Python3刚开始的时候,就引入了Function Annotation,PEP3107。它与Typing有很多相似的地方,比如都是用__annotations__属性,代码形式基本一致。不同点在于,Function Annotation只定义了形式,未内容做要求,任何内容都可以,对内容的解释,完全交由第三发工具。比如:

def compile(source: "something compilable",
            filename: "where the compilable thing comes from",
            mode: "is this a single statement or a suite?"):
    ...

这个示例中的annotation信息全是字符串,可以有第三方工具提取出来,作为函数使用的help信息。

Type Hints & Variable Annotations

Type Hints,PEP484,就是使用与Function Annotation一样的方式,只是约束其内容为type信息。typing模块就此出现。

Variable Annotations,PEP526,定义如何对各种变量进行type注释。

函数和变量有了类型type信息后,就可以非常方便的使用第三方工具,比如mypy,进行运行前的静态检查,发现一些潜在的bug。动态语言的一个痛点,有些类型不匹配的bug,只有在代码运行到那里的时候,才会暴露出来。如果测试不小心没能覆盖到那部分代码,bug就泄漏出去了。做静态类型检查,可以发现这类bug,catches bugs in code without running it!代码中所有typing信息,都不会带来runtime overhead。

看到一个观点:严肃认真的Python项目,typing和unittest一样重要!不能很好的typing,正如不能很好的unittest一样,都可能暗示了可能需要重构!

mpyp和typing入门

有好些给Python代码做静态检查的工具,mypy是一般情况下最好的选择。

安装mypy:

$ pip3 install mypy

官网:http://mypy-lang.org/

必学入门资料《Type hints cheat sheet (Python 3)》:https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html#cheat-sheet-py3

下面的内容,大多来自这份cheat sheet,有一些自己的补充理解。

Variable

# This is how you declare the type of a variable
age: int = 1

# You don't need to initialize a variable to annotate it
a: int  # Ok (no value at runtime until assigned)

# Doing so is useful in conditional branches
child: bool
if age < 18:
    child = True
else:
    child = False

# For most types, just use the name of the type.
# Note that mypy can usually infer the type of a variable from its value,
# so technically these annotations are redundant
x: int = 1
x: float = 1.0
x: bool = True
x: str = "test"
x: bytes = b"test"

当对变量所赋值的类型,与typing类型不一致的时候,mypy会有错误提示。因此,虽然mypy说它可以自动推导出变量的类型,但在最开始对变量进行typing,依然是非常好的习惯!

# For collections on Python 3.9+, the type of the collection item is in brackets
x: list[int] = [1]
x: set[int] = {6, 7}

# For mappings, we need the types of both keys and values
x: dict[str, float] = {"field": 2.0}  # Python 3.9+

# For tuples of fixed size, we specify the types of all the elements
x: tuple[int, str, float] = (3, "yes", 7.5)  # Python 3.9+

# For tuples of variable size, we use one type and ellipsis
# 不定长度,但只能是int类型
x: tuple[int, ...] = (1, 2, 3)  # Python 3.9+

# On Python 3.8 and earlier, the name of the collection type is
# capitalized, and the type is imported from the 'typing' module
from typing import List, Set, Dict, Tuple
x: List[int] = [1]
x: Set[int] = {6, 7}
x: Dict[str, float] = {"field": 2.0}
x: Tuple[int, str, float] = (3, "yes", 7.5)
x: Tuple[int, ...] = (1, 2, 3)

容器类型使用[]来申明内部元素的类型,与创建对象区分开来。

from typing import Union, Optional

# On Python 3.10+, use the | operator when something could be one of a few types
x: list[int|str] = [3, 5, "test", "fun"]  # Python 3.10+
# On earlier versions, use Union
x: list[Union[int, str]] = [3, 5, "test", "fun"]

# Use Optional[X] for a value that could be None
# Optional[X] is the same as X|None or Union[X, None]
x: Optional[str] = "something" if some_condition() else None
# Mypy understands a value can't be None in an if-statement
if x is not None:
    print(x.upper())
# If a value can never be None due to some invariants, use an assert
assert x is not None
print(x.upper())

# 单纯的变量,如果可能出现多种类型:
y: int|str|None
y = 123
y = '123'
y = None

动态语言的特点,变量只是对象的引用,变量的值发生变化,引入不同的对象,类型也就发生了变化。因此,变量可能会有多种类型。显然新代码要使用|。对于没有typing的变量,默认类型为Any。

from typing import Any

# 不定长度的tuple,任意类型
t: tuple[Any, ...]

Any关键词很多时候可以省掉不写。

Function

from typing import Callable, Iterator, Union, Optional

# This is how you annotate a function definition
def stringify(num: int) -> str:
    return str(num)

# And here's how you specify multiple arguments
def plus(num1: int, num2: int) -> int:
    return num1 + num2

# If a function does not return a value, use None as the return type
# Default value for an argument goes after the type annotation
def show(value: str, excitement: int = 10) -> None:
    print(value + "!" * excitement)

 # Note that arguments without a type are dynamically typed (treated as Any)
 # and that functions without any annotations not checked
 # 如果只对返回值进行注释,mypy也没有检查!
 def untyped(x):
     x.anything() + 1 + "string"  # no errors

对函数参数的typing,与对变量的typing一样,只是对于返回值,要使用literal符号->。如果函数参数没有typing,mypy就不会对这个函数做检查。如果只typing了返回值,也不会做检查。上面那个显而易见的错误,难道要在运行时发现吗!

如果使用mypy的--strict或其它参数,mypy只是会提示函数缺少type annotations,Function is missing a type annotation,这并不是mypy在做具体的检查。

# This is how you annotate a callable (function) value
x: Callable[[int, float], float] = f
def register(callback: Callable[[str], int]) -> None: ...

# A generator function that yields ints is secretly just a function that
# returns an iterator of ints, so that's how we annotate it
def gen(n: int) -> Iterator[int]:
    i = 0
    while i < n:
        yield i
        i += 1

# You can of course split a function annotation over multiple lines
def send_email(address: Union[str, list[str]],
               sender: str,
               cc: Optional[list[str]],
               bcc: Optional[list[str]],
               subject: str = '',
               body: Optional[list[str]] = None
               ) -> bool:
    ...

# Mypy understands positional-only and keyword-only arguments
# Positional-only arguments can also be marked by using a name starting with
# two underscores
def quux(x: int, / *, y: int) -> None:
    pass

quux(3, y=5)    # Ok
quux(3, 5)      # error: Too many positional arguments for "quux"
quux(x=3, y=5)  # error: Unexpected keyword argument "x" for "quux"

# This says each positional arg and each keyword arg is a "str"
def call(self, *args: str, **kwargs: str) -> str:
    reveal_type(args)    # Revealed type is "tuple[str, ...]"
    reveal_type(kwargs)  # Revealed type is "dict[str, str]"
    request = make_request(*args, **kwargs)
    return self.do_api_query(request)

# 入参是dict类型时:
def test(d: dict[int,str]) -> int: ...

使用Callable时,如果需要Any,Any这个关键词就不能省掉了。

Class

class BankAccount:
    # The "__init__" method doesn't return anything, so it gets return
    # type "None" just like any other method that doesn't return anything
    def __init__(self, account_name: str, initial_balance: int = 0) -> None:
        # mypy will infer the correct types for these instance variables
        # based on the types of the parameters.
        self.account_name = account_name
        self.balance = initial_balance

    # For instance methods, omit type for "self"
    def deposit(self, amount: int) -> None:
        self.balance += amount

    def withdraw(self, amount: int) -> None:
        self.balance -= amount
# User-defined classes are valid as types in annotations
account: BankAccount = BankAccount("Alice", 400)
def transfer(src: BankAccount, dst: BankAccount, amount: int) -> None:
    src.withdraw(amount)
    dst.deposit(amount)

# Functions that accept BankAccount also accept any subclass of BankAccount!
class AuditedBankAccount(BankAccount):
    # You can optionally declare instance variables in the class body
    # 这里如果没有typing信息,就会是个语法错误
    # 当instance创建这个名称的变量是,类型错误好像要用--strict才能检查出来
    audit_log: list[str]
    # This is a class variable with a default value
    auditor_name: str = "The Spanish Inquisition"

    def __init__(self, account_name: str, initial_balance: int = 0) -> None:
        super().__init__(account_name, initial_balance)
        self.audit_log: list[str] = []

    def deposit(self, amount: int) -> None:
        self.audit_log.append(f"Deposited {amount}")
        self.balance += amount

    def withdraw(self, amount: int) -> None:
        self.audit_log.append(f"Withdrew {amount}")
        self.balance -= amount

audited = AuditedBankAccount("Bob", 300)
transfer(audited, account, 100)  # type checks!
# You can use the ClassVar annotation to declare a class variable
class Car:
    seats: ClassVar[int] = 4
    passengers: ClassVar[list[str]]

# If you want dynamic attributes on your class, have it
# override "__setattr__" or "__getattr__"
class A:
    # This will allow assignment to any A.x, if x is the same type as "value"
    # (use "value: Any" to allow arbitrary types)
    def __setattr__(self, name: str, value: int) -> None: ...

    # This will allow access to any A.x, if x is compatible with the return type
    def __getattr__(self, name: str) -> int: ...

a = A()
a.foo = 42  # Works
a.bar = 'Ex-parrot'  # Fails type checking

对@singledispatch的标注

singledispatch的base接口,需要包含所有register的接口第1个参数的类型。

from typing import Union

@singledispatch
def make_toc(lines: Union[list[str],str]) -> str:
    """Return the TOC contents."""
    return _make_toc(lines)[1]


@make_toc.register
def _(strlines: str) -> str:
    return _make_toc(strlines.split('\n'))[1]

这段代码来自toc4github项目,用来自动给Github的README.md生成TOC。

对Generator的标注

对generator有两种typing方法,使用Generator或Iterator,前者可以指定YieldType,SendType,ReturnType,后者只能指定YieldType。

class trafix():
    """ traffic exchanging class """

    def send_sk_nonblock_gen(self, sk: socket.socket) \
                    -> Generator[int, tuple[bytes|None,int], None]:
        """ socket nonblocking send generator """
        data = b''
        while True:
            bmsg, sid = yield len(data)
            if bmsg is not None:
                if self.x: bmsg = cx(bmsg)
                data += (len(bmsg)+8).to_bytes(4,'little') \
                                + sid.to_bytes(4,'big') \
                                + bmsg
            try:
                while True:
                    if len(data) == 0:
                        break
                    if (i:=sk.send(data[:SK_IO_CHUNK_LEN])) == -1:
                        raise ConnectionError('send_sk_nonblock_gen send -1')
                    data = data[i:]
            except BlockingIOError:
                continue

    def recv_sk_nonblock_gen(self, sk: socket.socket) \
                    -> Iterator[tuple[int|None,bytes,bytes]]:
        """ socket nonblocking recv generator,
            yield sid,type,msg """
        data = b''
        while True:
            try:
                d = sk.recv(SK_IO_CHUNK_LEN)
                if len(d) == 0:
                    raise ConnectionError('recv_sk_nonblock_gen recv 0')
                data += d
                while (dlen:=len(data)) > 4:
                    mlen = int.from_bytes(data[:4], 'little')
                    if dlen >= mlen:
                        sid = int.from_bytes(data[4:8], 'big')
                        msg = dx(data[8:mlen]) if self.x else data[8:mlen]
                        yield sid, msg[:1], msg[1:]
                        data = data[mlen:]
                    else:
                        break
            except BlockingIOError:
                yield None, b'\x00', b''

reveal_type&reveal_locals

如果对某个对象的类型到底是什么不确定,可以用mypy提供的接口reveal_typereveal_locals来检查。

$ cat tt.py
a = [1, 'abc']
reveal_type(a)

b = (1, None)
reveal_type(b)

$ mypy tt.py
tt.py:4: note: Revealed type is "builtins.list[builtins.object]"
tt.py:7: note: Revealed type is "Tuple[builtins.int, None]"
Success: no issues found in 1 source file

reveal_type不是一个runtime接口,只有mypy理解这个接口,最后要将这些代码从源文件中删除。

下面是reveal_locals的使用示例:

$ cat tt.py

def func():
    a: int = 1
    b: str = '123'
    c = [a,b]  # no type hint
    reveal_locals()

$ mypy tt.py
tt.py:7: note: Revealed local types are:
tt.py:7: note:     a: builtins.int
tt.py:7: note:     b: builtins.str
tt.py:7: note:     c: Any
Success: no issues found in 1 source file

stubs文件(.pyi)

typing标注可以放在独立的stub文件中,比如Python的标准库和各种流行的第三方库。.pyi文件,i可以理解为interface首字母。

Mypy uses the typeshed repository of type stubs (type definitions for a module in the style of a header file) to provide type data for both the Python standard library and dozens of popular libraries like requests, six, and sqlalchemy. Importantly, mypy is designed for gradually adding types; if type data for an import isn’t available, it just treats that import as being consistent with anything. 如果import的库没有typing信息,mypy会直接检查通过。

参考:https://github.com/python/typeshed

下面摘一段pyi文件中的定义:

class Exif(MutableMapping[int, Any]):
    endian: Incomplete
    bigtiff: bool
    def load(self, data: bytes) -> None: ...
    def load_from_fp(self, fp, offset: Incomplete | None = None) -> None: ...
    def tobytes(self, offset: int = 8) -> bytes: ...
    def get_ifd(self, tag: int): ...
    def hide_offsets(self) -> None: ...
    def __len__(self) -> int: ...
    def __getitem__(self, tag: int) -> Any: ...
    def __contains__(self, tag: object) -> bool: ...
    def __setitem__(self, tag: int, value: Any) -> None: ...
    def __delitem__(self, tag: int) -> None: ...
    def __iter__(self) -> Iterator[int]: ...

--disallow-untyped-defs

For example, suppose you want to make sure all functions within your codebase are using static typing and make mypy report an error if you add a dynamically-typed function by mistake. You can make mypy do this by running mypy with the --disallow-untyped-defs flag.

--strict

建议使用,包含--disallow-untyped-defs

False Positives

如果使用了--strict,可以通过如下方式,让mypy跳过没有typing的函数:

def func(*args, **kwargs): # type: ignore
    ...

还可以跳过某些代码行,比如:

 # server parameters
param = {'host': smtp,
        'port': port,
        'timeout': timeout}
# create server
if port in (25, 465, 587):
    if port == 465:
        server = smtplib.SMTP_SSL(**param)  # type: ignore

mypy不能正确处理unpacking。

Type Alias

给长长的typing取一个别名。

import socket
sk_t = socket.socket  # sk_t is a type now

Vector = list[float]
# or more explicitly
from typing import TypeAlias
Vector: TypeAlias = list[float]

本文链接:https://cs.pynote.net/sf/python/202211041/

-- EOF --

-- MORE --