跳到内容

pydantic_ai.usage

Usage dataclass

与请求或运行相关的 LLM 使用情况。

计算使用情况的责任在于模型;PydanticAI 仅对跨请求的使用情况信息求和。

您需要查阅您正在使用的模型的文档,以将使用情况转换为货币成本。

源代码位于 pydantic_ai_slim/pydantic_ai/usage.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
@dataclass
class Usage:
    """LLM usage associated with a request or run.

    Responsibility for calculating usage is on the model; PydanticAI simply sums the usage information across requests.

    You'll need to look up the documentation of the model you're using to convert usage to monetary costs.
    """

    requests: int = 0
    """Number of requests made to the LLM API."""
    request_tokens: int | None = None
    """Tokens used in processing requests."""
    response_tokens: int | None = None
    """Tokens used in generating responses."""
    total_tokens: int | None = None
    """Total tokens used in the whole run, should generally be equal to `request_tokens + response_tokens`."""
    details: dict[str, int] | None = None
    """Any extra details returned by the model."""

    def incr(self, incr_usage: Usage, *, requests: int = 0) -> None:
        """Increment the usage in place.

        Args:
            incr_usage: The usage to increment by.
            requests: The number of requests to increment by in addition to `incr_usage.requests`.
        """
        self.requests += requests
        for f in 'requests', 'request_tokens', 'response_tokens', 'total_tokens':
            self_value = getattr(self, f)
            other_value = getattr(incr_usage, f)
            if self_value is not None or other_value is not None:
                setattr(self, f, (self_value or 0) + (other_value or 0))

        if incr_usage.details:
            self.details = self.details or {}
            for key, value in incr_usage.details.items():
                self.details[key] = self.details.get(key, 0) + value

    def __add__(self, other: Usage) -> Usage:
        """Add two Usages together.

        This is provided so it's trivial to sum usage information from multiple requests and runs.
        """
        new_usage = copy(self)
        new_usage.incr(other)
        return new_usage

    def opentelemetry_attributes(self) -> dict[str, int]:
        """Get the token limits as OpenTelemetry attributes."""
        result = {
            'gen_ai.usage.input_tokens': self.request_tokens,
            'gen_ai.usage.output_tokens': self.response_tokens,
        }
        for key, value in (self.details or {}).items():
            result[f'gen_ai.usage.details.{key}'] = value
        return {k: v for k, v in result.items() if v}

requests class-attribute instance-attribute

requests: int = 0

向 LLM API 发出的请求数。

request_tokens class-attribute instance-attribute

request_tokens: int | None = None

请求处理中使用的 tokens。

response_tokens class-attribute instance-attribute

response_tokens: int | None = None

生成响应中使用的 tokens。

total_tokens class-attribute instance-attribute

total_tokens: int | None = None

整个运行中使用的 tokens 总数,通常应等于 request_tokens + response_tokens

details class-attribute instance-attribute

details: dict[str, int] | None = None

模型返回的任何额外详细信息。

incr

incr(incr_usage: Usage, *, requests: int = 0) -> None

就地增加使用量。

参数

名称 类型 描述 默认值
incr_usage 使用方法

要增加的使用量。

必需
requests int

除了 incr_usage.requests 之外,要增加的请求数。

0
源代码位于 pydantic_ai_slim/pydantic_ai/usage.py
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
def incr(self, incr_usage: Usage, *, requests: int = 0) -> None:
    """Increment the usage in place.

    Args:
        incr_usage: The usage to increment by.
        requests: The number of requests to increment by in addition to `incr_usage.requests`.
    """
    self.requests += requests
    for f in 'requests', 'request_tokens', 'response_tokens', 'total_tokens':
        self_value = getattr(self, f)
        other_value = getattr(incr_usage, f)
        if self_value is not None or other_value is not None:
            setattr(self, f, (self_value or 0) + (other_value or 0))

    if incr_usage.details:
        self.details = self.details or {}
        for key, value in incr_usage.details.items():
            self.details[key] = self.details.get(key, 0) + value

__add__

__add__(other: Usage) -> Usage

将两个 Usage 相加。

提供此功能是为了可以轻松地汇总来自多个请求和运行的使用情况信息。

源代码位于 pydantic_ai_slim/pydantic_ai/usage.py
50
51
52
53
54
55
56
57
def __add__(self, other: Usage) -> Usage:
    """Add two Usages together.

    This is provided so it's trivial to sum usage information from multiple requests and runs.
    """
    new_usage = copy(self)
    new_usage.incr(other)
    return new_usage

opentelemetry_attributes

opentelemetry_attributes() -> dict[str, int]

将 token 限制作为 OpenTelemetry 属性获取。

源代码位于 pydantic_ai_slim/pydantic_ai/usage.py
59
60
61
62
63
64
65
66
67
def opentelemetry_attributes(self) -> dict[str, int]:
    """Get the token limits as OpenTelemetry attributes."""
    result = {
        'gen_ai.usage.input_tokens': self.request_tokens,
        'gen_ai.usage.output_tokens': self.response_tokens,
    }
    for key, value in (self.details or {}).items():
        result[f'gen_ai.usage.details.{key}'] = value
    return {k: v for k, v in result.items() if v}

UsageLimits dataclass

模型使用量的限制。

请求计数由 pydantic_ai 跟踪,请求限制在每次向模型发出请求之前检查。Token 计数在模型的响应中提供,token 限制在每次响应后检查。

每个限制都可以设置为 None 以禁用该限制。

源代码位于 pydantic_ai_slim/pydantic_ai/usage.py
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
@dataclass
class UsageLimits:
    """Limits on model usage.

    The request count is tracked by pydantic_ai, and the request limit is checked before each request to the model.
    Token counts are provided in responses from the model, and the token limits are checked after each response.

    Each of the limits can be set to `None` to disable that limit.
    """

    request_limit: int | None = 50
    """The maximum number of requests allowed to the model."""
    request_tokens_limit: int | None = None
    """The maximum number of tokens allowed in requests to the model."""
    response_tokens_limit: int | None = None
    """The maximum number of tokens allowed in responses from the model."""
    total_tokens_limit: int | None = None
    """The maximum number of tokens allowed in requests and responses combined."""

    def has_token_limits(self) -> bool:
        """Returns `True` if this instance places any limits on token counts.

        If this returns `False`, the `check_tokens` method will never raise an error.

        This is useful because if we have token limits, we need to check them after receiving each streamed message.
        If there are no limits, we can skip that processing in the streaming response iterator.
        """
        return any(
            limit is not None
            for limit in (self.request_tokens_limit, self.response_tokens_limit, self.total_tokens_limit)
        )

    def check_before_request(self, usage: Usage) -> None:
        """Raises a `UsageLimitExceeded` exception if the next request would exceed the request_limit."""
        request_limit = self.request_limit
        if request_limit is not None and usage.requests >= request_limit:
            raise UsageLimitExceeded(f'The next request would exceed the request_limit of {request_limit}')

    def check_tokens(self, usage: Usage) -> None:
        """Raises a `UsageLimitExceeded` exception if the usage exceeds any of the token limits."""
        request_tokens = usage.request_tokens or 0
        if self.request_tokens_limit is not None and request_tokens > self.request_tokens_limit:
            raise UsageLimitExceeded(
                f'Exceeded the request_tokens_limit of {self.request_tokens_limit} ({request_tokens=})'
            )

        response_tokens = usage.response_tokens or 0
        if self.response_tokens_limit is not None and response_tokens > self.response_tokens_limit:
            raise UsageLimitExceeded(
                f'Exceeded the response_tokens_limit of {self.response_tokens_limit} ({response_tokens=})'
            )

        total_tokens = usage.total_tokens or 0
        if self.total_tokens_limit is not None and total_tokens > self.total_tokens_limit:
            raise UsageLimitExceeded(f'Exceeded the total_tokens_limit of {self.total_tokens_limit} ({total_tokens=})')

request_limit class-attribute instance-attribute

request_limit: int | None = 50

允许向模型发出的最大请求数。

request_tokens_limit class-attribute instance-attribute

request_tokens_limit: int | None = None

允许在向模型的请求中使用的最大 tokens 数。

response_tokens_limit class-attribute instance-attribute

response_tokens_limit: int | None = None

允许在来自模型的响应中使用的最大 tokens 数。

total_tokens_limit class-attribute instance-attribute

total_tokens_limit: int | None = None

允许在请求和响应中使用的 tokens 总数。

has_token_limits

has_token_limits() -> bool

如果此实例对 token 计数施加任何限制,则返回 True

如果此项返回 False,则 check_tokens 方法永远不会引发错误。

这很有用,因为如果我们有 token 限制,我们需要在接收到每个流式消息后检查它们。如果没有限制,我们可以跳过流式响应迭代器中的该处理。

源代码位于 pydantic_ai_slim/pydantic_ai/usage.py
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
def has_token_limits(self) -> bool:
    """Returns `True` if this instance places any limits on token counts.

    If this returns `False`, the `check_tokens` method will never raise an error.

    This is useful because if we have token limits, we need to check them after receiving each streamed message.
    If there are no limits, we can skip that processing in the streaming response iterator.
    """
    return any(
        limit is not None
        for limit in (self.request_tokens_limit, self.response_tokens_limit, self.total_tokens_limit)
    )

check_before_request

check_before_request(usage: Usage) -> None

如果下一个请求将超过 request_limit,则引发 UsageLimitExceeded 异常。

源代码位于 pydantic_ai_slim/pydantic_ai/usage.py
102
103
104
105
106
def check_before_request(self, usage: Usage) -> None:
    """Raises a `UsageLimitExceeded` exception if the next request would exceed the request_limit."""
    request_limit = self.request_limit
    if request_limit is not None and usage.requests >= request_limit:
        raise UsageLimitExceeded(f'The next request would exceed the request_limit of {request_limit}')

check_tokens

check_tokens(usage: Usage) -> None

如果使用量超过任何 token 限制,则引发 UsageLimitExceeded 异常。

源代码位于 pydantic_ai_slim/pydantic_ai/usage.py
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
def check_tokens(self, usage: Usage) -> None:
    """Raises a `UsageLimitExceeded` exception if the usage exceeds any of the token limits."""
    request_tokens = usage.request_tokens or 0
    if self.request_tokens_limit is not None and request_tokens > self.request_tokens_limit:
        raise UsageLimitExceeded(
            f'Exceeded the request_tokens_limit of {self.request_tokens_limit} ({request_tokens=})'
        )

    response_tokens = usage.response_tokens or 0
    if self.response_tokens_limit is not None and response_tokens > self.response_tokens_limit:
        raise UsageLimitExceeded(
            f'Exceeded the response_tokens_limit of {self.response_tokens_limit} ({response_tokens=})'
        )

    total_tokens = usage.total_tokens or 0
    if self.total_tokens_limit is not None and total_tokens > self.total_tokens_limit:
        raise UsageLimitExceeded(f'Exceeded the total_tokens_limit of {self.total_tokens_limit} ({total_tokens=})')