跳转到内容

pydantic_ai.usage

RequestUsage dataclass

基类:UsageBase

与单个请求关联的 LLM 使用情况。

这是 genai_prices.types.AbstractUsage 的一个实现,因此它可以用于使用 genai-prices 计算请求的价格。

源代码位于 pydantic_ai_slim/pydantic_ai/usage.py
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
@dataclass(repr=False, kw_only=True)
class RequestUsage(UsageBase):
    """LLM usage associated with a single request.

    This is an implementation of `genai_prices.types.AbstractUsage` so it can be used to calculate the price of the
    request using [genai-prices](https://github.com/pydantic/genai-prices).
    """

    @property
    def requests(self):
        return 1

    def incr(self, incr_usage: RequestUsage) -> None:
        """Increment the usage in place.

        Args:
            incr_usage: The usage to increment by.
        """
        return _incr_usage_tokens(self, incr_usage)

    def __add__(self, other: RequestUsage) -> RequestUsage:
        """Add two RequestUsages together.

        This is provided so it's trivial to sum usage information from multiple parts of a response.

        **WARNING:** this CANNOT be used to sum multiple requests without breaking some pricing calculations.
        """
        new_usage = copy(self)
        new_usage.incr(other)
        return new_usage

incr

incr(incr_usage: RequestUsage) -> None

就地增加使用量。

参数

名称 类型 描述 默认值
incr_usage RequestUsage

要增加的使用量。

必需
源代码位于 pydantic_ai_slim/pydantic_ai/usage.py
104
105
106
107
108
109
110
def incr(self, incr_usage: RequestUsage) -> None:
    """Increment the usage in place.

    Args:
        incr_usage: The usage to increment by.
    """
    return _incr_usage_tokens(self, incr_usage)

__add__

__add__(other: RequestUsage) -> RequestUsage

将两个 RequestUsage 对象相加。

提供此功能是为了方便地对响应中多个部分的使用信息进行求和。

警告: 这不能用于对多个请求进行求和,否则会破坏某些价格计算。

源代码位于 pydantic_ai_slim/pydantic_ai/usage.py
112
113
114
115
116
117
118
119
120
121
def __add__(self, other: RequestUsage) -> RequestUsage:
    """Add two RequestUsages together.

    This is provided so it's trivial to sum usage information from multiple parts of a response.

    **WARNING:** this CANNOT be used to sum multiple requests without breaking some pricing calculations.
    """
    new_usage = copy(self)
    new_usage.incr(other)
    return new_usage

RunUsage dataclass

基类:UsageBase

与代理(agent)运行关联的 LLM 使用情况。

计算请求使用量的责任在于模型;Pydantic AI 只是将所有请求的使用信息进行汇总。

源代码位于 pydantic_ai_slim/pydantic_ai/usage.py
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
@dataclass(repr=False, kw_only=True)
class RunUsage(UsageBase):
    """LLM usage associated with an agent run.

    Responsibility for calculating request usage is on the model; Pydantic AI simply sums the usage information across requests.
    """

    requests: int = 0
    """Number of requests made to the LLM API."""

    tool_calls: int = 0
    """Number of successful tool calls executed during the run."""

    input_tokens: int = 0
    """Total number of text input/prompt tokens."""

    cache_write_tokens: int = 0
    """Total number of tokens written to the cache."""

    cache_read_tokens: int = 0
    """Total number of tokens read from the cache."""

    input_audio_tokens: int = 0
    """Total number of audio input tokens."""

    cache_audio_read_tokens: int = 0
    """Total number of audio tokens read from the cache."""

    output_tokens: int = 0
    """Total number of text output/completion tokens."""

    details: dict[str, int] = dataclasses.field(default_factory=dict)
    """Any extra details returned by the model."""

    def incr(self, incr_usage: RunUsage | RequestUsage) -> None:
        """Increment the usage in place.

        Args:
            incr_usage: The usage to increment by.
        """
        if isinstance(incr_usage, RunUsage):
            self.requests += incr_usage.requests
            self.tool_calls += incr_usage.tool_calls
        return _incr_usage_tokens(self, incr_usage)

    def __add__(self, other: RunUsage | RequestUsage) -> RunUsage:
        """Add two RunUsages together.

        This is provided so it's trivial to sum usage information from multiple runs.
        """
        new_usage = copy(self)
        new_usage.incr(other)
        return new_usage

requests class-attribute instance-attribute

requests: int = 0

向 LLM API 发出的请求数量。

tool_calls class-attribute instance-attribute

tool_calls: int = 0

在运行期间成功执行的工具调用次数。

input_tokens class-attribute instance-attribute

input_tokens: int = 0

文本输入/提示(prompt)的总词元(token)数。

cache_write_tokens class-attribute instance-attribute

cache_write_tokens: int = 0

写入缓存的总词元(token)数。

cache_read_tokens class-attribute instance-attribute

cache_read_tokens: int = 0

从缓存中读取的总词元(token)数。

input_audio_tokens class-attribute instance-attribute

input_audio_tokens: int = 0

音频输入总词元(token)数。

cache_audio_read_tokens class-attribute instance-attribute

cache_audio_read_tokens: int = 0

从缓存中读取的音频总词元(token)数。

output_tokens class-attribute instance-attribute

output_tokens: int = 0

文本输出/补全(completion)的总词元(token)数。

details class-attribute instance-attribute

details: dict[str, int] = field(default_factory=dict)

模型返回的任何额外详情。

incr

incr(incr_usage: RunUsage | RequestUsage) -> None

就地增加使用量。

参数

名称 类型 描述 默认值
incr_usage RunUsage | RequestUsage

要增加的使用量。

必需
源代码位于 pydantic_ai_slim/pydantic_ai/usage.py
158
159
160
161
162
163
164
165
166
167
def incr(self, incr_usage: RunUsage | RequestUsage) -> None:
    """Increment the usage in place.

    Args:
        incr_usage: The usage to increment by.
    """
    if isinstance(incr_usage, RunUsage):
        self.requests += incr_usage.requests
        self.tool_calls += incr_usage.tool_calls
    return _incr_usage_tokens(self, incr_usage)

__add__

__add__(other: RunUsage | RequestUsage) -> RunUsage

将两个 RunUsage 对象相加。

提供此功能是为了方便地对多次运行的使用信息进行求和。

源代码位于 pydantic_ai_slim/pydantic_ai/usage.py
169
170
171
172
173
174
175
176
def __add__(self, other: RunUsage | RequestUsage) -> RunUsage:
    """Add two RunUsages together.

    This is provided so it's trivial to sum usage information from multiple runs.
    """
    new_usage = copy(self)
    new_usage.incr(other)
    return new_usage

Usage dataclass deprecated

基类:RunUsage

已弃用

Usage 已被弃用,请改用 RunUsage

RunUsage 的已弃用别名。

源代码位于 pydantic_ai_slim/pydantic_ai/usage.py
197
198
199
200
@dataclass(repr=False, kw_only=True)
@deprecated('`Usage` is deprecated, use `RunUsage` instead')
class Usage(RunUsage):
    """Deprecated alias for `RunUsage`."""

UsageLimits dataclass

对模型使用量的限制。

请求次数由 pydantic_ai 跟踪,并且在每次向模型发出请求之前都会检查请求限制。词元(token)计数在模型的响应中提供,并且在每次响应之后检查词元限制。

每个限制都可以设置为 None 来禁用该限制。

源代码位于 pydantic_ai_slim/pydantic_ai/usage.py
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
@dataclass(repr=False, kw_only=True)
class UsageLimits:
    """Limits on model usage.

    The request count is tracked by pydantic_ai, and the request limit is checked before each request to the model.
    Token counts are provided in responses from the model, and the token limits are checked after each response.

    Each of the limits can be set to `None` to disable that limit.
    """

    request_limit: int | None = 50
    """The maximum number of requests allowed to the model."""
    tool_calls_limit: int | None = None
    """The maximum number of successful tool calls allowed to be executed."""
    input_tokens_limit: int | None = None
    """The maximum number of input/prompt tokens allowed."""
    output_tokens_limit: int | None = None
    """The maximum number of output/response tokens allowed."""
    total_tokens_limit: int | None = None
    """The maximum number of tokens allowed in requests and responses combined."""
    count_tokens_before_request: bool = False
    """If True, perform a token counting pass before sending the request to the model,
    to enforce `request_tokens_limit` ahead of time. This may incur additional overhead
    (from calling the model's `count_tokens` API before making the actual request) and is disabled by default."""

    @property
    @deprecated('`request_tokens_limit` is deprecated, use `input_tokens_limit` instead')
    def request_tokens_limit(self) -> int | None:
        return self.input_tokens_limit

    @property
    @deprecated('`response_tokens_limit` is deprecated, use `output_tokens_limit` instead')
    def response_tokens_limit(self) -> int | None:
        return self.output_tokens_limit

    @overload
    def __init__(
        self,
        *,
        request_limit: int | None = 50,
        tool_calls_limit: int | None = None,
        input_tokens_limit: int | None = None,
        output_tokens_limit: int | None = None,
        total_tokens_limit: int | None = None,
        count_tokens_before_request: bool = False,
    ) -> None:
        self.request_limit = request_limit
        self.tool_calls_limit = tool_calls_limit
        self.input_tokens_limit = input_tokens_limit
        self.output_tokens_limit = output_tokens_limit
        self.total_tokens_limit = total_tokens_limit
        self.count_tokens_before_request = count_tokens_before_request

    @overload
    @deprecated(
        'Use `input_tokens_limit` instead of `request_tokens_limit` and `output_tokens_limit` and `total_tokens_limit`'
    )
    def __init__(
        self,
        *,
        request_limit: int | None = 50,
        tool_calls_limit: int | None = None,
        request_tokens_limit: int | None = None,
        response_tokens_limit: int | None = None,
        total_tokens_limit: int | None = None,
        count_tokens_before_request: bool = False,
    ) -> None:
        self.request_limit = request_limit
        self.tool_calls_limit = tool_calls_limit
        self.input_tokens_limit = request_tokens_limit
        self.output_tokens_limit = response_tokens_limit
        self.total_tokens_limit = total_tokens_limit
        self.count_tokens_before_request = count_tokens_before_request

    def __init__(
        self,
        *,
        request_limit: int | None = 50,
        tool_calls_limit: int | None = None,
        input_tokens_limit: int | None = None,
        output_tokens_limit: int | None = None,
        total_tokens_limit: int | None = None,
        count_tokens_before_request: bool = False,
        # deprecated:
        request_tokens_limit: int | None = None,
        response_tokens_limit: int | None = None,
    ):
        self.request_limit = request_limit
        self.tool_calls_limit = tool_calls_limit
        self.input_tokens_limit = input_tokens_limit or request_tokens_limit
        self.output_tokens_limit = output_tokens_limit or response_tokens_limit
        self.total_tokens_limit = total_tokens_limit
        self.count_tokens_before_request = count_tokens_before_request

    def has_token_limits(self) -> bool:
        """Returns `True` if this instance places any limits on token counts.

        If this returns `False`, the `check_tokens` method will never raise an error.

        This is useful because if we have token limits, we need to check them after receiving each streamed message.
        If there are no limits, we can skip that processing in the streaming response iterator.
        """
        return any(
            limit is not None for limit in (self.input_tokens_limit, self.output_tokens_limit, self.total_tokens_limit)
        )

    def check_before_request(self, usage: RunUsage) -> None:
        """Raises a `UsageLimitExceeded` exception if the next request would exceed any of the limits."""
        request_limit = self.request_limit
        if request_limit is not None and usage.requests >= request_limit:
            raise UsageLimitExceeded(f'The next request would exceed the request_limit of {request_limit}')

        input_tokens = usage.input_tokens
        if self.input_tokens_limit is not None and input_tokens > self.input_tokens_limit:
            raise UsageLimitExceeded(
                f'The next request would exceed the input_tokens_limit of {self.input_tokens_limit} ({input_tokens=})'
            )

        total_tokens = usage.total_tokens
        if self.total_tokens_limit is not None and total_tokens > self.total_tokens_limit:
            raise UsageLimitExceeded(  # pragma: lax no cover
                f'The next request would exceed the total_tokens_limit of {self.total_tokens_limit} ({total_tokens=})'
            )

    def check_tokens(self, usage: RunUsage) -> None:
        """Raises a `UsageLimitExceeded` exception if the usage exceeds any of the token limits."""
        input_tokens = usage.input_tokens
        if self.input_tokens_limit is not None and input_tokens > self.input_tokens_limit:
            raise UsageLimitExceeded(f'Exceeded the input_tokens_limit of {self.input_tokens_limit} ({input_tokens=})')

        output_tokens = usage.output_tokens
        if self.output_tokens_limit is not None and output_tokens > self.output_tokens_limit:
            raise UsageLimitExceeded(
                f'Exceeded the output_tokens_limit of {self.output_tokens_limit} ({output_tokens=})'
            )

        total_tokens = usage.total_tokens
        if self.total_tokens_limit is not None and total_tokens > self.total_tokens_limit:
            raise UsageLimitExceeded(f'Exceeded the total_tokens_limit of {self.total_tokens_limit} ({total_tokens=})')

    def check_before_tool_call(self, usage: RunUsage) -> None:
        """Raises a `UsageLimitExceeded` exception if the next tool call would exceed the tool call limit."""
        tool_calls_limit = self.tool_calls_limit
        if tool_calls_limit is not None and usage.tool_calls >= tool_calls_limit:
            raise UsageLimitExceeded(
                f'The next tool call would exceed the tool_calls_limit of {tool_calls_limit} (tool_calls={usage.tool_calls})'
            )

    __repr__ = _utils.dataclasses_no_defaults_repr

request_limit class-attribute instance-attribute

request_limit: int | None = request_limit

允许向模型发出的最大请求数。

tool_calls_limit class-attribute instance-attribute

tool_calls_limit: int | None = tool_calls_limit

允许执行的成功工具调用的最大数量。

input_tokens_limit class-attribute instance-attribute

input_tokens_limit: int | None = (
    input_tokens_limit or request_tokens_limit
)

允许的最大输入/提示(prompt)词元(token)数。

output_tokens_limit class-attribute instance-attribute

output_tokens_limit: int | None = (
    output_tokens_limit or response_tokens_limit
)

允许的最大输出/响应词元(token)数。

total_tokens_limit class-attribute instance-attribute

total_tokens_limit: int | None = total_tokens_limit

请求和响应中允许的词元(token)总数上限。

count_tokens_before_request class-attribute instance-attribute

count_tokens_before_request: bool = (
    count_tokens_before_request
)

如果为 True,则在将请求发送到模型之前执行一次词元(token)计数,以便提前强制执行 request_tokens_limit。这可能会产生额外的开销(因为在发出实际请求之前会调用模型的 count_tokens API),默认情况下是禁用的。

has_token_limits

has_token_limits() -> bool

如果此实例对词元(token)计数设置了任何限制,则返回 True

如果此方法返回 False,那么 check_tokens 方法将永远不会引发错误。

这很有用,因为如果我们有词元(token)限制,我们需要在收到每个流式消息后检查它们。如果没有限制,我们可以在流式响应迭代器中跳过该处理过程。

源代码位于 pydantic_ai_slim/pydantic_ai/usage.py
297
298
299
300
301
302
303
304
305
306
307
def has_token_limits(self) -> bool:
    """Returns `True` if this instance places any limits on token counts.

    If this returns `False`, the `check_tokens` method will never raise an error.

    This is useful because if we have token limits, we need to check them after receiving each streamed message.
    If there are no limits, we can skip that processing in the streaming response iterator.
    """
    return any(
        limit is not None for limit in (self.input_tokens_limit, self.output_tokens_limit, self.total_tokens_limit)
    )

check_before_request

check_before_request(usage: RunUsage) -> None

如果下一个请求将超过任何限制,则会引发 UsageLimitExceeded 异常。

源代码位于 pydantic_ai_slim/pydantic_ai/usage.py
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
def check_before_request(self, usage: RunUsage) -> None:
    """Raises a `UsageLimitExceeded` exception if the next request would exceed any of the limits."""
    request_limit = self.request_limit
    if request_limit is not None and usage.requests >= request_limit:
        raise UsageLimitExceeded(f'The next request would exceed the request_limit of {request_limit}')

    input_tokens = usage.input_tokens
    if self.input_tokens_limit is not None and input_tokens > self.input_tokens_limit:
        raise UsageLimitExceeded(
            f'The next request would exceed the input_tokens_limit of {self.input_tokens_limit} ({input_tokens=})'
        )

    total_tokens = usage.total_tokens
    if self.total_tokens_limit is not None and total_tokens > self.total_tokens_limit:
        raise UsageLimitExceeded(  # pragma: lax no cover
            f'The next request would exceed the total_tokens_limit of {self.total_tokens_limit} ({total_tokens=})'
        )

check_tokens

check_tokens(usage: RunUsage) -> None

如果使用量超过任何词元(token)限制,则会引发 UsageLimitExceeded 异常。

源代码位于 pydantic_ai_slim/pydantic_ai/usage.py
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
def check_tokens(self, usage: RunUsage) -> None:
    """Raises a `UsageLimitExceeded` exception if the usage exceeds any of the token limits."""
    input_tokens = usage.input_tokens
    if self.input_tokens_limit is not None and input_tokens > self.input_tokens_limit:
        raise UsageLimitExceeded(f'Exceeded the input_tokens_limit of {self.input_tokens_limit} ({input_tokens=})')

    output_tokens = usage.output_tokens
    if self.output_tokens_limit is not None and output_tokens > self.output_tokens_limit:
        raise UsageLimitExceeded(
            f'Exceeded the output_tokens_limit of {self.output_tokens_limit} ({output_tokens=})'
        )

    total_tokens = usage.total_tokens
    if self.total_tokens_limit is not None and total_tokens > self.total_tokens_limit:
        raise UsageLimitExceeded(f'Exceeded the total_tokens_limit of {self.total_tokens_limit} ({total_tokens=})')

check_before_tool_call

check_before_tool_call(usage: RunUsage) -> None

如果下一次工具调用将超过工具调用限制,则会引发 UsageLimitExceeded 异常。

源代码位于 pydantic_ai_slim/pydantic_ai/usage.py
343
344
345
346
347
348
349
def check_before_tool_call(self, usage: RunUsage) -> None:
    """Raises a `UsageLimitExceeded` exception if the next tool call would exceed the tool call limit."""
    tool_calls_limit = self.tool_calls_limit
    if tool_calls_limit is not None and usage.tool_calls >= tool_calls_limit:
        raise UsageLimitExceeded(
            f'The next tool call would exceed the tool_calls_limit of {tool_calls_limit} (tool_calls={usage.tool_calls})'
        )