跳转到内容

pydantic_evals.generation

用于为 pydantic_evals 生成示例数据集的工具。

该模块提供了用于生成样本数据集以进行测试和示例的函数,使用 LLM 创建具有正确结构的真实测试数据。

generate_dataset async

generate_dataset(
    *,
    dataset_type: type[
        Dataset[InputsT, OutputT, MetadataT]
    ],
    path: Path | str | None = None,
    custom_evaluator_types: Sequence[
        type[Evaluator[InputsT, OutputT, MetadataT]]
    ] = (),
    model: Model | KnownModelName = "openai:gpt-4o",
    n_examples: int = 3,
    extra_instructions: str | None = None
) -> Dataset[InputsT, OutputT, MetadataT]

使用 LLM 生成一个测试用例数据集,每个测试用例包含输入、预期输出和元数据。

此函数创建一个具有指定输入、输出和元数据类型的结构正确的数据集。它使用 LLM 尝试生成符合类型模式的真实测试用例。

参数

名称 类型 描述 默认值
path Path | str | None

用于保存生成的数据集的可选路径。如果提供,数据集将被保存到此位置。

None
dataset_type type[Dataset[InputsT, OutputT, MetadataT]]

要生成的数据集的类型,具有所需的输入、输出和元数据类型。

必需
custom_evaluator_types Sequence[type[Evaluator[InputsT, OutputT, MetadataT]]]

要包含在模式中的自定义评估器类的可选序列。

()
model Model | KnownModelName

用于生成的 Pydantic AI 模型。默认为 'gpt-4o'。

'openai:gpt-4o'
n_examples int

要生成的示例数量。默认为 3。

3
extra_instructions str | None

提供给 LLM 的可选附加指令。

None

返回

类型 描述
Dataset[InputsT, OutputT, MetadataT]

一个结构正确的 Dataset 对象,包含生成的测试用例。

引发

类型 描述
ValidationError

如果 LLM 的响应无法解析为有效的数据集。

源代码位于 pydantic_evals/pydantic_evals/generation.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
async def generate_dataset(
    *,
    dataset_type: type[Dataset[InputsT, OutputT, MetadataT]],
    path: Path | str | None = None,
    custom_evaluator_types: Sequence[type[Evaluator[InputsT, OutputT, MetadataT]]] = (),
    model: models.Model | models.KnownModelName = 'openai:gpt-4o',
    n_examples: int = 3,
    extra_instructions: str | None = None,
) -> Dataset[InputsT, OutputT, MetadataT]:
    """Use an LLM to generate a dataset of test cases, each consisting of input, expected output, and metadata.

    This function creates a properly structured dataset with the specified input, output, and metadata types.
    It uses an LLM to attempt to generate realistic test cases that conform to the types' schemas.

    Args:
        path: Optional path to save the generated dataset. If provided, the dataset will be saved to this location.
        dataset_type: The type of dataset to generate, with the desired input, output, and metadata types.
        custom_evaluator_types: Optional sequence of custom evaluator classes to include in the schema.
        model: The Pydantic AI model to use for generation. Defaults to 'gpt-4o'.
        n_examples: Number of examples to generate. Defaults to 3.
        extra_instructions: Optional additional instructions to provide to the LLM.

    Returns:
        A properly structured Dataset object with generated test cases.

    Raises:
        ValidationError: If the LLM's response cannot be parsed as a valid dataset.
    """
    output_schema = dataset_type.model_json_schema_with_evaluators(custom_evaluator_types)

    # TODO(DavidM): Update this once we add better response_format and/or ResultTool support to Pydantic AI
    agent = Agent(
        model,
        system_prompt=(
            f'Generate an object that is in compliance with this JSON schema:\n{output_schema}\n\n'
            f'Include {n_examples} example cases.'
            ' You must not include any characters in your response before the opening { of the JSON object, or after the closing }.'
        ),
        output_type=str,
        retries=1,
    )

    result = await agent.run(extra_instructions or 'Please generate the object.')
    try:
        result = dataset_type.from_text(result.output, fmt='json', custom_evaluator_types=custom_evaluator_types)
    except ValidationError as e:  # pragma: no cover
        print(f'Raw response from model:\n{result.output}')
        raise e
    if path is not None:
        result.to_file(path, custom_evaluator_types=custom_evaluator_types)  # pragma: no cover
    return result