Skip to content

cleanlab/structured-output-benchmark

Repository files navigation

Structured Output Benchmark

Code to run the Structured Output Benchmark introduced in our article:

LLM Structured Output Benchmarks are Riddled with Mistakes

This benchmark contains four high-quality datasets which are formatted for you to easily evaluate Structured Outputs from different LLM models.

These benchmarks were created after we discovered that public Structured Output datasets contain substantial annotation errors, inconsistencies, and ambigities in their ground truth. To enable more reliable assessment of models, we provide four rigorously cleaned and validated benchmarks, along with scripts to format tasks, generate LLM responses, and evaluate their correctness.

Datasets

The datasets are hosted on HuggingFace.

Dataset Description Dataset Link Code Folder
Data Table Analysis Analyze CSV tables and extract structured metadata. https://huggingface.co/datasets/Cleanlab/data-table-analysis data_table_analysis/
Financial Entities Extraction Extract financial and contextual entities from business and financial text. https://huggingface.co/datasets/Cleanlab/fire-financial-ner-extraction financial_entities/
Insurance Claims Extraction Extract structured fields from insurance claim documents. https://huggingface.co/datasets/Cleanlab/insurance-claims-extraction insurance_claims/
PII Extraction Extract and classify different types of personally identifiable information (PII) from text. https://huggingface.co/datasets/Cleanlab/pii-extraction pii_extraction/

About

A Structured Output Benchmark whose 'ground-truth' is actually right

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published