-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Open
Labels
🐞 BugSomething isn't workingSomething isn't working🩺 Needs TriageNeeds attention of maintainersNeeds attention of maintainers
Description
crawl4ai version
0.7.8
Expected Behavior
JsonCssExtractionStrategy.generate_schema() crashes with a JSON parsing error when LLMs (particularly Claude Sonnet) return valid JSON wrapped in markdown code blocks:
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The method directly attempts to parse the LLM response without handling common formatting issues like:
```json\n{...}\n``````\n{...}\n```
Current Behavior
The library should be resilient to common LLM response formats and successfully parse JSON even when wrapped in markdown code blocks. The parsing should handle these cases gracefully without requiring users to modify their code.
Is this reproducible?
Yes
Code snippets
import asyncio
from crawl4ai import (
CrawlerRunConfig,
AsyncWebCrawler,
BrowserConfig,
JsonCssExtractionStrategy,
LLMConfig,
)
from pprint import pprint
async def generate_schema():
browser_config = BrowserConfig(
headless=False,
)
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(
url="https://www.fastenal.com/product/details/925576347",
config=CrawlerRunConfig(
remove_overlay_elements=True,
remove_forms=True,
wait_for="div#pdp-details",
css_selector="div#pdp-details",
),
)
schema = JsonCssExtractionStrategy.generate_schema(
result.fit_html,
llm_config=LLMConfig(
provider="anthropic/claude-sonnet-4-5",
api_token="env:ANTHROPIC_API_KEY",
),
target_json_example={
"part_name": """7" Dia x NH Arbor 60+ Grit Coarse Ceramic Purple Fiber Disc""",
"fastenal_part_no": "925576347",
"manufacturer_part_no": "7100349674",
"unspsc": "31191506",
"manufacturer": "3M",
"brand": "CUBITRON",
"attachment_type": "GL",
"diameter": '7"',
"arbor_size": "NH",
"abrasive_material": "Ceramic",
"grade": "Coarse",
"grit": "60+",
"backing_material": "Fiber",
"coat_type": "Open",
"color": "Purple",
"type": "Fiber Disc",
"operating_speed": "8600 rpm",
"product_weight": ".1138",
"uom": "each",
"country_of_origin": "United States",
"origin_note": "Origin is subject to change",
},
)
return schema
asyncio.run(generate_schema())Metadata
Metadata
Assignees
Labels
🐞 BugSomething isn't workingSomething isn't working🩺 Needs TriageNeeds attention of maintainersNeeds attention of maintainers