Skip to main content

Recipe parser

Norish imports structured recipes through a parser service. By default, non-video URL imports go through the Python parser API first, scraping the page with a headless Chrome instance.

How it works

Structured imports use the recipe-scrapers library (wrapped by apps/parser-api). Page scraping uses headless Chrome at CHROME_WS_ENDPOINT — the chrome-headless service in the Quick start compose.

Settings

VariableDescriptionDefault
CHROME_WS_ENDPOINTPlaywright CDP WebSocket endpoint for scrapingws://chrome-headless:3000
PARSER_API_TIMEOUT_MSParser API timeout in milliseconds15000
LEGACY_RECIPE_PARSER_ROLLBACKRe-enable the deprecated legacy parserfalse

Content detection

Advanced overrides for how recipe content is detected. Most instances never need these.

VariableDescriptionDefault
UNITS_JSONOverride the units dictionary(empty)
CONTENT_INDICATORSOverride recipe-content indicator configuration(empty)
CONTENT_INGREDIENTSOverride ingredient-content configuration(empty)

Rollback to the legacy parser

If the parser API is unhealthy, you can temporarily switch structured imports back to the deprecated JSON-LD and microdata parser:

  1. Set LEGACY_RECIPE_PARSER_ROLLBACK=true and restart Norish. Structured imports now use the legacy parser.
  2. Once the parser API is healthy again, set LEGACY_RECIPE_PARSER_ROLLBACK=false and restart.