How Recipeas chooses recipes
Recipeas crawls over 2.5 million recipes from chefs and food bloggers around the web. Not all of them are equally good. This page explains, in plain language and with live numbers, how we decide which to surface in browse and which to keep behind search.
Two questions every recipe has to answer
Before a recipe even enters the catalog, our crawler asks: does it have a real photo? does it have at least 3 ingredients and 2 instructions? does the title look like a recipe and not a blog header? About 35% of what we scrape clears that bar — the rest gets discarded immediately.
The accepted recipes then get a discovery score from 0 to 100, which decides whether they show up in the browse feed. Recipes with low scores stay searchable — you can always find them by name or ingredient — they just don't lead the browse feed.
How the discovery score works
It's a small, transparent formula. We're not trying to be clever; we're trying to be honest about what makes a recipe worth recommending.
A few specific notes about that formula:
- The American bonus is a list of household names — Allrecipes, Food Network, Serious Eats, Bon Appétit, America's Test Kitchen, food.com, Smitten Kitchen, etc. It's a curated lift, not a punishment for everyone else.
- Foreign-language sites are not penalized for being foreign. A clean Russian recipe with a great photo and canonical ingredients scores the same as a clean American one minus the boost. What does penalize them is corrupted text (mojibake), missing photos, and ingredient names the parser couldn't recognize — all of which hurt recipes in any language.
- Search ignores all of this. If you can name what you want, you can find it.
Is it actually working? The data.
The whole point of the score is to put the bad-looking recipes in the hidden bucket and the good-looking ones in the shown bucket. The fastest test is to look at known quality signals — corrupted titles, ALL-CAPS shouting, broken image URLs — and compare the rates.
| Corrupted (mojibake) titles | 14.8% |
|---|---|
| ALL-CAPS titles | 8.4% |
| Suspicious image URLs | 0.7% |
| Corrupted (mojibake) titles | 0.0% |
|---|---|
| ALL-CAPS titles | 1.3% |
| Suspicious image URLs | 0.0% |
If the score weren't doing anything useful, those percentages would be similar between the two columns. Today the gap is roughly 1476× on mojibake — we're correctly funneling broken text away from the feed.
Which hosts lead each bucket
| www.allrecipes.com | 2,432 |
| www.food.com | 2,229 |
| www.americastestkitchen.com | 667 |
| www.povarenok.ru | 493 |
| www.justapinch.com | 461 |
| jamiegeller.com | 442 |
| www.greatbritishchefs.com | 441 |
| sunset.com | 427 |
| www.bbcgoodfood.com | 415 |
| www.delicious.com.au | 415 |
| www.povarenok.ru | 12,484 |
| www.cuisinelolo.fr | 1,863 |
| eatsmarter.de | 1,174 |
| www.culinar.ro | 865 |
| migusto.migros.ch | 728 |
| varecha.pravda.sk | 700 |
| pt.petitchef.com | 622 |
| culinariefy.com | 573 |
| www.kochbar.de | 567 |
| www.kotikokki.net | 525 |
Languages in the catalog
We accept recipes from chefs and food bloggers around the world. The app auto-translates titles and ingredient lines into English by default; a one-tap toggle in the recipe view flips back to the original.
| Language | Accepted recipes |
|---|---|
| en | 1,342,867 |
| (unknown) | 411,128 |
| ru | 118,381 |
| es | 274 |
| de | 133 |
| id | 103 |
| pt | 73 |
| zh-CN | 72 |
| it | 54 |
| hi-Latn | 45 |
What we're still working on
- About 14% of accepted titles still carry the mojibake corruption (UTF-8 read as Latin-1). We have a verified one-pass repair (99% accuracy on a 5,000-row sample); rolling it out across titles, descriptions, ingredient lines, and instructions.
- The photo size signal in the score is currently from a small HEAD-request pass; the full corpus warm is in progress and will sharpen the photo penalty as it completes.
- The canonical-ingredient catalog covers about 100 base ingredients; we keep adding the most-frequent gaps so more recipes parse cleanly on first ingest.