Replies: 6 comments 4 replies
-
That's a great question which I don't have a good answer to. The automated approach sounds overly ambitious. I made quite a few manual conversions and I found that recipes posted online aren't always formatted and structured well. Often they reference ingredients indirectly like "peel all the vegetables" or "add spices". With old-fashioned scrappers/parsers it won't work 100%. Hence, my only hope is to use machine learning for that. It's not my field and I don't really understand its capabilities (if someone can share their vision about using of machine learning here it would be great!). On that note, as more people share their recipes on GitHub at some point we can create thousands of recipes for training: we can generate "regular" recipes from Cooklang files where ingredients and method are separated. Manual approach is tedious, yeah. It's almost no effort if you want to convert a recipe or two but if you need to do the same for 700 is a different story. I tried a little experiment and used Mechanical Turk for that job. It worked, but results weren't satisfactory. Perhaps if we invest more time we can make it better, but it has disadvantage as one need to pay for each recipe. I'd add semi-automated (or semi-manual) approach to your list as well. We can read a recipe in schema format and do our best effort in its converting into Cooklang format. After that one needs to do manual corrections, but hopefully not many. |
Beta Was this translation helpful? Give feedback.
-
As I mentioned, a huge amount of the online recipe sites have schema.org formatted recipes. There's even Python repos like this or this to scrape this data. Once you have the data then it's a question of parsing it into another format if that's feasible. Have a look at the second link for a list of sites (probably not incomplete) that have their recipes formatted in the schema.org format. As you're I'm sure already aware, the problem with this path lies in that your structure is radically different to the schema.org structure (and that of many of the other recipe formats). I'd underline the inline ingredients as the biggest difference. Because of this, moving between the formats becomes quite the headache - even with some degree of automation. You can parse the steps, but won't always have an exact match with the ingredient column. Some will not be in the steps, some will be labelled slightly differently etc. I think this is no small problem. I'd assume that the majority of folks who will be potentially adopting this will already have a digital recipe collection of some sorts, I think the ability to migrate their notes to and from cooklang is nigh on essential if you want some uptake of the format. This doesn't have to be right now, but just possible, in the future. I think you have 2 options here:
I don't presume to make any big demands on your project, but personally - and I'd think quite a few others - the ability to migrate should be very high on the list of priorities. I don't know how far along you are with all of this but I'd have thought that the latter option would be less of a headache in the long run. I guess it's a matter of how important the migration point is to you. |
Beta Was this translation helpful? Give feedback.
-
FYI I made a project to move my notes from Paprika recipe manager to Markdown format. I've posted some basics and screenshots on the Obsidian forum. Obsidian is a Markdown notebook. Not FOSS, but works with .md files rather than keeping them in a database. I've put the project onto GitHub here. The most advanced template uses yaml frontmatter to display the recipes in this format. Incidentally, someone else created a yaml formatted recipe specification which is very similar to mine. This is mine, however: ---
name: "Absurdly Addictive Asparagus"
source: "Food52.com"
ingredients:
- [4.0, 'oz', 'pancetta', 'cut into 3/8 inch to 1/4 inch dice', '4 ounces pancetta, cut into 3/8 inch to 1/4 inch dice']
- [1.0, 'tbsp', 'butter', '', '1 tablespoon butter']
- [1.0, 'lb', 'asparagus', 'woody ends trimmed and sliced into 2 inch pieces on the bias', '1 pound asparagus, woody ends trimmed and sliced into 2 inch pieces on the bias']
- [1.25, 'cup', 'leek', '(white and pale green parts only) thinly sliced crosswise', '1 1/4 cups leek, thinly sliced crosswise (white and pale green parts only)']
- [2.0, '', 'cloves garlic', 'minced', '2 cloves garlic, minced']
- [1.0, '', 'lemon', 'zest of', '1 lemon']
- [1.0, 'tsp', 'orange zest', '', '1 teaspoon orange zest']
- [2.0, 'tbsp', 'toasted pine nuts', '', '2 tablespoons toasted pine nuts']
- [1.0, 'tbsp', 'Italian parsley', 'chopped - more to taste', '1 tablespoon Italian parsley, chopped - more to taste']
- [0, '', 'Salt', '', 'Salt']
- [0, '', 'freshly ground pepper', '', 'freshly ground pepper']
source: "Food52.com"
difficulty:
photo_thumbnail: _resources/A0A8C371-7E85-4EFC-AD07-89C9126432FD.jpg
image_url: https://images.food52.com/2eSoCv08CvlAoBS25X9BFkwiAIk=/1200x1200/71c708ab-489a-405f-8432-3ac949fa7551--2019-0723_absurdly-addictive-asparagus_3x2_rocky-luten_018.jpg
total_time:
notes: |
nutritional_info: []
description: |
rating: 0
prep_time:
created: 2021-02-08 14:08:41
directions: |
In a large non-stick pan, sauté pancetta, stirring frequently, over medium heat, until crisp and lightly golden.
Add 1 tablespoon of butter to pan. Add asparagus pieces and leek and sauté until asparagus is tender crisp, about 3-4 minutes.
Add garlic, lemon and orange zest, toasted pine nuts and parsley and sauté for about 1 minute, until fragrant. Season to taste with freshly ground pepper and salt and serve immediately.
categories: ['Food52', 'Vegetarian']
source_url: https://food52.com/recipes/4023-absurdly-addictive-asparagus
cook_time:
servings: [4]
scale: 2
tags:
- recipes/authors/food52
- recipes/vegetarian
photos:
author: |
Food52
--- Obsidian has a nice Dataview plugin that allows you to format the YAML frontmatter into a nicely formatted text. As the Dataview is programatically created, I can use the scale to increase the portion size. Scaled x2: |
Beta Was this translation helpful? Give feedback.
-
Here is a sample of the dataview JS dv.header(2, "Ingredients: ")
var i;
for (i=0; i < dv.current().ingredients.length; i++) {
var amount = dv.current().ingredients[i][0];
var unit = dv.current().ingredients[i][1];
var name = dv.current().ingredients[i][2];
var comment = dv.current().ingredients[i][3];
var raw = dv.current().ingredients[i][4];
if ( amount && name ) {
// console.log("Amount: " + amount + ", Unit: " + unit + ", Name: " + name)
// var servings = dv.current().servings
var scale = dv.current().scale
// servings[0] > 0 ? servings = dv.current().servings : servings = 1
if (unit) {
dv.paragraph("- " + (amount * scale).toFixed(1) + " " + unit + " " + name + " " + comment);
} else {
dv.paragraph("- " + (amount * scale).toFixed(1) + " " + name + " " + comment);
}
} else {
dv.paragraph("- " + raw);
}
} Don't know how good your JS is, but it's basically able to multiply the amount by the scale figure (if the amount figure exists). |
Beta Was this translation helpful? Give feedback.
-
Honestly this was a 3-4 days worth of work, but hopefully this has saved somebody else the job! The biggest headache was parsing a line of plain text ingredients into separate entities. I ended up using a Python library for this, but I still had to go through about 1/3 of my recipes to clean up the ingredients. There's a bit in the Readme.md about this. |
Beta Was this translation helpful? Give feedback.
-
@dubadub, hopefully this gives you some food for thought. And if you want me to create a |
Beta Was this translation helpful? Give feedback.
-
Just heard about this project, it's a really nice idea. I'd love to have an open standard for recipes, so I'd not have to worry about my data any more.
After a very brief look, my thoughts are, though I'd love to, how am I going to convert my existing recipes into Cooklang? There's nearly 700 of them. Second thought is, how will you get recipes that exist on the internet parsed into Cooklang?
My understanding is that a lot of them use an open standard from Schema.org. My current DB is in an app called Paprika. They both have the ingredients separate from the steps. I get the idea you have, you want to keep it as simple as possible. This means though, that we either:
a) Automated: have to parse the existing steps for our ingredients, and add the quantities there.
b) Manually: have to process the ingredients in the recipe steps by hand every time we find a decent recipe.
Manually transcribing my recipe archive into cooklang isn't going to happen, and I'd worry that the automated method might be a little fraught, though not inconceivable.
One of the magical things about Paprika, and various other recipe managers, is that the parsing is all automated. I get the need for simplicity, but when SO much of the data that's out there is significantly structurally different, the simplicity may actually get in the way of the functionality.
Not really an issue I guess, just wondering whether it was something you'd considered.
Beta Was this translation helpful? Give feedback.
All reactions