recipe recommendation using ingredient networks · we then generated an ingredient list sorted by...
Post on 06-Aug-2020
12 Views
Preview:
TRANSCRIPT
Recipe recommendation using ingredient networks
Chun-Yuen TengSchool of InformationUniversity of MichiganAnn Arbor, MI, USA
chunyuen@umich.edu
Yu-Ru LinIQSS, Harvard University
CCS, Northeastern UniversityBoston, MA
yuruliny@gmail.com
Lada A. AdamicSchool of InformationUniversity of MichiganAnn Arbor, MI, USA
ladamic@umich.edu
ABSTRACTThe recording and sharing of cooking recipes, a human ac-tivity dating back thousands of years, naturally became anearly and prominent social use of the web. The resultingonline recipe collections are repositories of ingredient com-binations and cooking methods whose large-scale and vari-ety yield interesting insights about both the fundamentals ofcooking and user preferences. At the level of an individualingredient we measure whether it tends to be essential or canbe dropped or added, and whether its quantity can be modi-fied. We also construct two types of networks to capture therelationships between ingredients. The complement networkcaptures which ingredients tend to co-occur frequently, andis composed of two large communities: one savory, the othersweet. The substitute network, derived from user-generatedsuggestions for modifications, can be decomposed into manycommunities of functionally equivalent ingredients, and cap-tures users’ preference for healthier variants of a recipe. Ourexperiments reveal that recipe ratings can be well predictedwith features derived from combinations of ingredient net-works and nutrition information.
Categories and Subject DescriptorsH.2.8 [Database Management]: Database applications—Data mining
General TermsMeasurement; Experimentation
Keywordsingredient networks, recipe recommendation
1. INTRODUCTIONThe web enables individuals to collaboratively share knowl-
edge and recipe websites are one of the earliest examples ofcollaborative knowledge sharing on the web. Allrecipes.com,
the subject of our present study, was founded in 1997, yearsahead of other collaborative websites such as the Wikipedia.Recipe sites thrive because individuals are eager to sharetheir recipes, from family recipes that had been passed downfor generations, to new concoctions that they created thatafternoon, having been motivated in part by the ability toshare the result online. Once shared, the recipes are imple-mented and evaluated by other users, who supply ratingsand comments.
The desire to look up recipes online may at first appearodd given that tombs of printed recipes can be found inalmost every kitchen. The Joy of Cooking [12] alone con-tains 4,500 recipes spread over 1,000 pages. There is, how-ever, substantial additional value in online recipes, beyondtheir accessibility. While the Joy of Cooking contains asingle recipe for Swedish meatballs, Allrecipes.com hosts“Swedish Meatballs I”, “II”, and “III”, submitted by differentusers, along with 4 other variants, including “The Amaz-ing Swedish Meatball”. Each variant has been reviewed,from 329 reviews for “Swedish Meatballs I” to 5 reviewsfor “Swedish Meatballs III”. The reviews not only providea crowd-sourced ranking of the different recipes, but alsomany suggestions on how to modify them, e.g. using groundturkey instead of beef, skipping the “cream of wheat” be-cause it is rarely on hand, etc.
The wealth of information captured by online collabora-tive recipe sharing sites is revealing not only of the fun-damentals of cooking, but also of user preferences. The co-occurrence of ingredients in tens of thousands of recipes pro-vides information about which ingredients go well together,and when a pairing is unusual. Users’ reviews provide cluesas to the flexibility of a recipe, and the ingredients withinit. Can the amount of cinnamon be doubled? Can the nut-meg be omitted? If one is lacking a certain ingredient, can asubstitute be found among supplies at hand without a tripto the grocery store? Unlike cookbooks, which will containvetted but perhaps not the best variants for some individu-als’ tastes, ratings assigned to user-submitted recipes allowfor the evaluation of what works and what does not.
In this paper, we seek to distill the collective knowledgeand preference about cooking through mining a popularrecipe-sharing website. To extract such information, we firstparse the unstructured text of the recipes and the accom-panying user reviews. We construct two types of networksthat reflect different relationships between ingredients, inorder to capture users’ knowledge about how to combine in-gredients. The complement network captures which ingre-dients tend to co-occur frequently, and is composed of two
arX
iv:1
111.
3919
v3 [
cs.S
I] 2
1 M
ay 2
012
large communities: one savory, the other sweet. The sub-stitute network, derived from user-generated suggestions formodifications, can be decomposed into many communities offunctionally equivalent ingredients, and captures users’ pref-erence for healthier variants of a recipe. Our experimentsreveal that recipe ratings can be well predicted by featuresderived from combinations of ingredient networks and nu-trition information (with accuracy .792), while most of theprediction power comes from the ingredient networks (84%).
The rest of the paper is organized as follows. Section 2 re-views the related work. Section 3 describes the dataset. Sec-tion 4 discusses the extraction of the ingredient and comple-ment networks and their characteristics. Section 5 presentsthe extraction of recipe modification information, as well asthe construction and characteristics of the ingredient substi-tute network. Section 6 presents our experiments on reciperecommendation and Section 7 concludes.
2. RELATED WORKRecipe recommendation has been the subject of much
prior work. Typically the goal has been to suggest recipesto users based on their past recipe ratings [15][3] or brows-ing/cooking history [16]. The algorithms then find simi-lar recipes based on overlapping ingredients, either treat-ing each ingredient equally [4] or by identifying key ingre-dients [19]. Instead of modeling recipes using ingredients,Wang et al. [17] represent the recipes as graphs which arebuilt on ingredients and cooking directions, and they demon-strate that graph representations can be used to easily ag-gregate Chinese dishes by the flow of cooking steps and thesequence of added ingredients. However, their approach onlymodels the occurrence of ingredients or cooking methods,and doesn’t take into account the relationships between in-gredients. In contrast, in this paper we incorporate the like-lihood of ingredients to co-occur, as well as the potential ofone ingredient to act as a substitute for another.
Another branch of research has focused on recommend-ing recipes based on desired nutritional intake or promotinghealthy food choices. Geleijnse et al. [7] designed a proto-type of a personalized recipe advice system, which suggestsrecipes to users based on their past food selections and nutri-tion intake. In addition to nutrition information, Kamiethet al. [9] built a personalized recipe recommendation systembased on availability of ingredients and personal nutritionalneeds. Shidochi et al. [14] proposed an algorithm to extractreplaceable ingredients from recipes in order to satisfy users’various demands, such as calorie constraints and food avail-ability. Their method identifies substitutable ingredients bymatching the cooking actions that correspond to ingredientnames. However, their assumption that substitutable ingre-dients are subject to the same processing methods is less di-rect and specific than extracting substitutions directly fromuser-contributed suggestions.
Ahn et al. [1] and Kinouchi et al [10] examined networksinvolving ingredients derived from recipes, with the formermodeling ingredients by their flavor bonds, and the latterexamining the relationship between ingredients and recipes.In contrast, we derive direct ingredient-ingredient networksof both compliments and substitutes. We also step beyondcharacterizing these networks to demonstrating that theycan be used to predict which recipes will be successful.
3. DATASETAllrecipes.com is one of the most popular recipe-sharing
websites, where novice and expert cooks alike can uploadand rate cooking recipes. It hosts 16 customized interna-tional sites for users to share their recipes in their nativelanguages, of which we study only the main, English, ver-sion. Recipes uploaded to the site contain specific instruc-tions on how to prepare a dish: the list of ingredients, prepa-ration steps, preparation and cook time, the number of serv-ings produced, nutrition information, serving directions, andphotos of the prepared dish. The uploaded recipes are en-riched with user ratings and reviews, which comment onthe quality of the recipe, and suggest changes and improve-ments. In addition to rating and commenting on recipes,users are able to save them as favorites or recommend themto others through a forum.
We downloaded 46,337 recipes including all informationlisted from allrecipes.com, including several classifications,such as a region (e.g. the midwest region of US or Eu-rope), the course or meal the dish is appropriate for (e.g.:appetizers or breakfast), and any holidays the dish may beassociated with. In order to understand users’ recipe prefer-ences, we crawled 1,976,920 reviews which include reviewers’ratings, review text, and the number of users who voted thereview as useful.
3.1 Data preprocessingThe first step in processing the recipes is identifying the
ingredients and cooking methods from the freeform text ofthe recipe. Usually, although not always, each ingredientis listed on a separate line. To extract the ingredients, wetried two approaches. In the first, we found the maximalmatch between a pre-curated list of ingredients and the textof the line. However, this missed too many ingredients,while misidentifying others. In the second approach, weused regular expression matching to remove non-ingredientterms from the line and identified the remainder as the in-gredient. We removed quantifiers, such as e.g. “1 lb” or “2cups”, words referring to consistency or temperature, e.g.chopped or cold, along with a few other heuristics, such asremoving content in parentheses. For example “1 (28 ounce)can baked beans (such as Bush’s Original R©)” is identifiedas “baked beans”. By limiting the list of potential termsto remove from an ingredient entry, we erred on the sideof not conflating potentially identical or highly similar in-gredients, e.g. “cheddar cheese”, used in 2450 recipes, wasconsidered different from “sharp cheddar cheese”, occurringin 394 recipes.
We then generated an ingredient list sorted by frequencyof ingredient occurrence and selected the top 1000 commoningredient names as our finalized ingredient list. Each of thetop 1000 ingredients occurred in 23 or more recipes, withplain salt making an appearance in 47.3% of recipes. Theseingredients also accounted for 94.9% of ingredient entries inthe recipe dataset. The remaining ingredients were missedeither because of high specificity (e.g. yolk-free egg noodle),referencing brand names (e.g. Planters almonds), rarity (e.g.serviceberry), misspellings, or not being a food (e.g. “nylonnetting”).
The remaining processing task was to identify cookingprocesses from the directions. We first identified all heatingmethods using a listing in the Wikipedia entry on cooking[18]. For example, baking, boiling, and steaming are all ways
bake boil fry grill roast simmer marinate
midwestmountainnortheastwest coastsouth
method
% in
rec
ipes
010
2030
40
Figure 1: The percentage of recipes by region thatapply a specific heating method.
of heating the food. We then identified mechanical ways ofprocessing the food such as chopping and grinding, and otherchemical techniques such as marinating and brining.
3.2 Regional preferencesChoosing one cooking method over another appears to be
a question of regional taste. 5.8% of recipes were classifiedinto one of five US regions: Mountain, Midwest, Northeast,South, and West Coast (including Alaska and Hawaii). Fig-ure 1 shows significantly (χ2 test p-value < 0.001) varyingpreferences in the different US regions among 6 of the mostpopular cooking methods. Boiling and simmering, both in-volving heating food in hot liquids, are more common in theSouth and Midwest. Marinating and grilling are relativelymore popular in the West and Mountain regions, but in theWest more grilling recipes involve seafood (18/42 = 42%)relative to other regions combined (7/106 = 6%). Fryingis popular in the South and Northeast. Baking is a univer-sally popular and versatile technique, which is often used forboth sweet and savory dishes, and is slightly more popularin the Northeast and Midwest. Examination of individualrecipes reflecting these frequencies shows that these differ-ences in preference can be tied to differences in demograph-ics, immigrant culture and availability of local ingredients,e.g. seafood.
4. INGREDIENT COMPLEMENT NETWORKCan we learn how to combine ingredients from the data?
Here we employ the occurrences of ingredients across recipesto distill users’ knowledge about combining ingredients.
We constructed an ingredient complement network basedon pointwise mutual information (PMI) defined on pairs ofingredients (a, b):
PMI(a,b) = logp(a, b)
p(a)p(b),
where
p(a, b) =# of recipes containing a and b
# of recipes,
p(a) =# of recipes containing a
# of recipes,
p(b) =# of recipes containing b
# of recipes.
The PMI gives the probability that two ingredients occurtogether against the probability that they occur separately.Complementary ingredients tend to occur together far moreoften than would be expected by chance.
Figure 2 shows a visualization of ingredient complemen-tarity. Two distinct subcommunities of recipes are imme-diately apparent: one corresponding to savory dishes, theother to sweet ones. Some central ingredients, e.g. egg andsalt, actually are pushed to the periphery of the network.They are so ubiquitous, that although they have many edges,they are all weak, since they don’t show particular comple-mentarity with any single group of ingredients.
We further probed the structure of the complementaritynetwork by applying a network clustering algorithm [13].The algorithm confirmed the existence of two main clusterscontaining the vast majority of the ingredients. An interest-ing satellite cluster is that of mixed drink ingredients, whichis evident as a constellation of small nodes located near thetop of the sweet cluster in Figure 2. The cluster includesthe following ingredients: lime, rum, ice, orange, pineapplejuice, vodka, cranberry juice, lemonade, tequila, etc.
For each recipe we recorded the minimum, average, andmaximum pairwise pointwise mutual information betweeningredients. The intuition is that complementary ingredi-ents would yield higher ratings, while ingredients that don’tgo together would lower the average rating. We found thatwhile the average and minimum pointwise mutual informa-tion between ingredients is uncorrelated with ratings, themaximum is very slightly positively correlated with the av-erage rating for the recipe (ρ = 0.09, p-value < 10−10). Thissuggests that having at least two complementary ingredientsvery slightly boosts a recipe’s prospects, but having clashingor unrelated ingredients does not seem to do harm.
5. RECIPE MODIFICATIONSCo-occurrence of ingredients aggregated over individual
recipes reveals the structure of cooking, but tells us littleabout how flexible the ingredient proportions are, or whethersome ingredients could easily be left out or substituted. Anexperienced cook may know that apple sauce is a low-fat al-ternative to oil, or may know that nutmeg is often optional,but a novice cook may implement recipes literally, afraidthat deviating from the instructions may produce poor re-sults. While a traditional hardcopy cookbook would providefew such hints, they are plentiful in the reviews submittedby users who implemented the recipes, e.g. “This is a greatrecipe, but using fresh tomatoes only adds a few minutes tothe prep time and makes it taste so much better”, or anothercomment about the same salsa recipe“This is by far the bestrecipe we have ever come across. We did however change itjust a little bit by adding extra onion.”
As the examples illustrate, modifications are reported evenwhen the user likes the recipe. In fact, we found that 60.1%of recipe reviews contain words signaling modification, suchas “add”, “omit”, “instead”, “extra” and 14 others. Further-more, it is the reviews that include changes that have a sta-tistically higher average rating (4.49 vs. 4.39, t-test p-value< 10−10), and lower rating variance (0.82 vs. 1.05, Bartletttest p-value < 10−10), as is evident in the distribution ofratings, shown in Fig. 3. This suggests that flexibility inrecipes is not necessarily a bad thing, and that reviewerswho don’t mention modifications are more likely to think ofthe recipe as perfect, or to dislike it entirely.
cherry gelatin
graham cracker
low fat cottage cheese
pork shoulder roast
heavy whipping cream tofu
bok choy
butter cracker
baking soda
pimento pepper
milk powder
chorizo sausage
lady�nger
steak sauce
crimushroom
radishe
shiitake mushroom
pesto
brownie mix
pumpkin pie spice
rye �our
cardamom
sa�ron thread
linguine
corn
fat free sour cream
basmati rice
bittersweet chocolate
bay
corn chip
cracker
french green bean
poppy seed
vegetable oil
grape tomato
pizza crust dough
low sodium beef broth
club soda
lard
soy saucepanko bread
couscou
crab meat
mango
unpastry shell
catalina dressing
pasta shell
italian salad dressing
mexican corn
decorating gel
italian bread
napa cabbage
onion powder
white wine vinegar
cocktail rye bread
basil sauce
crouton
brown gravy mix
barbeque sauce
apple cider vinegar
hoagie roll
milk chocolate candy kisse
�ounder
salt black pepper
maraschino cherry juice
chow mein noodle
tiger prawn
banana pepper
cranberry
vermicelli pasta
root beer
strawberry jam
lemon gelatin mix
creamed corn
pretzel
pie shell
sun�ower kernel
rump roast
romaine
vegetable stock
lemon pepper seasoning
guacamole
louisiana hot sauce
cabbage
yellow onion
super�ne sugar
orange peel
raspberry
cumin seedcandied mixed fruit peel
cream of coconut
bow tie pasta
creme fraiche
currant
pork chop
turkey gravy
fat free half and half
chicken ramen noodle
wooden skewer
whipping cream
mace
seasoning salt
mozzarella cheesepasta sauce
lean pork
broccoli �oweret
tomatillo
lemonade
tomato paste
caesar dressing
basil pesto
melon liqueur
coconut milk
whole wheat pastry �our
muenster cheese
lump crab meat
angel food cake
ring
cheese tortellini
spiral pasta
vanilla pudding
cauli�oweret
smoked sausage
hot dog
pita bread
cocoa powder
garbanzo bean
tart apple
wheat bran
hot pepper sauce
chili
refried bean
salmon steak
white cheddar cheese
low fat mayonnaise
grapefruit
dijon mustard
tomato juice
yellow squash
baking apple
cream of tartar
vodka
rye bread
white chip
�at iron steak
linguine pasta
fennel
whole wheat bread
baking mix
alfredo pasta sauce
margarine
confectioners' sugarfruit gelatin mix
pork
balsamic vinegar
pork loin chop
jicama
pre pizza crust
triple sec
teriyaki sauce
cola carbonated beverage
polish sausage
cracked black pepper
poblano chile pepper
individually wrapped caramel
roast beef
bread stu�ng mix
eggnog
pear
caramel
beet
worcestershire sauce
chicken stock
horseradish
semisweet chocolate chip
basil
red grape
plum
cinnamon sugar
fajita seasoning
rice noodle
powdered milk
star anise pod
short grain rice
ramen noodle
vegetable
coconut oil
whiskey
lime gelatin mix
peanut oil
ham
ginger root
lima bean
pimento stu�ed green olive
hoisin sauce
round steak
stu�ng
part skim ricotta cheese
broiler fryer chicken up
milk chocolate chip
turbinado sugar
vegetable shortening
tarragon vinegar
golden delicious apple
turkey
rigatoni pasta
stu�ng mix
milk
juiced
burgundy wine
red kidney bean
dill
candied pineapple
german chocolate cake mix
arborio rice
sugar free vanilla pudding mix
pine nut
green apple
cucumber oreganopearl onion
stu�ed green olive
whipped topping mix
broccoli
pinto bean
pasta
beef short rib
gelatin
garlic powder
rutabaga
chicken liver
pepperjack cheese
herb
lemon gras
sweet potato
pineapple ring
parsley �ake
pie �lling
spice cake mix
butterscotch chip
greek yogurt
vanilla ice cream
seafood seasoning
parsnip
applesauce
chinese �ve spice powder
salt pepper
beef broth
cherry tomato
sage
vanilla
vital wheat gluten
artichoke heart
mixed berry
bacon dripping
self rising �our
nilla wafer
navy bean
bacon
egg yolk
wonton wrapper
chocolate pudding mix
salsa
coconut
tomato based chili sauce
marsala wine
mussel
manicotti shell
anise extract
mustard seed
nutmeg
cayenne pepper
black bean pepperokra
asparagu
mustard powder
�rmly brown sugar
balsamic vinaigrette dressing
chicken breastoyster
ditalini pasta
old bay seasoning tm
brown rice
process american cheese
chocolate
miso paste
pineapple
iceberg lettuce
pearl barley
oat
greek seasoning
biscuit
clove
browning sauce
chicken bouillon powder
green pea
bread dough
cream cheesepeanut butter chip
silken tofu
pineapple chip
sea scallop
ricotta cheese
papaya
red cabbage
egg substitute
zesty italian dressing
devil's food cake mix
bagel
sour mix
lamb
irish stout beer
sea salt
romaine lettuce
kalamata olive
salt
monosodium glutamate
rice wine
white potato
rum extract
grape jelly
crescent roll dough
beer
phyllo dough
fettuccine pasta
chili seasoning mix
biscuit mix
candy coated chocolate
green cabbage
ranch bean
cream of celery soup
apple pie �lling
caper
nectarine
white mushroom
banana
orange gelatin mix
1% buttermilk
apple jelly
dinner roll
sugar pumpkin
salad green
shrimp
cheese ravioli
chicken wing
sour cream
saltine
cornmeal
mixed vegetable
beef tenderloin
sherry
rotini pasta
mexican cheese blend
kosher salt black pepper
mayonnaise
lobster
white onion
chocolate cookie
white bread
french baguette
bread
vanilla frosting
anise seed
ranch dressing mix
wild rice
hot
canadian bacon
corn�akes cereal
wax bean
cantaloupe
non fat yogurt
lite whipped topping
spaghetti squash
egg roll wrapper
solid pack pumpkin
recipe pastry
asafoetida powder
co�ee powder
italian sauce
amaretto liqueur
shortening
turmeric
semolina �our
pomegranate juice
corned beef
skewer
shallotspanish onion
tapioca
provolone cheese
chile sauce
vanilla bean
chile pepperangel hair pasta
pumpkin
tilapia
brie cheese
cottage cheese
banana liqueur
lemon
smoked salmon
ginger paste
brown mustard
peanut butter
escarole
sour milk
olive oil
country pork rib
pastry shell
adobo seasoning
candy coated milk chocolate
curryghee
alfredo sauce
yellow cake mix
granny smith apple
beef chuck
chocolate hazelnut spread
maple syrup
squid
gingersnap cooky
raspberry gelatin
molasse
lemon cake mix
�sh stock
cook
grenadine syrup
pu� pastry
rum
grapefruit juice
tahini
black pepperbutternut squash
key lime juice
sirloin steak
macaroni
butter shortening
brown lentil
chicken broth
chili bean
pickling spice
yellow food coloring
great northern bean
mixed nut
green chile
salmon
english mu�n
co�ee liqueur
non fat milk powder
buttermilk
distilled white vinegar
golden syrup
powdered fruit pectin
green chily
grape
raspberry gelatin mix
low fat sour cream
topping
pineapple juice
red lettuce
orange zest
ketchup
chunk chicken
steak seasoning
sandwich roll
crystallized ginger
kosher salt
roma tomato
red bean
red candied cherry
sesame seed
beef stock
cashew
popped popcorn
apricot nectar
any fruit jam
processed cheese food
red pepper
coleslaw mix
white cake mix
cherry pie �lling
canola oil
whole wheat �our
honey
long grain
marinara sauce
yellow summer squash
to�ee baking bit
whole milk
trout
onion separated
low fat cream cheese
corn oil
oat bran
cream of potato soup
allspice berry
mandarin orange
cumin
saltine cracker
swiss chard
fenugreek seed
�sh sauce
eggplant
baby corn
cider vinegar
orange sherbet
debearded
beef bouillon
kernel corn
vanilla vodka
chicken leg quarter
mintfeta cheese
lime juice
raspberry jam
cooking oil
white corn
herb stu�ng mix
lemon lime soda
pork sausage
ziti pasta
orange marmalade
yogurt
bean
ginger garlic paste
crescent dinner roll
scallop
walnut oil
smoked ham
red food coloring
triple sec liqueur
fat free evaporated milk
walnutbaking chocolate
blueberry
caramel ice cream topping
bacon grease
fat free italian dressing
steak
�g
miracle whip ‚Ñ
potato starch
luncheon meat
brandy based orange liqueur
smoked paprika
pu� pastry shell
raspberry preserve
apple butter
tomato sauce
white rice
beef stew meat
taco seasoning mix
date
whipped topping
marshmallow
co�ee
butterscotch schnapp
red wine vinegar
orange
chicken thigh
mild italian sausage
blueberry pie �lling
yeast
lime peel
rice �our
chocolate cake mix
barbecue sauce
monterey jack cheese
halibut
beef round steak
seed
sour cherry
pork sparerib
orange roughy
barley nugget cereal
leek
maraschino cherry
chickpea
fettuccini pasta
orange juice
blue cheese dressing
yam
garam masala
black eyed pea
penne pasta
serrano chile pepper
�ourchive
marjoram
herb stu�ng
beef sirloin
beef
maple extract
bamboo shoot
lemon extract
meat tenderizer
kielbasa sausage
low sodium chicken broth
asparagus
cod
italian seasoning
lime gelatin
vegetable bouillon
andouille sausage
collard green
blackberry
beef gravy
green grape
tamari
fruit
malt vinegar
strawberry gelatin
lemon gelatin
green olive
poultry seasoning
prune
beef consomme
chili powder
dressing
fennel seed
gruyere cheese
jellied cranberry sauce
chipotle pepper
vanilla extract
apricot
linguini pasta
cranberry sauce
port wine
process cheese
cornish game hen
cilantro
green chile pepper
wheat
bread machine yeast
tube pasta
biscuit baking mix
cream corn
spinach
low fat whipped topping
irish cream liqueur
candy
zucchini
mild cheddar cheese
orange gelatin cornstarch
cheese
snow pea
low fat margarine
green candied cherry
vermouth
brandy
white grape juice
corn bread mix
broccoli �oret
vidalia onion
cocktail sauce
pickled jalapeno pepper
beaten egg
hamburger bun
black walnut
dill pickle juice
dill pickle relish
habanero pepper
white chocolate chip
veal
powdered non dairy creamer
lasagna noodle
gingerapricot jam
imitation crab meat
chicken soup base
white bean
tarragon
onion soup mix
thousand island dressing
red lentil
pancake mix
wheat germ
fat free mayonnaise
yukon gold potato
long grain rice
carrot
cauli�ower �oret
vegetable cooking spray
craw�sh tail
peppermint extract
brussels sprout
onion salt buttermilk biscuit
white kidney bean
mango chutney
black olive
meatless spaghetti sauce
curry powder
coriander
red snapper
biscuit dough
sausage
cheddar cheese soup
lettuce
pork loin roast
lemon pepper
red curry paste
egg noodle
hot sauce
raspberry vinegar
butter cooking spray
peach schnapp
eggspicy pork sausage
mixed fruit
cat�sh
venison
yellow pepper
carbonated water
pumpkin seed
new potato
lemon juice
chocolate pudding
watermelon
chicken breast half
gorgonzola cheese
buttery round cracker
apple pie spice
process cheese sauce
jasmine rice
lemon pudding mix
cooking sherry
strawberry preserve
french bread
toothpick
sauce
corn tortilla chip
garlic paste
salt free seasoning blend
elbow macaronipickle
cream of chicken soup
cardamom pod
persimmon pulp
chicken
liquid smoke
cocoa
pound cake
bell pepper
food coloring
coconut extract
chocolate chip
berry cranberry sauce
red bell pepper
seashell pasta
american cheese
oatmeal
sourdough bread
cornbread
mixed salad green
arugula
oil
parmesan cheeseclam juice
brick cream cheesecereal
italian parsley
milk chocolate
rice wine vinegar
hot dog bun
pistachio pudding mix
curd cottage cheese
garlic salt
chocolate cookie crust
orange extract
cream of mushroom soup
sa�ron
mushroom
tortilla chip
white hominy
green beans snapped
dill pickle
french onion soup
skim milk
tequila
�ax seed
low fat cheddar cheese
red wine
nut
apple cider
candied cherry
cheddar cheese
gingerroot
chocolate frosting
low fat yogurt
peppercorn
pepperoni
artichoke
baby pea
crisp rice cereal
potato chip
coconut cream
angel food cake mix
onion �ake
salad shrimp
taco seasoning
champagne
peach
low fat
yellow cornmeal
pork roast
baby spinach
portobello mushroom cap
blue cheese
strawberry gelatin mix
pink lemonade
chestnut
strawberry
oyster sauce
sugar snap pea
ka�r lime
anchovy
stu�ed olive
herb bread stu�ng mix
half and half
serrano pepper
coconut rum
red apple
cherry
�ank steak
round
peppermint candy
butter bean
almond
white vinegarcelery seed
corn syrupfat free cream cheese
cannellini bean
clam
mustard
scallion
potato �ake
parsley
fat free yogurt
pita bread round
red pepper �ake
onion
bourbon whiskey
creme de menthe liqueur
golden raisin
pancetta bacon
apple juice
egg white
fontina cheese
kale
asiago cheese
spiced rum
farfalle pasta
lobster tail
mirin
leg of lamb
tomato
zested
sauerkraut
unpie crust
bourbon
lean beef
tuna steak
wild rice mix
raisin
chocolate syrup
juice
cajun seasoning
cauli�owerwaterlemon yogurt
tapioca �our
vanilla yogurt
pimiento
hazelnut liqueur
thyme
part skim mozzarella cheese
mandarin orange segment
cinnamon
corn tortilla
crispy rice cereal
colby monterey jack cheese
apricot preserve
chipotle chile powder
swiss cheese
white wine
baking powdergraham cracker crust
vanilla wafer
lime
sugar based curing mixture
cream cheese spread
celeryolive
simple syrup
asian sesame oil
bacon bit
sharp cheddar cheese
rice vinegar
sea salt black pepper
curry paste
beef chuck roast
butter extract
pork loin
ginger ale
chicken leg
adobo sauce
lime zest
ham hock
watercres
pastry
seasoning
lentil
mascarpone cheese
baker's semisweet chocolate
acorn squash
chunk chicken breast
pepperoni sausage
brown sugar
fusilli pasta
kaiser roll
red delicious apple
honey mustard
unbleached �our
vinegar
spicy brown mustard
chuck roast
candied citron
vegetable combination
beef �ank steak
red chile pepper
avocado
quinoa
cake �our
whole wheat tortilla
dill seed
turnip
vegetable broth
sugarsugar cookie mix neufchatel cheese
coriander seed
apple
vegetable soup mix
chocolate sandwich cooky
colby cheese
sourdough starter
green bean
pecansoftened butter matzo meal
hash brown potato
vanilla pudding mix
pickle relish
noodlered potato
white chocolate
pistachio nut
green food coloring
lemon zest
chutney
splenda
buttermilk baking mix
caraway seedmaple �avoring
taco sauce
chili oil
kiwi
lean turkey
garlic
golden mushroom soup
grit
chili sauce
rosemary
green salsa
corkscrew shaped pasta
marshmallow creme
enchilada sauce
baby carrot
savory
cinnamon red candy
corn mu�n mix
black peppercorn
green bell pepperwater chestnut
french dressing
almond extract
rose water
paprika
english cucumber
nutritional yeast
unpie shell
ears corn
cream of shrimp soup
plum tomato
bratwurst
green lettuce
lemon lime carbonated beverage
ice
creole seasoning
grape juice
italian sausage
pizza crust
orzo pasta
white rum
crescent roll
italian cheese blend
rhubarb
chicken bouillon
prosciutto
cream
red onion
marinated artichoke heart
jalapeno chile pepper
tater tot
pork tenderloin
spaghetti
gin
semisweet chocolate
pie crust
cooking spray
spaghetti sauce
bread �our
butterscotch pudding mix
romano cheese
bulgur
hungarian paprika
white balsamic vinegar
picante sauce
meatball
tuna
chili without bean
bean sprout
baking cocoa
chile paste
butter
yellow mustard
haddock
sun�ower seed
processed american cheese
russet potato
allspice
giblet
button mushroom
peanut
kidney bean
portobello mushroom
ranch dressing
almond paste
hazelnut
beef brisket
sake
fruit cocktail
beef sirloin steak
pimento
honeydew melon
low fat milk
salami
german chocolate
pizza sauce
green tomato
orange liqueur
celery salt
chocolate mix
cranberry juice
white pepper
barley
soy milk
sweet
poblano pepper
macadamia nut
goat cheese
tomato soup
tea bag
mixed spice
low fat peanut butter
turkey breast
lemon peel
tomato vegetable juice cocktail
jalapeno pepper
low sodium soy sauce
processed cheese
limeade
arti�cial sweetener
sesame oil
heavy cream
fat free chicken broth
pork shoulder
evaporated milk
corn�ake
bay scallop
chocolate waferwhite sugar
rapid rise yeast potato
�our tortilla
chicken drum
chocolate ice cream
pepper jack cheese
baking potato
italian dressing mix
Figure 2: Ingredient complement network. Two ingredients share an edge if they occur together more thanwould be expected by chance and if their pointwise mutual information exceeds a threshold.
1 2 3 4 5
rating
prop
ortio
n of
revi
ews
with
giv
en ra
ting
0.0
0.1
0.2
0.3
0.4
0.5
0.6
no modificationwith modification
Figure 3: The likelihood that a review suggests amodification to the recipe depends on the star ratingthe review is assigning to the recipe.
In the following, we describe the recipe modifications ex-tracted from user reviews, including adjustment, deletionand addition. We then present how we constructed an in-gredient substitute network based on the extracted informa-tion.
5.1 AdjustmentsSome modifications involve increasing or decreasing the
amount of an ingredient in the recipe. In this and the fol-lowing analyses, we split the review on punctuation suchas commas and periods. We used simple heuristics to de-tect when a review suggested a modification: adding/usingmore/less of an ingredient counted as an increase/decrease.Doubling or increasing counted as an increase, while reduc-ing, cutting, or decreasing counted as a decrease. While it islikely that there are other expressions signaling the adjust-ment of ingredient quantities, using this set of terms allowed
us to compare the relative rate of modification, as well asthe frequency of increase vs. decrease between ingredients.The ingredients themselves were extracted by performing amaximal character match within a window following an ad-justment term.
Figure 4 shows the ratios of the number of reviews sug-gesting modifications, either increases or decreases, to thenumber of recipes that contain the ingredient. Two patternsare immediately apparent. Ingredients that may be per-ceived as being unhealthy, such as fats and sugars, are, withthe exception of vegetable oil and margarine, more likelyto be modified, and to be decreased. On the other hand,flavor enhancers such as soy sauce, lemon juice, cinnamon,Worcestershire sauce, and toppings such as cheeses, baconand mushrooms, are also likely to be modified; however, theytend to be added in greater, rather than lesser quantities.Combined, the patterns suggest that good-tasting but “un-healthy” ingredients can be reduced, if desired, while spices,extracts, and toppings can be increased to taste.
5.2 Deletions and additionsRecipes are also frequently modified such that ingredients
are omitted entirely. We looked for words indicating thatthe reviewer did not have an ingredient (and hence did notuse it), e.g. “had no” and “didn’t have”. We further used“omit/left out/left off/bother with” as indication that thereviewer had omitted the ingredients, potentially for otherreasons. Because reviewers often used simplified terms, e.g.“vanilla” instead of “vanilla extract”, we compared words inproximity to the action words by constructing 4-character-grams and calculating the cosine similarity between the n-grams in the review and the list of ingredients for the recipe.
To identify additions, we simply looked for the word“add”,but omitted possible substitutions. For example, we woulduse “added cucumber”, but not “added cucumber instead ofgreen pepper”, the latter of which we analyze in the follow-ing section. We then compared the addition to the list ofingredients in the recipes, and considered the addition validonly if the ingredient does not already belong in the recipe.
0.01 0.02 0.05 0.10 0.20 0.50 1.00
0.01
0.02
0.05
0.10
0.20
0.50
1.00
(# reviews adjusting down)/(# recipes)
(# re
view
s ad
just
ing
up)/(
# re
cipe
s)
salt
butter
egg
flour
white sugarwateronion
garlic
milk
vanilla extract
pepper olive oil
vegetable oil
brown sugar
black peppersugar
cinnamon
tomato
margarine
baking powderbaking soda
lemon juice
parsley
cs’. sugar
parmesan
celery
cream cheese
green bell pepper
carrot
walnut
cheddar
sour cream
garlic powderchicken breast
nutmegbasil
pecan
mushroom
mayonnaise
chicken broth
potato
soy sauce
oregano
cornstarch
shortening
honeychocolate chipbacon
worcestershire s.
Figure 4: Suggested modifications of quantity forthe 50 most common ingredients, derived fromrecipe reviews. The line denotes equal numbers ofsuggested quantity increases and decreases.
Table 1 shows the correlation between ingredient modifi-cations. As might be expected, the more frequently an in-gredient occurs in a recipe, the more times its quantity hasthe opportunity to be modified, as is evident in the strongcorrelation between the the number of recipes the ingredientoccurs in and both increases and decreases recommended inreviews. However, the more common an ingredient, the morestable it appears to be. Recipe frequency is negatively cor-related with deletions/recipe (ρ = −0.22), additions/recipe(ρ = −0.25), and increases/recipe (ρ = −0.26). For exam-ple, salt is so essential, appearing in over 21,000 recipes, thatwe detected only 18 reviews where it was explicitly dropped.In contrast, Worcheshire sauce, appearing in 1,542 recipes,is dropped explicitly in 148 reviews.
As might also be expected, additions are positively corre-lated with increases, and deletions with decreases. However,additions and deletions are very weakly negatively corre-lated, indicating that an ingredient that is added frequentlyis not necessarily omitted more frequently as well.
Table 1: Correlations between ingredient modifica-tions
addition deletion increase decrease# recipes 0.41 0.22 0.61 0.68addition -0.15 0.79 0.11deletion 0.09 0.58increase 0.39
5.3 Ingredient substitute networkReplacement relationships show whether one ingredient
is preferable to another. The preference could be basedon taste, availability, or price. Some ingredient substitu-tion tables can be found online1, but are neither extensivenor contain information about relative frequencies of each
1e.g., http://allrecipes.com/HowTo/common-ingredient-substitutions/detail.aspx
Figure 5: Ingredient substitute network. Nodes aresized according to the number of times they havebeen recommended as a substitute for another in-gredient, and colored according to their indegree.
substitution. Thus, we found an alternative source for ex-tracting replacement relationships – users’ comments, e.g.“I replaced the butter in the frosting by sour cream, just tosoothe my conscience about all the fatty calories”.
To extract such knowledge, we first parsed the reviewsas follows: we considered several phrases to signal replace-ment relationships: “replace a with b”, “substitute b for a”,“b instead of a”, etc, and matched a and b to our list ofingredients.
We constructed an ingredient substitute network to cap-ture users’ knowledge about ingredient replacement. Thisweighted, directed network consists of ingredients as nodes.We thresholded and eliminated any suggested substitutionsthat occurred fewer than 5 times. We then determined theweight of each edge by p(b|a), the proportion of substitu-tions of ingredient a that suggest ingredient b. For example,68% of substitutions for white sugar were to splenda, anartificial sweetener, and hence the assigned weight for thesugar → splenda edge is 0.68.
The resulting substitution network, shown in Figure 5,exhibits strong clustering. We examined this structure byapplying the map generator tool by Rosvall et al. [13], whichuses a random walk approach to identify clusters in weighted,directed networks. The resulting clusters, and their relation-ships to one another, are shown in Fig. 6. The derived clus-ters could be used when following a relatively new recipewhich may not receive many reviews, and therefore manysuggestions for ingredient substitutions. If one does not haveall ingredients at hand, one could examine the content ofone’s fridge and pantry and match it with other ingredientsfound in the same cluster as the ingredient called for bythe recipe. Table 2 lists the contents of a few such sampleingredient clusters, and Fig. 7 shows two example clustersextracted from the substitute network.
Table 2: Clusters of ingredients that can be substi-tuted for one another. A maximum of 5 additionalingredients for each cluster are listed, ordered byPageRank.
main other ingredients
chicken turkey, beef, sausage, chicken breast, baconolive oil butter, apple sauce, oil, banana, margarine
sweet yam, potato, pumpkin, butternut squash,potato parsnipbaking baking soda, cream of tartarpowderalmond pecan, walnut, cashew, peanut, sunflower s.
apple peach, pineapple, pear, mango, pie fillingegg egg white, egg substitute, egg yolk
tilapia cod, catfish, flounder, halibut, orange roughyspinach mushroom, broccoli, kale, carrot, zucchiniitalian basil, cilantro, oregano, parsley, dill
seasoningcabbage coleslaw mix, sauerkraut, bok choy
napa cabbage
Finally, we examine whether the substitution network en-codes preferences for one ingredient over another, as evi-denced by the relative ratings of similar recipes, one whichcontains an original ingredient, and another which imple-ments a substitution. To test this hypothesis, we constructa “preference network”, where one ingredient is preferred toanother in terms of received ratings, and is constructed bycreating an edge (a, b) between a pair of ingredients, where aand b are listed in two recipes X and Y respectively, if reciperatings RX > RY . For example, if recipe X includes beef,ketchup and cheese, and recipe Y contains beef and pick-les, then this recipe pair contributes to two edges: one frompickles to ketchup, and the other from pickles to cheese. Theaggregate edge weights are defined based on PMI. BecausePMI is a symmetric quantity (PMI(a; b) = PMI(b; a)), weintroduce a directed PMI measure to cope with the direc-tionality of the preference network:
PMI(a→ b) = logp(a→ b)
p(a)p(b),
where
p(a→ b) =# of recipe pairs from a to b
# of recipe pairs,
and p(a), p(b) are defined as in the previous section.We find high correlation between this preference network
and the substitution network (ρ = 0.72, p < 0.001). This ob-servation suggests that the substitute network encodes users’ingredient preference, which we use in the recipe predictiontask described in the next section.
6. RECIPE RECOMMENDATIONWe use the above insights to uncover novel recommen-
dation algorithms suitable for recipe recommendations. Weuse ingredients and the relationships encoded between themin ingredient networks as our main feature sets to predictrecipe ratings, and compare them against features encod-ing nutrition information, as well as other baseline featuressuch as cooking methods, and preparation and cook time.
chicken,..
tilapia,..
italian seasoning,..seasoning,..
onion,..
garlic,..chicken broth,..
milk,..
sour cream,..
honey,..
olive oil,..
spinach,..
bread,..apple,..
sweet potato,..
cinnamon,..
black bean,..
flour,..
tomato,..
sauce,..
lemon juice,..
pepper,..brown rice,..
white wine,..
strawberry,..
spaghetti sauce,..
almond extract,..vanilla,..
cheese,..
almond,..
chocolate chip,..
baking powder,..
cream of mushroom soup,..
egg,..
cranberry,..pie crust,..
cabbage,..
celery,..
champagne,..
coconut milk,..
corn chip,..
sea scallop,..
apple juice,..
hoagie roll,..
iceberg lettuce,..
cottage cheese,..
golden syrup,.. black olive,..
pickle,..
red potato,..
quinoa,..
graham cracker,..
lemon cake mix,..
imitation crab meat,..
peach schnapp,..
hot,..
vegetable shortening,..pumpkin seed,..
lemonade,..
curry powder,..
dijon mustard,..
sugar snap pea,..
smoked paprika,..
Figure 6: Ingredient substitution clusters. Nodesrepresent clusters and edges indicate the presence ofrecommended substitutions that span clusters. Eachcluster represents a set of related ingredients whichare frequently substituted for one another.
milkheavy whipping cream
whole milk
skim milk
whipping cream
heavy cream
buttermilk
soy milk
half and half
evaporated milk
cream
cinnamon
ginger
pumpkin pie spicecardamom
nutmeg allspice
ginger root
clove
mace
(a) milk substitutes (b) cinammon substitutes
Figure 7: Relationships between ingredients locatedwithin two of the clusters from Fig. 6.
Then we apply a discriminative machine learning method,stochastic gradient boosting trees [6], to predict recipe rat-ings.
In the experiments, we seek to answer the following threequestions. (1) Can we predict users’ preference for a newrecipe given the information present in the recipe? (2) Whatare the key aspects that determine users’ preference? (3)Does the structure of ingredient networks help in recipe rec-ommendation, and how?
6.1 Recipe Pair PredictionThe goal of our prediction task is: given a pair of similar
recipes, determine which one has higher average rating thanthe other. This task is designed particularly to help userswith a specific dish or meal in mind, and who are trying todecide between several recipe options for that dish.
Recipe pair data. The data for this prediction taskconsists of pairs of similar recipes. The reason for select-ing similar recipes, with high ingredient overlap, is thatwhile apples may be quite comparable to oranges in thecontext of recipes, especially if one is evaluating salads ordesserts, lasagna may not be comparable to a mixed drink.To derive pairs of related recipes, we computed similarity
with a cosine similarity between the ingredient lists for thetwo recipes, weighted by the inverse document frequency,log(# of recipes/# of recipes containing the ingredient).We considered only those pairs of recipes whose cosine sim-ilarity exceeded 0.2. The weighting is intended to identifyhigher similarity among recipes sharing more distinguishingingredients, such as Brussels sprouts, as opposed to recipessharing very common ones, such as butter.
A further challenge to obtaining reliable relative rankingsof recipes is variance introduced by having different userschoose to rate different recipes. In addition, some usersmight not have a sufficient number of reviews under theirbelt to have calibrated their own rating scheme. To con-trol for variation introduced by users, we examined recipepairs where the same users are rating both recipes and arecollectively expressing a preference for one recipe over an-other. Specifically, we generated 62,031 recipe pairs (a, b)where ratingi(a) > ratingi(b), for at least 10 users i, andover 50% of users who rated both recipe a and recipe b. Fur-thermore, each user i should be an active enough reviewerto have rated at least 8 other recipes.
Features. In the prediction dataset, each observationconsists of a set of predictor variables or features that rep-resent information about two recipes, and the response vari-able is a binary indicator of which gets the higher rating onaverage. To study the key aspects of recipe information, weconstructed different set of features, including:
• Baseline: This includes cooking methods, such as chop-ping, marinating, or grilling, and cooking effort de-scriptors, such as preparation time in minutes, as wellas the number of servings produced, etc. These fea-tures are considered as primary information about arecipe and will be included in all other feature setsdescribed below.
• Full ingredients: We selected up to 1000 popular ingre-dients to build a “full ingredient list”. In this featureset, each observed recipe pair contains a vector withentries indicating whether an ingredient from the fulllist is present in either recipe in the pair.
• Nutrition: This feature set does not include any in-gredients but only nutrition information such the totalcaloric content, as well as quantities of fats, carbohy-drates, etc.
• Ingredient networks: In this set, we replaced the fullingredient list by structural information extracted fromdifferent ingredient networks, as described in Sections 4and 5.3. Co-occurrence is treated separately as a rawcount, and a complementarity, captured by the PMI.
• Combined set: Finally, a combined feature set is con-structed to test the performance of a combination offeatures, including baseline, nutrition and ingredientnetworks.
To build the ingredient network feature set, we extractedthe following two types of structural information from theco-occurrence and substitution networks, as well as the com-plement network derived from the co-occurrence informa-tion:Network positions are calculated to represent how a recipe’s
ingredients occupy positions within the networks. Such po-sition measures are likely to inform if a recipe contains any“popular” or “unusual” ingredients. To calculate the posi-tion measures, we first calculated various network centrality
baseline
full ingredients
nutrition
ing. networks
combined
Accuracy
0.60
0.65
0.70
0.75
0.80
Figure 8: Prediction performance. The nutritioninformation and ingredient networks are more effec-tive features than full ingredients. The ingredientnetwork features lead to impressive performance,close to the best performance.
measures, including degree centrality, betweenness central-ity, etc., from the ingredient networks. A centrality measurecan be represented as a vector ~g where each entry indicatesthe centrality of an ingredient. The network position of arecipe, with its full ingredient list represented as a binary
vector ~f , can be summarized by ~gT · ~f , i.e., an aggregatedcentrality measure based on the centrality of its ingredients.
Network communities provide information about whichingredient is more likely to co-occur with a group of otheringredients in the network. A recipe consisting of ingredientsthat are frequently used with, complemented by or substi-tuted by certain groups may be predictive of the ratingsthe recipe will receive. To obtain the network communityinformation, we applied latent semantic analysis (LSA) onrecipes. We first factorized each ingredient network, rep-resented by matrix W , using singular value decomposition(SVD). In the matrix W , each entry Wij indicates whetheringredient i co-occurrs, complements or substitues ingredi-ent j.
Suppose Wk = UkΣkVTk is a rank-k approximation of W ,
we can then transform each recipe’s full ingredient list using
the low-dimensional representation, Σ−1k V T
k~f , as community
information within a network. These low-dimensional vec-tors, together with the vectors of network positions, consti-tute the ingredient network features.
Learning method. We applied discriminative machinelearning methods such as support vector machines (SVM) [2]and stochastic gradient boosting trees [5] to our predictionproblem. Here we report and discuss the detailed resultsbased on the gradient boosting tree model. Like SVM, thegradient boosting tree model seeks a parameterized classi-fier, but unlike SVM that considers all the features at onetime, the boosting tree model considers a set of featuresat a time and iteratively combines them according to theirempirical errors. In practice, it not only has competitiveperformance comparable to SVM, but can serve as a featureranking procedure [11].
In this work, we fitted a stochastic gradient boosting treemodel with 8 terminal nodes under an exponential loss func-tion. The dataset is roughly balanced in terms of whichrecipe is the higher-rated one within a pair. We randomly
feature
impo
rtan
ce
0.0
0.2
0.4
0.6
0.8
1.0
20 40 60 80 100
group
nutrition (6.5%)
cook effort (5.0%)
ing. networks (84%)
cook methods (3.9%)
Figure 9: Relative importance of features in thecombined set. The individual items from nutri-tion information are very indicative in differentiat-ing highly rated recipes, while most of the predictionpower comes from ingredient networks.
feature
impo
rtan
ce
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
20 40 60 80 100
network
substitution (39.8%)
co−occurrence (30.9%)
complement (29.2%)
Figure 10: Relative importance of features repre-senting the network structure. The substitution net-work has the strongest contribution (39.8%) to thetotal importance of network features, and it also hasmore influential features in the top 100 list, whichsuggests that the substitution network is comple-mentary to other features.
divided the dataset into a training set (2/3) and a testingset (1/3). The prediction performance is evaluated based onaccuracy, and the feature performance is evaluated in termsof relative importance [8]. For each single decision tree, oneof the input variables, xj , is used to partition the region as-sociated with that node into two subregions in order to fitto the response values. The squared relative importance ofvariable xj is the sum of such squared improvements overall internal nodes for which it was chosen as the splittingvariable, as:
imp(j) =∑k
i2kI(splits on xj)
where i2k is the empirical improvement by the k-th nodesplitting on xj at that point.
6.2 ResultsThe overall prediction performance is shown in Fig. 8.
Surprisingly, even with a full list of ingredients, the pre-diction accuracy is only improved from .712 (baseline) to
feature
impo
rtan
ce
0.0
0.2
0.4
0.6
0.8
1.0
2 4 6 8 10 12
nutrition
carbs (20.9%)
cholesterol (17.7%)
calories (19.7%)
sodium (16.8%)
fiber (12.3%)
fat (12.4%)
Figure 11: Relative importance of features fromnutrition information. The carbs item is the mostinfluential feature in predicting higher-rated recipes.
.746. In contrast, the nutrition information and ingredientnetworks are more effective (with accuracy .753 and .786, re-spectively). Both of them have much lower dimensions (fromtens to several hundreds), compared with the full ingredientsthat are represented by more than 2000 dimensions (1000ingredients per recipe in the pair). The ingredient networkfeatures lead to impressive performance, close to the bestperformance given by the combined set (.792), indicatingthe power of network structures in recipe recommendation.
Figure 9 shows the influence of different features in thecombined feature set. Up to 100 features with the highestrelative importance are shown. The importance of a featuregroup is summarized by how much the total importance iscontributed by all features in the set. For example, thebaseline consisting of cooking effort and cooking methodscontribute 8.9% to the overall performance. The individualitems from nutrition information are very indicative in differ-entiating highly-rated recipes, while most of the predictionpower comes from ingredient networks (84%).
Figure 10 shows the top 100 features from the three net-works. In terms of the total importance of ingredient net-work features, the substitution network has slightly strongercontribution (39.8%) than the other two networks, and italso has more influential features in the top 100 list. Thissuggests that the structural information extracted from thesubstitution network is not only important but also comple-mentary to information from other aspects.
Looking into the nutrition information (Fig. 11), we foundthat carbohydrates are the most influential feature in pre-dicting higher-rated recipes. Since carbohydrates comprisearound 50% or more of total calories, the high importanceof this feature interestingly suggests that a recipe’s ratingcan be influenced by users’ concerns about nutrition anddiet. Another interesting observation is that, while individ-ual nutrition items are powerful predictors, a higher predic-tion accuracy can be reached by using ingredient networksalone, as shown in Fig. 8. This implies the informationabout nutrition may have been encoded in the ingredientnetwork structure, e.g. substitutions of less healthful ingre-dients with “healthier” alternatives.
Constructing the ingredient network feature involves re-ducing high-dimensional network information through SVD,as described in the previous section. The dimensionality canbe determined by cross-validation. As shown in Fig. 12, fea-tures with a very large dimension tend to overfit the training
Dimensions
Acc
urac
y
0.76
0.77
0.78
0.79
0.80
●
●●
●
●●
●
10 20 30 40 50 60 70
network
● combined
substitution
complement
co−occurrence
Figure 12: Prediction performance over reduceddimensionality. The best performance is given byreduced dimension k = 50 when combining all threenetworks. In addition, using the information aboutthe complement network alone is more effective inprediction than using other two networks.
4
8
43
19
6
chicken breastporkitalian sausagesausagechickenturkeycoconut extractwalnutlim
e juicelem
on extractchocolate puddingalm
ond extractcream
of chicken soupbeefalm
ondkalevanillavanilla extractevaporated m
ilksour creambutterm
ilkchicken brothhalf and halfm
ilkbrow
n sugarbutterhoneyapplesauceolive oilsplenda
−0.5Value
Color Key
4 8 43 19 6
chicken breastporkitalian sausagesausagechickenturkeycoconut extractwalnutlime juicelemon extractchocolate puddingalmond extractcream of chicken soupbeefalmondkalevanillavanilla extractevaporated milksour creambuttermilkchicken brothhalf and halfmilkbrown sugarbutterhoneyapplesauceolive oilsplenda
−0.5 0.5Value
Color Key
ingredient
svd dimension
1
2
3
4
5
Figure 13: Influential substitution communities.The matrix shows the most influential feature di-mensions extracted from the substitution network.For each dimension, the six representative ingredi-ents with the highest intensity values are shown,with colors indicating their intensity. These featuressuggest that the communities of ingredient substi-tutes, such as the sweet and oil in the first dimen-sion, are particularly informative in prediction.
data. Hence we chose k = 50 for the reduced dimension ofall three networks. The figure also shows that using theinformation about the complement network alone is moreeffective in prediction than using either the co-occurrenceand substitute networks, even in the case of low dimen-sions. Consistently, as shown in terms of relative importance(Fig. 10), the substitution network alone is not the most ef-fective, but it provides more complementary information inthe combined feature set.
In Figure 13 we show the most representative ingredientsin the decomposed matrix derived from the substitution net-work. We display the top five influential dimensions, eval-uated based on the relative importance, from the SVD re-sultant matrix Vk, and in each of these dimensions we ex-tracted six representative ingredients based on their inten-sities in the dimension (the squared entry values). Theserepresentative ingredients suggest that the communities ofingredient substitutes, such as the sweet and oil substitutesin the first dimension or the milk substitutes in the seconddimesion (which is similar to the cluster shown in Fig. 6),are particularly informative in predicting recipe ratings.
To summarize our observations, we find we are able toeffectively predict users’ preference for a recipe, but the pre-diction is not through using a full list of ingredients. Instead,by using the structural information extracted from the re-lationships among ingredients, we can better uncover users’preference about recipes.
7. CONCLUSIONRecipes are little more than instructions for combining
and processing sets of ingredients. Individual cookbooks,even the most expansive ones, contain single recipes for eachdish. The web, however, permits collaborative recipe gen-eration and modification, with tens of thousands of recipescontributed in individual websites. We have shown how thisdata can be used to glean insights about regional preferencesand modifiability of individual ingredients, and also how itcan be used to construct two kinds of networks, one of in-gredient complements, the other of ingredient substitutes.These networks encode which ingredients go well together,and which can be substituted to obtain superior results, andpermit one to predict, given a pair of related recipes, whichone will be more highly rated by users.
In future work, we plan to extend ingredient networks toincorporate the cooking methods as well. It would also beof interest to generate region-specific and diet-specific rat-ings, depending on the users’ background and preferences.A whole host of user-interface features could be added forusers who are interacting with recipes, whether the recipeis newly submitted, and hence unrated, or whether they arebrowsing a cookbook. In addition to automatically predict-ing a rating for the recipe, one could flag ingredients thatcan be omitted, ones whose quantity could be tweaked, aswell as suggested additions and substitutions.
8. ACKNOWLEDGMENTSThis work was supported by MURI award FA9550-08-1-
0265 from the Air Force Office of Scientific Research. Themethodology used in this paper was developed with sup-port from funding from the Army Research Office, Multi-University Research Initiative on Measuring, Understand-ing, and Responding to Covert Social Networks: Passive andActive Tomography. The authors gratefully acknowledge D.Lazer for support.
9. REFERENCES[1] Ahn, Y., Ahnert, S., Bagrow, J., and Barabasi, A.
Flavor network and the principles of food pairing.Bulletin of the American Physical Society 56 (2011).
[2] Cortes, C., and Vapnik, V. Support-vector networks.Machine learning 20, 3 (1995), 273–297.
[3] Forbes, P., and Zhu, M. Content-boosted matrixfactorization for recommender systems: Experimentswith recipe recommendation. Proceedings ofRecommender Systems (2011).
[4] Freyne, J., and Berkovsky, S. Intelligent foodplanning: personalized recipe recommendation. In IUI,ACM (2010), 321–324.
[5] Friedman, J. Stochastic gradient boosting.Computational Statistics & Data Analysis 38, 4(2002), 367–378.
[6] Friedman, J., Hastie, T., and Tibshirani, R. Additivelogistic regression: a statistical view of boosting.Annals of Statistics 28 (1998), 2000.
[7] Geleijnse, G., Nachtigall, P., van Kaam, P., andWijgergangs, L. A personalized recipe advice systemto promote healthful choices. In IUI, ACM (2011),437–438.
[8] Hastie, T., Tibshirani, R., Friedman, J., and Franklin,J. The elements of statistical learning: data mining,inference and prediction. The MathematicalIntelligencer 27, 2 (2005).
[9] Kamieth, F., Braun, A., and Schlehuber, C. Adaptiveimplicit interaction for healthy nutrition and foodintake supervision. Human-Computer Interaction.Towards Mobile and Intelligent InteractionEnvironments (2011), 205–212.
[10] Kinouchi, O., Diez-Garcia, R., Holanda, A.,Zambianchi, P., and Roque, A. The non-equilibriumnature of culinary evolution. New Journal of Physics10 (2008), 073020.
[11] Lu, Y., Peng, F., Li, X., and Ahmed, N. Couplingfeature selection and machine learning methods fornavigational query identification. In CIKM, ACM(2006), 682–689.
[12] Rombauer, I., Becker, M., Becker, E., and Maestro, L.Joy of cooking. Scribner Book Company, 1997.
[13] Rosvall, M., and Bergstrom, C. Maps of random walkson complex networks reveal community structure.PNAS 105, 4 (2008), 1118.
[14] Shidochi, Y., Takahashi, T., Ide, I., and Murase, H.Finding replaceable materials in cooking recipe textsconsidering characteristic cooking actions. In Proc. ofthe ACM multimedia 2009 workshop on Multimediafor cooking and eating activities, ACM (2009), 9–14.
[15] Svensson, M., Hook, K., and Coster, R. Designing andevaluating kalas: A social navigation system for foodrecipes. ACM Transactions on Computer-HumanInteraction (TOCHI) 12, 3 (2005), 374–400.
[16] Ueda, M., Takahata, M., and Nakajima, S. User’s foodpreference extraction for personalized cooking reciperecommendation. Proc. of the Second Workshop onSemantic Personalized Information Management:Retrieval and Recommendation (2011).
[17] Wang, L., Li, Q., Li, N., Dong, G., and Yang, Y.Substructure similarity measurement in chineserecipes. In WWW, ACM (2008), 979–988.
[18] Wikipedia. Outline of food preparation, 2011. [Online;accessed 22-Oct-2011].
[19] Zhang, Q., Hu, R., Mac Namee, B., and Delany, S.Back to the future: Knowledge light case base cookery.In Proc. of The 9th European Conference onCase-Based Reasoning Workshop (2008), 15.
top related