Association Rule Mining
Discovering interesting relationships among items in large collections of transactions.
Classic framing: market basket analysis
But also:
The question:
“If a recipe uses garlic and olive oil, what other ingredients does it tend to include?”
An association rule has the form:
\[\{garlic, olive\_oil\} \Rightarrow \{parmesan\}\]
Each transaction is a set of items. We represent the data as a binary matrix:
Key insight
Association rules work on presence/absence data — no quantities, no order. Just “was it in the basket or not?”
Let’s build one interactively…
Toggle pizza toppings on and off to build a transaction database. Watch the binary matrix and item frequencies update in real time.
The three key metrics for evaluating association rules:
How frequently the itemset appears:
\[\text{supp}(A \Rightarrow B) = \frac{|A \cup B|}{N}\]
“How common is this combination?”
How reliable the rule is:
\[\text{conf}(A \Rightarrow B) = \frac{\text{supp}(A \cup B)}{\text{supp}(A)}\]
“Given A, how often does B also appear?”
This is \(P(B | A)\) — conditional probability.
How much more likely B is given A than expected by chance:
\[\text{lift}(A \Rightarrow B) = \frac{\text{supp}(A \cup B)}{\text{supp}(A) \cdot \text{supp}(B)}\]
Select items for the antecedent (A) and consequent (B) from the pizza data above, and see support, confidence, and lift computed in real time with a Venn diagram.
High confidence ≠ interesting rule
A rule can have high confidence simply because the consequent is extremely common — not because of any real association.
Example from our pizza data — pepperoni appears in ~6–7 out of 8 transactions:
| Rule | Confidence | Lift | Verdict |
|---|---|---|---|
| {Mushrooms} → {Pepperoni} | ≈ 0.80 | ≈ 1.0 | ❌ Useless — pepperoni is on nearly every pizza anyway |
| {Mushrooms} → {Olives} | ≈ 0.50 | ≈ 2.5 | ✓ Interesting — olives appear much more with mushrooms |
Lift corrects for popularity. It asks: “Is B more likely given A than it would be by chance?”
Lift = 1.0 means A tells us nothing new about B. Lift > 1 means a genuine association.
Try it in Widget 2 — select Mushrooms → Pepperoni, then swap to Mushrooms → Olives.
The Apriori algorithm efficiently discovers association rules by exploiting a key property:
The Apriori principle
If an itemset is infrequent, all its supersets must also be infrequent.
Equivalently: every subset of a frequent itemset must also be frequent.
Algorithm outline:
Let’s watch this happen step by step…
Watch the Apriori algorithm work on the pizza transaction data. Adjust the minimum support threshold and step through each level of candidate generation and pruning.
Two popular algorithms — same goal, different data representation:
arules::apriori()arules::eclat()Both produce the same frequent itemsets. Apriori is easier to teach; Eclat is often faster in practice. R’s arules package supports both.
Let’s apply association rules to real data — 2,218 recipes from Allrecipes, categorised by cuisine (TidyTuesday 2025-09-16).
Each recipe is a transaction; each ingredient is an item.
#> Rows: 2,218
#> Columns: 17
#> $ name <chr> "Saganaki (Flaming Greek Cheese)", "Coney Island Knishe…
#> $ country <chr> "Greek", "Jewish", "Australian and New Zealander", "Chi…
#> $ url <chr> "https://www.allrecipes.com/recipe/263750/flaming-greek…
#> $ author <chr> "John Mitzewich", "John Mitzewich", "CHIPPENDALE", "Hei…
#> $ date_published <date> 2024-02-07, 2024-11-26, 2022-07-14, 2025-01-31, 2025-0…
#> $ ingredients <chr> "1 (4 ounce) package kasseri cheese, 1 tablespoon water…
#> $ calories <dbl> 391, 301, 64, 106, 449, 958, 378, 90, 157, 322, 4, NA, …
#> $ fat <dbl> 25, 17, 3, 9, 23, 24, 10, 5, 6, 16, 0, NA, 21, 2, 66, 8…
#> $ carbs <dbl> 15, 31, 9, 7, 58, 144, 59, 10, 25, 39, 1, NA, 16, 63, 7…
#> $ protein <dbl> 16, 7, 1, 1, 7, 46, 14, 1, 2, 7, 0, NA, 28, 6, 54, 17, …
#> $ avg_rating <dbl> 4.8, 4.6, 4.3, 5.0, 3.8, 4.4, 4.3, NA, 4.6, 5.0, 4.7, 4…
#> $ total_ratings <dbl> 25, 10, 126, 1, 13, 40, 3, NA, 65, 2, 182, 2, 19, 16, 9…
#> $ reviews <dbl> 22, 9, 104, 1, 11, 32, 3, NA, 55, 2, 138, 2, 15, 16, 84…
#> $ prep_time <dbl> 10, 30, 20, 10, 30, 30, 30, 40, 0, 5, 5, 5, 10, 10, 20,…
#> $ cook_time <dbl> 5, 75, 15, 0, 15, 165, 75, 30, 0, 5, 0, 25, 10, 50, 16,…
#> $ total_time <dbl> 15, 180, 180, 10, 45, 675, 585, 155, 0, 10, 5, 30, 50, …
#> $ servings <dbl> 2, 16, 12, 6, 15, 6, 6, 84, 24, 1, 21, 8, 4, 10, 4, 8, …
The ingredients column is a comma-separated string. We need to extract base ingredient names.
# Words/phrases that are preparation descriptors, not ingredients.
# These appear as standalone tokens after comma-splitting
# (e.g. "garlic, minced" → "garlic" + "minced").
not_ingredients <- c(
# preparation methods
"chopped", "minced", "diced", "sliced", "grated", "shredded", "crushed",
"julienned", "cubed", "halved", "quartered", "peeled", "seeded", "stemmed",
"trimmed", "cleaned", "rinsed", "drained", "thawed", "softened", "melted",
"beaten", "divided", "separated", "sifted", "crumbled", "toasted",
"mashed", "pureed", "blanched", "boiled", "roasted", "juiced",
"thinly sliced", "finely chopped", "coarsely chopped",
"finely", "coarsely", "thinly",
# usage / quantity phrases
"to taste", "as needed", "or to taste", "or as needed",
"or more", "or more to taste", "or more as needed",
"salt to taste", "salt and pepper to taste",
"salt and ground black pepper to taste",
"salt and ground black pepper",
"salt and pepper", "if desired", "for garnish",
# too generic / ubiquitous to be informative
"salt", "pepper", "kosher salt", "ground black pepper",
"freshly ground black pepper",
# other non-ingredients
"skinless", "cooking spray", "room temperature", "needed", "taste"
)
# Unit regex: strips leading quantity + measurement word
unit_rx <- regex(
paste0("^(cups?|tablespoons?|tbsp|teaspoons?|tsp|pounds?|lbs?|ounces?|oz|",
"cans?|packages?|pkgs?|envelopes?|pinch(es)?|cloves?|slices?|",
"stalks?|heads?|bunche?s?|pieces?|strips?|sprigs?)\\s+(of\\s+)?"),
ignore_case = TRUE
)
recipe_items <- cuisines_raw |>
select(name, country, ingredients) |>
mutate(recipe_id = row_number()) |>
separate_longer_delim(ingredients, delim = ", ") |>
mutate(
ingredient = ingredients |>
str_to_lower() |>
str_trim() |>
str_remove_all("\\(.*?\\)") |> # remove parentheticals
str_remove("^[0-9½¼¾⅓⅔⅛⅜⅝⅞/\\. ]+") |> # remove quantities inc. Unicode fractions
str_remove(unit_rx) |> # remove measurement unit
str_remove(regex("^(large|medium|small|extra-?large)\\s+", ignore_case = TRUE)) |>
str_remove(regex("\\s+(to taste|as needed|if desired|or more to taste|or more)$",
ignore_case = TRUE)) |>
str_remove(regex(",\\s*(divided|optional|drained|thawed|softened|melted|beaten)$",
ignore_case = TRUE)) |>
str_trim() |>
# Normalise variants → canonical form
str_replace_all("^unsalted butter$", "butter") |>
str_replace_all("^garlic powder$", "garlic") |>
str_replace_all("^minced garlic$", "garlic") |>
# Normalise plurals → singular
str_replace_all("^eggs$", "egg") |>
str_replace_all("^onions$", "onion") |>
str_replace_all("^potatoes$", "potato") |>
str_replace_all("^tomatoes$", "tomato") |>
str_replace_all("^carrots$", "carrot") |>
str_replace_all("^lemons$", "lemon") |>
str_replace_all("^limes$", "lime") |>
str_replace_all("^mushrooms$", "mushroom") |>
str_replace_all("^bananas$", "banana") |>
str_replace_all("^green onions$", "green onion") |>
str_replace_all("^cloves$", "clove")
) |>
filter(
nchar(ingredient) > 2,
!ingredient %in% not_ingredients
)
# Keep the most common ingredients for interpretable rules
top_ingredients <- recipe_items |>
count(ingredient, sort = TRUE) |>
slice_head(n = 75) |>
pull(ingredient)
recipe_items_filtered <- recipe_items |>
filter(ingredient %in% top_ingredients)# Create transaction format for arules
recipe_trans <- recipe_items_filtered |>
select(recipe_id, ingredient) |>
distinct() |>
mutate(present = TRUE) |>
pivot_wider(
names_from = ingredient,
values_from = present,
values_fill = FALSE
) |>
select(-recipe_id) |>
as.matrix() |>
as("transactions")
recipe_trans#> transactions in sparse format with
#> 2158 transactions (rows) and
#> 75 items (columns)
#> set of 43 rules
#> lhs rhs support
#> [1] {baking powder} => {all-purpose flour} 0.06302132
#> [2] {egg, butter, white sugar} => {all-purpose flour} 0.05746061
#> [3] {egg, butter} => {all-purpose flour} 0.07970343
#> [4] {butter, white sugar} => {all-purpose flour} 0.07970343
#> [5] {egg, white sugar} => {all-purpose flour} 0.08989805
#> [6] {all-purpose flour, butter, white sugar} => {egg} 0.05746061
#> [7] {baking powder} => {egg} 0.05468026
#> [8] {all-purpose flour, white sugar} => {egg} 0.08989805
#> [9] {all-purpose flour, egg, white sugar} => {butter} 0.05746061
#> [10] {milk} => {egg} 0.07043559
#> confidence coverage lift count
#> [1] 0.7953216 0.07924004 3.287939 136
#> [2] 0.7750000 0.07414272 3.203927 124
#> [3] 0.7136929 0.11167748 2.950478 172
#> [4] 0.6935484 0.11492122 2.867198 172
#> [5] 0.6783217 0.13253012 2.804249 194
#> [6] 0.7209302 0.07970343 2.778156 124
#> [7] 0.6900585 0.07924004 2.659190 118
#> [8] 0.6759582 0.13299351 2.604853 194
#> [9] 0.6391753 0.08989805 2.587880 124
#> [10] 0.6637555 0.10611677 2.557829 152
#> lhs rhs support confidence coverage lift
#> [1] {olive oil, garlic} => {onion} 0.05143652 0.4723404 0.1088971 1.983095
#> count
#> [1] 111
Explore all mined recipe rules interactively. Brush the scatterplot, adjust thresholds, or search for specific ingredients.
Association rules share ideas with many techniques you already know:
Conditional Probability (STAT courses)
\[\text{confidence}(A \Rightarrow B) = P(B \mid A)\]
Confidence is conditional probability, estimated from data.
. . .
Bayesian Reasoning
\[\text{lift}(A \Rightarrow B) = \frac{P(B \mid A)}{P(B)}\]
Lift measures how much A changes our belief about B — like a likelihood ratio.
Clustering (Week 10)
Decision Trees (Week 9)
“Customers who bought X also bought Y”
This is literally association rule mining applied to recommendation systems.
| Market Basket | Recommendation |
|---|---|
| Transaction = shopping cart | Transaction = user purchase history |
| Item = product | Item = product |
| Rule: {A} → {B} | Recommendation: “You might also like B” |
| High lift = genuine association | High lift = personalised recommendation |
Association rules find rules from all transactions together.
Modern recommender systems (Netflix, Spotify) use more sophisticated collaborative filtering (matrix factorisation, deep learning), but the core intuition is the same:
Find patterns of co-occurrence across many users/transactions.
Click any cell to expand the explanation. See how association rules compare with other methods you’ve learned.
✅ Transactional data (baskets, carts, records)
✅ Binary/categorical variables
✅ Exploratory discovery — “what goes with what?”
✅ Large number of items with sparse transactions
✅ Business applications: cross-selling, layout optimisation, recommendations
❌ Small datasets (< 100 transactions) — rules will be unreliable
❌ Continuous data without discretisation
❌ When you need causal claims (rules show co-occurrence, not causation)
❌ When you have a specific target variable (use classification instead)
❌ Very dense data (every item in every transaction)
| Parameter | Too low | Too high |
|---|---|---|
| min support | Millions of rules, most rare & unreliable | Only obvious, uninteresting patterns |
| min confidence | Many weak associations | Miss rules where consequent is rare |
Start with support ≥ 0.01, confidence ≥ 0.5, then adjust based on domain knowledge and the number of rules produced.
In the lab you’ll practice with:
Groceries dataset (9,835 transactions)bank-rules.csv dataset (banking service cross-selling)inspectDT() for interactive explorationChapter 8 of the course notes covers the same concepts with different examples and more mathematical detail.