Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
c785217
created bsaed function for frequent itemsets and association rules
Wander03 Nov 11, 2024
4fed387
testing new functions
Wander03 Nov 13, 2024
f70d6f6
clustering for freq itemsets function
Wander03 Nov 17, 2024
7fb5ede
progress!
Wander03 Nov 18, 2024
f2a51f5
fix conditions
Wander03 Jan 4, 2025
f8787f4
fix conditions
Wander03 Jan 4, 2025
ea1c695
Update text
Wander03 Jan 8, 2025
2d30942
change method to mining_method
Wander03 Jan 8, 2025
9459749
create vignette for freq itemsets
Wander03 Jan 8, 2025
144036c
bug fixing
Wander03 Jan 13, 2025
1c19e9d
fixed name
Wander03 Jan 14, 2025
6ae24be
premptive changes
Wander03 Jan 14, 2025
94dcf46
bug fixing freq itemsets
Wander03 Jan 14, 2025
f5c8dd1
code formatting
Wander03 Jan 15, 2025
d01188f
bug fixes
Wander03 Jan 16, 2025
1074d74
updating cluster functions
Wander03 Jan 17, 2025
fdeac74
save average supports for each cluster (to be used in predict)
Wander03 Jan 27, 2025
02dbce1
predict not saving output
Wander03 Feb 5, 2025
ecf7486
some change
Wander03 Feb 6, 2025
cff078c
fixed predcit! Proba is now put in N/A spots
Wander03 Feb 8, 2025
4d45209
remove avg support tracker (unused)
Wander03 Feb 27, 2025
1f8f633
change best cluster to prioritize size then support
Wander03 Feb 27, 2025
690c385
vignette testing
Wander03 Mar 5, 2025
ae9917b
predict output formated & cutoff implemented
Wander03 Mar 5, 2025
a6826c3
create holder for extract_predictions function (placeholder name)
Wander03 Mar 6, 2025
cf3e82b
hard code cutoff
Wander03 Mar 6, 2025
c6acf01
change predict formating
Wander03 Mar 13, 2025
d81fc52
something>
Wander03 Mar 13, 2025
cfb5d74
extract_predictions complete! (still needs a better name)
Wander03 Mar 19, 2025
5fc3153
move detail text
Wander03 Mar 24, 2025
a3abf08
add tuning for min_support
Wander03 Mar 24, 2025
f2805b2
tune update
Wander03 Apr 3, 2025
a304ebd
improved example code
Wander03 Apr 9, 2025
0fbcdb7
fix params help doc
Wander03 Apr 9, 2025
d23f353
create test files (TODO: create tests)
Wander03 Apr 10, 2025
dcea789
adding raw to predict
Wander03 Apr 11, 2025
b22bcee
split regualar predict and raw predict.
Wander03 Apr 12, 2025
be2f3a9
predict fixed! Output is the same :D
Wander03 Apr 14, 2025
3ab1362
change default to eclat
Wander03 Apr 23, 2025
a5c8edd
fixing replacing wrong part from earlier commit
Wander03 Apr 23, 2025
022c984
updating with correct default
Wander03 Apr 23, 2025
a61eb1e
augment code written, add correct header text and move unecessary code
Wander03 May 2, 2025
ffaa380
standardize column names
Wander03 May 4, 2025
8d18057
testing
Wander03 May 4, 2025
f99b0aa
testing2
Wander03 May 4, 2025
0f9e947
hide predict dataframe from arules::inspect()
Wander03 May 7, 2025
e12ae62
remove `` from predict output item names
Wander03 May 7, 2025
3bc1e43
added note
Wander03 May 8, 2025
ace0ce4
re-roder doesnt matter for fit
Wander03 May 12, 2025
2ed1a98
hide freq itemset output from auto displaying when extracting cluster…
Wander03 May 14, 2025
18566d2
rename col name in predict
Wander03 May 14, 2025
4315559
vignettes update with new info from thesis
Wander03 May 25, 2025
083cb4d
added header descriptions about functions
Wander03 May 25, 2025
1923aea
added convergence limit and warning message
Wander03 May 25, 2025
3bfdba4
update freq_itemsets extract_fit_summary and ? information
Wander03 May 25, 2025
39a92eb
vignette update
Wander03 May 29, 2025
d68410d
create test cases for freq_itemsets
Wander03 Jun 2, 2025
19c58ae
move min_support tuning to dials
Wander03 Jun 19, 2025
b59df0b
Merged upstream/main into main
Wander03 Jun 19, 2025
fe2537d
re-ran test cases
Wander03 Jun 19, 2025
96ad9f2
remove assoc_rules
Wander03 Jun 19, 2025
6e5a28f
Add the following
Wander03 Jul 3, 2025
1ae32d8
rename `extract_predictions` to `extract_itemset_predictions`
Wander03 Jul 3, 2025
f808e10
rename `extract_predictions` to `extract_itemset_predictions`
Wander03 Jul 3, 2025
0629618
Add exported functions to _pkgdown.yml
Wander03 Jul 3, 2025
e22c3c6
convert all rlang::abort() calls to use {cli}
Wander03 Jul 3, 2025
842f8d7
edit toy_df and toy_pred to use " instead of ' and TRUE/FALSE instead…
Wander03 Jul 3, 2025
305b078
add example to `augment_itemset_predict`
Wander03 Jul 3, 2025
5860141
add skip_if_not_installed("arules") to all tests that use freq_itemse…
Wander03 Jul 3, 2025
32705bf
use base R rather than stringr
Wander03 Jul 3, 2025
7b2751c
use the reduce() from compat-purrr.R
Wander03 Jul 3, 2025
fbe29b3
stats::setNames
Wander03 Jul 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ Imports:
utils,
vctrs (>= 0.5.0)
Suggests:
arules,
cluster,
ClusterR,
clustMixType (>= 0.3-5),
Expand Down
11 changes: 11 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,14 @@
S3method(as_tibble,cluster_metric_set)
S3method(augment,cluster_fit)
S3method(check_args,default)
S3method(check_args,freq_itemsets)
S3method(check_args,hier_clust)
S3method(check_args,k_means)
S3method(extract_cluster_assignment,KMeansCluster)
S3method(extract_cluster_assignment,cluster_fit)
S3method(extract_cluster_assignment,cluster_spec)
S3method(extract_cluster_assignment,hclust)
S3method(extract_cluster_assignment,itemsets)
S3method(extract_cluster_assignment,kmeans)
S3method(extract_cluster_assignment,kmodes)
S3method(extract_cluster_assignment,kproto)
Expand All @@ -18,6 +20,7 @@ S3method(extract_fit_summary,KMeansCluster)
S3method(extract_fit_summary,cluster_fit)
S3method(extract_fit_summary,cluster_spec)
S3method(extract_fit_summary,hclust)
S3method(extract_fit_summary,itemsets)
S3method(extract_fit_summary,kmeans)
S3method(extract_fit_summary,kmodes)
S3method(extract_fit_summary,kproto)
Expand All @@ -37,6 +40,7 @@ S3method(print,cluster_fit)
S3method(print,cluster_metric_set)
S3method(print,cluster_spec)
S3method(print,control_cluster)
S3method(print,freq_itemsets)
S3method(print,hier_clust)
S3method(print,k_means)
S3method(required_pkgs,cluster_fit)
Expand All @@ -58,23 +62,28 @@ S3method(sse_within_total,cluster_spec)
S3method(sse_within_total,workflow)
S3method(tidy,cluster_fit)
S3method(translate_tidyclust,default)
S3method(translate_tidyclust,freq_itemsets)
S3method(translate_tidyclust,hier_clust)
S3method(translate_tidyclust,k_means)
S3method(tunable,cluster_spec)
S3method(tunable,freq_itemsets)
S3method(tunable,k_means)
S3method(tune_args,cluster_spec)
S3method(tune_cluster,cluster_spec)
S3method(tune_cluster,default)
S3method(tune_cluster,workflow)
S3method(update,freq_itemsets)
S3method(update,hier_clust)
S3method(update,k_means)
export("%>%")
export(.freq_itemsets_fit_arules)
export(.hier_clust_fit_stats)
export(.k_means_fit_ClusterR)
export(.k_means_fit_clustMixType)
export(.k_means_fit_klaR)
export(.k_means_fit_stats)
export(augment)
export(augment_itemset_predict)
export(cluster_metric_set)
export(control_cluster)
export(cut_height)
Expand All @@ -83,6 +92,7 @@ export(extract_cluster_assignment)
export(extract_fit_engine)
export(extract_fit_parsnip)
export(extract_fit_summary)
export(extract_itemset_predictions)
export(extract_parameter_set_dials)
export(extract_preprocessor)
export(extract_spec_parsnip)
Expand All @@ -92,6 +102,7 @@ export(fit)
export(fit.cluster_spec)
export(fit_xy)
export(fit_xy.cluster_spec)
export(freq_itemsets)
export(get_tidyclust_colors)
export(glance)
export(hier_clust)
Expand Down
6 changes: 6 additions & 0 deletions R/aaa.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ utils::globalVariables(
".iter_model",
".iter_preprocessor",
".msg_model",
".pred_item",
".submodels",
"call_info",
"cluster",
Expand All @@ -23,6 +24,7 @@ utils::globalVariables(
"exposed",
"func",
"id",
"item",
"iteration",
"lab",
"name",
Expand All @@ -32,10 +34,14 @@ utils::globalVariables(
"orig_label",
"original",
"predictor_indicators",
"preds",
"remove_intercept",
"row_id",
"seed",
"setNames",
"sil_width",
"splits",
"truth_value",
"tunable",
"type",
"value",
Expand Down
170 changes: 170 additions & 0 deletions R/augment_itemset_predict.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
#' Augment Itemset Predictions with Truth Values
#'
#' This function processes the output of a `predict()` call for frequent itemset models
#' and joins it with the corresponding ground truth data. It's designed to prepare
#' the prediction and truth values in a format suitable for calculating evaluation metrics
#' using packages like `yardstick`.
#'
#' @param pred_output A data frame that is the output of `predict()` from a `freq_itemsets` model.
#' It is expected to have a column named `.pred_cluster`, where each cell contains
#' a data frame with prediction details (including `.pred_item`, `.obs_item`, and `item`).
#' @param truth_output A data frame representing the ground truth. It should have a similar
#' structure to the input data used for prediction, where columns represent items
#' and rows represent transactions.
#'
#' @details
#' The function first extracts and combines all individual item prediction data frames
#' nested within the `pred_output`. It then filters for items where a prediction was made
#' (i.e., `!is.na(.pred_item)`) and standardizes item names by removing backticks.
#' The `truth_output` is pivoted to a long format to match the structure of the predictions.
#' Finally, an inner join is performed to ensure that only predicted items are included in
#' the final result, aligning predictions with their corresponding true values.
#'
#' @return A data frame with the following columns:
#' \itemize{
#' \item `item`: The name of the item.
#' \item `row_id`: An identifier for the transaction (row) from which the prediction came.
#' \item `preds`: The predicted value for the item (either raw probability or binary prediction).
#' \item `truth`: The true value for the item from `truth_output`.
#' }
#' This output is suitable for direct use with `yardstick` metric functions.
#'
#' @examples
#' toy_df <- data.frame(
#' "beer" = c(FALSE, TRUE, TRUE, TRUE, FALSE),
#' "milk" = c(TRUE, FALSE, TRUE, TRUE, TRUE),
#' "bread" = c(TRUE, TRUE, FALSE, TRUE, TRUE),
#' "diapers" = c(TRUE, TRUE, TRUE, TRUE, TRUE),
#' "eggs" = c(FALSE, TRUE, FALSE, FALSE, FALSE)
#' )
#'
#' new_data <- data.frame(
#' "beer" = NA,
#' "milk" = TRUE,
#' "bread" = TRUE,
#' "diapers" = TRUE,
#' "eggs" = FALSE
#' )
#'
#' truth_df <- data.frame(
#' "beer" = FALSE,
#' "milk" = TRUE,
#' "bread" = TRUE,
#' "diapers" = TRUE,
#' "eggs" = FALSE
#' )
#'
#' fi_spec <- freq_itemsets(
#' min_support = 0.05,
#' mining_method = "eclat"
#' ) |>
#' set_engine("arules") |>
#' set_mode("partition")
#'
#' fi_fit <- fi_spec |>
#' fit(~ .,
#' data = toy_df
#' )
#'
#' aug_pred <- fi_fit |>
#' predict(new_data, type = "raw") |>
#' augment_itemset_predict(truth_output = truth_df)
#'
#' aug_pred
#'
#' # Example use of formatted output
#' aug_pred |>
#' yardstick::rmse(truth, preds)
#'
#' @export

augment_itemset_predict <- function(pred_output, truth_output) {
Comment on lines +79 to +81
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All exported functions need examples.

I would also like to see the example to help determine the use of it

# Extract all predictions (bind all .pred_cluster dataframes)
preds_df <- dplyr::bind_rows(pred_output$.pred_cluster, .id = "row_id") %>%
dplyr::filter(!is.na(.pred_item)) %>% # Keep only rows with predictions
dplyr::mutate(
item = gsub("`|TRUE|FALSE", "", item) # Remove backticks, TRUE, and FALSE from item names
)
dplyr::select(row_id, item, preds = .pred_item) # Standardize column names

# Pivot truth data to long format (to match predictions)
truth_long <- truth_output %>%
tibble::rownames_to_column("row_id") %>%
tidyr::pivot_longer(
cols = -row_id,
names_to = "item",
values_to = "truth_value"
) %>%
dplyr::mutate(truth_value = as.numeric(truth_value))

# Join predictions with truth (inner join to keep only predicted items)
result <- preds_df %>%
dplyr::inner_join(truth_long, by = c("row_id", "item"))

# Return simplified output (preds vs truth)
dplyr::select(result, item, row_id, preds, truth = truth_value)
}

#' Generate Dataframe with Random NAs and Corresponding Truth
#'
#' @description
#' This helper function creates a new data frame by randomly introducing `NA` values
#' into an input data frame. It also returns the original data frame as a "truth"
#' reference, which can be useful for simulating scenarios with missing data
#' for prediction tasks.
#'
#' @param df The input data frame to which `NA` values will be introduced.
#' It is typically a transactional dataset where columns are items and rows are transactions.
#' @param na_prob The probability (between 0 and 1) that any given cell in the
#' input data frame will be replaced with `NA`.
#'
#' @return A list containing two data frames:
#' \itemize{
#' \item `na_data`: The data frame with `NA` values randomly introduced.
#' \item `truth`: The original input data frame, serving as the ground truth.
#' }
#' @examples
#' # Create a sample data frame
#' sample_df <- data.frame(
#' itemA = c(1, 0, 1),
#' itemB = c(0, 1, 1),
#' itemC = c(1, 1, 0)
#' )
#'
#' # Generate NA data and truth with 30% NA probability
#' set.seed(123)
#' na_data_list <- random_na_with_truth(sample_df, na_prob = 0.3)
#'
#' # View the NA data
#' print(na_data_list$na_data)
#'
#' # View the truth data
#' print(na_data_list$truth)
#'
#' This function is not exported as it was used to test and provide examples in
#' the vignettes, it may be formally introduced in the future.
random_na_with_truth <- function(df, na_prob = 0.3) {
# Create a copy of the original dataframe to store truth values
truth_df <- df

# Create a mask of NAs (TRUE = becomes NA)
na_mask <- matrix(
sample(
c(TRUE, FALSE),
size = nrow(df) * ncol(df),
replace = TRUE,
prob = c(na_prob, 1 - na_prob)
),
nrow = nrow(df)
)

# Apply the mask to create NA values
na_df <- df
na_df[na_mask] <- NA

# Return both the NA-filled dataframe and the truth
list(
na_data = na_df,
truth = truth_df
)
}
Loading
Loading