Title: | Inferring Group Bayesian Networks using Hierarchical Feature Clustering |
---|---|
Description: | Group Bayesian Networks: This package implements the inference of group Bayesian networks based on hierarchical feature clustering, and the adaptive refinement of the grouping regarding an outcome of interest, as described in Becker et. al (2021) <doi: 10.1371/journal.pcbi.1008735>. |
Authors: | Ann-Kristin Becker [aut, cre], Lars Kaderali [aut, ths] |
Maintainer: | Ann-Kristin Becker <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2.0 |
Built: | 2025-02-22 03:40:11 UTC |
Source: | https://github.com/cran/GroupBN |
Calculates the weighted cross entropy / log-loss for a vector of observations and predicted probabilities (weighted by class proportions)
cross.en(pred, obs, sdpred=NULL, weighted=T)
cross.en(pred, obs, sdpred=NULL, weighted=T)
pred |
a numeric vector, the predicted probabilities of the reference class |
obs |
the vector of observations, a categorical variable with 2-4 levels |
sdpred |
either NULL or a vector containing the standard deviations of every estimate |
weighted |
a boolean, if FALSE, the unweighted logloss is calculated. By default, the weighted cross entropy is calculated. |
if sdpred contains the standard deviations for each estimated probability, then a lower bound of the log loss is returned.
a numeric value: cross entropy / log loss for comparison of classifiers. The smaller, the better.
Ann-Kristin Becker
#observations obs<-as.factor(c("A","A","B")) #correct prediction pred1<-c(1,1,0) #wrong prediction pred2<-c(0,0,1) cross.en(pred=pred1, obs=obs) #small cross.en(pred=pred2, obs=obs) #large #prediction of only majority class pred3<-c(1,1,1) #prediction of only minority class pred4<-c(0,0,0) cross.en(pred=pred3, obs=obs, weighted=TRUE) cross.en(pred=pred4, obs=obs, weighted=TRUE) #both equal (as weighted) cross.en(pred=pred3, obs=obs, weighted=FALSE) cross.en(pred=pred4, obs=obs, weighted=FALSE) #unweighted, majority class is favored
#observations obs<-as.factor(c("A","A","B")) #correct prediction pred1<-c(1,1,0) #wrong prediction pred2<-c(0,0,1) cross.en(pred=pred1, obs=obs) #small cross.en(pred=pred2, obs=obs) #large #prediction of only majority class pred3<-c(1,1,1) #prediction of only minority class pred4<-c(0,0,0) cross.en(pred=pred3, obs=obs, weighted=TRUE) cross.en(pred=pred4, obs=obs, weighted=TRUE) #both equal (as weighted) cross.en(pred=pred3, obs=obs, weighted=FALSE) cross.en(pred=pred4, obs=obs, weighted=FALSE) #unweighted, majority class is favored
density approximative discretization. Significant peaks in the density are determined and used as starting points for k-means based discretization. If only one peak is present, distribution quartiles are used for binning.
discretize.dens(data, graph=F, title="Density-approxmative Discretization", rename.level=F, return.all=T, cluster=F, seed=NULL)
discretize.dens(data, graph=F, title="Density-approxmative Discretization", rename.level=F, return.all=T, cluster=F, seed=NULL)
data |
a vector containing the data that may be discretized |
graph |
a boolean value, if TRUE, the density and the determined binning are plotted |
title |
a title for the plot |
rename.level |
a boolean value, if TRUE, factor levels are replaced by integers 1:n |
return.all |
a boolean value, if FALSE, only the discretized data are returned. |
cluster |
a boolean value, if data is a cluster variable and may already be discrete or not |
seed |
a random seed number |
discretized |
the discretized data |
levels |
the factor levels |
optima |
the x and y coordinates of the determined peaks |
Ann-Kristin Becker
testdata = c(rnorm(100,-3,1), rnorm(100,3,1)) d<-discretize.dens(testdata, graph=TRUE) summary(d$discretized)
testdata = c(rnorm(100,-3,1), rnorm(100,3,1)) d<-discretize.dens(testdata, graph=TRUE) summary(d$discretized)
creates groupbn object (determines an initial clustering based on a hierarchy with target variable and 'separated' variables separated, learns a Bayesian network from grouped data and saves discretization and pca parameters)
groupbn(hierarchy, k, target, separate=NULL, separate.as.roots=FALSE, X.quanti=NULL, X.quali=NULL, struct.alg="hc", boot=TRUE, discretize=TRUE, arc.thresh=NULL, debug=FALSE, R=100, seed=NULL)
groupbn(hierarchy, k, target, separate=NULL, separate.as.roots=FALSE, X.quanti=NULL, X.quali=NULL, struct.alg="hc", boot=TRUE, discretize=TRUE, arc.thresh=NULL, debug=FALSE, R=100, seed=NULL)
hierarchy |
a cluster object from ClustOfVar. |
k |
a positive integer number, the number of initial clusters. |
target |
a string, the name of the target variable. |
separate |
a vector of strings, names of variables that should be separated from the groups, such as age, sex,... |
separate.as.roots |
a boolean; if TRUE separated variables are used as roots in the network. Can be ignored if separate is empty. |
X.quanti |
a numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns). |
X.quali |
a categorical matrix of data, or an object that can be coerced to such a matrix (such as a character vector, a factor or a data frame with all factor columns). |
struct.alg |
structure learning algorithm according to bnlearn |
arc.thresh |
threshold for bootstrap arcs |
discretize |
a boolean, if a network variables should be discretized before network learning |
boot |
boolean, if TRUE, a bootstrap based network averaging approach is used |
debug |
a boolean, if TRUE, debugging messages are printed |
R |
number of bootstrap replicates for model averaging, default is 100 |
seed |
a random seed number |
an object of class groupbn
bn |
a Bayesian Network structure of bn class from bnlearn. |
fit |
a Bayesian Network with fitted parameters of bn.fit class from bnlearn. |
X.quanti |
a data.frame containing only the quantitative variables. |
X.quali |
a data.frame containing only the qualitative variables. |
grouping |
a vector of positive integers, giving the cluster assignment. |
k |
the number of clusters. |
group.data |
a data.frame containing the cluster representants. |
target |
a string, the name of the target variable. |
separate |
a vector of strings, names of variables that should be separated from the groups. |
pca.param |
the PCAmix used to determine the cluster representants. |
disc.param |
the cutpoints used to discretize the cluster representants. |
score |
Different prediction scores for the target variable using the fitted network. |
Ann-Kristin Becker
Becker A-K, Dörr M, Felix SB, Frost F, Grabe HJ, Lerch MM, et al. (2021) From heterogeneous healthcare data to disease-specific biomarker networks: A hierarchical Bayesian network approach. PLoS Comput Biol 17(2): e1008735. https://doi.org/10.1371/journal.pcbi.1008735
#load example data data(wine) wine.test<-wine[wine$Soil%in%c("Reference", "Env1"),1:29] wine.test$Soil<-factor(wine.test$Soil) levels(wine.test$Soil)<-c("0", "1") #cluster data hierarchy<-hclustvar(X.quanti=wine.test[,3:29], X.quali=wine.test[,1:2]) #Learn group network among 5 clusters with "Soil" as target variable wine.groupbn<-groupbn(hierarchy, k=5, target="Soil", separate=NULL, X.quanti=wine.test[,3:29], X.quali=wine.test[,1:2], seed=321) #Plot network plot(wine.groupbn)
#load example data data(wine) wine.test<-wine[wine$Soil%in%c("Reference", "Env1"),1:29] wine.test$Soil<-factor(wine.test$Soil) levels(wine.test$Soil)<-c("0", "1") #cluster data hierarchy<-hclustvar(X.quanti=wine.test[,3:29], X.quali=wine.test[,1:2]) #Learn group network among 5 clusters with "Soil" as target variable wine.groupbn<-groupbn(hierarchy, k=5, target="Soil", separate=NULL, X.quanti=wine.test[,3:29], X.quali=wine.test[,1:2], seed=321) #Plot network plot(wine.groupbn)
Based on a GroupBN, a cluster can be selected manually, that is split and the refined model is learned.
groupbn_refine_manually(res, hierarchy, refine, arc.thresh=NULL, R=100, debug=FALSE, seed=NULL)
groupbn_refine_manually(res, hierarchy, refine, arc.thresh=NULL, R=100, debug=FALSE, seed=NULL)
res |
an object of class groupbn |
hierarchy |
a cluster object from ClustOfVar |
refine |
name of group to be refined |
arc.thresh |
threshold for bootstrap arcs |
R |
number of bootstrap replicates for model averaging, default is 100 |
debug |
a boolean, if TRUE, debugging messages are printed |
seed |
a random seed number |
returns an object of class groupbn
Ann-Kristin Becker
#load example data data(wine) wine.test<-wine[wine$Soil%in%c("Reference", "Env1"),1:29] wine.test$Soil<-factor(wine.test$Soil) levels(wine.test$Soil)<-c("0", "1") #cluster data hierarchy<-hclustvar(X.quanti=wine.test[,3:29], X.quali=wine.test[,1:2]) #Learn group network among 5 clusters with "Soil" as target variable wine.groupbn<-groupbn(hierarchy, k=5, target="Soil", separate=NULL, X.quanti=wine.test[,3:29], X.quali=wine.test[,1:2], seed=321) #Refine cluster 2 wine.groupbn.refined<-groupbn_refine_manually(wine.groupbn, hierarchy, refine = "cl2", seed=321) #Plot refined network plot(wine.groupbn.refined)
#load example data data(wine) wine.test<-wine[wine$Soil%in%c("Reference", "Env1"),1:29] wine.test$Soil<-factor(wine.test$Soil) levels(wine.test$Soil)<-c("0", "1") #cluster data hierarchy<-hclustvar(X.quanti=wine.test[,3:29], X.quali=wine.test[,1:2]) #Learn group network among 5 clusters with "Soil" as target variable wine.groupbn<-groupbn(hierarchy, k=5, target="Soil", separate=NULL, X.quanti=wine.test[,3:29], X.quali=wine.test[,1:2], seed=321) #Refine cluster 2 wine.groupbn.refined<-groupbn_refine_manually(wine.groupbn, hierarchy, refine = "cl2", seed=321) #Plot refined network plot(wine.groupbn.refined)
Adaptive Refinement of a group Bayesian Network using hierarchical Clustering
groupbn_refinement(res, hierarchy, refinement.part="mb", restart=0, perturb=1, max.step=10, max.min=Inf, R=100, return.all=FALSE, arc.thresh=NULL, debug=FALSE, seed=NULL)
groupbn_refinement(res, hierarchy, refinement.part="mb", restart=0, perturb=1, max.step=10, max.min=Inf, R=100, return.all=FALSE, arc.thresh=NULL, debug=FALSE, seed=NULL)
res |
an object of class groupbn |
hierarchy |
a cluster object from ClustOfVar |
refinement.part |
"mb", "mb2", "arc.confid" or "all", selects if the refinement steps should be done only within the markov blanket of the target variable (mb), within the second-order markov blanket (mb2), in all clusters with an arcconfidence to target >0 (arc.confid) or within all clusters (all). Default: "mb" |
restart |
a positive integer number, the number of restarts |
perturb |
a positive integer number, the number of perturbations (splits) in each restart |
max.step |
a positive integer number, the maximal number of refinement steps, default is 10 |
max.min |
a positive integer number, the maximal run time in minutes, default is unlimited |
R |
number of bootstrap replicates for model averaging, default is 100 |
return.all |
a boolean, if TRUE, the output is a whole list of group models, if FALSE, the output is only the best-scoring model. |
arc.thresh |
threshold for bootstrap arcs |
debug |
a boolean, if TRUE, debugging messages are printed |
seed |
a random seed number |
Based on a variable grouping, data are aggregated and a Bayesian network is learned. The target variable is kept separated during this procedure, so that the resulting network model can be used for risk prediction and classification. Starting from a coarse group network, groups are iteratively refined to smaller groups. The heuristic refinement happens downwards along the dendrogram, and stops, if it no longer improves the predictive performance of the model. The refinement part is implemented using a hill-climbing procedure.
returns an object of class groupbn
Ann-Kristin Becker
Becker A-K, Dörr M, Felix SB, Frost F, Grabe HJ, Lerch MM, et al. (2021) From heterogeneous healthcare data to disease-specific biomarker networks: A hierarchical Bayesian network approach. PLoS Comput Biol 17(2): e1008735. https://doi.org/10.1371/journal.pcbi.1008735
#load example data data(wine) wine.test<-wine[wine$Soil%in%c("Reference", "Env1"),1:29] wine.test$Soil<-factor(wine.test$Soil) levels(wine.test$Soil)<-c("0", "1") #cluster data hierarchy<-hclustvar(X.quanti=wine.test[,3:29], X.quali=wine.test[,1:2]) #Learn group network among 5 clusters with "Soil" as target variable wine.groupbn<-groupbn(hierarchy, k=5, target="Soil", separate=NULL, X.quanti=wine.test[,3:29], X.quali=wine.test[,1:2], seed=321) #Do one refinement step #Set max.step higher to optimize completely wine.groupbn.refined<-groupbn_refinement(wine.groupbn, hierarchy, refinement.part="mb", max.step = 1, seed=321) #Plot refined network plot(wine.groupbn.refined)
#load example data data(wine) wine.test<-wine[wine$Soil%in%c("Reference", "Env1"),1:29] wine.test$Soil<-factor(wine.test$Soil) levels(wine.test$Soil)<-c("0", "1") #cluster data hierarchy<-hclustvar(X.quanti=wine.test[,3:29], X.quali=wine.test[,1:2]) #Learn group network among 5 clusters with "Soil" as target variable wine.groupbn<-groupbn(hierarchy, k=5, target="Soil", separate=NULL, X.quanti=wine.test[,3:29], X.quali=wine.test[,1:2], seed=321) #Do one refinement step #Set max.step higher to optimize completely wine.groupbn.refined<-groupbn_refinement(wine.groupbn, hierarchy, refinement.part="mb", max.step = 1, seed=321) #Plot refined network plot(wine.groupbn.refined)
Create an output table with clusters and included variables with similarity scores
groupbn.output.table(res, with.scores=TRUE)
groupbn.output.table(res, with.scores=TRUE)
res |
gn object |
with.scores |
if TRUE, similarity scores of every cluster member to the cluster center are added to the table |
a table with one column per group, similarity scores to cluster centers are calculated for each variable
Ann-Kristin Becker
data("wine.groupbn.refined") df<-groupbn.output.table(wine.groupbn.refined)
data("wine.groupbn.refined") df<-groupbn.output.table(wine.groupbn.refined)
Create an interactive html network object with visNet (displaying similarity scores and number of variables in a score)
groupbn.vis.html.plot(res, df=NULL, save.file=TRUE, save.name=NULL, hierarchical=FALSE, nodecolor.all="#E0F3F8", nodecolor.special="cornflowerblue", main=NULL)
groupbn.vis.html.plot(res, df=NULL, save.file=TRUE, save.name=NULL, hierarchical=FALSE, nodecolor.all="#E0F3F8", nodecolor.special="cornflowerblue", main=NULL)
res |
a groupbn object |
df |
output from output.table if already calculated, otherwise the same table is calculated internally |
save.file |
boolean; if TRUE a html file is produced |
save.name |
name for saving html object, date is additionally used |
hierarchical |
boolean; if TRUE the network is plotted with a hierarchical layout |
nodecolor.all |
a color for "normal" nodes |
nodecolor.special |
a color for the target variable and all separated nodes, if any. |
main |
optionally a title for the plot |
Plots an interactive network plot using visNetwork package
an html widget of class visNetwork
Ann-Kristin Becker
data("wine.groupbn.refined") groupbn.vis.html.plot(wine.groupbn.refined, hierarchical=TRUE, save.file=FALSE)
data("wine.groupbn.refined") groupbn.vis.html.plot(wine.groupbn.refined, hierarchical=TRUE, save.file=FALSE)
Generic function for groupbn objects
is.groupbn(x)
is.groupbn(x)
x |
an object of class groupbn |
A boolean; TRUE if x is of class groupbn, FALSE otherwise.
Ann-Kristin Becker
data("wine.groupbn.refined") is.groupbn(wine.groupbn.refined)
data("wine.groupbn.refined") is.groupbn(wine.groupbn.refined)
generic plot function for class groupbn
## S3 method for class 'groupbn' plot(x, ...)
## S3 method for class 'groupbn' plot(x, ...)
x |
an object of class groupbn |
... |
further arguments |
Plot the group bayesian network structure
No return value, called for plotting
Ann-Kristin Becker
data("wine.groupbn.refined") plot(wine.groupbn.refined)
data("wine.groupbn.refined") plot(wine.groupbn.refined)
Predict the target variable from a group Bayesian network
## S3 method for class 'groupbn' predict(object, X.quanti, X.quali, rename.level=FALSE, return.data=FALSE, new.fit=FALSE, debug=FALSE, ...)
## S3 method for class 'groupbn' predict(object, X.quanti, X.quali, rename.level=FALSE, return.data=FALSE, new.fit=FALSE, debug=FALSE, ...)
object |
An object of class groupbn generated by the functions groupbn or groupbn_refinement |
X.quanti |
quantitative variables |
X.quali |
qualitative variables |
rename.level |
a boolean; if TRUE, all levels of categorical variables are renamed by integers. Default is FALSE. |
return.data |
a boolean; if TRUE, a list with predictions and group.data is returned instead of only predicitions. Default is FALSE. |
new.fit |
a boolean; if TRUE, the parameters are newly fit using the test data. |
debug |
a boolean, if TRUE, debugging messages are printed |
... |
further arguments |
Returns a dataframe with a column of predictions and a column of the target data. If the target is discrete, class probabilities are returned. Otherwise continuous scores are returned. If return.data is TRUE, additionally the transformed group data are returned.
Ann-Kristin Becker
#load example data data(wine) wine.test<-wine[wine$Soil%in%c("Reference", "Env1"),1:29] wine.test$Soil<-factor(wine.test$Soil) levels(wine.test$Soil)<-c("0", "1") data(wine.groupbn.refined) predict(wine.groupbn.refined, X.quanti=wine.test[,3:29], X.quali=wine.test[,1:2])
#load example data data(wine) wine.test<-wine[wine$Soil%in%c("Reference", "Env1"),1:29] wine.test$Soil<-factor(wine.test$Soil) levels(wine.test$Soil)<-c("0", "1") data(wine.groupbn.refined) predict(wine.groupbn.refined, X.quanti=wine.test[,3:29], X.quali=wine.test[,1:2])
This is a method for the function print for objects of the class groupbn.
## S3 method for class 'groupbn' print(x, ...)
## S3 method for class 'groupbn' print(x, ...)
x |
An object of class groupbn generated by the functions groupbn or groupbn_refinement |
... |
further arguments |
No return value, prints a description of the object
Ann-Kristin Becker
data("wine.groupbn.refined") print(wine.groupbn.refined)
data("wine.groupbn.refined") print(wine.groupbn.refined)
A refined group Bayesian network with 8 groups learned from dataset 'wine'.
data("wine.groupbn.refined")
data("wine.groupbn.refined")
group Bayesian network (class 'groupbn')
name of target variable: Soil number of groups: 8 achieved scoring: F1: 0.92 ; Precision: 1 ; Recall: 0.86 ; AUC-PR: 1 ; AUC-ROC: 1 ; cross-entr.: 1.43; BIC (netw.): -77.21
name description "$bn" "Bayesian network structure" "$fit" "fitted Bayesian network (multinomial)" "$arc.confid" "arc confidence" "$X.quali" "qualitative variables in a data.frame" "$X.quanti" "quantitative variables in a data.frame" "$grouping" "group memberships" "$k" "number of groups of initial grouping" "$group.data" "group representatives used for network inference" "$target" "name of target variable" "$separate" "name of any other separated variables" "$pca.param" "pca parameters of each group" "$disc.param" "discretization intervals of each group" "$score" "cross entropy and additional scoring information"
data(wine.groupbn.refined)
data(wine.groupbn.refined)