| Title: | A Package for Graphical Modelling in R |
|---|---|
| Description: | The 'gRbase' package provides graphical modelling features used by e.g. the packages 'gRain', 'gRim' and 'gRc'. 'gRbase' implements graph algorithms including (i) maximum cardinality search (for marked and unmarked graphs). (ii) moralization, (iii) triangulation, (iv) creation of junction tree. 'gRbase' facilitates array operations, 'gRbase' implements functions for testing for conditional independence. 'gRbase' illustrates how hierarchical log-linear models may be implemented and describes concept of graphical meta data. The facilities of the package are documented in the book by Højsgaard, Edwards and Lauritzen (2012, <doi:10.1007/978-1-4614-2299-0>) and in the paper by Dethlefsen and Højsgaard, (2005, <doi:10.18637/jss.v014.i17>). Please see 'citation("gRbase")' for citation details. |
| Authors: | Søren Højsgaard [aut, cre] |
| Maintainer: | Søren Højsgaard <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 2.0.3.9001 |
| Built: | 2026-05-14 08:22:41 UTC |
| Source: | https://github.com/hojsgaard/grbase |
Create all possible pairs of two character vectors.
all_pairs(x, y = character(0), sort = FALSE, result = "matrix") names2pairs(x, y = NULL, sort = TRUE, result = "list")all_pairs(x, y = character(0), sort = FALSE, result = "matrix") names2pairs(x, y = NULL, sort = TRUE, result = "list")
x, y
|
Character vectors. |
sort |
Logical. |
result |
A list or a matrix. |
NOTICE: If y is not NULL then x and y must be disjoint (no checks are made); otherwise pairs of identical elements wil also be obtained.
Søren Højsgaard, [email protected]
x <- letters[1:4] y <- letters[5:7] all_pairs(x) all_pairs(x, result="matrix") all_pairs(x, y) all_pairs(x, y, result="matrix")x <- letters[1:4] y <- letters[5:7] all_pairs(x) all_pairs(x, result="matrix") all_pairs(x, y) all_pairs(x, y, result="matrix")
Create all subsets of a vector
all_subsets(x) all_subsets0(x)all_subsets(x) all_subsets0(x)
x |
Vector |
Søren Højsgaard, [email protected]
Array operations; created to facilitate the gRain package in 2007. Now largely replaceable by other (often faster) functions implemented in Rcpp.
tablePerm(tab, perm, resize = TRUE, keep.class = FALSE) tableMult(tab1, tab2) tableDiv(tab1, tab2) tableOp(tab1, tab2, op = "*") tableOp2(tab1, tab2, op = `*`, restore = FALSE) tableOp0(tab1, tab2, op = `*`) tableSlice(tab, margin, level, impose) tableSlicePrim(tab, mar.idx, lev.idx) tableMargin(tab, margin, keep.class = FALSE) tableGetSliceIndex(tab, margin, level, complement = FALSE) tableSetSliceValue(tab, margin, level, complement = FALSE, value = 0)tablePerm(tab, perm, resize = TRUE, keep.class = FALSE) tableMult(tab1, tab2) tableDiv(tab1, tab2) tableOp(tab1, tab2, op = "*") tableOp2(tab1, tab2, op = `*`, restore = FALSE) tableOp0(tab1, tab2, op = `*`) tableSlice(tab, margin, level, impose) tableSlicePrim(tab, mar.idx, lev.idx) tableMargin(tab, margin, keep.class = FALSE) tableGetSliceIndex(tab, margin, level, complement = FALSE) tableSetSliceValue(tab, margin, level, complement = FALSE, value = 0)
tab, tab1, tab2
|
Arrays with named dimnames. |
perm |
A permutation; either indices or names. |
resize |
A flag indicating whether the vector should be resized as well as having its elements reordered (default TRUE). |
keep.class |
Obsolete argument. |
op |
The operation; choices are |
restore |
Not so clear anymore. |
margin |
Index or name of margin. |
level |
Corresponding level of margin. |
impose |
Value to be imposed. |
mar.idx |
Index of margin |
lev.idx |
Index of level |
complement |
Should values be set for the complement? |
value |
Which value should be set |
A multidimensional table of numbers is represented by a multidimensional array, so we can use the terms 'table' and 'array' interchangeably. In this context, 'table' refers specifically to numerical data structured in multiple dimensions, similar to how arrays are used in programming. An alternative representation of a multidimensional table would be as a dataframe.
tableOp0 is brute force implementation based on
dataframes. It is very slow, but useful for error checking.
Addition, subtraction etc. of arrays
a1 %a+% a2 a1 %a-% a2 a1 %a*% a2 a1 %a/% a2 a1 %a/0% a2 tab1 %a_% marg tab1 %a==% tab2 tab1 %a^% extra tab1 %aperm% perm tab1 %aalign% tab2 tab1 %aslice% slice tab1 %aslice*% slice tab1 %amarg% marga1 %a+% a2 a1 %a-% a2 a1 %a*% a2 a1 %a/% a2 a1 %a/0% a2 tab1 %a_% marg tab1 %a==% tab2 tab1 %a^% extra tab1 %aperm% perm tab1 %aalign% tab2 tab1 %aslice% slice tab1 %aslice*% slice tab1 %amarg% marg
tab1, tab2
|
Multidimensional arrays with named dimnames (we call them 'named arrays'). |
marg |
A vector of indices or dimnames or a right hand sided formula giving the desired marginal. |
extra |
List defining the extra dimensions. |
perm |
A vector of indices or dimnames or a right hand sided formula giving the desired permutiation. |
slice |
A list of the form name=value. |
a, a1, a2
|
Arrays (with named dimnames) |
Søren Højsgaard, [email protected]
hec <- HairEyeColor a1 <- tab_marg(hec, c("Hair", "Eye")) a2 <- tab_marg(hec, c("Hair", "Sex")) a3 <- tab_marg(hec, c("Eye", "Sex")) ## Binary operations a1 %a+% a2 a1 %a-% a2 a1 %a*% a2 a1 %a/% a2hec <- HairEyeColor a1 <- tab_marg(hec, c("Hair", "Eye")) a2 <- tab_marg(hec, c("Hair", "Sex")) a3 <- tab_marg(hec, c("Eye", "Sex")) ## Binary operations a1 %a+% a2 a1 %a-% a2 a1 %a*% a2 a1 %a/% a2
Marginalize and condition in a multidimensional array which is assumed to represent a discrete multivariate distribution.
tabDist(tab, marg = NULL, cond = NULL, normalize = TRUE)tabDist(tab, marg = NULL, cond = NULL, normalize = TRUE)
tab |
Multidimensional array with dimnames. |
marg |
A specification of the desired margin; a character vector, a numeric vector or a right hand sided formula. |
cond |
A specification of what is conditioned on. Can take two forms: Form one is a a character vector, a numeric vector or a right hand sided formula. Form two is as a simple slice of the array, which is a list of the form var1=value1, var2=value2 etc. |
normalize |
Should the result be normalized to sum to 1. |
A multidimensional array.
Søren Højsgaard, [email protected]
hec <- HairEyeColor is_named_array( hec ) ## We need dimnames, and names on the dimnames ## Marginalize: tabDist(hec, marg= ~Hair:Eye) tabDist(hec, marg= c("Hair", "Eye")) tabDist(hec, marg= 1:2) tabDist(hec, marg= ~Hair + Eye, normalize=FALSE) ## Condition tabDist(hec, cond= ~Sex + Hair) tabDist(hec, cond= ~Sex:Hair) tabDist(hec, cond= c("Sex", "Hair")) tabDist(hec, cond= c(3,1)) tabDist(hec, cond= list(Hair="Black")) tabDist(hec, cond= list(Hair=1)) ## Not run: ## This will fail tabDist(hec, cond= list(Hair=c("Black", "Brown"))) tabDist(hec, cond= list(Hair=1:2)) ## End(Not run) ## But this will do the trick a <- tab_slice(hec, slice=list(Hair=c("Black", "Brown"))) tabDist(a, cond=~Hair) ## Combined tabDist(hec, marg=~Hair+Eye, cond=~Sex) tabDist(hec, marg=~Hair+Eye, cond="Sex") tabDist(hec, marg=~Hair+Eye, cond=list(Sex="Male")) tabDist(hec, marg=~Hair+Eye, cond=list(Sex="Male"), normalize=FALSE) tabDist(hec, cond=list(Sex="Male")) tabDist(hec, cond=list(Sex="Male"), normalize=FALSE)hec <- HairEyeColor is_named_array( hec ) ## We need dimnames, and names on the dimnames ## Marginalize: tabDist(hec, marg= ~Hair:Eye) tabDist(hec, marg= c("Hair", "Eye")) tabDist(hec, marg= 1:2) tabDist(hec, marg= ~Hair + Eye, normalize=FALSE) ## Condition tabDist(hec, cond= ~Sex + Hair) tabDist(hec, cond= ~Sex:Hair) tabDist(hec, cond= c("Sex", "Hair")) tabDist(hec, cond= c(3,1)) tabDist(hec, cond= list(Hair="Black")) tabDist(hec, cond= list(Hair=1)) ## Not run: ## This will fail tabDist(hec, cond= list(Hair=c("Black", "Brown"))) tabDist(hec, cond= list(Hair=1:2)) ## End(Not run) ## But this will do the trick a <- tab_slice(hec, slice=list(Hair=c("Black", "Brown"))) tabDist(a, cond=~Hair) ## Combined tabDist(hec, marg=~Hair+Eye, cond=~Sex) tabDist(hec, marg=~Hair+Eye, cond="Sex") tabDist(hec, marg=~Hair+Eye, cond=list(Sex="Male")) tabDist(hec, marg=~Hair+Eye, cond=list(Sex="Male"), normalize=FALSE) tabDist(hec, cond=list(Sex="Male")) tabDist(hec, cond=list(Sex="Male"), normalize=FALSE)
Alternative ways of creating tables (arrays)
tab_new(names, levels, values, normalize = "none", smooth = 0) tabNew(names, levels, values, normalize = "none", smooth = 0)tab_new(names, levels, values, normalize = "none", smooth = 0) tabNew(names, levels, values, normalize = "none", smooth = 0)
names |
Names of variables defining table; either a character vector or a right hand sided formula. |
levels |
|
values |
values to go into the array. |
normalize |
Either "none", "first" or "all". Should result be normalized, see 'Details' below. |
smooth |
Should values be smoothed, see 'Details' below. |
A multidimensional table of numbers is represented by a multidimensional array, so we can use the terms 'table' and 'array' interchangeably. In this context, 'table' refers specifically to numerical data structured in multiple dimensions, similar to how arrays are used in programming. An alternative representation of a multidimensional table would be as a dataframe.
If normalize="first" then for each configuration of all
other variables than the first, the probabilities are
normalized to sum to one. Thus f(a, b, c) becomes a
conditional probability table of the form p(a | b, c).
If normalize="all" then the sum over all entries of
f(a,b,c) is one.
If smooth is positive then smooth is added to
values BEFORE normalization takes place.
An array.
Søren Højsgaard, [email protected]
universe <- list(gender=c('male', 'female'), answer=c('yes', 'no'), rain=c('yes', 'no')) t1 <- tab_new(c("gender", "answer"), levels=universe, values=1:4) t1 t2 <- tab_new(~gender:answer, levels=universe, values=1:4) t2 t3 <- tab_new(~gender:answer, c(2, 2), values=1:4) t3universe <- list(gender=c('male', 'female'), answer=c('yes', 'no'), rain=c('yes', 'no')) t1 <- tab_new(c("gender", "answer"), levels=universe, values=1:4) t1 t2 <- tab_new(~gender:answer, levels=universe, values=1:4) t2 t3 <- tab_new(~gender:answer, c(2, 2), values=1:4) t3
Functions for extracting slices of arrays
tab_slice( tab, slice = NULL, margin = names(slice), drop = TRUE, as.array = FALSE ) tab_slice_prim(tab, slice, drop = TRUE) tab_slice_mult(tab, slice, val = 1, comp = 0) tab_slice_to_entries(tab, slice, complement = FALSE) tabSliceMult(tab, slice, val = 1, comp = 0) tabSlice( tab, slice = NULL, margin = names(slice), drop = TRUE, as.array = FALSE )tab_slice( tab, slice = NULL, margin = names(slice), drop = TRUE, as.array = FALSE ) tab_slice_prim(tab, slice, drop = TRUE) tab_slice_mult(tab, slice, val = 1, comp = 0) tab_slice_to_entries(tab, slice, complement = FALSE) tabSliceMult(tab, slice, val = 1, comp = 0) tabSlice( tab, slice = NULL, margin = names(slice), drop = TRUE, as.array = FALSE )
tab |
An array with named dimnames. |
slice |
A list defining the slice. |
margin |
Names of variables in slice. |
drop |
If TRUE then dimensions with only one level will be dropped from the output. |
as.array |
If the resulting array is one-dimensional the result will by default be a vector with no dim attribute unless as.array is TRUE. |
val |
The values that entries in the slice will be multiplied with. |
comp |
The values that entries NOT in the slice will be multiplied with. |
complement |
If TRUE the complement of the entries are returned. |
Søren Højsgaard, [email protected]
x = HairEyeColor s = list(Hair=c("Black", "Brown"), Eye=c("Brown", "Blue")) s1 = tab_slice(x, slice=s); s1 tab_slice_to_entries(x, slice=s) tab_slice_to_entries(x, slice=s, complement=TRUE) ## tab_slice_mult s2 = tab_slice_mult(x, slice=s); s2 sp = list(c(1,2), c(1,2), TRUE) tab_slice_prim(x, slice=sp) tab_slice(x, slice=s)x = HairEyeColor s = list(Hair=c("Black", "Brown"), Eye=c("Brown", "Blue")) s1 = tab_slice(x, slice=s); s1 tab_slice_to_entries(x, slice=s) tab_slice_to_entries(x, slice=s, complement=TRUE) ## tab_slice_mult s2 = tab_slice_mult(x, slice=s); s2 sp = list(c(1,2), c(1,2), TRUE) tab_slice_prim(x, slice=sp) tab_slice(x, slice=s)
Check if object is array (that it is a vector with a dim attribute) and that the object has dimnames and that dimnames are named.
is_named_array(obj) is.named.array(obj) is_named_array_(obj) is_number_vector_(obj) is_dimnames_(obj) dimnames_match(tab1, tab2)is_named_array(obj) is.named.array(obj) is_named_array_(obj) is_number_vector_(obj) is_dimnames_(obj) dimnames_match(tab1, tab2)
obj |
Some R object. |
tab1, tab2
|
Arrays with named dimnames. |
A multidimensional table of numbers is represented by a multidimensional array, so we can use the terms 'table' and 'array' interchangeably. In this context, 'table' refers specifically to numerical data structured in multiple dimensions, similar to how arrays are used in programming. An alternative representation of a multidimensional table would be as a dataframe.
Søren Højsgaard, [email protected]
is_named_array( HairEyeColor ) is_named_array( matrix(1:4, nrow=2) ) is_named_array_( HairEyeColor ) is_named_array_( matrix(1:4, nrow=2) ) is_number_vector_(1:4) is_number_vector_(list(1:4)) tab1 = tab_new(c("a", "b"), levels=c(2, 3)) tab2 = tab_new(c("c", "a"), levels=c(2, 2)) tab1 tab2 ## dimension a has levels tab1,tab2 in both tab1 and tab2. # Hence we have a match. dimnames_match(tab1, tab2) tab1 = tab_new(c("a", "b"), levels=c(2, 3)) tab2 = tab_new(c("c", "a"), levels=c(2, 3)) tab1 tab2 ## dimension a has levels (a1,a2) in tab1 and levels (a1,a2,a3) in tab2. # Hence we do not have a match. dimnames_match(tab1, tab2) tab2 = tab_new(c("c", "a"), levels=list(c=c("c1", "c2"), a=c("a2", "a1"))) tab2 ## dimension a has levels (a1,a2) in tab1 and levels (a2,a1) in tab2. # Hence we do not have a match. dimnames_match(tab1, tab2)is_named_array( HairEyeColor ) is_named_array( matrix(1:4, nrow=2) ) is_named_array_( HairEyeColor ) is_named_array_( matrix(1:4, nrow=2) ) is_number_vector_(1:4) is_number_vector_(list(1:4)) tab1 = tab_new(c("a", "b"), levels=c(2, 3)) tab2 = tab_new(c("c", "a"), levels=c(2, 2)) tab1 tab2 ## dimension a has levels tab1,tab2 in both tab1 and tab2. # Hence we have a match. dimnames_match(tab1, tab2) tab1 = tab_new(c("a", "b"), levels=c(2, 3)) tab2 = tab_new(c("c", "a"), levels=c(2, 3)) tab1 tab2 ## dimension a has levels (a1,a2) in tab1 and levels (a1,a2,a3) in tab2. # Hence we do not have a match. dimnames_match(tab1, tab2) tab2 = tab_new(c("c", "a"), levels=list(c=c("c1", "c2"), a=c("a2", "a1"))) tab2 ## dimension a has levels (a1,a2) in tab1 and levels (a2,a1) in tab2. # Hence we do not have a match. dimnames_match(tab1, tab2)
Low level table cell operations.
cell2entry(cell, dim) entry2cell(entry, dim) next_cell(cell, dim) next_cell2(cell, dim) next_cell_slice(cell, dim, slice_marg) slice2entry(slice_cell, slice_marg, dim) cell2entry_perm(cell, dim, perm) perm_cell_entries(perm, dim) fact_grid(dim, slice_cell = NULL, slice_marg = NULL)cell2entry(cell, dim) entry2cell(entry, dim) next_cell(cell, dim) next_cell2(cell, dim) next_cell_slice(cell, dim, slice_marg) slice2entry(slice_cell, slice_marg, dim) cell2entry_perm(cell, dim, perm) perm_cell_entries(perm, dim) fact_grid(dim, slice_cell = NULL, slice_marg = NULL)
cell |
Vector giving the cell, e.g. c(1, 1, 2) in 3-way table. |
dim |
Vector giving array dimension, eg c(2, 2, 2). |
entry |
An entry in an array (a number indexing a vector). |
slice_marg |
Vector giving the margin of a table, eg. c(2, 3) |
slice_cell |
Vector giving the corresponding cell of marginal table, e.g. c(1, 2) |
perm |
Vector giving permutaion of array, eg. c(1, 3, 2). |
A multidimensional table of numbers is represented by a multidimensional array, so we can use the terms 'table' and 'array' interchangeably. In this context, 'table' refers specifically to numerical data structured in multiple dimensions, similar to how arrays are used in programming. An alternative representation of a multidimensional table would be as a dataframe.
di <- c(2, 2, 3) cell2entry(c(1, 1, 1), dim=di) cell2entry(c(2, 2, 3), dim=di) entry2cell(1, dim=di) entry2cell(12, dim=di) next_cell(c(1, 1, 1), dim=di) next_cell(c(2, 1, 1), dim=di) ## The first two entries are kept fixed next_cell_slice(c(2, 1, 1), dim=di, slice_marg=c(1, 2)) next_cell_slice(c(2, 1, 2), dim=di, slice_marg=c(1, 2)) ## Cell (2, 2, 1) corresponds to entry 4 cell2entry(c(2, 2, 1), dim=di) ## Same as cell2entry_perm(c(2, 2, 1), dim=di, perm=c(1, 2, 3)) ## If the table dimensions are permuted as (3, 1, 2) ## the entry becomes cell2entry_perm(c(2, 2, 1), dim=di, perm=c(3, 1, 2))di <- c(2, 2, 3) cell2entry(c(1, 1, 1), dim=di) cell2entry(c(2, 2, 3), dim=di) entry2cell(1, dim=di) entry2cell(12, dim=di) next_cell(c(1, 1, 1), dim=di) next_cell(c(2, 1, 1), dim=di) ## The first two entries are kept fixed next_cell_slice(c(2, 1, 1), dim=di, slice_marg=c(1, 2)) next_cell_slice(c(2, 1, 2), dim=di, slice_marg=c(1, 2)) ## Cell (2, 2, 1) corresponds to entry 4 cell2entry(c(2, 2, 1), dim=di) ## Same as cell2entry_perm(c(2, 2, 1), dim=di, perm=c(1, 2, 3)) ## If the table dimensions are permuted as (3, 1, 2) ## the entry becomes cell2entry_perm(c(2, 2, 1), dim=di, perm=c(3, 1, 2))
Corresponding R functions without the trailing underscore exist.
cell2entry_(cell, dim) make_plevels_(dim) entry2cell_(entry, dim) next_cell_(cell, dim) next_cell2_(cell, dim) next_cell_slice_(cell, dim, slice_marg) slice2entry_(slice_cell, slice_marg, dim) cell2entry_perm_(cell, dim, perm) perm_cell_entries_(perm, dim)cell2entry_(cell, dim) make_plevels_(dim) entry2cell_(entry, dim) next_cell_(cell, dim) next_cell2_(cell, dim) next_cell_slice_(cell, dim, slice_marg) slice2entry_(slice_cell, slice_marg, dim) cell2entry_perm_(cell, dim, perm) perm_cell_entries_(perm, dim)
cell |
Vector giving the cell, e.g. c(1, 1, 2) in 3-way table. |
dim |
Vector giving array dimension, eg c(2, 2, 2). |
entry |
An entry in an array (a number indexing a vector). |
slice_marg |
Vector giving the margin of a table, eg. c(2, 3) |
slice_cell |
Vector giving the corresponding cell of marginal table, e.g. c(1, 2) |
perm |
Vector giving permutaion of array, eg. c(1, 3, 2). |
Interface functions and minor extensions to cpp functions.
tab_add(tab1, tab2) tab_align(tab1, tab2) tab_div(tab1, tab2) tab_div0(tab1, tab2) tab_op(tab1, tab2, op = "*") tab_equal(tab1, tab2, eps = 1e-12) tabMult(tab1, tab2) tab_mult(tab1, tab2) tab_subt(tab1, tab2) tab_list_mult(lst) tab_list_add(lst) tab_expand(tab, aux, type = 0L) tab_perm(tab, perm) tab_marg(tab, marg = NULL) tabMarg(tab, marg = NULL) tab_sum(tab, ...) tab_prod(tab, ...) tab_normalize(tab, type = "none") tabDiv(tab1, tab2) tabDiv0(tab1, tab2) tabPerm(tab, perm) tabProd(tab, ...) tabNormalize(tab, type = "none") tabListMult(lst)tab_add(tab1, tab2) tab_align(tab1, tab2) tab_div(tab1, tab2) tab_div0(tab1, tab2) tab_op(tab1, tab2, op = "*") tab_equal(tab1, tab2, eps = 1e-12) tabMult(tab1, tab2) tab_mult(tab1, tab2) tab_subt(tab1, tab2) tab_list_mult(lst) tab_list_add(lst) tab_expand(tab, aux, type = 0L) tab_perm(tab, perm) tab_marg(tab, marg = NULL) tabMarg(tab, marg = NULL) tab_sum(tab, ...) tab_prod(tab, ...) tab_normalize(tab, type = "none") tabDiv(tab1, tab2) tabDiv0(tab1, tab2) tabPerm(tab, perm) tabProd(tab, ...) tabNormalize(tab, type = "none") tabListMult(lst)
op |
The algebraic operation to be carried out. |
eps |
Criterion for checking equality of two arrays. |
lst |
List of arrays. |
tab, tab1, tab2, ...
|
Arrays with named dimnames (we call them 'named arrays'). |
aux |
Either a list with names and dimnames or a named array from which such a list can be extracted. |
type |
If 0 then entries are duplicated. If 3 then averages are computed. If 2 then 0 slices are inserted. |
perm, marg
|
A vector of indices or dimnames or a right hand sided formula giving the desired permutation/margin. |
Table operations implemented in c++. Corresponding R functions without the trailing underscore exist.
tab_perm_(tab, perm) tab_expand_(tab, aux, type = 0L) tab_align_(tab1, tab2) tab_marg_(tab, marg) tab_op_(tab1, tab2, op = "*") tab_add_(tab1, tab2) tab_subt_(tab1, tab2) tab_mult_(tab1, tab2) tab_div_(tab1, tab2) tab_div0_(tab1, tab2) tab_equal_(tab1, tab2, eps = 1e-12) tab_list_mult_(lst) tab_list_add_(lst)tab_perm_(tab, perm) tab_expand_(tab, aux, type = 0L) tab_align_(tab1, tab2) tab_marg_(tab, marg) tab_op_(tab1, tab2, op = "*") tab_add_(tab1, tab2) tab_subt_(tab1, tab2) tab_mult_(tab1, tab2) tab_div_(tab1, tab2) tab_div0_(tab1, tab2) tab_equal_(tab1, tab2, eps = 1e-12) tab_list_mult_(lst) tab_list_add_(lst)
tab, tab1, tab2
|
Tables (arrays) |
perm, marg
|
A vector of indices or dimnames or a right hand sided formula giving the desired permutation/margin. |
aux |
Either a list with names and dimnames or a named array from which such a list can be extracted. |
type |
If 0 then entries are duplicated. If 3 then averages are computed. If 2 then 0 slices are inserted. |
op |
The operation to be carried out; "+", "-", "*", "/". |
eps |
Criterion for checking equality of two arrays. |
lst |
List of arrays. |
Simulate data (slice of) an array: Simulate n observations from the array x conditional on the variables in margin (a vector of indices) takes values given by margin.value
## S3 method for class 'table' simulate(object, nsim = 1, seed = NULL, margin, value.margin, ...) ## S3 method for class 'xtabs' simulate(object, nsim = 1, seed = NULL, margin, value.margin, ...) ## S3 method for class 'array' simulate(object, nsim = 1, seed = NULL, margin, value.margin, ...) simulateArray(x, nsim = 1, margin, value.margin, seed = NULL)## S3 method for class 'table' simulate(object, nsim = 1, seed = NULL, margin, value.margin, ...) ## S3 method for class 'xtabs' simulate(object, nsim = 1, seed = NULL, margin, value.margin, ...) ## S3 method for class 'array' simulate(object, nsim = 1, seed = NULL, margin, value.margin, ...) simulateArray(x, nsim = 1, margin, value.margin, seed = NULL)
nsim |
Number of cases to simulate. |
seed |
Seed to be used for random number generation. |
margin, value.margin
|
Specification of slice of array to simulate from. |
... |
Additional arguments, currently not used. |
x, object
|
An array. |
A multidimensional table of numbers is represented by a multidimensional array, so we can use the terms 'table' and 'array' interchangeably. In this context, 'table' refers specifically to numerical data structured in multiple dimensions, similar to how arrays are used in programming. An alternative representation of a multidimensional table would be as a dataframe.
A matrix.
The current implementation is fragile in the sense that it is
not checked that the input argument x is an array.
Søren Højsgaard, [email protected]
## 2x2 array x <- tab_new(c("a", "b"), levels=c(2, 2), values=1:4) ## Simulate from entire array s <- simulate_array(x, 1000) xtabs(~., as.data.frame(s)) ## Simulate from slice defined by that dimension 1 is fixed at level 2 s <-simulate_array(x, 1000, margin=1, value.margin=2) xtabs(~., as.data.frame(s)) ## 2 x 2 x 2 array x <- tab_new(c("a", "b", "c"), levels=c(2, 2, 2), values=1:8) ## Simulate from entire array s <-simulate_array(x, 36000) xtabs(~., as.data.frame(s)) ## Simulate from slice defined by that dimension 3 is fixed at level 1 s <-simulate_array(x, 10000, 3, 1) xtabs(~., as.data.frame(s))## 2x2 array x <- tab_new(c("a", "b"), levels=c(2, 2), values=1:4) ## Simulate from entire array s <- simulate_array(x, 1000) xtabs(~., as.data.frame(s)) ## Simulate from slice defined by that dimension 1 is fixed at level 2 s <-simulate_array(x, 1000, margin=1, value.margin=2) xtabs(~., as.data.frame(s)) ## 2 x 2 x 2 array x <- tab_new(c("a", "b", "c"), levels=c(2, 2, 2), values=1:8) ## Simulate from entire array s <-simulate_array(x, 36000) xtabs(~., as.data.frame(s)) ## Simulate from slice defined by that dimension 3 is fixed at level 1 s <-simulate_array(x, 10000, 3, 1) xtabs(~., as.data.frame(s))
compareModels is a generic functions which
invoke particular methods which depend on the class of the
first argument
compareModels(object, object2, ...)compareModels(object, object2, ...)
object, object2
|
Model objects |
... |
Additional arguments |
The value returned depends on the class of the first argument.
Søren Højsgaard, [email protected]
cov2pcor calculates the partial correlation
matrix from an (empirical) covariance matrix while
conc2pcor calculates the partial correlation matrix from
a concentration matrix (inverse covariance matrix).
cov2pcor(V) conc2pcor(K)cov2pcor(V) conc2pcor(K)
V |
Covariance matrix |
K |
Concentration matrix |
A matrix with the same dimension as V.
Søren Højsgaard, [email protected]
data(math) S <- cov.wt(math)$cov cov2pcor(S)data(math) S <- cov.wt(math)$cov cov2pcor(S)
A cross classified table with observational data from a Danish heart clinic. The response variable is CAD (coronary artery disease, some times called heart attack).
data(cad1)data(cad1)
A data frame with 236 observations on the following 14 variables.
SexSex; a factor with levels Female Male
AngPecAngina pectoris (chest pain attacks); a
factor with levels Atypical None Typical
AMIAcute myocardic infarct; a factor with
levels Definite NotCertain
QWaveA reading from an electrocardiogram; a
factor with levels No Yes; Yes means pathological and is a sign of previous myocardial infarction.
QWavecodea factor with levels Nonusable
Usable. An assesment of whether QWave is reliable.
STcodea factor with levels
Nonusable Usable. An assesment of whether STchange is reliable.
STchangeA reading from an electrocardiogram; a factor
with levels No Yes. An STchange indicates a blockage of the coronary artery.
SuffHeartFSufficient heart frequency; a factor with levels No, Yes
Hypertrophia factor with levels No, Yes. Hypertrophy refers to an
increased size of the heart muscle due to exercise.
Hyperchola factor with levels No Yes. Hypercholesterolemia, also called high cholesterol,
is the presence of high levels of cholesterol in the blood.
SmokerIs the patient a smoker; a factor with levels No, Yes.
InheritHereditary predispositions for CAD; a factor with levels No, Yes.
HeartfailPrevious heart failures; a factor with levels No Yes
CADCoronary Artery Disease; a factor with levels
No Yes
. CAD refers to a reduction of blood flow to the heart muscle (commonly known as a heart attack). The diagnosis made from biopsies.
Notice that data are collected at a heart clinic, so data do not represent the population, but are conditional on patients having ended up at the clinic.
cad1: Complete dataset, 236 cases.
cad2: Incomplete dataset, 67 cases. Information on (some of) the variables 'Hyperchol', 'Smoker' and 'Inherit' is missing.
Hansen, J. F. (1980). The clinical diagnoisis of ichaeme heart disease du to coronary artery disease. Danish Medical Bulletin
Højsgaard, Søren and Thiesson, Bo (1995). BIFROST - Block recursive models Induced From Relevant knowledge, Observations and Statistical Techniques. Computational Statistics and Data Analysis, vol. 19, p. 155-175
data(cad1) ## maybe str(cad1) ; plot(cad1) ...data(cad1) ## maybe str(cad1) ; plot(cad1) ...
Utilities for data handling
valueLabels(x) ## Default S3 method: valueLabels(x) varNames(x) ## Default S3 method: varNames(x) nLevels(x) ## Default S3 method: nLevels(x)valueLabels(x) ## Default S3 method: valueLabels(x) varNames(x) ## Default S3 method: varNames(x) nLevels(x) ## Default S3 method: nLevels(x)
x |
Data, typically a dataframe. |
This dataset comes from a study of symptoms of crown dieback, cankers and symptoms caused by other pathogens and pests in ash trees (Fraxinus excelsior). In all 454 trees were observed in two plots. There are 8 categorical variables, 6 of which are binary and two are trichotomous with values representing increasing severity of symptoms, and one continuous variable, tree diameter at breast height (DBH).
data(ashtrees)data(ashtrees)
A data frame with 454 observations on the following 9 variables.
plota factor with levels 2 6
diebacka factor with levels 0 1 2
dead50a factor with levels 0 0.5 1
bushya factor with levels 0 1
cankera factor with levels BRNCH MAIN NONE
wilta factor with levels 0 1
rosesa factor with levels 0 1
discoloura factor with levels 0 1
dbha numeric vector
Skovgaard JP, Thomsen IM, Skovgaard IM and Martinussen T (2009). Associations among symptoms of dieback in even-aged stands of ash (Fraxinus excelsior L.). Forest Pathology.
data(ashtrees) head(ashtrees)data(ashtrees) head(ashtrees)
Estimates of the percentage of body fat determined by underwater weighing and various body circumference measurements for 252 men.
data(BodyFat) data(BodyFat)data(BodyFat) data(BodyFat)
A data frame with 252 observations on the following 15 variables.
DensityDensity determined from underwater weighing, a numeric vector
BodyFatPercent body fat from Siri's (1956) equation, a numeric vector
Agein years, a numeric vector
Weightin lbs, a numeric vector
Heightin inches, a numeric vector
Neckcircumference in cm, a numeric vector
Chestcircumference in cm, a numeric vector
Abdomencircumference in cm, a numeric vector
Hipcircumference in cm, a numeric vector
Thighcircumference in cm, a numeric vector
Kneecircumference in cm, a numeric vector
Anklecircumference in cm, a numeric vector
Bicepscircumference in cm, a numeric vector
Forearmcircumference in cm, a numeric vector
Wristcircumference in cm, a numeric vector
For more information see https://lib.stat.cmu.edu/datasets/bodyfat
Bailey, Covert (1994). Smart Exercise: Burning Fat, Getting Fit, Houghton-Mifflin Co., Boston, pp. 179-186.
Behnke, A.R. and Wilmore, J.H. (1974). Evaluation and Regulation of Body Build and Composition, Prentice-Hall, Englewood Cliffs, N.J.
Siri, W.E. (1956), "Gross composition of the body", in Advances in Biological and Medical Physics, vol. IV, edited by J.H. Lawrence and C.A. Tobias, Academic Press, Inc., New York.
Katch, Frank and McArdle, William (1977). Nutrition, Weight Control, and Exercise, Houghton Mifflin Co., Boston.
Wilmore, Jack (1976). Athletic Training and Physical Fitness: Physiological Principles of the Conditioning Process, Allyn and Bacon, Inc., Boston.
data(BodyFat) head(BodyFat)data(BodyFat) head(BodyFat)
Perturbations of the p53 pathway are associated with more aggressive and therapeutically refractory tumours. We preprocessed the data using Robust Multichip Analysis (RMA). Dataset has been truncated to the 1000 most informative genes (as selected by Wilcoxon test statistics) to simplify computation. The genes have been standardised to have zero mean and unit variance (i.e. z-scored).
data(breastcancer)data(breastcancer)
A data frame with 250 observations on 1001 variables. The
first 1000 columns are numerical variables; the last column
(named code) is a factor with levels case and
control.
The factor code defines whether there was a mutation in the p53
sequence (code=case) or not (code=control).
Dr. Chris Holmes, c.holmes at stats dot. ox . ac .uk
Miller et al (2005, PubMed ID:16141321)
data(breastcancer) ## maybe str(breastcancer) ; plot(breastcancer) ...data(breastcancer) ## maybe str(breastcancer) ; plot(breastcancer) ...
Measurement of lean meat percentage of 344 pig carcasses together with auxillary information collected at three Danish slaughter houses
data(carcass)data(carcass)
carcassall: A data frame with 344 observations on the following 17 variables.
weightWeight of carcass
lengthcLength of carcass from back toe to head (when the carcass hangs in the back legs)
lengthfLength of carcass from back toe to front leg (that is, to the shoulder)
lengthpLength of carcass from back toe to the pelvic bone
Fat02, Fat03, Fat11, Fat12, Fat13, Fat14, Fat16Thickness of fat layer at different locations on the back of the carcass (FatXX refers to thickness at (or rather next to) rib no. XX. Notice that 02 is closest to the head
Meat11, Meat12, Meat13Thickness of meat layer at different locations on the back of the carcass, see description above
LeanMeatLean meat percentage determined by dissection
slhouseSlaughter house; a factor with levels a b c
sexSex of the pig; a factor with a b
c. Notice that it is no an error to have three levels; the
third level refers to castrates
carcass: Contains only the variables Fat11, Fat12, Fat13, Meat11, Meat12, Meat13, LeanMeat
Busk, H., Olsen, E. V., Brøndum, J. (1999) Determination of lean meat in pig carcasses with the Autofom classification system, Meat Science, 52, 307-314
data(carcass) head(carcass)data(carcass) head(carcass)
Simulated data from the Chest Clinic example (also known as the Asia example) from Lauritzen and Spiegelhalter, 1988 (see reference below).
data(chestSim500)data(chestSim500)
A data frame with 500 observations on the following 8 variables.
asiaRecent visit to Asia?; a factor with levels yes no
tubHas tuberculosis?; a factor with levels yes no
smokeIs a smoker?; a factor with levels yes no
lungHas lung cancer?; a factor with levels yes no
broncHas bronchitis?; a factor with levels yes no
eitherEither lung cancer or tuberculosis?; a factor with levels yes no
xrayPositive x-ray? a factor with levels yes no
dyspDyspnoea (shortness of breath)?; a factor with levels yes no
Notice that the chest clinic example is a contrieved example; it does not originate from an empirical study.
Lauritzen and Spiegelhalter (1988) Local Computations with Probabilities on Graphical Structures and their Application to Expert Systems (with Discussion). J. Roy. Stat. Soc. 50, p. 157-224.
data(chestSim500) ## maybe str(chestSim500) ; plot(chestSim500) ...data(chestSim500) ## maybe str(chestSim500) ; plot(chestSim500) ...
The dietox data frame has 861 rows and 7 columns.
data(dietox)data(dietox)
This data frame contains the following columns: Weight, Feed, Time, Pig, Evit, Cu, Litter.
Lauridsen, C., Højsgaard, S., Sørensen, M.T. C. (1999) Influence of Dietary Rapeseed Oli, Vitamin E, and Copper on Performance and Antioxidant and Oxidative Status of Pigs. J. Anim. Sci.77:906-916
data(dietox)data(dietox)
A contingency table relating surgical operation, centre and severity of gastric dumping, a syndrome associated with gastric surgery.
data(dumping)data(dumping)
A 3x4x4 table of counts cross-classified by Symptom (none/slight/moderate), Operation (Vd/Va/Vh/Gr) and Centre (1:4).
Gastric dumping syndrome is a condition where ingested foods bypass the stomach too rapidly and enter the small intestine largely undigested. It is an undesirable side-effect of gastric surgery. The table summarizes the results of a study comparing four different surgical operations on patients with duodenal ulcer, carried out in four centres, as described in Grizzle et al (1969). The four operations were: vagotomy and drainage, vagotomy and antrectomy (removal of 25\ (removal of 50\ 75\
Grizzle JE, Starmer CF, Koch GG (1969) Analysis of categorical data by linear models. Biometrics 25(3):489-504.
data(dumping) plot(dumping)data(dumping) plot(dumping)
In a study of lizard behaviour, characteristics of 409 lizards were recorded, namely species (S), perch diameter (D) and perch height (H). Perch means preferred place to settle down (a branch on a tree). The focus of interest is in how the propensities of the lizards to choose perch height and diameter are related, and whether and how these depend on species.
data(lizard)data(lizard)
A 3–dimensional array with factors diam: "<=4" ">4" height: ">4.75" "<=4.75" species: "anoli" "dist"
Schoener TW (1968) The anolis lizards of bimini: Resource partitioning in a complex fauna. Ecology 49:704-726
data(lizard) # Datasets lizardRAW and lizardDF are generated with the following code #lizardAGG <- as.data.frame(lizard) #f <- lizardAGG$Freq #idx <- unlist(mapply(function(i, n) rep(i, n), 1:8, f)) #set.seed(0805) #idx <- sample(idx) #lizardRAW <- as.data.frame(lizardAGG[idx, 1:3]) #rownames(lizardRAW) <- 1:NROW(lizardRAW)data(lizard) # Datasets lizardRAW and lizardDF are generated with the following code #lizardAGG <- as.data.frame(lizard) #f <- lizardAGG$Freq #idx <- unlist(mapply(function(i, n) rep(i, n), 1:8, f)) #set.seed(0805) #idx <- sample(idx) #lizardRAW <- as.data.frame(lizardAGG[idx, 1:3]) #rownames(lizardRAW) <- 1:NROW(lizardRAW)
The mathmark data frame has 88 rows and 5 columns.
data(mathmark)data(mathmark)
This data frame contains the following columns: mechanics, vectors, algebra, analysis, statistics.
Søren Højsgaard, [email protected]
David Edwards, An Introduction to Graphical Modelling, Second Edition, Springer Verlag, 2000
data(mathmark)data(mathmark)
The data stem from a cross between two isolates of the barley powdery mildew fungus. For each offspring 6 binary characteristics, each corresponding to a single locus, were recorded. The object of the analysis is to determine the order of the loci along the chromosome.
data(mildew)data(mildew)
A 6 dimensional array where each variable has levels "1" and "2".
The variables are: la10, locc, mp58, c365, p53a and a365.
Christiansen, S.K., Giese, H (1991) Genetic analysis of obligate barley powdery mildew fungus based on RFLP and virulence loci. Theor. Appl. Genet. 79:705-712
data(mildew) ## maybe str(mildew) ; plot(mildew) ...data(mildew) ## maybe str(mildew) ; plot(mildew) ...
Data from an experiment on composition of sow milk. Milk composition is measured on four occasions during lactation on a number of sows. The treatments are different types of fat added to the sows feed.
data(milkcomp)data(milkcomp)
A data frame with 214 observations on the following 7 variables.
sowa numeric vector
lactimea numeric vector
treata factor with levels a b c d e f g
fata numeric vector
proteina numeric vector
dm(dry matter) a numeric vector
lactosea numeric vector
a is the control, i.e. no fat has been added.
fat + protein + lactose almost add up to dm (dry
matter)
Charlotte Lauridsen and Viggo Danielsen (2004): Lactational dietary fat levels and sources influence milk composition and performance of sows and their progeny Livestock Production Science 91 (2004) 95-105
data(milkcomp) ## maybe str(milk) ; plot(milk) ...data(milkcomp) ## maybe str(milk) ; plot(milk) ...
The data come from a study of the effects of five dietary regimens with different fatty acid compositions on liver lipids and hepatic gene expression in 40 mice.
data(Nutrimouse)data(Nutrimouse)
A data frame with 40 observations on 143 variables of which two are factors and 141 are numeric.
genotypea factor with levels wt
ppar
dieta factor with levels coc
fish lin ref sun
The data come from a study of the effects of five dietary regimens with different fatty acid compositions on liver lipids and hepatic gene expression in wild-type and PPAR-alpha-deficient mice (Martin et al., 2007).
There were 5 replicates per genotype and diet combination.
There are two design variables: (i) genotype, a factor with two levels: wild-type (wt) and PPAR-alpha-deficient (ppar), and (ii) diet, a factor with five levels. The oils used for experimental diet preparation were: corn and colza oils (50/50) for a reference diet (ref); hydrogenated coconut oil for a saturated fatty acid diet (coc); sunflower oil for an Omega6 fatty acid-rich diet (sun); linseed oil for an Omega3-rich diet (lin); and corn/colza/enriched (43/43/14) fish oils (fish).
There are 141 response variables: (i) the log-expression levels of 120 genes measured in liver cells, and (ii) the concentrations (in percentages) of 21 hepatic fatty acids measured by gas chromatography.
The data were provided by Pascal Martin from the Toxicology and Pharmacology Laboratory, National Institute for Agronomic Research, France.
Martin, P. G. P., Guillou, H., Lasserre, F., D'jean, S., Lan, A., Pascussi, J.-M., San Cristobal, M., Legrand, P., Besse, P. and Pineau, T. (2007). Novel aspects of PPARa-mediated regulation of lipid and xenobiotic metabolism revealed through a multrigenomic study. Hepatology 54, 767-777.
data(Nutrimouse)data(Nutrimouse)
The peronality dataframe has 240 rows and 32 columns
data(personality)data(personality)
This dataframe has recordings on the following 32 variables: distant, talkatv, carelss, hardwrk, anxious, agreebl, tense, kind, opposng, relaxed, disorgn, outgoin, approvn, shy, discipl, harsh, persevr, friendl, worryin, respnsi, contrar, sociabl, lazy, coopera, quiet, organiz, criticl, lax, laidbck, withdrw, givinup, easygon
Søren Højsgaard, [email protected]
Origin unclear
data(personality) str(personality)data(personality) str(personality)
An artificial dataset. 24 rats (12 female, 12 male) have been randomized to use one of three drugs (products for loosing weight). The weightloss for each rat is noted after one and two weeks.
data(rats)data(rats)
A dataframe with 4 variables. Sex: "M" (male), "F" (female). Drug: "D1", "D2", "D3" (three types). W1 weightloss, week one. W2 weightloss, week 2.
Morrison, D.F. (1976). Multivariate Statistical Methods. McGraw-Hill, USA.
Edwards, D. (1995). Introduction to Graphical Modelling, Springer-Verlag. New York.
Data collected at the beginning of a 15 year follow-up study of probable risk factors for coronary thrombosis. Data are from all men employed in a car factory.
data(reinis)data(reinis)
A table with 6 discrete variables. A: smoking, B: strenous mental work, D: strenuous physical work, E: systolic blood pressure, F: ratio of lipoproteins, G: Family anamnesis of coronary heart disease.
Edwards and Havranek (1985): A fast procedure for model search in multidimensional contingency tables. Biometrika, 72: 339-351.
Reinis et al (1981): Prognostic significance of the risk profile in the prevention of coronary heart disease. Bratis. lek. Listy. 76: 137-150.
Using chemical analysis determine the origin of wines
data(wine)data(wine)
A data frame with 178 observations on the following 14 variables.
Culta factor with levels v1 v2
v3: 3 different graph varieties
AlchAlcohol
MlcaMalic acid
AshAsh
AloaAlcalinity of ash
MgnsMagnesium
TtlpTotal phenols
FlvnFlavanoids
NnfpNonflavanoid phenols
PrntProanthocyanins
ClriColor intensity
HueHue
OodwOD280/OD315 of diluted wines
PrlnProline
Data comes from the UCI Machine Learning Repository. The grape variety
Cult is the class identifier.
Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository https://archive.ics.uci.edu/ml/. Irvine, CA: University of California, School of Information and Computer Science.
See references at https://archive.ics.uci.edu/ml/datasets/Wine/
data(wine) ## maybe str(wine) ; plot(wine) ...data(wine) ## maybe str(wine) ; plot(wine) ...
Downstream aliases for other graphical modelling packages. Will be deprecated in due course.
A DAG can be represented as a triangular matrix of regression coefficients.
dag2edge_matrix(object, out = 1) edge_matrix2dag(edge_matrix)dag2edge_matrix(object, out = 1) edge_matrix2dag(edge_matrix)
object |
A graph, either an igraph object or an adjacency matrix. |
out |
Format of the output, can be 1, 2, 3 or 4. |
edge_matrix |
Lower triangular matrix representing a dag |
g <- dag(~x2|x1 + x3|x1:x2 + x4|x3) dag2edge_matrix(g, out=1) dag2edge_matrix(g, out=2) dag2edge_matrix(g, out=3) dag2edge_matrix(g, out=4) d2 <- dag(~c|a:b+d:c) dag2edge_matrix(d2)g <- dag(~x2|x1 + x3|x1:x2 + x4|x3) dag2edge_matrix(g, out=1) dag2edge_matrix(g, out=2) dag2edge_matrix(g, out=3) dag2edge_matrix(g, out=4) d2 <- dag(~c|a:b+d:c) dag2edge_matrix(d2)
Generate all combinations of the elements of x taken m at a time. If x is a positive integer, returns all combinations of the elements of seq(x) taken m at a time.
fastcombn(x, m, FUN = NULL, simplify = TRUE, ...) combn_prim(x, m, simplify = TRUE)fastcombn(x, m, FUN = NULL, simplify = TRUE, ...) combn_prim(x, m, simplify = TRUE)
x |
vector source for combinations, or integer n for x <- seq(n). |
m |
number of elements to choose. |
FUN |
function to be applied to each combination; default ‘NULL’ means the identity, i.e., to return the combination (vector of length ‘m’). |
simplify |
logical indicating if the result should be simplified to a matrix; if FALSE, the function returns a list. |
... |
Further arguments passed on to |
Factors x are accepted.
combn_prim is a simplified (but faster) version of the combn
function. Does nok take the FUN argument.
fastcombn is intended to be a faster version of the combn
function.
A matrix or a list.
Søren Højsgaard
x <- letters[1:5]; m <- 3 fastcombn(x, m) combn(x, m) combn_prim(x, m) x <- letters[1:4]; m <- 3 fastcombn(x, m, simplify=FALSE) combn(x, m, simplify=FALSE) combn_prim(x, m, simplify=FALSE) x <- 1:10; m <- 3 fastcombn(x, m, min) combn(x, m, min) x <- factor(letters[1:8]); m <- 5 if (require(microbenchmark)){ microbenchmark( combn(x, m, simplify=FALSE), combn_prim(x, m, simplify=FALSE), fastcombn(x, m, simplify=FALSE), times=50 ) }x <- letters[1:5]; m <- 3 fastcombn(x, m) combn(x, m) combn_prim(x, m) x <- letters[1:4]; m <- 3 fastcombn(x, m, simplify=FALSE) combn(x, m, simplify=FALSE) combn_prim(x, m, simplify=FALSE) x <- 1:10; m <- 3 fastcombn(x, m, min) combn(x, m, min) x <- factor(letters[1:8]); m <- 5 if (require(microbenchmark)){ microbenchmark( combn(x, m, simplify=FALSE), combn_prim(x, m, simplify=FALSE), fastcombn(x, m, simplify=FALSE), times=50 ) }
Functions that must be retained to make code from gmwr-book work
as.adjMAT(object)as.adjMAT(object)
object |
An object to be coerced. |
Methods for changing graph representations
coerceGraph(object, class) graph_as(object, outtype, intype = NULL)coerceGraph(object, class) graph_as(object, outtype, intype = NULL)
object |
A graph object |
class |
The desired output class |
outtype |
The desired output outtype |
intype |
The desired output outtype (only relevant if object is a list) |
coerceGraph is used in the book "Graphical models with R". A more generic approach is as().
g1 <- ug(~a:b+b:c) as(g1, "igraph") as(g1, "matrix") as(g1, "Matrix") as(g1, "dgCMatrix") ## graph_as(g1, "ugList") ## Fails ## getCliques(g1) ## Works l1 <- list(c("a" ,"b"), c("b", "c"))g1 <- ug(~a:b+b:c) as(g1, "igraph") as(g1, "matrix") as(g1, "Matrix") as(g1, "dgCMatrix") ## graph_as(g1, "ugList") ## Fails ## getCliques(g1) ## Works l1 <- list(c("a" ,"b"), c("b", "c"))
Coercion of graphs represented as lists to various graph formats.
g_ugl2ig_(zz, vn = NULL) g_ugl2dm_(zz, vn = NULL) g_ugl2sm_(zz, vn = NULL) g_ugl2XX_(zz, outtype, vn = NULL) g_dagl2ig_(zz, vn = NULL) g_dagl2dm_(zz, vn = NULL) g_dagl2sm_(zz, vn = NULL) g_dagl2XX_(zz, outtype, vn = NULL) g_adl2ig_(zz) g_adl2dm_(zz) g_adl2sm_(zz) g_adl2XX_(zz, outtype) g_M2adl_(amat) g_M2ugl_(amat) g_M2dagl_(amat) g_ugl2M_(glist, vn = NULL, result = "matrix") g_dagl2M_(glist, vn = NULL, result = "matrix") g_adl2M_(alist, result = "matrix")g_ugl2ig_(zz, vn = NULL) g_ugl2dm_(zz, vn = NULL) g_ugl2sm_(zz, vn = NULL) g_ugl2XX_(zz, outtype, vn = NULL) g_dagl2ig_(zz, vn = NULL) g_dagl2dm_(zz, vn = NULL) g_dagl2sm_(zz, vn = NULL) g_dagl2XX_(zz, outtype, vn = NULL) g_adl2ig_(zz) g_adl2dm_(zz) g_adl2sm_(zz) g_adl2XX_(zz, outtype) g_M2adl_(amat) g_M2ugl_(amat) g_M2dagl_(amat) g_ugl2M_(glist, vn = NULL, result = "matrix") g_dagl2M_(glist, vn = NULL, result = "matrix") g_adl2M_(alist, result = "matrix")
zz |
An object representing a graph. |
vn |
The names of the vertices in the graphs. These will be the row and column names of the matrix. |
outtype |
What should a list be coerced to. |
amat |
Adjacency matrix (dense or sparse dgCMatrix). |
glist |
A list of generators where a generator is a character vector. If interpreted as generators of an undirected graph, a generator is a complete set of vertices in the graph. If interpreted as generators of a dag, a generator (v1,...,vn) means that there will be arrows from v2,...,vn to v1. |
result |
A graph object. |
alist |
An adjacency list. |
## Sparse and dense adjacency matrices converted to adjacency list g1 <- ug(~a:b + b:c + c:d, result="matrix") g2 <- ug(~a:b + b:c + c:d, result="dgCMatrix") g_M2adl_( g1 ) ## Sparse and dense adjacency matrices converted to cliques g_M2ugl_( g1 ) ## Sparse and dense adjacency matrices converted to cliques g_M2dagl_( g1 ) ## g_M2adl_( g2 ) ## FIXME FAILS for sparse matrix ## g_M2ugl_( g2 ) ## FIXME Is there an issue here?? ## g_M2dagList( g2 ) ## Fails for sparse matrix## Sparse and dense adjacency matrices converted to adjacency list g1 <- ug(~a:b + b:c + c:d, result="matrix") g2 <- ug(~a:b + b:c + c:d, result="dgCMatrix") g_M2adl_( g1 ) ## Sparse and dense adjacency matrices converted to cliques g_M2ugl_( g1 ) ## Sparse and dense adjacency matrices converted to cliques g_M2dagl_( g1 ) ## g_M2adl_( g2 ) ## FIXME FAILS for sparse matrix ## g_M2ugl_( g2 ) ## FIXME Is there an issue here?? ## g_M2dagList( g2 ) ## Fails for sparse matrix
Generic function for plotting graphs using the 'igraph' package.
iplot(x, ...) ## S3 method for class 'igraph' iplot(x, ...)iplot(x, ...) ## S3 method for class 'igraph' iplot(x, ...)
x |
A graph object to be plotted. |
... |
Additional arguments |
Søren Højsgaard, [email protected]
UG <- ug(~a:b+b:c:d) iplot(UG)UG <- ug(~a:b+b:c:d) iplot(UG)
Check if a graph is 1) a directed acyclic graph (DAG), 2) a directed graph (DG), 3) an undirected graph (UG), 4) a triangulated (chordal) undirected graph (TUG).
is_dag(object) is_dagMAT(object) is_ug(object) is_ugMAT(object) is_tug(object) is_tugMAT(object) is_dg(object) is_dgMAT(object) is_adjMAT(object) is.adjMAT(object)is_dag(object) is_dagMAT(object) is_ug(object) is_ugMAT(object) is_tug(object) is_tugMAT(object) is_dg(object) is_dgMAT(object) is_adjMAT(object) is.adjMAT(object)
object |
A graph represented as
an |
A non-zero value at entry (i,j) in an adjacency matrix A for a graph means that there is an edge from i to j. If also (j,i) is non-zero there is also an edge from j to i. In this case we may think of a bidirected edge between i and j or we may think of the edge as being undirected. We do not distinguish between undirected and bidirected edges in the gRbase package.
The function is_ug() checks if the adjacency matrix is
symmetric.
The function is_tug() checks if the graph is undirected and
triangulated (also called chordal) by checking if the adjacency matrix is
symmetric and the vertices can be given a perfect ordering using maximum
cardinality seach.
The function is_dg() checks if a graph is directed, i.e., that there
are no undirected edges. This is done by computing the elementwise product
of A and the transpose of A; if there are no non–zero entries in this
product then the graph is directed.
The function is_dag() will return TRUE if all edges are
directed and if there are no cycles in the graph. (This is checked by
checking if the vertices in the graph can be given a topological ordering
which is based on identifying an undirected edge with a bidrected edge).
There is a special case, namely if the graph has no edges at all (such that the adjacency matrix consists only of zeros). Such a graph is both undirected, triangulated, directed and directed acyclic.
The functions
is.TUG/is.DAG/is.DG/is.UG/is.adjMAT
are synonymous with
is_tug/is_dag/is_dg/is_ug/is_adjMAT.
The is.X group of functions will be deprecated.
Søren Højsgaard, [email protected]
## DAGs dag_ <- dag(~ a:b:c + c:d:e) ## Undirected graphs ug_ <- ug(~a:b:c + c:d:e) ## Is graph a DAG? is_dag(dag_) is_dag(ug_) ## Is graph an undirected graph is_ug(dag_) is_ug(ug_) ## Is graph a triangulated (i.e. chordal) undirected graph is_tug(dag_) is_tug(ug_) ## Example where the graph is not triangulated ug2_ <- ug(~ a:b + b:c + c:d + d:a) is_tug(ug2_)## DAGs dag_ <- dag(~ a:b:c + c:d:e) ## Undirected graphs ug_ <- ug(~a:b:c + c:d:e) ## Is graph a DAG? is_dag(dag_) is_dag(ug_) ## Is graph an undirected graph is_ug(dag_) is_ug(ug_) ## Is graph a triangulated (i.e. chordal) undirected graph is_tug(dag_) is_tug(ug_) ## Example where the graph is not triangulated ug2_ <- ug(~ a:b + b:c + c:d + d:a) is_tug(ug2_)
Unified approach to query a graph about its properties (based partly on functionality from gRbase and functionality imported from RBGL).
querygraph(object, op, set = NULL, set2 = NULL, set3 = NULL) qgraph(object, op, set = NULL, set2 = NULL, set3 = NULL) ancestors(set, object) subGraph(set, object) is.triangulated(object) connComp(object) ancestralSet(set, object) ancestralGraph(set, object) parents(set, object) children(set, object) separates(set, set2, set3, object) closure(set, object) adj(object, set) is.simplicial(set, object) simplicialNodes(object) is.complete(object, set = NULL) is.decomposition(set, set2, set3, object) nodes_(object) nodes(object, ...) ## S4 method for signature 'igraph' nodes(object, ...) edges(object) edges_(object) addEdge(v1, v2, object) removeEdge(v1, v2, object)querygraph(object, op, set = NULL, set2 = NULL, set3 = NULL) qgraph(object, op, set = NULL, set2 = NULL, set3 = NULL) ancestors(set, object) subGraph(set, object) is.triangulated(object) connComp(object) ancestralSet(set, object) ancestralGraph(set, object) parents(set, object) children(set, object) separates(set, set2, set3, object) closure(set, object) adj(object, set) is.simplicial(set, object) simplicialNodes(object) is.complete(object, set = NULL) is.decomposition(set, set2, set3, object) nodes_(object) nodes(object, ...) ## S4 method for signature 'igraph' nodes(object, ...) edges(object) edges_(object) addEdge(v1, v2, object) removeEdge(v1, v2, object)
object |
A graph. |
op |
The operation or query. |
set, set2, set3
|
Sets of nodes in graph. |
... |
additional arguments |
v1, v2
|
Vertex names |
ug0 <- ug(~a:b + b:c:d + e) separates("a", "d", c("b", "c"), ug0) separates("a", "d", "c", ug0) is.simplicial("b", ug0) simplicialNodes(ug0) simplicialNodes(ug0)ug0 <- ug(~a:b + b:c:d + e) separates("a", "d", c("b", "c"), ug0) separates("a", "d", "c", ug0) is.simplicial("b", ug0) simplicialNodes(ug0) simplicialNodes(ug0)
A topological ordering of a directed graph is a linear ordering of its vertices such that, for every edge (u->v), u comes before v in the ordering. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering. Can hence be used for checking if a graph is a DAG.
topo_sort(object, index = FALSE) topo_sortMAT(amat, index = FALSE) topoSort(object, index = FALSE) topoSortMAT(amat, index = FALSE)topo_sort(object, index = FALSE) topo_sortMAT(amat, index = FALSE) topoSort(object, index = FALSE) topoSortMAT(amat, index = FALSE)
object |
An graph represented either as an |
index |
If FALSE, an ordering is returned if it exists and
|
amat |
Adjacency matrix. |
If FALSE, an ordering is returned if it exists and
character(0) otherwise. If TRUE, the index of the
variables in an adjacency matrix is returned and -1
otherwise.
The functions topo_sort / topoSort are synonymous with topo_sortMAT /
topoSortMAT. One of the groups may be deprecated in the future.
The workhorse is the topo_sortMAT function which takes
an adjacency matrix as input.
Søren Højsgaard, [email protected]
dagMAT <- dag(~a:b:c + c:d:e, result="matrix") dagMATS <- as(dagMAT, "dgCMatrix") topo_sort(dagMAT) topo_sort(dagMATS)dagMAT <- dag(~a:b:c + c:d:e, result="matrix") dagMATS <- as(dagMAT, "dgCMatrix") topo_sort(dagMAT) topo_sort(dagMATS)
Get list of vertices and their parents for graph.
vchi(object, getv = TRUE, forceCheck = TRUE) vchiMAT(object, getv = TRUE, forceCheck = TRUE) vpar(object, getv = TRUE, forceCheck = TRUE) vparMAT(object, getv = TRUE, forceCheck = TRUE)vchi(object, getv = TRUE, forceCheck = TRUE) vchiMAT(object, getv = TRUE, forceCheck = TRUE) vpar(object, getv = TRUE, forceCheck = TRUE) vparMAT(object, getv = TRUE, forceCheck = TRUE)
object |
An object representing a graph. Valid objects are an adjacency matrix or an igraph. |
getv |
The result is by default a list of vectors of the form
|
forceCheck |
Logical indicating if it should be checked that the object is a DAG. |
A list of vectors where each vector will have the form
(v, pa1, pa2, ... paN) where pa1, pa2, ... paN
are the parents of v.
## DAGs dag_mat <- dag(~a:b:c + c:d:e, result="matrix") dag_ig <- dag(~a:b:c + c:d:e) vpar(dag_mat) vpar(dag_ig) vpar(dag_mat, getv=FALSE) vpar(dag_ig, getv=FALSE) ## Undirected graphs ug_mat <- ug(~a:b:c + c:d:e, result="matrix") ug_ig <- ug(~a:b:c + c:d:e) ## Not run: ## This will fail because the adjacency matrix is symmetric and the ## graph has undirected edges vpar(ug_mat) vpar(ug_ig) ## End(Not run) ## When forceCheck is FALSE, it will not be detected that the #g raphs are undirected. vpar(ug_mat, forceCheck=FALSE) vpar(ug_ig, forceCheck=FALSE)## DAGs dag_mat <- dag(~a:b:c + c:d:e, result="matrix") dag_ig <- dag(~a:b:c + c:d:e) vpar(dag_mat) vpar(dag_ig) vpar(dag_mat, getv=FALSE) vpar(dag_ig, getv=FALSE) ## Undirected graphs ug_mat <- ug(~a:b:c + c:d:e, result="matrix") ug_ig <- ug(~a:b:c + c:d:e) ## Not run: ## This will fail because the adjacency matrix is symmetric and the ## graph has undirected edges vpar(ug_mat) vpar(ug_ig) ## End(Not run) ## When forceCheck is FALSE, it will not be detected that the #g raphs are undirected. vpar(ug_mat, forceCheck=FALSE) vpar(ug_ig, forceCheck=FALSE)
Return a list of (maximal) cliques of an undirected graph.
get_cliques(object) max_cliqueMAT(amat) getCliques(object) maxCliqueMAT(amat) maxClique(object)get_cliques(object) max_cliqueMAT(amat) getCliques(object) maxCliqueMAT(amat) maxClique(object)
object |
An undirected graph represented either as an |
amat |
An adjacency matrix. |
In graph theory, a clique is often a complete subset of a graph. A maximal clique is a clique which can not be enlarged. In statistics (and that is the convention we follow here) a clique is usually understood to be a maximal clique.
Finding the cliques of a general graph is an NP complete problem. Finding the cliques of triangualted graph is linear in the number of cliques.
The workhorse is the max_cliqueMAT function which calls the
maxClique function in the RBGL package.
A list.
For backward compatibility with downstream packages we have the following synonymous functions:
getCliques = get_cliques
maxCliqueMAT = max_cliqueMAT
Søren Højsgaard, [email protected]
ug, dag, mcs,
mcsMAT, rip, ripMAT,
moralize, moralizeMAT
uG0 <- ug(~a:b + b:c + c:d + d:e + e:f + f:a) get_cliques(uG0) uG1 <- as(uG0, "igraph") get_cliques(uG1) uG2 <- as(uG0, "matrix") get_cliques(uG2) uG3 <- as(uG1, "dgCMatrix") get_cliques(uG3)uG0 <- ug(~a:b + b:c + c:d + d:e + e:f + f:a) get_cliques(uG0) uG1 <- as(uG0, "igraph") get_cliques(uG1) uG2 <- as(uG0, "matrix") get_cliques(uG2) uG3 <- as(uG1, "dgCMatrix") get_cliques(uG3)
API for coercing graph representations.
g_dm2sm_(object) g_dm2ig_(object) g_sm2dm_(object) g_sm2ig_(object) g_ig2dm_(object) g_ig2sm_(object) g_xm2ig_(object) g_xm2dm_(object) g_xm2sm_(object) g_xm2xm_(object, result = "matrix")g_dm2sm_(object) g_dm2ig_(object) g_sm2dm_(object) g_sm2ig_(object) g_ig2dm_(object) g_ig2sm_(object) g_xm2ig_(object) g_xm2dm_(object) g_xm2sm_(object) g_xm2xm_(object, result = "matrix")
object |
An object representing a graph |
result |
Either 'matrix' (dense) or 'dgCMatrix' (sparse, can be abbreviated to 'Matrix'). |
No checking is made. In the function the following names are used:
"ig": "igraph";
"sm": "dgCMatrix" (sparse matrix);
"dm": "matrix" (dense matrix)
Søren Højsgaard, [email protected]
These functions are wrappers for creation of graphs as primarily igraph objects but also as adjacency matrices
ug(..., result = "igraph") ugi(...) ugList(x, result = "igraph") dag(..., result = "igraph", forceCheck = FALSE) dagi(..., forceCheck = FALSE) dagList(x, result = "igraph", forceCheck = FALSE)ug(..., result = "igraph") ugi(...) ugList(x, result = "igraph") dag(..., result = "igraph", forceCheck = FALSE) dagi(..., forceCheck = FALSE) dagList(x, result = "igraph", forceCheck = FALSE)
... |
A generating class for a graph, see examples below |
result |
The format of the graph. The possible choices are
"igraph" (for an |
x |
A list or individual components from which a graph can be created. |
forceCheck |
Logical determining if it should be checked if
the graph is acyclical. Yes, one can specify graphs with cycles
using the |
Functions ug(), and dag() can return a
an igraph object, a sparse or a dense
adjacency matrix.
Søren Højsgaard, [email protected]
## The following specifications of undirected graphs are equivalent: uG1 <- ug(~ a:b:c + c:d) uG2 <- ug(c("a", "b", "c"), c("c", "d")) uG3 <- ug(c("a", "b"), c("a", "c"), c("b", "c"), c("c", "d")) ## The following specifications of directed acyclig graphs are equivalent: daG1 <- dag(~ a:b:c + b:c + c:d) daG2 <- dag(c("a", "b", "c"), c("b", "c"), c("c", "d")) ## dag() allows to specify directed graphs with cycles: daG4 <- dag(~ a:b + b:c + c:a) # A directed graph but with cycles ## A check for acyclicity can be done with ## daG5 <- dag(~ a:b + b:c + c:a, forceCheck=TRUE) ## A check for acyclicity is provided by topoSort topo_sort( daG2 ) topo_sort( daG4 ) ## Different representations uG7 <- ug(~a:b:c + c:d, result="igraph") # igraph uG8 <- ug(~a:b:c + c:d, result="matrix") # dense matrix uG9 <- ug(~a:b:c + c:d, result="dgCMatrix") # sparse matrix## The following specifications of undirected graphs are equivalent: uG1 <- ug(~ a:b:c + c:d) uG2 <- ug(c("a", "b", "c"), c("c", "d")) uG3 <- ug(c("a", "b"), c("a", "c"), c("b", "c"), c("c", "d")) ## The following specifications of directed acyclig graphs are equivalent: daG1 <- dag(~ a:b:c + b:c + c:d) daG2 <- dag(c("a", "b", "c"), c("b", "c"), c("c", "d")) ## dag() allows to specify directed graphs with cycles: daG4 <- dag(~ a:b + b:c + c:a) # A directed graph but with cycles ## A check for acyclicity can be done with ## daG5 <- dag(~ a:b + b:c + c:a, forceCheck=TRUE) ## A check for acyclicity is provided by topoSort topo_sort( daG2 ) topo_sort( daG4 ) ## Different representations uG7 <- ug(~a:b:c + c:d, result="igraph") # igraph uG8 <- ug(~a:b:c + c:d, result="matrix") # dense matrix uG9 <- ug(~a:b:c + c:d, result="dgCMatrix") # sparse matrix
Returns the edges of a graph (or edges not in a graph)
where the graph can be either an igraph
object or an adjacency matrix.
edgeList(object, matrix = FALSE) edgeListMAT(adjmat, matrix = FALSE) nonEdgeList(object, matrix = FALSE) nonEdgeListMAT(adjmat, matrix = FALSE)edgeList(object, matrix = FALSE) edgeListMAT(adjmat, matrix = FALSE) nonEdgeList(object, matrix = FALSE) nonEdgeListMAT(adjmat, matrix = FALSE)
object |
An |
matrix |
If TRUE the result is a matrix; otherwise the result is a list. |
adjmat |
An adjacency matrix. |
## A graph with edges g <- ug(~a:b + b:c + c:d) gm <- as(g, "matrix") edgeList(g) edgeList(gm) edgeListMAT(gm) edgeList(g, matrix=TRUE) edgeList(gm, matrix=TRUE) edgeListMAT(gm, matrix=TRUE) nonEdgeList(g) nonEdgeList(gm) nonEdgeListMAT(gm) ## A graph without edges g <- ug(~a + b + c) gm <- as(g, "matrix") edgeList(g) edgeList(gm) edgeListMAT(gm) edgeList(g, matrix=TRUE) edgeList(gm, matrix=TRUE) edgeListMAT(gm, matrix=TRUE) nonEdgeList(g) nonEdgeList(gm) nonEdgeListMAT(gm)## A graph with edges g <- ug(~a:b + b:c + c:d) gm <- as(g, "matrix") edgeList(g) edgeList(gm) edgeListMAT(gm) edgeList(g, matrix=TRUE) edgeList(gm, matrix=TRUE) edgeListMAT(gm, matrix=TRUE) nonEdgeList(g) nonEdgeList(gm) nonEdgeListMAT(gm) ## A graph without edges g <- ug(~a + b + c) gm <- as(g, "matrix") edgeList(g) edgeList(gm) edgeListMAT(gm) edgeList(g, matrix=TRUE) edgeList(gm, matrix=TRUE) edgeListMAT(gm, matrix=TRUE) nonEdgeList(g) nonEdgeList(gm) nonEdgeListMAT(gm)
A set of generators define an undirected graph, here called a dependence graph. Given a set of generators it is checked 1) if the dependence dependence graph is in 1-1-correspondance with the genrators (such that the corresponding model is graphical) and 2) if the dependence graph is chordal (triangulated) (such that the corresponding model is decomposable).
isGraphical(x) isDecomposable(x)isGraphical(x) isDecomposable(x)
x |
A generating class given as right hand sided formula or a
list; see |
A set of sets of variables, say A_1, A_2, ... A_K is called a generating class for a graph with vertices V and edges E. If two variables a,b are in the same generator, say A_j, then a and b are vertices in the graph and there is an undirected edge between a and b.
The graph induced by \code{g1 = ~a:b + a:c + b:c + c:d} has
edges \code{ab, ac, bc, cd}. The
cliques of this graph are \code{abc, cd}. Hence there is not a
1-1-correspondance between the graph and the generators.
On the other hand, \code{g2 <- ~a:b:c + c:d} induces the same
graph in this case there is a 1-1-correspondance.
The graph induced by \code{g3 <- ~a:b + b:c + c:d + d:a} is in
1-1-correspondance with its dependence graph, but the graph is
not chordal.
TRUE or FALSE
Søren Højsgaard, [email protected]
g1 <- ~a:b + a:c + b:c + c:d g2 <- ~a:b:c + c:d g3 <- ~a:b + b:c + c:d + d:a isGraphical( g1 ) # FALSE isGraphical( g2 ) # TRUE isGraphical( g3 ) # TRUE isDecomposable( g1 ) # FALSE isDecomposable( g2 ) # TRUE isDecomposable( g3 ) # TRUE ## A generating class can be given as a list: f <- list(c("a","b"), c("b","c"), c("a","c")) isGraphical( f ) isDecomposable( f )g1 <- ~a:b + a:c + b:c + c:d g2 <- ~a:b:c + c:d g3 <- ~a:b + b:c + c:d + d:a isGraphical( g1 ) # FALSE isGraphical( g2 ) # TRUE isGraphical( g3 ) # TRUE isDecomposable( g1 ) # FALSE isDecomposable( g2 ) # TRUE isDecomposable( g3 ) # TRUE ## A generating class can be given as a list: f <- list(c("a","b"), c("b","c"), c("a","c")) isGraphical( f ) isDecomposable( f )
Returns (if it exists) a perfect ordering of the vertices in an undirected graph.
mcs(object, root = NULL, index = FALSE) ## Default S3 method: mcs(object, root = NULL, index = FALSE) mcsMAT(amat, vn = colnames(amat), root = NULL, index = FALSE) mcs_marked(object, discrete = NULL, index = FALSE) ## Default S3 method: mcs_marked(object, discrete = NULL, index = FALSE) mcs_markedMAT(amat, vn = colnames(amat), discrete = NULL, index = FALSE)mcs(object, root = NULL, index = FALSE) ## Default S3 method: mcs(object, root = NULL, index = FALSE) mcsMAT(amat, vn = colnames(amat), root = NULL, index = FALSE) mcs_marked(object, discrete = NULL, index = FALSE) ## Default S3 method: mcs_marked(object, discrete = NULL, index = FALSE) mcs_markedMAT(amat, vn = colnames(amat), discrete = NULL, index = FALSE)
object |
An undirected graph represented either as an |
root |
A vector of variables. The first variable in the perfect ordering will be the first variable on 'root'. The ordering of the variables given in 'root' will be followed as far as possible. |
index |
If TRUE, then a permutation is returned |
amat |
Adjacency matrix |
vn |
Nodes in the graph given by adjacency matrix |
discrete |
A vector indicating which of the nodes are discrete. See 'details' for more information. |
An undirected graph is decomposable iff there exists a
perfect ordering of the vertices. The maximum cardinality
search algorithm returns a perfect ordering of the vertices if
it exists and hence this algorithm provides a check for
decomposability. The mcs() functions finds such an
ordering if it exists.
The notion of strong decomposability is used in connection with
e.g. mixed interaction models where some vertices represent
discrete variables and some represent continuous
variables. Such graphs are said to be marked. The
\code{mcsmarked()} function will return a perfect ordering iff
the graph is strongly decomposable. As graphs do not know about
whether vertices represent discrete or continuous variables,
this information is supplied in the \code{discrete} argument.
A vector with a linear ordering (obtained by maximum cardinality search) of the variables or character(0) if such an ordering can not be created.
The workhorse is the mcsMAT function.
Søren Højsgaard, [email protected]
moralize, junction_tree,
rip, ug, dag
uG <- ug(~ me:ve + me:al + ve:al + al:an + al:st + an:st) mcs(uG) mcsMAT(as(uG, "matrix")) ## Same as uG <- ug(~ me:ve + me:al + ve:al + al:an + al:st + an:st, result="matrix") mcsMAT(uG) ## Marked graphs uG1 <- ug(~ a:b + b:c + c:d) uG2 <- ug(~ a:b + a:d + c:d) ## Not strongly decomposable: mcs_marked(uG1, discrete=c("a","d")) ## Strongly decomposable: mcs_marked(uG2, discrete=c("a","d"))uG <- ug(~ me:ve + me:al + ve:al + al:an + al:st + an:st) mcs(uG) mcsMAT(as(uG, "matrix")) ## Same as uG <- ug(~ me:ve + me:al + ve:al + al:an + al:st + an:st, result="matrix") mcsMAT(uG) ## Marked graphs uG1 <- ug(~ a:b + b:c + c:d) uG2 <- ug(~ a:b + a:d + c:d) ## Not strongly decomposable: mcs_marked(uG1, discrete=c("a","d")) ## Strongly decomposable: mcs_marked(uG2, discrete=c("a","d"))
An undirected graph uG is triangulated (or chordal) if it has no cycles of length >= 4 without a chord which is equivalent to that the vertices can be given a perfect ordering. Any undirected graph can be triangulated by adding edges to the graph, so called fill-ins which gives the graph TuG. A triangulation TuG is minimal if no fill-ins can be removed without breaking the property that TuG is triangulated.
minimal_triang( object, tobject = triangulate(object), result = NULL, details = 0 ) minimal_triangMAT(amat, tamat = triangulateMAT(amat), details = 0)minimal_triang( object, tobject = triangulate(object), result = NULL, details = 0 ) minimal_triangMAT(amat, tamat = triangulateMAT(amat), details = 0)
object |
An undirected graph represented either as a |
tobject |
Any triangulation of |
result |
The type (representation) of the result. Possible values are
|
details |
The amount of details to be printed. |
amat |
The undirected graph which is to be triangulated; a symmetric adjacency matrix. |
tamat |
Any triangulation of |
For a given triangulation tobject it may be so that some of the fill-ins are superflous in the sense that they can be removed from tobject without breaking the property that tobject is triangulated. The graph obtained by doing so is a minimal triangulation.
Notice: A related concept is the minimum triangulation, which is the the graph with the smallest number of fill-ins. The minimum triangulation is unique. Finding the minimum triangulation is NP-hard.
minimal_triang() returns an igraph object while
minimal_triangMAT() returns an adjacency matrix.
Clive Bowsher [email protected] with modifications by Søren Højsgaard, [email protected]
Kristian G. Olesen and Anders L. Madsen (2002): Maximal Prime Subgraph Decomposition of Bayesian Networks. IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS, PART B: CYBERNETICS, VOL. 32, NO. 1, FEBRUARY 2002
## An igraph object g1 <- ug(~a:b + b:c + c:d + d:e + e:f + a:f + b:e, result="igraph") x <- minimal_triang(g1) tt <- ug(~a:b:e:f + b:e:c:d, result="igraph") x <- minimal_triang(g1, tobject=tt) ## g2 is a triangulation of g1 but it is not minimal g2 <- ug(~a:b:e:f + b:c:d:e, result="igraph") x <- minimal_triang(g1, tobject=g2) ## An adjacency matrix g1m <- ug(~a:b + b:c + c:d + d:e + e:f + a:f + b:e, result="matrix") x <- minimal_triangMAT(g1m)## An igraph object g1 <- ug(~a:b + b:c + c:d + d:e + e:f + a:f + b:e, result="igraph") x <- minimal_triang(g1) tt <- ug(~a:b:e:f + b:e:c:d, result="igraph") x <- minimal_triang(g1, tobject=tt) ## g2 is a triangulation of g1 but it is not minimal g2 <- ug(~a:b:e:f + b:c:d:e, result="igraph") x <- minimal_triang(g1, tobject=g2) ## An adjacency matrix g1m <- ug(~a:b + b:c + c:d + d:e + e:f + a:f + b:e, result="matrix") x <- minimal_triangMAT(g1m)
Moralize a directed acyclic graph which means marrying parents and dropping directions.
moralize(object, ...) ## Default S3 method: moralize(object, result = NULL, ...)moralize(object, ...) ## Default S3 method: moralize(object, result = NULL, ...)
object |
A directed acyclic graph represented either as an |
... |
Additional arguments, currently not used |
result |
The representation of the moralized graph. When NULL the representation will be the same as the input object. |
A moralized graph represented either as an igraph, a
dense matrix or a sparse dgCMatrix.
The workhorse is the moralizeMAT function.
Søren Højsgaard, [email protected]
mcs, junction_tree, rip,
ug, dag
daG <- dag(~me+ve,~me+al,~ve+al,~al+an,~al+st,~an+st) moralize(daG) daG <- dag(~me+ve,~me+al,~ve+al,~al+an,~al+st,~an+st, result="matrix") moralizeMAT(daG) if (require(igraph)){ M <- matrix(c(1,2,3,3), nrow=2) G <- graph.edgelist(M) G V(G)$name moralize(G) }daG <- dag(~me+ve,~me+al,~ve+al,~al+an,~al+st,~an+st) moralize(daG) daG <- dag(~me+ve,~me+al,~ve+al,~al+an,~al+st,~an+st, result="matrix") moralizeMAT(daG) if (require(igraph)){ M <- matrix(c(1,2,3,3), nrow=2) G <- graph.edgelist(M) G V(G)$name moralize(G) }
Finding a junction tree representation of the MPD (maximal prime subgraph decomposition) of an undirected graph The maximal prime subgraph decomposition of a graph is the smallest subgraphs into which the graph can be decomposed.
mpd(object, tobject = minimal_triang(object), details = 0) ## Default S3 method: mpd(object, tobject = triangulate(object), details = 0) mpdMAT(amat, tamat = minimal_triangMAT(amat), details = 0)mpd(object, tobject = minimal_triang(object), details = 0) ## Default S3 method: mpd(object, tobject = triangulate(object), details = 0) mpdMAT(amat, tamat = minimal_triangMAT(amat), details = 0)
object |
An undirected graph; an igraph or an adjacency matrix. |
tobject |
Any minimal triangulation of object; an igraph or an adjacency matrix. |
details |
The amount of details to be printed. |
amat |
An undirected graph; a symmetric adjacency matrix |
tamat |
Any minimal triangulation of object; a symmetric adjacency matrix |
A list with components "nodes", "cliques", "separators", "parents", "children", "nLevels". The component "cliques" defines the subgraphs.
Clive Bowsher [email protected] with modifications by Søren Højsgaard, [email protected]
Kristian G. Olesen and Anders L. Madsen (2002): Maximal Prime Subgraph Decomposition of Bayesian Networks. IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS, PART B: CYBERNETICS, VOL. 32, NO. 1, FEBRUARY 2002
mcs, mcsMAT,
minimal_triang, minimal_triangMAT,
rip, ripMAT, triangulate,
triangulateMAT
## Maximal prime subgraph decomposition g1 <- ug(~ a:b + b:c + c:d + d:e + e:f + a:f + b:e) if (interactive()) plot(g1) x <- mpd(g1) ## Maximal prime subgraph decomposition - an adjacency matrix g1m <- ug(~ a:b + b:c + c:d + d:e + e:f + a:f + b:e, result="matrix") x <- mpdMAT(g1m)## Maximal prime subgraph decomposition g1 <- ug(~ a:b + b:c + c:d + d:e + e:f + a:f + b:e) if (interactive()) plot(g1) x <- mpd(g1) ## Maximal prime subgraph decomposition - an adjacency matrix g1m <- ug(~ a:b + b:c + c:d + d:e + e:f + a:f + b:e, result="matrix") x <- mpdMAT(g1m)
Generate a random directed acyclic graph (DAG)
random_dag(V, maxpar = 3, wgt = 0.1)random_dag(V, maxpar = 3, wgt = 0.1)
V |
The set of vertices. |
maxpar |
The maximum number of parents each node can have |
wgt |
A parameter controlling how likely it is for a node to have a certain number of parents; see 'Details'. |
If the maximum number of parents for a node is, say 3 and wgt=0.1, then the probability of the node ending up with 0,1,2,3 parents is proportional to 0.1^0, 0.1^1, 0.1^2, 0.1^3.
An igraph object.
Søren Højsgaard, [email protected]
dg <- random_dag(1:1000, maxpar=5, wgt=.9) table(sapply(vpar(dg),length)) dg <- random_dag(1:1000, maxpar=5, wgt=.5) table(sapply(vpar(dg),length)) dg <- random_dag(1:1000, maxpar=5, wgt=.1) table(sapply(vpar(dg),length))dg <- random_dag(1:1000, maxpar=5, wgt=.9) table(sapply(vpar(dg),length)) dg <- random_dag(1:1000, maxpar=5, wgt=.5) table(sapply(vpar(dg),length)) dg <- random_dag(1:1000, maxpar=5, wgt=.1) table(sapply(vpar(dg),length))
A RIP (running intersection property) ordering of the cliques is also called a perfect ordering. If the graph is not chordal, then no such ordering exists.
rip(object, ...) ## Default S3 method: rip(object, root = NULL, nLevels = NULL, ...) ripMAT(amat, root = NULL, nLevels = rep(2, ncol(amat))) junction_tree(object, ...) ## Default S3 method: junction_tree(object, nLevels = NULL, ...) junction_treeMAT(amat, nLevels = rep(2, ncol(amat)), ...) jTree(object, ...)rip(object, ...) ## Default S3 method: rip(object, root = NULL, nLevels = NULL, ...) ripMAT(amat, root = NULL, nLevels = rep(2, ncol(amat))) junction_tree(object, ...) ## Default S3 method: junction_tree(object, nLevels = NULL, ...) junction_treeMAT(amat, nLevels = rep(2, ncol(amat)), ...) jTree(object, ...)
object |
An undirected graph represented either as an |
... |
Additional arguments; currently not used |
root |
A vector of variables. The first variable in the perfect ordering will be the first variable on 'root'. The ordering of the variables given in 'root' will be followed as far as possible. |
nLevels |
Typically, the number of levels of the variables (nodes) when these are discrete. Used in determining the triangulation using a "minimum clique weight heuristic". See section 'details'. |
amat |
Adjacency matrix |
The RIP ordering of the cliques of a decomposable
(i.e. chordal) graph is obtained by first ordering the
variables linearly with maximum cardinality search (by
mcs). The root argument is transfered to mcs as a
way of controlling which clique will be the first in the RIP
ordering. The junction_tree() (and
junction_tree()) (for "junction tree") is just a wrapper
for a call of triangulate() followed by a call of
rip().
rip returns a list (an object of class
ripOrder. A print method exists for such objects.)
For backward compatibility with downstream packages we have the following synonymous functions:
jTree = junction_tree (Used in rags2ridges)
junctionTree = junction_tree
The workhorse is the ripMAT() function. The
nLevels argument to the rip functions has no
meaning.
Søren Højsgaard, [email protected]
mcs, triangulate,
moralize, ug, dag
uG <- ug(~me:ve + me:al + ve:al + al:an + al:st + an:st) mcs(uG) rip(uG) junction_tree(uG) ## Adjacency matrix uG <- ug(~me:ve:al + al:an:st, result="matrix") mcs(uG) rip(uG) junction_tree(uG) ## Sparse adjacency matrix uG <- ug(c("me", "ve", "al"), c("al", "an", "st"), result="dgCMatrix") mcs(uG) rip(uG) junction_tree(uG) ## Non--decomposable graph uG <- ug(~1:2 + 2:3 + 3:4 + 4:5 + 5:1) mcs(uG) rip(uG) junction_tree(uG)uG <- ug(~me:ve + me:al + ve:al + al:an + al:st + an:st) mcs(uG) rip(uG) junction_tree(uG) ## Adjacency matrix uG <- ug(~me:ve:al + al:an:st, result="matrix") mcs(uG) rip(uG) junction_tree(uG) ## Sparse adjacency matrix uG <- ug(c("me", "ve", "al"), c("al", "an", "st"), result="dgCMatrix") mcs(uG) rip(uG) junction_tree(uG) ## Non--decomposable graph uG <- ug(~1:2 + 2:3 + 3:4 + 4:5 + 5:1) mcs(uG) rip(uG) junction_tree(uG)
This function will triangulate an undirected graph by adding fill-ins.
triangulate(object, ...) ## Default S3 method: triangulate(object, nLevels = NULL, result = NULL, check = TRUE, ...) triang_mcwh(object, ...) triang_elo(object, ...) triang(object, ...) ## Default S3 method: triang(object, control = list(), ...) ## Default S3 method: triang_mcwh(object, nLevels = NULL, result = NULL, check = TRUE, ...) ## Default S3 method: triang_elo(object, order = NULL, result = NULL, check = TRUE, ...) triangulateMAT(amat, nLevels = rep(2, ncol(amat)), ...) triang_mcwhMAT_(amat, nLevels = rep(2, ncol(amat)), ...) triang_eloMAT_(amat, order) triang_eloMAT(amat, order = NULL)triangulate(object, ...) ## Default S3 method: triangulate(object, nLevels = NULL, result = NULL, check = TRUE, ...) triang_mcwh(object, ...) triang_elo(object, ...) triang(object, ...) ## Default S3 method: triang(object, control = list(), ...) ## Default S3 method: triang_mcwh(object, nLevels = NULL, result = NULL, check = TRUE, ...) ## Default S3 method: triang_elo(object, order = NULL, result = NULL, check = TRUE, ...) triangulateMAT(amat, nLevels = rep(2, ncol(amat)), ...) triang_mcwhMAT_(amat, nLevels = rep(2, ncol(amat)), ...) triang_eloMAT_(amat, order) triang_eloMAT(amat, order = NULL)
object |
An undirected graph represented either as
an |
... |
Additional arguments, currently not used. |
nLevels |
The number of levels of the variables (nodes) when these are discrete. Used in determining the triangulation using a "minimum clique weight heuristic". See section 'details'. |
result |
The type (representation) of the result. Possible values are
|
check |
If |
control |
A list controlling the triangulation; see 'examples'. |
order |
Elimation order; a character vector or numeric vector. |
amat |
Adjacency matrix; a (dense) |
There are two type of functions: triang and triangulate
The workhorse is the triangulateMAT function.
The triangulation is made so as the total state space is kept low
by applying a minimum clique weight heuristic: When a fill-in is
necessary, the algorithm will search for an edge to add such that
the complete set to be formed will have as small a state-space as
possible. It is in this connection that the nLevels values
are used.
Default (when nLevels=NULL) is to take nLevels=2 for all
nodes. If nLevels is the same for all nodes then the heuristic aims
at keeping the clique sizes small.
A triangulated graph represented either as a
(dense) matrix or a (sparse) dgCMatrix.
Care should be taken when specifying nLevels for other
representations than adjacency matrices: Since the triangulateMAT
function is the workhorse, any other representation is transformed to an
adjacency matrix and the order of values in nLevels most come in
the order of the nodes in the adjacency matrix representation.
Currently there is no check for that the graph is undirected.
Søren Højsgaard, [email protected]
ug, dag, mcs,
mcsMAT, rip, ripMAT,
moralize, moralizeMAT
uG1 <- ug(~a:b + b:c + c:d + d:e + e:f + f:a) uG2 <- ug(~a:b + b:c + c:d + d:e + e:f + f:a, result="matrix") uG3 <- ug(~a:b + b:c + c:d + d:e + e:f + f:a, result="dgCMatrix") ## Default triangulation: minimum clique weight heuristic # (default is that each node is given the same weight): tuG1 <- triang(uG1) ## Same as triang_mcwh(uG1) ## Alternative: Triangulation from a desired elimination order # (default is that the order is order of the nodes in the graph): triang(uG1, control=list(method="elo")) ## Same as: triang_elo(uG1) ## More control: Define the number of levels for each node: tuG1 <- triang(uG1, control=list(method="mcwh", nLevels=c(2, 3, 2, 6, 4, 9))) tuG1 <- triang_mcwh(uG1, nLevels=c(2, 3, 2, 6, 4, 9)) tuG1 <- triang(uG1, control=list(method="elo", order=c("a", "e", "f"))) tuG1 <- triang_elo(uG1, order=c("a", "e", "f")) uG1 <- ug(~a:b + b:c + c:d + d:e + e:f + f:a) tuG1 <- triangulate(uG1) ## adjacency matrix uG2 <- ug(~a:b + b:c + c:d + d:e + e:f + f:a, result="matrix") tuG2 <- triangulate(uG2) ## adjacency matrix (sparse) uG2 <- ug(~a:b + b:c + c:d + d:e + e:f + f:a, result="dgCMatrix") tuG2 <- triangulate(uG2)uG1 <- ug(~a:b + b:c + c:d + d:e + e:f + f:a) uG2 <- ug(~a:b + b:c + c:d + d:e + e:f + f:a, result="matrix") uG3 <- ug(~a:b + b:c + c:d + d:e + e:f + f:a, result="dgCMatrix") ## Default triangulation: minimum clique weight heuristic # (default is that each node is given the same weight): tuG1 <- triang(uG1) ## Same as triang_mcwh(uG1) ## Alternative: Triangulation from a desired elimination order # (default is that the order is order of the nodes in the graph): triang(uG1, control=list(method="elo")) ## Same as: triang_elo(uG1) ## More control: Define the number of levels for each node: tuG1 <- triang(uG1, control=list(method="mcwh", nLevels=c(2, 3, 2, 6, 4, 9))) tuG1 <- triang_mcwh(uG1, nLevels=c(2, 3, 2, 6, 4, 9)) tuG1 <- triang(uG1, control=list(method="elo", order=c("a", "e", "f"))) tuG1 <- triang_elo(uG1, order=c("a", "e", "f")) uG1 <- ug(~a:b + b:c + c:d + d:e + e:f + f:a) tuG1 <- triangulate(uG1) ## adjacency matrix uG2 <- ug(~a:b + b:c + c:d + d:e + e:f + f:a, result="matrix") tuG2 <- triangulate(uG2) ## adjacency matrix (sparse) uG2 <- ug(~a:b + b:c + c:d + d:e + e:f + f:a, result="dgCMatrix") tuG2 <- triangulate(uG2)
This package provides a basis for graphical modelling in R and in particular for other graphical modelling packages, most notably gRim, gRain and gRc.
gRbase provides the following:
Implementation of various graph algorithms, including maximum
cardinality search, maximal prime subgraph decomposition,
triangulation. See the vignette graphs.
Implementation of various "high level" array operations, including
multiplication/division, marginalization, slicing, permutation. See
the vignette ArrayOps.
Implementation of various "low level" array operations. See
the vignette ArrayOpsPrim.
A collection of datasets
A general framework for setting up data and model structures and provide examples for fitting hierarchical log linear models for contingency tables and graphical Gaussian models for the multivariate normal distribution. (Notice: This last part is not maintained / developed further.)
Soren Hojsgaard, Department of Mathematical Sciences, Aalborg University, Denmark
Contributions from Claus Dethlefsen, Clive Bowsher, David Edwards.
Thanks to the other members of the gR initiative, in particular to David Edwards for providing functions for formula-manipulation.
Hojsgaard, S., Edwards, D., Lauritzen, S. (2012) Graphical models with R. Springer. ISBN: 978-1-4614-2298-3
Lauritzen, S. L. (2002). gRaphical Models in R. R News, 3(2)39.
compile and propagate are generic
functions which invoke particular methods which depend on the
class of the first argument
fit(object, ...) compile(object, ...) propagate(object, ...) stepwise(object, ...)fit(object, ...) compile(object, ...) propagate(object, ...) stepwise(object, ...)
object |
An object |
... |
Additional arguments which depends on the class of the object |
The value returned depends on the class of the first argument.
Søren Højsgaard, [email protected]
Højsgaard, Søren; Edwards, David; Lauritzen, Steffen (2012): Graphical Models with R, Springer
Various utility functions for gRbase. Includes 'faster versions' of certain standard R functions.
rhsFormula2list(form) rhsf2list(form) rhsf2vec(form) listify_dots(dots) list2rhsFormula(form) list2rhsf(form) rowmat2list(X) colmat2list(X) matrix2list(X, byrow = TRUE) which.arr.index(X) which_matrix_index(X) rowSumsPrim(X) colSumsPrim(X) colwiseProd(v, X) lapplyV2I(setlist, item) lapplyI2V(setlist, item)rhsFormula2list(form) rhsf2list(form) rhsf2vec(form) listify_dots(dots) list2rhsFormula(form) list2rhsf(form) rowmat2list(X) colmat2list(X) matrix2list(X, byrow = TRUE) which.arr.index(X) which_matrix_index(X) rowSumsPrim(X) colSumsPrim(X) colwiseProd(v, X) lapplyV2I(setlist, item) lapplyI2V(setlist, item)
form |
Formula specification (a right-hand sided formula, a numeric/character vector or a list of vectors). |
dots |
dot-arguments to be turned into a list |
X |
A matrix. |
byrow |
Should the split be by row or by column. |
v |
A vector. |
setlist |
A list of atomic vectors |
item |
An atomic vector |
which.arr.ind: Returns matrix n x 2 matrix with
indices of non-zero entries in matrix X. Notice
which_matrix_index__ is cpp implementation.
colwiseProd: multiplies a vector v and a matrix X
columnwise (as opposed to rowwise which is achieved by
v * X). Hence colwiseProd does the same as
t(v * t(X)) - but it does so faster for numeric values.
lapplyV2I: same as but much faster than lapply(setlist, function(elt) match(elt, item))
lapplyI2V: same as but faster than lapply(setlist, function(elt) item[elt])
Søren Højsgaard, [email protected]
## colwiseProd X <- matrix(1:16, nrow=4) v <- 1:4 t(v * t(X)) colwiseProd(v, X) ## Not run: system.time(for (ii in 1:100000) t(v * t(X))) system.time(for (ii in 1:100000) colwiseProd(v, X)) ## End(Not run) setlist <- list(c(1,2,3), c(2,3,4), c(2,4,5)) item <- c(2,3) lapplyV2I(setlist, item) lapply(setlist, function(gg) match(gg, item)) lapplyI2V(setlist, item) lapply(setlist, function(x) item[x]) if (require(microbenchmark)){ microbenchmark( lapplyV2I(setlist, item), lapply(setlist, function(elt) match(elt, item))) microbenchmark::microbenchmark( lapplyI2V(setlist, item), lapply(setlist, function(elt) item[elt])) }## colwiseProd X <- matrix(1:16, nrow=4) v <- 1:4 t(v * t(X)) colwiseProd(v, X) ## Not run: system.time(for (ii in 1:100000) t(v * t(X))) system.time(for (ii in 1:100000) colwiseProd(v, X)) ## End(Not run) setlist <- list(c(1,2,3), c(2,3,4), c(2,4,5)) item <- c(2,3) lapplyV2I(setlist, item) lapply(setlist, function(gg) match(gg, item)) lapplyI2V(setlist, item) lapply(setlist, function(x) item[x]) if (require(microbenchmark)){ microbenchmark( lapplyV2I(setlist, item), lapply(setlist, function(elt) match(elt, item))) microbenchmark::microbenchmark( lapplyI2V(setlist, item), lapply(setlist, function(elt) item[elt])) }
These functions are not intended to be called directly.
This is the Prostate Tumor Gene Expression dataset used in Chung and Keles (2010).
data(prostate)data(prostate)
A list with two components:
Gene expression data. A matrix with 102 rows and 6033 columns.
Class index. A vector with 102 elements.
The prostate dataset consists of 52 prostate tumor and 50 normal samples.
Normal and tumor classes are coded in 0 and 1, respectively, in y vector.
Matrix x is gene expression data and
arrays were normalized, log transformed, and standardized
to zero mean and unit variance across genes as described
in Dettling (2004) and Dettling and Beuhlmann (2002).
See Chung and Keles (2010) for more details.
Singh D, Febbo P, Ross K, Jackson D, Manola J, Ladd C, Tamayo P, Renshaw A, DAmico A, Richie J, Lander E, Loda M, Kantoff P, Golub T, and Sellers W (2002), "Gene expression correlates of clinical prostate cancer behavior", Cancer Cell, Vol. 1, pp. 203–209.
Chung D and Keles S (2010), "Sparse partial least squares classification for high dimensional data", Statistical Applications in Genetics and Molecular Biology, Vol. 9, Article 17.
Dettling M (2004), "BagBoosting for tumor classification with gene expression data", Bioinformatics, Vol. 20, pp. 3583–3593.
Dettling M and Beuhlmann P (2002), "Supervised clustering of genes", Genome Biology, Vol. 3, pp. research0069.1–0069.15.
data(prostate) prostate$x[1:5,1:5] prostate$ydata(prostate) prostate$x[1:5,1:5] prostate$y
Set operations for gRbase and related packages.
maximal_sets(setlist, index = FALSE) minimal_sets(setlist, index = FALSE) remove_redundant(setlist, maximal = TRUE, index = FALSE) is_inset(x, setlist, index = FALSE) filter_maximal_vectors(setlist, index = FALSE) get_subset(x, setlist, all = FALSE) get_superset(x, setlist, all = FALSE) is_subsetof(set, set2) is.subsetof(x, set) subsetof(x, set)maximal_sets(setlist, index = FALSE) minimal_sets(setlist, index = FALSE) remove_redundant(setlist, maximal = TRUE, index = FALSE) is_inset(x, setlist, index = FALSE) filter_maximal_vectors(setlist, index = FALSE) get_subset(x, setlist, all = FALSE) get_superset(x, setlist, all = FALSE) is_subsetof(set, set2) is.subsetof(x, set) subsetof(x, set)
setlist |
List of vectors (representing a set of subsets) |
index |
Logical; should indices (in setlist) be returned or a set of subsets. |
maximal |
Logical; see section 'Details' for a description. |
x, set, set2
|
Vector representing a set. |
all |
Logical; see section 'Details' for a description. |
'setlist' is a list of vectors representing a set of subsets; i.e. V1,...VQ where Vk is a subset of some base set V.
'all' If true, get_superset will return index of all
vectors containing the element; otherwise only the first index is
returned.
is_inset: Checks if the set
x is in one of the Vk's.
remove_redundant: Returns those Vk which are not contained
in other subsets; i.e. gives the maximal sets. If maximal is FALSE
then returns the minimal sets; i.e. Vk is returned if Vk is
contained in one of the other sets Vl and there are no set Vn
contained in Vk.
Notice that the comparisons are made by turning the elements into characters and then comparing these. Hence 1 is identical to "1".
Søren Højsgaard, [email protected]
set <- list(c(1, 2), c(1, 2, 3), c(2, 3, 6), c(2, 4), c(5, 6), 5) el1 <- c(2, 1) el2 <- c(2, 3) el3 <- c(4, 3) el4 <- c(2, 1, 3) maximal_sets(set) minimal_sets(set) remove_redundant(set) remove_redundant(set, maximal=FALSE) is_inset(el1, set) is_inset(el2, set) is_inset(el3, set) get_subset(el1, set) get_subset(el1, set) get_subset(el2, set) get_subset(el3, set) get_superset(el1, set) get_superset(el1, set, all=TRUE) get_superset(el2, set) get_superset(el3, set) is_subsetof(el1, el1) is_subsetof(el1, el2) is_subsetof(el1, el4)set <- list(c(1, 2), c(1, 2, 3), c(2, 3, 6), c(2, 4), c(5, 6), 5) el1 <- c(2, 1) el2 <- c(2, 3) el3 <- c(4, 3) el4 <- c(2, 1, 3) maximal_sets(set) minimal_sets(set) remove_redundant(set) remove_redundant(set, maximal=FALSE) is_inset(el1, set) is_inset(el2, set) is_inset(el3, set) get_subset(el1, set) get_subset(el1, set) get_subset(el2, set) get_subset(el3, set) get_superset(el1, set) get_superset(el1, set, all=TRUE) get_superset(el2, set) get_superset(el3, set) is_subsetof(el1, el1) is_subsetof(el1, el2) is_subsetof(el1, el4)
An undirected graph G can be converted to a dag if G is chordal.
ug2dag(object)ug2dag(object)
object |
An igraph object. |