cm_distance(dataframe, pvals = c(TRUE, FALSE), replications = 1000, parallel = TRUE, extended.output = TRUE, time.var = TRUE, code.var = "code", causal = FALSE, start.var = "start", end.var = "end", cores = detectCores()/2)
cm_range2long
; cm_df2long
; cm_time2long
).TRUE
pvalues
will be calculated for the combined (main) output for all repeated measures
from simulated resampling of the data. If the second element is TRUE
pvalues will be calculated for the individual (extended) repeated measures
output from simulated resampling of the data. Default is to calculate
pvalues for the main output but not for the extended output. This process
involves multiple resampling of the data and is a time consuming process. It
may take from a few minutes to days to calculate the pvalues depending on the
number of all codes use, number of different codes and number of
replications
.pvals
is TRUE
. It is recommended
that this value be no lower than 1000. Failure to use enough replications
may result in unreliable pvalues.TRUE
runs the cm_distance
on
multiple cores (if available). This will generally be effective with most
data sets, given there are repeated measures, because of the large number of
simulations. Default uses 1/2 of the available cores.TRUE
the information on individual
repeated measures is calculated in addition to the aggregated repeated
measures results for the main output.TRUE
measures the distance between x and y
given that x must precede y. That is, only those y_i
that begin after
the x_i
has begun will be considered, as it is assumed that x precedes
y. If FALSE
x is not assumed to precede y. The closest y_i
(either its beginning or end) is calculated to x_i
(either it's
beginning or end).parallel = TRUE
. Default is to use half of the available cores.An object of the class "cm_distance"
. This is a list with the
following components:
pvalsA logical indication of whether pvalues were calculated
replicationsInteger value of number of replications used
extended.outputAn optional list of individual repeated measures
information
main.outputA list of aggregated repeated measures information
adj.alphaAn adjusted alpha level (based on \alpha = .05
) for
the estimated p-values using the upper end of the confidence interval around
the p-values
Within the lists of extended.output and list of the main.output are the following items:
meanA distance matrix of average distances between codes sdA matrix of standard deviations of distances between codes nA matrix of counts of distances between codes stan.meanA matrix of standardized values of distances between codes. The closer a value is to zero the closer two codes relate. pvalueA n optional matrix of simulated pvalues associated with the mean distances
Generate distance measures to ascertain a mean distance measure between codes.
Note that row names are the first code and column names are the
second comparison code. The values for Code A compared to Code B will not be
the same as Code B compared to Code A. This is because, unlike a true
distance measure, cm_distance's matrix is asymmetrical. cm_distance
computes the distance by taking each span (start and end) for Code A and
comparing it to the nearest start or end for Code B.
p-values are estimated and thus subject to error. More replications decreases the error. Use:
p +/- (1.96 * \sqrt[\alpha * (1-\alpha)/n])
to adjust the confidence in the estimated p-values based on the number of replications.
## <strong>Not run</strong>: # foo <- list( # AA = qcv(terms="02:03, 05"), # BB = qcv(terms="1:2, 3:10"), # CC = qcv(terms="1:9, 100:150") # ) # # foo2 <- list( # AA = qcv(terms="40"), # BB = qcv(terms="50:90"), # CC = qcv(terms="60:90, 100:120, 150"), # DD = qcv(terms="") # ) # # (dat <- cm_2long(foo, foo2, v.name = "time")) # plot(dat) # (out <- cm_distance(dat, replications=100)) # names(out) # names(out$main.output) # out$main.output # out$extended.output # print(out, new.order = c(3, 2, 1)) # print(out, new.order = 3:2) # #======================================== # x <- list( # transcript_time_span = qcv(00:00 - 1:12:00), # A = qcv(terms = "2.40:3.00, 6.32:7.00, 9.00, # 10.00:11.00, 59.56"), # B = qcv(terms = "3.01:3.02, 5.01, 19.00, 1.12.00:1.19.01"), # C = qcv(terms = "2.40:3.00, 5.01, 6.32:7.00, 9.00, 17.01") # ) # (dat <- cm_2long(x)) # plot(dat) # (a <- cm_distance(dat, causal=TRUE, replications=100)) # # ## Plotting as a network graph # datA <- list( # A = qcv(terms="02:03, 05"), # B = qcv(terms="1:2, 3:10, 45, 60, 200:206, 250, 289:299, 330"), # C = qcv(terms="1:9, 47, 62, 100:150, 202, 260, 292:299, 332"), # D = qcv(terms="10:20, 30, 38:44, 138:145"), # E = qcv(terms="10:15, 32, 36:43, 132:140"), # F = qcv(terms="1:2, 3:9, 10:15, 32, 36:43, 45, 60, 132:140, 250, 289:299"), # G = qcv(terms="1:2, 3:9, 10:15, 32, 36:43, 45, 60, 132:140, 250, 289:299"), # H = qcv(terms="20, 40, 60, 150, 190, 222, 255, 277"), # I = qcv(terms="20, 40, 60, 150, 190, 222, 255, 277") # ) # # datB <- list( # A = qcv(terms="40"), # B = qcv(terms="50:90, 110, 148, 177, 200:206, 250, 289:299"), # C = qcv(terms="60:90, 100:120, 150, 201, 244, 292"), # D = qcv(terms="10:20, 30, 38:44, 138:145"), # E = qcv(terms="10:15, 32, 36:43, 132:140"), # F = qcv(terms="10:15, 32, 36:43, 132:140, 148, 177, 200:206, 250, 289:299"), # G = qcv(terms="10:15, 32, 36:43, 132:140, 148, 177, 200:206, 250, 289:299"), # I = qcv(terms="20, 40, 60, 150, 190, 222, 255, 277") # ) # # (datC <- cm_2long(datA, datB, v.name = "time")) # plot(datC) # (out2 <- cm_distance(datC, replications=1250)) # # plot(out2) # plot(out2, label.cex=2, label.dist=TRUE, digits=5) # ## <strong>End(Not run)</strong>
print.cm_distance