word_proximity(text.var, terms, grouping.var = NULL, parallel = TRUE, cores = parallel::detectCores()/2)"weight"(x, type = "scale", ...)
NULL
generates
one word list for all text. Also takes a single grouping variable or a list
of 1 or more grouping variables.TRUE
attempts to run the function on
multiple cores. Note that this may not mean a speed boost if you have one
core or if the data set is smaller as the cluster takes time to create.parallel = TRUE
. Default
is half the number of available cores."scale_log"
, "scale"
,
"rev_scale"
, "rev_scale_log"
, "log"
, "sqrt"
,
"scale_sqrt"
, "rev_sqrt"
, "rev_scale_sqrt"
). The
weight type section name (i.e. A_B_C
where A
, B
, and
C
are sections) determines what action will occur. log
will
use log
, sqrt
will use sqrt
,
scale
will standardize the values. rev
will multiply by -1 to
give the inverse sign. This enables a comparison similar to correlations
rather than distance.Returns a list of matrices of proximity measures in the unit of average sentences between words (defaults to scaled).
word_proximity
- Generate proximity measures to ascertain a mean
distance measure between word uses.
weight
- weight Method for word_proximity.
Note that row names are the first word and column names are the
second comparison word. The values for Word A compared to Word B will not
be the same as Word B compared to Word A. This is because, unlike a true
distance measure, word_proximity
's matrix is asymmetrical.
word_proximity
computes the distance by taking each sentence position
for Word A and comparing it to the nearest sentence location for Word B.
The match.terms is character sensitive. Spacing is an important way to grab specific words and requires careful thought. Using "read" will find the words "bread", "read" "reading", and "ready". If you want to search for just the word "read" you'd supply a vector of c(" read ", " reads", " reading", " reader").
## <strong>Not run</strong>: # wrds <- word_list(pres_debates2012$dialogue, # stopwords = c("it's", "that's", Top200Words)) # wrds2 <- tolower(sort(wrds$rfswl[[1]][, 1])) # # (x <- with(pres_debates2012, word_proximity(dialogue, wrds2))) # plot(x) # plot(weight(x)) # plot(weight(x, "rev_scale_log")) # # (x2 <- with(pres_debates2012, word_proximity(dialogue, wrds2, person))) # # ## The spaces around `terms` are important # (x3 <- with(DATA, word_proximity(state, spaste(qcv(the, i))))) # (x4 <- with(DATA, word_proximity(state, qcv(the, i)))) # ## <strong>End(Not run)</strong>
word_proximity