word_stats(text.var, grouping.var = NULL, tot = NULL, parallel = FALSE, rm.incomplete = FALSE, digit.remove = FALSE, apostrophe.remove = FALSE, digits = 3, ...)
"word_stats"
object (i.e., the
output of a word_stats
function).NULL
generates
one word list for all text. Also takes a single grouping variable or a list
of 1 or more grouping variables.TRUE
attempts to run the function on
multiple cores. Note that this may not mean a speed boost if you have one
core or if the data set is smaller as the cluster takes time to create
(parallel is slower until approximately 10,000 rows). To reduce run time
pass a "word_stats"
object to the word_stats
function.TRUE
incomplete statements are
removed from calculations in the output.TRUE
removes digits from calculating
the output.TRUE
removes apostrophes from
calculating the output.end_inc
.Returns a list of three descriptive word statistics: tsA data frame of descriptive word statistics by row gtsA data frame of word/sentence statistics per grouping variable:
mpunAn account of sentences with an improper/missing end mark word.elemA data frame with word element columns from gts sent.elemA data frame with sentence element columns from gts omitCounter of omitted sentences for internal use (only included if some rows contained missing values) percentThe value of percent used for plotting purposes. zero.replaceThe value of zero.replace used for plotting purposes. digitsinteger value od number of digits to display; mostly internal use
Transcript apply descriptive word statistics.
Note that a sentence is classified with only one endmark. An imperative sentence is classified only as imperative (not as a state, quest, or exclm as well). If a sentence is both imperative and incomplete the sentence will be counted as incomplete rather than imperative. labeled as both imperative
It is assumed the user has run sentSplit
on their
data, otherwise some counts may not be accurate.
## <strong>Not run</strong>: # word_stats(mraja1spl$dialogue, mraja1spl$person) # # (desc_wrds <- with(mraja1spl, word_stats(dialogue, person, tot = tot))) # # ## Recycle for speed boost # with(mraja1spl, word_stats(desc_wrds, person, tot = tot)) # # scores(desc_wrds) # counts(desc_wrds) # htruncdf(counts(desc_wrds), 15, 6) # plot(scores(desc_wrds)) # plot(counts(desc_wrds)) # # names(desc_wrds) # htruncdf(desc_wrds$ts, 15, 5) # htruncdf(desc_wrds$gts, 15, 6) # desc_wrds$mpun # desc_wrds$word.elem # desc_wrds$sent.elem # plot(desc_wrds) # plot(desc_wrds, label=TRUE, lab.digits = 1) # # ## Correlation Visualization # qheat(cor(scores(desc_wrds)[, -1]), diag.na = TRUE, by.column =NULL, # low = "yellow", high = "red", grid = FALSE) # # ## Parallel (possible speed boost) # with(mraja1spl, word_stats(dialogue, list(sex, died, fam.aff))) # with(mraja1spl, word_stats(dialogue, list(sex, died, fam.aff), # parallel = TRUE)) # # ## Recycle for speed boost # word_stats(desc_wrds, mraja1spl$sex) # ## <strong>End(Not run)</strong>
end_inc