sentSplit(dataframe, text.var, rm.var = NULL, endmarks = c("?", ".", "!", "|"), incomplete.sub = TRUE, rm.bracket = TRUE, stem.col = FALSE, text.place = "right", verbose = is.global(2), ...)sentCombine(text.var, grouping.var = NULL, as.list = FALSE)TOT(tot)sent_detect(text.var, endmarks = c("?", ".", "!", "|"), incomplete.sub = TRUE, rm.bracket = TRUE, ...)
TRUE
detects incomplete sentences
and replaces with "|"
.TRUE
removes brackets from the text.TRUE
stems the text as a new column."original"
, "right"
or
"left"
.TRUE
select diagnostics from
check_text
are reported.NULL
generates
one word list for all text. Also takes a single grouping variable or a list
of 1 or more grouping variables.TRUE
returns the output as a list. If
FALSE
the output is returned as a dataframe.sentSplit
output.stem2df
.sentSplit
- returns a dataframe with turn of talk broken apart
into sentences. Optionally a stemmed version of the text variable may be
returned as well.
sentCombine
- returns a list of vectors with the continuous
sentences by grouping.var pasted together.
returned as well.
TOT
- returns a numeric vector of the turns of talk without
sentence sub indexing (e.g. 3.2 become 3).
sent_detect
- returns a character vector of sentences split on
endmark.
sentSplit
- Splits turns of talk into individual sentences (provided
proper punctuation is used). This procedure is usually done as part of the
data read in and cleaning process.
sentCombine
- Combines sentences by the same grouping variable together.
TOT
- Convert the tot column from sentSplit
to
turn of talk index (no sub sentence). Generally, for internal use.
sent_detect
- Detect and split sentences on endmark boundaries.
sentSplit
requires the dialogue (text)
column to be cleaned in a particular way. The data should contain qdap
punctuation marks (c("?", ".", "!", "|")
) at the end of each sentence.
Additionally, extraneous punctuation such as abbreviations should be removed
(see replace_abbreviation
).
Trailing sentences such as I thought I... will be treated as
incomplete and marked with "|"
to denote an incomplete/trailing
sentence.
It is recommended that the user runs check_text
on the
output of sentSplit
's text column.
## <strong>Not run</strong>: # ## `sentSplit` EXAMPLE: # (out <- sentSplit(DATA, "state")) # out %&% check_text() ## check output text # sentSplit(DATA, "state", stem.col = TRUE) # sentSplit(DATA, "state", text.place = "left") # sentSplit(DATA, "state", text.place = "original") # sentSplit(raj, "dialogue")[1:20, ] # # ## plotting # plot(out) # plot(out, grouping.var = "person") # # out2 <- sentSplit(DATA2, "state", rm.var = c("class", "day")) # plot(out2) # plot(out2, grouping.var = "person") # plot(out2, grouping.var = "person", rm.var = "day") # plot(out2, grouping.var = "person", rm.var = c("day", "class")) # # ## `sentCombine` EXAMPLE: # dat <- sentSplit(DATA, "state") # sentCombine(dat$state, dat$person) # truncdf(sentCombine(dat$state, dat$sex), 50) # # ## `TOT` EXAMPLE: # dat <- sentSplit(DATA, "state") # TOT(dat$tot) # # ## `sent_detect` # sent_detect(DATA$state) # ## <strong>End(Not run)</strong>
bracketX
,
incomplete_replace
,
stem2df
,
TOT