sentSplit. qdap 2.2.0

Usage

sentSplit(dataframe, text.var, rm.var = NULL, endmarks = c("?", ".", "!", "|"), incomplete.sub = TRUE, rm.bracket = TRUE, stem.col = FALSE, text.place = "right", verbose = is.global(2), ...)
sentCombine(text.var, grouping.var = NULL, as.list = FALSE)
TOT(tot)
sent_detect(text.var, endmarks = c("?", ".", "!", "|"), incomplete.sub = TRUE, rm.bracket = TRUE, ...)

Arguments

dataframe: A dataframe that contains the person and text variable.
text.var: The text variable.
rm.var: An optional character vector of 1 or 2 naming the variables that are repeated measures (This will restart the "tot" column).
endmarks: A character vector of endmarks to split turns of talk into sentences.
incomplete.sub: logical. If TRUE detects incomplete sentences and replaces with "|".
rm.bracket: logical. If TRUE removes brackets from the text.
stem.col: logical. If TRUE stems the text as a new column.
text.place: A character string giving placement location of the text column. This must be one of the strings "original", "right" or "left".
verbose: logical. If TRUE select diagnostics from check_text are reported.
grouping.var: The grouping variables. Default NULL generates one word list for all text. Also takes a single grouping variable or a list of 1 or more grouping variables.
as.list: logical. If TRUE returns the output as a list. If FALSE the output is returned as a dataframe.
tot: A tot column from a sentSplit output.
...: Additional options passed to stem2df.

Sentence Splitting

Value

sentSplit - returns a dataframe with turn of talk broken apart into sentences. Optionally a stemmed version of the text variable may be returned as well.

sentCombine - returns a list of vectors with the continuous sentences by grouping.var pasted together. returned as well.

TOT - returns a numeric vector of the turns of talk without sentence sub indexing (e.g. 3.2 become 3).

sent_detect - returns a character vector of sentences split on endmark.

Description

sentSplit - Splits turns of talk into individual sentences (provided proper punctuation is used). This procedure is usually done as part of the data read in and cleaning process.

sentCombine - Combines sentences by the same grouping variable together.

TOT - Convert the tot column from sentSplit to turn of talk index (no sub sentence). Generally, for internal use.

sent_detect - Detect and split sentences on endmark boundaries.

Warning

sentSplit requires the dialogue (text) column to be cleaned in a particular way. The data should contain qdap punctuation marks (c("?", ".", "!", "|")) at the end of each sentence. Additionally, extraneous punctuation such as abbreviations should be removed (see replace_abbreviation). Trailing sentences such as I thought I... will be treated as incomplete and marked with "|" to denote an incomplete/trailing sentence.

Suggestion

It is recommended that the user runs check_text on the output of sentSplit's text column.

Examples

## <strong>Not run</strong>: 
# ## `sentSplit` EXAMPLE:
# (out <- sentSplit(DATA, "state"))
# out %&% check_text()  ## check output text
# sentSplit(DATA, "state", stem.col = TRUE)
# sentSplit(DATA, "state", text.place = "left")
# sentSplit(DATA, "state", text.place = "original")
# sentSplit(raj, "dialogue")[1:20, ]
# 
# ## plotting
# plot(out)
# plot(out, grouping.var = "person")
# 
# out2 <- sentSplit(DATA2, "state", rm.var = c("class", "day"))
# plot(out2)
# plot(out2, grouping.var = "person")
# plot(out2, grouping.var = "person", rm.var = "day")
# plot(out2, grouping.var = "person", rm.var = c("day", "class"))
# 
# ## `sentCombine` EXAMPLE:
# dat <- sentSplit(DATA, "state")
# sentCombine(dat$state, dat$person)
# truncdf(sentCombine(dat$state, dat$sex), 50)
# 
# ## `TOT` EXAMPLE:
# dat <- sentSplit(DATA, "state")
# TOT(dat$tot)
# 
# ## `sent_detect`
# sent_detect(DATA$state)
# ## <strong>End(Not run)</strong>

Author

Dason Kurkiewicz and Tyler Rinker .

Sentence Splitting

Usage

Arguments

Sentence Splitting

Value

Description

Warning

Suggestion

Examples

See also

Author