rm_nchar_words. qdapRegex 0.5.0

Usage

rm_nchar_words(text.var, n, trim = !extract, clean = TRUE, pattern = "@rm_nchar_words", replacement = "", extract = FALSE, dictionary = getOption("regex.library"), ...)

Arguments

text.var: The text variable.
n: The number of letters counted in the word.
trim: logical. If TRUE removes leading and trailing white spaces.
clean: trim logical. If TRUE extra white spaces and escaped character will be removed.
pattern: A character string containing a regular expression (or character string for fixed = TRUE) to be matched in the given character vector (see Details for additional information). Default, @rm_nchar_words uses the rm_nchar_words regex from the regular expression dictionary from the dictionary argument.
replacement: Replacement for matched pattern.
extract: logical. If TRUE the n letter words are extracted into a list of vectors.
dictionary: A dictionary of canned regular expressions to search within if pattern begins with "@rm_".
...: Other arguments passed to gsub.

Value

Returns a character string with n letter words removed.

Description

Remove/replace/extract words that are n letters in length (apostrophes not counted).

Details

The default regular expression used by rm_nchar_words counts letter length, not characters. This means that apostrophes are not include in the character count. This behavior can be altered (to include apostrophes in the character count) by using a secondary regular expression from the regex_usa data (or other dictionary) via (pattern = "@rm_nchar_words2"). See Examples for example usage.

References

The n letter/character word regular expression was taken from: http://stackoverflow.com/a/25243885/1000343

Examples

x <- "This is Jon's dogs' 'bout there in a word Mike's re'y."
rm_nchar_words(x, 4)

[1] "is there in a Mike's re'y."

rm_nchar_words(x, 4, extract=TRUE)

[[1]]
[1] "This"  "Jon's" "dogs'" "'bout" "word" 



## Count characters (apostrophes and letters)
rm_nchar_words(x, 5, extract=TRUE, pattern = "@rm_nchar_words2")

[[1]]
[1] "Jon's" "dogs'" "'bout" "there"



## nchar range
rm_nchar_words(x, "1,2")

[1] "This Jon's dogs' 'bout there word Mike's re'y."


## <strong>Not run</strong>: 
# ## Larger example
# library(qdap)
# rm_nchar_words(hamlet[["dialogue"]], 5, extract=TRUE)
# ## <strong>End(Not run)</strong>

gsub, stri_extract_all_regex Other rm_.functions: as_numeric, as_numeric2, rm_number; as_time, as_time2, rm_time, rm_transcript_time; rm_abbreviation; rm_angle, rm_bracket, rm_bracket_multiple, rm_curly, rm_round, rm_square; rm_between, rm_between_multiple; rm_caps_phrase; rm_caps; rm_citation_tex; rm_citation; rm_city_state_zip; rm_city_state; rm_date; rm_default; rm_dollar; rm_email; rm_emoticon; rm_endmark; rm_hash; rm_non_ascii; rm_non_words; rm_percent; rm_phone; rm_postal_code; rm_repeated_characters; rm_repeated_phrases; rm_repeated_words; rm_tag; rm_title_name; rm_twitter_url, rm_url; rm_white, rm_white_bracket, rm_white_colon, rm_white_comma, rm_white_endmark, rm_white_lead, rm_white_lead_trail, rm_white_multiple, rm_white_punctuation, rm_white_trail; rm_zip

Author

stackoverflow's CharlieB and Tyler Rinker .

Remove/Replace/Extract N Letter Words