rm_nchar_words(text.var, n, trim = !extract, clean = TRUE, pattern = "@rm_nchar_words", replacement = "", extract = FALSE, dictionary = getOption("regex.library"), ...)
TRUE
removes leading and trailing white
spaces.TRUE
extra white spaces and escaped
character will be removed.fixed = TRUE
) to be matched in the given
character vector (see Details for additional information). Default,
@rm_nchar_words
uses the rm_nchar_words
regex from the regular
expression dictionary from the dictionary
argument.pattern
.TRUE
the n letter words are extracted into a
list of vectors.pattern
begins with "@rm_"
.gsub
.Remove/replace/extract words that are n letters in length (apostrophes not counted).
The default regular expression used by rm_nchar_words
counts
letter length, not characters. This means that apostrophes are not include
in the character count. This behavior can be altered (to include apostrophes
in the character count) by using a secondary regular expression from the
regex_usa
data (or other dictionary) via
(pattern = "@rm_nchar_words2"
). See Examples for example
usage.
The n letter/character word regular expression was taken from: http://stackoverflow.com/a/25243885/1000343
x <- "This is Jon's dogs' 'bout there in a word Mike's re'y." rm_nchar_words(x, 4)[1] "is there in a Mike's re'y."rm_nchar_words(x, 4, extract=TRUE)[[1]] [1] "This" "Jon's" "dogs'" "'bout" "word"## Count characters (apostrophes and letters) rm_nchar_words(x, 5, extract=TRUE, pattern = "@rm_nchar_words2")[[1]] [1] "Jon's" "dogs'" "'bout" "there"## nchar range rm_nchar_words(x, "1,2")[1] "This Jon's dogs' 'bout there word Mike's re'y."## <strong>Not run</strong>: # ## Larger example # library(qdap) # rm_nchar_words(hamlet[["dialogue"]], 5, extract=TRUE) # ## <strong>End(Not run)</strong>
gsub
,
stri_extract_all_regex
Other rm_.functions: as_numeric
,
as_numeric2
, rm_number
;
as_time
, as_time2
,
rm_time
, rm_transcript_time
;
rm_abbreviation
; rm_angle
,
rm_bracket
,
rm_bracket_multiple
,
rm_curly
, rm_round
,
rm_square
; rm_between
,
rm_between_multiple
;
rm_caps_phrase
; rm_caps
;
rm_citation_tex
; rm_citation
;
rm_city_state_zip
;
rm_city_state
; rm_date
;
rm_default
; rm_dollar
;
rm_email
; rm_emoticon
;
rm_endmark
; rm_hash
;
rm_non_ascii
; rm_non_words
;
rm_percent
; rm_phone
;
rm_postal_code
;
rm_repeated_characters
;
rm_repeated_phrases
;
rm_repeated_words
; rm_tag
;
rm_title_name
;
rm_twitter_url
, rm_url
;
rm_white
, rm_white_bracket
,
rm_white_colon
,
rm_white_comma
,
rm_white_endmark
,
rm_white_lead
,
rm_white_lead_trail
,
rm_white_multiple
,
rm_white_punctuation
,
rm_white_trail
; rm_zip