qdap

qdapicon
qdap (Quantitative Discourse Analysis Package) is an R package designed to assist in quantitative discourse analysis. The package stands as a bridge between qualitative transcripts of dialogue and statistical analysis & visualization.

Java Issues

If there is a discrepancy between the R and Java architectures you will have to download the appropriate version of Java compatible with the version of R you're using. For more see Tal Galili's blog post regarding rJava issues.

Development Version

Download the development version of qdap here

Help topics

Project Template

A function to generate a project template of folders, scripts and documents.

new_project
Project Template

Import/Export Data

Functions for importing data and exporting output.

condense
Condense Dataframe Columns
dir_map
Map Transcript Files from a Directory to a Script
mcsv_r mcsv_w
Read/Write Multiple csv Files at a Time
read.transcript
Read Transcripts Into R

Cleaning/Parsing

Function to clean and parse text data.

bracketX
Bracket Parsing
bracketXtract
Bracket Parsing
genX
Bracket Parsing
genXtract
Bracket Parsing
beg2char
Grab Begin/End of String to Character
char2end
Grab Begin/End of String to Character
capitalizer
Capitalize Select Words
check_spelling which_misspelled check_spelling_interactive
Check Spelling
correct
Check Spelling
check_text
Check Text For Potential Problems
clean
Remove Escaped Characters
comma_spacer
Ensure Space After Comma
incomplete_replace
Denote Incomplete End Marks With "|"
incomp
Denote Incomplete End Marks With "|"
multigsub mgsub
Multiple gsub
sub_holder
Multiple gsub
name2sex
Names to Gender
potential_NA
Search for Potential Missing Values
qprep
Quick Preparation of Text
replace_abbreviation
Replace Abbreviations
replace_contraction
Replace Contractions
replace_number
Replace Numbers With Text Representation
replace_ordinal
Replace Mixed Ordinal Numbers With Text Representation
replace_symbol
Replace Symbols With Word Equivalents
rm_row
Remove Rows That Contain Markers
rm_empty_row
Remove Rows That Contain Markers
rm_stopwords rm_stop %sw%
Remove Stop Words
scrubber
Clean Imported Text
space_fill
Replace Spaces
spaste
Add Leading/Trailing Spaces
stemmer stem_words stem2df
Stem Text
Trim
Remove Leading/Trailing White Space

Viewing

Functions to aid data viewing.

htruncdf
Dataframe Viewing
ltruncdf
Dataframe Viewing
lview
Dataframe Viewing
qview
Dataframe Viewing
truncdf
Dataframe Viewing
left_just right_just
Text Justification
Search boolean_search %bs%
Search Columns of a Data Frame
strWrap
Wrap Character Strings to Format Paragraphs

Chaining

Functions chain together qdap data and functions.

qdap_df Text Text<-
Create qdap Specific Data Structure
%&%
qdap Chaining
%>%
qdap Chaining

Generic qdap Methods

Functions to aid in selection of data elements.

counts
Generic Counts Method
preprocessed
Generic Preprocessed Method
proportions
Generic Proportions Method
scores
Generic Scores Method
visual
Generic visual Method

Reshaping

Functions to reshape data.

adjacency_matrix adjmat
Takes a Matrix and Generates an Adjacency Matrix
colSplit
Separate a Column Pasted by paste2
colsplit2df lcolsplit2df
Wrapper for colSplit that Returns Dataframe(s)
colcomb2class
Combine Columns to Class
gantt
Gantt Durations
plot_gantt_base
Gantt Durations
gantt_rep
Generate Unit Spans for Repeated Measures
key_merge
Merge Demographic Information with Person/Text Transcript
paste2 colpaste2df
Paste an Unspecified Number Of Text Columns
prop
Convert Raw Numeric Matrix or Data Frame to Proportions
qcombine
Combine Columns
sentSplit sent_detect
Sentence Splitting
TOT
Sentence Splitting
sentCombine
Sentence Splitting
speakerSplit
Break and Stretch if Multiple Persons per Cell
trans_context
Print Context Around Indices

Word Extraction/Comparison

Functions for working with dialogue at the word level.

all_words
Searches Text Column for Words
bag_o_words unbag breaker word_split
Bag of Words
chunker
Break Text Into Ordered Word Chunks
common
Find Common Words Between Groups
exclude exclude.DocumentTermMatrix %ex% exclude.TermDocumentMatrix exclude.default exclude.list exclude.wfm
Exclude Elements From a Vector
freq_terms
Find Frequent Terms
ngrams
Generate ngrams
strip
Strip Text
synonyms syn synonyms_frame syn_frame
Search For Synonyms
word_associate
Find Associated Words
word_diff_list
Differences In Word Use Between Groups
word_list
Raw Word Lists/Frequency Counts

Coding Tools

cm functions are code matrix functions. These functions are used for coding and reshaping transcripts, dataframes, and time spans for further use in analysis and visualization.

summary.cmspans
Summarize a cmspans object
cm_range.temp
Range Code Sheet
cm_df.transcript
Transcript With Word Number
cm_time.temp
Time Span Code Sheet
cm_df.temp
Break Transcript Dialogue into Blank Code Matrix
cm_2long
A Generic to Long Function
cm_df2long
Transform Codes to Start-End Durations
cm_range2long
Transform Codes to Start-End Durations
cm_time2long
Transform Codes to Start-End Times
cm_code.blank
Blank Code Transformation
cm_code.combine
Combine Codes
cm_code.exclude
Exclude Codes
cm_code.overlap
Find Co-occurrence Between Codes
cm_code.transform
Transform Codes
cm_distance
Distance Matrix Between Codes
cm_dummy2long
Convert cm_combine.dummy Back to Long
cm_long2dummy
Stretch and Dummy Code cm_xxx2long

tm Package Integration

Functions for working between the tm and qdap packages.

as.tdm
tm Package Compatibility Tools: Apply to or Convert to/from Term Document
apply_as_df
tm Package Compatibility Tools: Apply to or Convert to/from Term Document
apply_as_tm
tm Package Compatibility Tools: Apply to or Convert to/from Term Document
as.Corpus
tm Package Compatibility Tools: Apply to or Convert to/from Term Document
as.Corpus.DocumentTermMatrix
tm Package Compatibility Tools: Apply to or Convert to/from Term Document
as.Corpus.TermDocumentMatrix
tm Package Compatibility Tools: Apply to or Convert to/from Term Document
as.Corpus.default
tm Package Compatibility Tools: Apply to or Convert to/from Term Document
as.Corpus.sent_split
tm Package Compatibility Tools: Apply to or Convert to/from Term Document
as.Corpus.wfm
tm Package Compatibility Tools: Apply to or Convert to/from Term Document
as.DocumentTermMatrix
tm Package Compatibility Tools: Apply to or Convert to/from Term Document
as.TermDocumentMatrix
tm Package Compatibility Tools: Apply to or Convert to/from Term Document
as.data.frame.Corpus
tm Package Compatibility Tools: Apply to or Convert to/from Term Document
as.dtm
tm Package Compatibility Tools: Apply to or Convert to/from Term Document
as.dtm.Corpus
tm Package Compatibility Tools: Apply to or Convert to/from Term Document
as.dtm.character
tm Package Compatibility Tools: Apply to or Convert to/from Term Document
as.dtm.default
tm Package Compatibility Tools: Apply to or Convert to/from Term Document
as.dtm.wfm
tm Package Compatibility Tools: Apply to or Convert to/from Term Document
as.tdm.Corpus
tm Package Compatibility Tools: Apply to or Convert to/from Term Document
as.tdm.character
tm Package Compatibility Tools: Apply to or Convert to/from Term Document
as.tdm.default
tm Package Compatibility Tools: Apply to or Convert to/from Term Document
as.tdm.wfm
tm Package Compatibility Tools: Apply to or Convert to/from Term Document

Counts/Descriptives

Functions for word counts and descriptive statistics.

dist_tab
SPSS Style Frequency Tables
multiscale
Nested Standardization
object_pronoun_type
Count Object Pronouns Per Grouping Variable
outlier_detect
Detect Outliers in Text
outlier_labeler
Locate Outliers in Numeric String
pos
Parts of Speech Tagging
pos_by
Parts of Speech Tagging
pos_tags
Parts of Speech Tagging
pronoun_type
Count Object/Subject Pronouns Per Grouping Variable
question_type
Count of Question Type
subject_pronoun_type
Count Subject Pronouns Per Grouping Variable
syllable_sum
Syllabication
combo_syllable_sum
Syllabication
polysyllable_sum
Syllabication
syllable_count
Syllabication
termco termco_d
Search For and Count Terms
term_match
Search For and Count Terms
termco2mat
Search For and Count Terms
termco_c
Combine Columns from a termco Object
wfm wfdf as.wfm weight.wfdf weight.wfm wfm_combine wfm_expanded wfm.wfdf wfm.character wfm.factor as.wfm.Corpus as.wfm.DocumentTermMatrix as.wfm.TermDocumentMatrix as.wfm.data.frame as.wfm.default as.wfm.matrix as.wfm.wfdf wfm.Corpus
Word Frequency Matrix
word_count wc
Word Counts
character_count
Word Counts
character_table char_table
Word Counts
word_stats
Descriptive Word Statistics

Measures

Word measures and scoring.

automated_readability_index coleman_liau flesch_kincaid fry linsear_write SMOG
Readability Measures
Dissimilarity
Dissimilarity Statistics
diversity
Diversity Statistics
formality
Formality Score
kullback_leibler
Kullback Leibler Statistic
polarity
Polarity Score (Sentiment Analysis)
word_cor
Find Correlated Words
word_proximity weight.word_proximity
Proximity Matrix Between Words

qdap Tools

Tools to assist in transcript/discourse analysis.

blank2NA
Replace Blanks in a dataframe
build_qdap_vignette
Replace Temporary Introduction to qdap Vignette
duplicates
Find Duplicated Words in a Text String
qcv
Quick Character Vector
replacer
Replace Cells in a Matrix or Data Frame

Identification

Identify sentence elements/types.

end_inc
Test for Incomplete Sentences
end_mark end_mark_by
Sentence End Marks
imperative
Intuitively Remark Sentences as Imperative
NAer
Replace Missing Values (NA)

Visualization

Plotting functions.

dispersion_plot
Lexical Dispersion Plot
gradient_cloud
Gradient Word Cloud
gantt_plot
Gantt Plot
gantt_wrap
Gantt Plot
phrase_net
Phrase Nets
plot.character_table
Plots a character_table Object
plot.cmspans
Plots a cmspans object
plot.diversity
Plots a diversity object
plot.formality
Plots a formality Object
plot.gantt
Plots a gantt object
plot.kullback_leibler
Plots a kullback_leibler object
plot.polarity
Plots a polarity Object
plot.pos_by
Plots a pos_by Object
plot.question_type
Plots a question_type Object
plot.rmgantt
Plots a rmgantt object
plot.sent_split
Plots a sent_split Object
plot.sum_cmspans
Plot Summary Stats for a Summary of a cmspans Object
plot.sums_gantt
Plots a sums_gantt object
plot.termco
Plots a termco object
plot.wfdf
Plots a wfdf object
plot.wfm
Plots a wfm object
plot.word_stats
Plots a word_stats object
qheat
Quick Heatmap
qheat.character_table
Quick Heatmap
qheat.default
Quick Heatmap
qheat.diversity
Quick Heatmap
qheat.pos_by
Quick Heatmap
qheat.question_type
Quick Heatmap
qheat.termco
Quick Heatmap
qheat.word_stats
Quick Heatmap
rank_freq_mplot
Rank Frequency Plot
rank_freq_plot
Rank Frequency Plot
tot_plot
Visualize Word Length by Turn of Talk
trans_cloud
Word Clouds by Grouping Variable
trans_venn
Venn Diagram by Grouping Variable
word_network_plot
Word Network Plot

Network Plots

Network Plots for qdap Objects.

discourse_map
Discourse Mapping
Network
Generic Network Method
Network.formality
Network Formality
Network.polarity
Network Polarity
qtheme
Add themes to a Network object.
theme_badkitchen
Add themes to a Network object.
theme_cafe
Add themes to a Network object.
theme_duskheat
Add themes to a Network object.
theme_grayscale
Add themes to a Network object.
theme_greyscale
Add themes to a Network object.
theme_hipster
Add themes to a Network object.
theme_nightheat
Add themes to a Network object.
theme_norah
Add themes to a Network object.

Cumulative Plots

Network Plots for qdap Objects.

cumulative
Cumulative Scores
cumulative.animated_formality
Cumulative Scores
cumulative.animated_polarity
Cumulative Scores
cumulative.combo_syllable_sum
Cumulative Scores
cumulative.end_mark
Cumulative Scores
cumulative.formality
Cumulative Scores
cumulative.polarity
Cumulative Scores
cumulative.pos
Cumulative Scores
cumulative.pos_by
Cumulative Scores
cumulative.syllable_freq
Cumulative Scores

Animation

Animate qdap Objects.

Animate
Generic Animate Method
Animate.discourse_map
Discourse Map
Animate.formality
Animate Formality
Animate.gantt
Gantt Durations
Animate.gantt_plot
Gantt Plot
Animate.polarity
Animate Polarity
vertex_apply edge_apply
Apply Parameter to List of Igraph Vertices/Edges

Print Functions

print.all_words
Prints an all_words Object
print.adjacency_matrix
Prints an adjacency_matrix Object
print.boolean_qdap
Prints a boolean_qdap object
print.character_table
Prints a character_table object
print.cm_distance
Prints a cm_distance Object
print.colsplit2df
Prints a colsplit2df Object.
print.Dissimilarity
Prints a Dissimilarity object
print.diversity
Prints a diversity object
print.formality
Prints a formality Object
print.kullback_leibler
Prints a kullback_leibler Object.
print.ngrams
Prints an ngrams object
print.polarity
Prints an polarity Object
print.pos
Prints a pos Object.
print.pos_by
Prints a pos_by Object.
print.qdap_context
Prints a qdap_context object
print.question_type
Prints a question_type object
print.sent_split
Prints a sent_split object
print.sum_cmspans
Prints a sum_cmspans object
print.sums_gantt
Prints a sums_gantt object
print.termco
Prints a termco object.
print.wfm
Prints a wfm Object
print.word_associate
Prints a word_associate object
print.word_list
Prints a word_list Object
print.word_stats
Prints a word_stats object

Data

Data sets included in qdap and used in examples.

DATA
Fictitious Classroom Dialogue
DATA2
Fictitious Repeated Measures Classroom Dialogue
DATA.SPLIT
Fictitious Split Sentence Classroom Dialogue
pres_debates2012
2012 U.S. Presidential Debates
pres_debate_raw2012
First 2012 U.S. Presidential Debate
mraja1
Romeo and Juliet: Act 1 Dialogue Merged with Demographics
mraja1spl
Romeo and Juliet: Act 1 Dialogue Merged with Demographics and Split
raj.act.1
Romeo and Juliet: Act 1
raj.act.2
Romeo and Juliet: Act 2
raj.act.3
Romeo and Juliet: Act 3
raj.act.4
Romeo and Juliet: Act 4
raj.act.5
Romeo and Juliet: Act 5
raj.demographics
Romeo and Juliet Demographics
raj
Romeo and Juliet (Unchanged & Complete)
rajPOS
Romeo and Juliet Split in Parts of Speech
rajSPLIT
Romeo and Juliet (Complete & Split)
raw.time.span
Minimal Raw Time Span Data Set
sample.time.span
Minimal Time Span Data Set

Vignettes

Dependencies

Depends: ggplot2, qdapDictionaries, qdapRegex, qdapTools, RColorBrewer
Imports: chron, dplyr, gdata, gender, grid, gridExtra, igraph, NLP, openNLP, parallel, plotrix, RCurl, reports, reshape2, scales, stringdist, tm, tools, venneuler, wordcloud, xlsx, XML
Suggests: koRpus, knitr, lda, proxy, stringi, SnowballC, testthat
Extends:

Author

Tyler W. Rinker

Contributor

Dason Kurkiewicz

Contributor

Bryan Goodrich