Dedoose Articles

Article: Best Practices in Excerpting and Coding and Capitalizing on Dedoose Features

Pros and Cons

(Note that this post contains content largely extracted from an article by the authors currently under review for publication. Please contact Dedoose Support Dedoose Support if you wish to cite any content contained herein)

Summary—Context is King…Be a Chunker!

Context is King!

Qualitative data allow us to learn about the rich, nature, complex, and contextualized ways in which our research participants experience their lives…the ‘how’ and ‘why’ of life, beyond the ‘what?’ So, simply, context is king. The first step of the excerpting process involves deciding where an excerpt begins and ends. There are two general styles of excerpting that we’ll call ‘splitting’ and ‘chunking.’ Splitters tend to create smaller excerpts that are tagged with small numbers of codes. Chunkers tend to create larger excerpts and apply multiple codes. The professional academic researchers behind Dedoose strongly encourage more chunking when creating excerpts and, when one does so, you’ll more likely assure you’ve good context and you’ll have set up the project to take full advantage of the Dedoose analytic features.

For Splitters, imagine doing a search and retrieval for commonly coded excerpts after using a splitting style. Results will get you many short excerpts completely out of context. Remember when you are creating excerpts you are viewing or listening to the entire media file, so the context is there and the broader meanings are clear when you are engaged in process. Unfortunately for splitters, when they later review excerpts out of context they often find themselves needing to return to the context to be reminded of the broader meanings…bummer, this can be a real time-sink. This is the primary reason we recommend a chunking style.

For Chunkers, two big benefits:

  • 1—excerpts contain sufficient context to understand why you applied particular codes
  • 2—many Dedoose analytic features, like the code co-occurrence matrix, are far more valuable when you are carrying out your analysis.

If you keep in mind that we collect and analyze qualitative data because of their richness and that smaller numbers of words carry far less meaning—particularly out of context—there is every reason in the world to join the Chunker community and embrace the valuable rich, deep, context in your data….for more, read on!

Excerpting, Coding, and Teamwork (and Inter-Rater Reliability)

Excerpting, Coding, Teamwork and Inter-Rater Reliability Clip Art

At the outset, it is important to keep in mind that creating and tagging activity involves two distinct decisions. The first decision is to determine where an excerpt starts and ends. In a flowing interview transcript, this determination can be challenging as it is often not clear where a complete thought has been reported (Ryan & Bernard, 2003). Not surprisingly, when dealing with such data, different members of a team will commonly have different interpretations of a where a complete thought is represented and, such, where an excerpt should be defined. The second decision is to select the appropriate codes for the given passage based on recognition of the ‘meaning’ contained in the passage and based on the available codes (or evolving code systems—see also, Code Systems are Serious Business. This second decision can be challenging in its own respect depending on the levels of nuance represented by the overall code system, the level of clarity with which team members have defined code application decisions, and the degree to which the code application criteria have been communicated across team members via discussion and documentation (Dey, 1993; Jehn & Doucet, 1996; Jehn & Doucet, 1997). We would argue that, while the first decision, excerpt location, is a matter of style, splitting versus chunking, the second decision points directly to reliability and validity. The inter-rater reliability and validity questions ask, did independent coders decide to apply the same codes based on the content they were viewing, regardless of whether they were dealing with more or fewer excerpts?

Coding Blind to the Work of Collaborators

Collaboration Clipart Example

Following from these distinct decision points in creating and coding excerpts, coding blind in Dedoose is a valuable strategy for discovering the different ways individuals make decisions about where to create and tag excerpts in qualitative content. Using Dedoose filtering capabilities, users can view a document or other media file without the ability to view the work contributed by others. The primary goal here is to get a sense of where team members are independently making decisions about excerpt location. Here’s how you do it:

  1. Each user logs into Dedoose and, before accessing a media file, they filter out the work of others via the Data Set Workspace functions
  2. After filtering, when viewing a media file they only see the work they contributed and can carry on with their own excerpting and tagging activity without any distraction/contamination from the work of others
  3. Each user does their work in this manner on the same media files
  4. The full team can later view the media files with all work showing and can clearly identify any variation in excerpting and coding decisions.

With the information gained from the blind coding exercise, the team is then prepared to discuss excerpting and coding decision criteria and, most importantly, come to consensus on the excerpting style by which all team members should adopt moving forward. It is important to keep in mind that while the ‘style’ with which excerpting is carried out is not as critical to inter-rater reliability and validity as coding decisions, a relatively consistent style across team member can help support any conclusions that may be based on quantification of any excerpting/coding activity.

Using Document Cloning for ‘Apples to Apples’ Comparison


Following the establishment of a consistent excerpting style, and initial pass at using the emerging code system, taking advantage of the Dedoose document cloning feature would be a logical next step. The primary goal of the document cloning functionality is to more closely assess code application decisions without concern for the excerpt decision—in a sense, more of an apples-to-apples comparison of coding. The document cloning feature essentially creates an identical copy of media files with all excerpts in place, facilitating coding in parallel. To use this approach:

  1. One ‘trusted’ team member is assigned responsibility for creating, but not coding, excerpts in a media file
  2. The media file is then cloned and each cloned copy re-titled for each team member (for example doc 1person 1, doc 1person 2, …)
  3. Each team member then access their copy of any cloned media files and assigns codes to the existing excerpts
  4. From these copies there are a variety of ways to compare and contrast the coding decisions made by each team member and to continue, in more depth, the conversation about refining code application criteria toward a comfortable and shared understanding.

The Dedoose Training Center and Testing for Inter-Rater Reliability

Coding blind and independent coding using the cloning feature can continue in an iterative manner until such time when the team feels confident in the overall structure of the coding system and their ability to independently apply codes in a consistent manner. At this point, the team may then wish to more formally assess inter-rater reliability via the Dedoose Training Center (Dedoose 7.5.16, 2017). To make use of the training tests:

  1. One or more team members create and code excerpts within project media files to comprehensively represent variation in the sample data and the meanings to be identified and tagged with codes in the code system
  2. Training Center tests are created by selecting a set of codes on which a test should focus and selecting representative excerpts from the master project (as created in step 1)
  3. Other members of the team, including those involved in the initial test excerpt creation, then take the tests in which they are presented with the set of codes assigned to the test and the excerpt content
  4. Test takers see the excerpt content and are responsible for selecting the appropriate codes for each excerpt in the test
  5. Results include Cohen’s Kappa coefficient for each code, a pooled Kappa for the full set of codes, and detailed information showing each excerpt’s content, the codes applied in the master project, and the codes selected by the test taker.

Where test results show acceptable levels of inter-rater reliability, the team then vocalizes their delight, share ‘high fives’ around the room, and takes the rest of the day off for cocktails and canapes at PI’s expense. However, when test results are less that acceptable, the team then proceeds to review the available information and continue the very important discussion about how the code system and code application criteria are defined as they continue working toward the shared understanding critical to both team confidence and the confidence they will be prepared to instill in the consumers of their work.


Dedoose, 7.5.16 (2017). Inter-rater Reliability: SocioCultural Research Consultants, LLC (

Dey, I. (1993). Qualitative data analysis: A user-friendly guide for social scientists. London, UK: Routledge Kegan Paul.

Jehn, K. A. & Doucet, L. (1996). Developing categories from interview data: Text analysis and multidimensional scaling. Part 1. Cultural Anthropology Methods Journal 8(2), 15–16.

Jehn, K. A. & Doucet, L. (1997). Developing categories for interview data: Consequences of different coding and analysis strategies in understanding text. Part 2. Cultural Anthropology Methods Journal 9(1), 1–7. Ryan, G. W. & Bernard, H. R. (2003). Techniques to identify themes. Field Methods, 15(1), 85-109.