Quantitative narrative analysis software options compared: PC-ACE and CAQDAS (ATLAS.ti, MAXqda, and NVivo)

This paper shows how to carry out quantitative narrative analysis (QNA) with different text analysis software (PC-ACE, Program for Computer-Assisted Coding of Events, and various CAQDAS programs, Computer-Assisted Qualitative Data Analysis Software: ATLAS.ti, MAXQDA, and NVivo). QNA is a methodological approach to narrative texts that exploits invariant properties of narrative (namely, a “story grammar”, based on actors, actions, and their attributes) to make a statistical analysis of words possible. In comparing PC-ACE and CAQDAS, the paper leads the reader through the steps involved in setting up a grammar, in data entry, and in data query. A careful comparison of limits and possibilities of the two types of software will allow the reader interested in QNA to make an informed choice between a full implementation of QNA in a specialized but unknown software (PC-ACE) and a limited implementation in any of the widely used and popular CAQDAS programs.

1 Quantitative narrative analysis (QNA) Quantitative narrative analysis (QNA) is a social science research method for the analysis of narrative texts (Franzosi 2010).Like content analysis, the method traditionally used in the social sciences to extract quantitative information from texts, QNA is quantitative.It aims at turning words into numbers (Franzosi 2004).It does so via a coding scheme (a "story grammar"), made up of coding categories (the objects of the grammar), to which coders assign specific parts of a text.By computing word frequencies of specific coding categories, words are then turned into numbers-numbers to be analyzed with the help of statistical tools.In content analysis, the coding categories reflect the substantive/theoretical interests of the investigator (Franzosi 2008).As a result, these categories change from study to study, albeit with some accumulation of categories within specific areas of investigation (e.g., content analysis of health issues rather than content analysis of media advertisements or of news).Contrary to content analysis, however, QNA's coding categories and coding schemes are based on invariant, structural properties of narrative, namely the sequential organization of narrative (in story and plot) and its simple surface linguistic structure of subject-verb-object (SVO) (where subjects are typically social actors and verbs actions).

Story grammars
Linguistically, a story is typically expressed as the simple micro-level structure.The SVO structure is a general form of the language, and not just of narrative (indeed, "the canonical form of the language"); but in narrative, both Subject and Object are typically social actors or organizations (the Object can also be a "thing", i.e., a physical object) and the Verb is a verb of doing or saying (i.e., a social action).In narrative, the SVO structure has also been referred to as a "story grammar." 1 A story grammar can be as simple as the basic three elements of the "semantic triplet" SVO-the subject, the action, and the object of the action-or very complex, with the addition of a number of modifiers for each element of the SVO (such as type, number, organization, name and last name of the Subject and Object and time, space, reason, outcome, instrument of the Verb).Thus, a story grammar broadly corresponds to the 5 Ws of journalism-Who, What, When, Where, Why-with the potential addition of several more elements (on the 5 Ws, see Franzosi in press).
Relationships between the various objects (or coding categories) of a story grammar can be expressed formally with the help of "rewrite rules."Through a rewrite rule, we can express the simple SVO structure (or semantic triplet) in terms of its basic components: where the symbol → refers to a rewrite rule (or production), whereby an element to the left of the rule can be rewritten in terms of the elements to its right. 2Each element of the triplet 1 Discourse on story grammars has a long tradition, starting from Propp's seminal work on Russian folktales; for a review, see Franzosi (2004, pp.43-51, 2010, pp.11-23; in press). 2 The angular brackets denote elements that can be further rewritten; while "terminal elements," i.e., the words or linguistic expressions found in the text, have no .Curly brackets { } denote elements that can occur more than one time; while square brackets [ ] denote optional elements.Thus, in the clause "victim screams" there is only one participant (the agent), while the clause "mob kills negro" has two participants (the agent, mob, and the goal or patient, negro).As a result, the grammar requires only the first participant; the second is optional.
can then be further rewritten, down to its "terminal" symbols (those found in the language itself.In this instance, a lynching in Hinesville, Georgia) The user might also need to set up alternative objects (e.g., both semantic triplet and alternative semantic triplet ), particularly when the narrative information is extracted from different documents.This allows one to code the different ways of telling the same story, depending upon the story teller's point of view.Franzosi (see for all, 2004, pp.59-61; 2010, pp.31-32) has shown how a grammar of this kind can provide coding schemes with more desirable properties than traditional content analysis coding schemes: 1. they are based on invariant structural properties of narrative rather than on the investigator's ad-hoc substantive or theoretical interests; 2. they allow for the setting up of both hierarchical and relational links between the objects of the grammar (or coding categories of the coding scheme, in the language of content analysis); 3. they provide coded output that preserves much of the input narrative information; 4. the narrative flavor of coded output delivers more reliable data, since coded output must make sense to any "competent user of the language" (semantic coherence).

QNA with lynching stories
To understand how a story grammar can help structure narrative information, consider the following newspaper article published in the Atlanta Constitution on February 9, 1888.
Savannah, Feb 8.A few weeks ago a house and warehouse were destroyed by the fire in Hinesville, and all the circumstances pointed to its being the work of an incendiary.The people have been greatly wrought up in consequence.Intelligence received here tonight states that a Negro was arrested there yesterday on the charge of burning the houses aforesaid.He is said to have confessed the deed, and implicated several in the crime.After a preliminary investigation he was committed to jail in Hinesville.Last night a band of armed men overpowered the deputy Sheriff, who had the prisoner in charge, and carrying him off to the woods, shot him to death.Great excitement prevails in that section.(Atlanta Constitution article (February 9, 1888)) The article of Excerpt 1 is a good example of narrative text, made up of clauses characterized by factual events, with a series of actors engaged in different actions.Overall, the text provides minimal description and evaluation. 4The article gives us the events ("fire", "arrest", "lynching"), the events' time ("a few weeks ago, "yesterday", "last night") and place ("Hinesville", "jail", "woods"), the actors involved ("Negro", "deputy Sheriff", "band of armed men"), the actor's actions ("destroyed", "burned", "overpowered, "shot", "arrested", "charge", "committed to jail", "confessed", "implicated", "carrying"), the targets of their actions, as actors ("Negro") or physical objects ("house" and "warehouse").
In a second article published the next day (February 10, 1888), also by the Atlanta Constitution, we are given additional and different details on the same events: Savannah, Ga, February 9. Very little information can be obtained from Liberty county about the lynching of the Negro incendiary Tuesday night.About a fortnight ago The Constitution published an account of a fire at Johnson's station, on the Savannah, Florida and Western railway.Mr. Chapman lost a store and the railway company's warehouse was burned, along with several other buildings.It was suspected that the fire was started by an incendiary, and on Tuesday a Negro was arrested on suspicion.He was given a preliminary hearing and confessed that he was one of a party of five who broke into Mr. Chapman's store.After stealing all they could carry off, the burglars sprinkled kerosene about the building and set fire to it.The magistrate committed the Negro to jail.While the deputy Sheriff was on his way to the Hinesville jail, he was surprised by a crowd of fifteen men, who took the prisoner away from him.The Negro was carried into the woods, and it is supposed he was hung or burned.The officer never saw him more.It is expected that the other incendiaries will share the same fate.(Atlanta Constitution article (February 10, 1888)) If we combine the factual information from the two articles, we obtain a full narration of the event story (information from the later article in italics): A few weeks ago a house and warehouse were destroyed by the fire in Hinesville at Johnson's station, on the Savannah, Florida and Western railway, and all the circumstances pointed to its being the work of an incendiary.Mr. Chapman lost a store and the railway company's warehouse was burned, along with several other buildings.The people have been greatly wrought up in consequence.Intelligence received here tonight states that a Negro was arrested there yesterday on Tuesday on the charge of burning the houses aforesaid.He is said to have confessed the deed, and implicated several in the crime, that he was one of a party of five who broke into Mr. Chapman's store.After stealing all they could carry off, the burglars sprinkled kerosene about the building and set fire to it.After a preliminary investigation he was committed to jail in Hinesville.Last night on Tuesday while the deputy Sheriff was on his way to the Hinesville jail a band of fifteen armed men overpowered the deputy Sheriff, who had the prisoner in charge, and carrying him off to the woods, shot him to death.It is supposed he was hung or burned.Great excitement prevails in that section.(Atlanta Constitution combined articles (February 9 and 10, 1888)) This "combined"5 lynching story would look as follows within the coding categories of a complex story grammar (in black, the name of the coding categories, in italic font the information taken from the newspaper articles, in bold font, the alternative coding of a semantic triplet; semantic triplets have been ordered chronologically); Participant-S, Process, Participant-O reflect Halliday's language for SVO (Halliday 1994): (Macroevent) (Victim: unnamed Negro) As the coded output shows, all narrative information found in the original input text also appears in the output, assigned to appropriate categories in the grammar. 61.3 QNA's questions Coded output also shows another thing: the centrality of social actors and social action.QNA is all about actors and actions.After all, in narrative, the S and V components of the SVO structure are social actors and social actions (verbs of doing and saying).Such actor-centered methodological approach leads to research questions about social reality that are fundamentally different from the more traditional variable-centered approaches (see Abell  2004; Franzosi 2004, pp.240-242).Consider the literature on lynching.Variable-oriented explanations have focused on the role of such variables as the size of black population in a county, black crime rate, deflated price of cotton on the number of lynching events (for an excellent example, see Beck and Tolnay 1990).QNA questions focus on the characters of a story (see Franzosi 2010, p. 74).Who are the various characters in a story (the Negro, the Sheriff, the mob, the outraged women)?Which traits do they possess (male, female, young, old, good, evil)?What role do they play in the story?What do they do?Do any/some of the characters benefit (or suffer) from the actions of the other?When did the actions narrated in the story happen?Where?Were certain types of action committed by specific actors, at specific times or locations?1.4 A thousand lynching stories: "No software, no QNA" The lynching reported by the Atlanta Constitution on February 9 and 10, 1888, (Excerpts 1 and 2) is one of nearly 400 lynchings that occurred in Georgia between 1875 and 1930.1332 articles from 212 different newspapers reported these cases of lynching.7 How, then, can one carry out QNA on such large volume of documents?In the absence of fully automated approaches to the parsing of narrative, the only answer is: with computer-assisted QNA.The complexity and detail of the coding scheme and the nature of the "data" (basically, words rather than numbers-what we are typically accustomed to think as data) may otherwise relegate the use of story grammars as coding schemes to small-scale examples.The application of QNA for large socio-historical research requires the implementation of QNA in a computer environment.Indeed, "No software, no QNA", as Franzosi writes (2010, p. 67).
In the next sections, we will discuss two different software approaches to QNA: PC-ACE (Program for Computer-Assisted Coding of Events) and CAQDAS programs (Computer-Assisted Qualitative Data Analysis Software), in particular, ATLAS.ti,NVivo, and MAX-QDA.8,9PC-ACE, available for free download at www.pc-ace.com), is a software program specifically designed to carry out large-scale QNA.It is based on a Relational Database Management System (RDMS) design, with information stored in separate tables, each table containing the values of a specific coding category (or object of the grammar, e.g., triplets, actors).Information stored in separate tables is linked via "relations" (i.e., overlapping fields) and queried using Structured Query Language (SQL).SQL relies on a handful of commands: select (i.e., extract a specific field), from (a specific table ), where (filter on specific values).The SQL count command is used to compute the number of occurrences of a specific value in an object (coding category), either taken by itself or in relation to other objects (e.g., actors and their actions) (Franzosi 2010, p. 82).
Contrary to PC-ACE, CAQDAS programs were not originally designed for QNA, but as tools to aid in the analysis of qualitative research questions that explore theoretical concepts and provide support for theoretical development.Unlike PC-ACE, CAQDAS programs are tools for qualitative analysis.CAQDAS programs do, however, have extensive query capacities that help uncover complex patterns or relationships within the data.CAQDAS query tools can even provide frequency counts, but the query tools are designed to highlight thematic and conceptual patterns across a number of different documents (e.g., in-depth interviews, focus groups, life histories).
It may seem invidious to propose a comparison between PC-ACE and CAQDAS programs, given the different research questions and design strategies of the two types of software.It is not.The point is to show whether (and how) it is possible to carry out QNA in CAQDAS programs.After all, given the popularity of these software programs, they are likely to be an investigator's first port of call.
Below, for both PC-ACE and CAQDAS programs, we focus on the three main tasks of QNA: 1. setting up a story grammar (using PC-ACE grammar objects/coding categories or CAQDAS codes); 2. data entry; and 3. data querying.

Quantitative narrative analysis in PC-ACE (Program for Computer-Assisted
Coding of Events)

Setting up the grammar
In order to perform QNA using PC-ACE, the first step is to setup the story grammar to be used as a coding scheme.In PC-ACE, users can set up complex story grammars made up of both relational objects (the Subject, the Verb, the Object, and each of their modifiers) and hierarchical objects (those that assemble the basic SVO narrative units into larger units, i.e., the Macroevent, the event, and the semantic triplet itself).Objects can further be simplex or complex, where simplex objects are rewritten as terminal symbols (i.e., as the words found in the dictionary of the language) and complex objects in terms of other simplex and/or complex objects.Hence, in the grammar presented above, actor and verbal phrase are simplex objects; subject , characteristics , verb , circumstances , and object are complex objects (for the complete grammar used in the Georgia lynching project, see Appendix II).While all the objects of a story grammar are relational (i.e., they are all related to one another via a set of rewrite rules), PC-ACE complex objects can also be set up as hierarchical nodes of the grammar.By using hierarchical objects, the actions of a story (the basic semantic triplets) can be aggregated into larger units ( events ); and events can be further aggregated hierarchically into even larger units ( Macroevent ).To set up a PC-ACE story grammar the user will need to generate all its complex and simplex objects, from the top hierarchical object (e.g., Macroevent) down to the lower objects (e.g., any of the actor characteristics or circumstances of verbs).PC-ACE provides a series of "setup forms" for the user to enter all the relevant information.
Figure 1 shows an example of PC-ACE setup form for the object "semantic triplet".10The right-hand sub-form titled "Reference/alias Children of Underlying Complex …" lists all the "children"11 objects of a semantic triplet, with complex objects identified by a +, and optional objects by square brackets [ ] and multiple objects by curly brackets { }.From the setup form in Fig. 1, the user could edit or delete any of the displayed objects, move up to their parent and down to their children, add new objects, simplex or complex, and view all instances of coded data for a selected object.When all the elements of the story grammar have been setup, data entry can start.

PC-ACE data entry
The text documents to be analyzed in PC-ACE are external to the program.They may be in printed form, microfilms, or computer files (in which case links to these files can be stored within PC-ACE and used to open and analyze the documents from within PC-ACE).Even when the documents are in the form of computer files imported in PC-ACE, the data cannot be coded directly from the document itself, selecting portions of the text and assigning them to specific objects (coding categories).Rather, relevant information is entered manually into text boxes or combo boxes (see Fig. 2), and document links only provide a convenient way to open and view a specific document. 12ach time a new piece of textual data is entered, PC-ACE adds it to its internal dictionary. 13For stories narrated in multiple documents (e.g., the Hinesville lynching story told by two different newspaper articles), information can be taken and coded from any document that makes up the story in PC-ACE.This means that PC-ACE does not require any prior rewriting of multiple-document stories into a single document.New information is coded as encountered in each new document and automatically cross-referenced to that document.

PC-ACE data query
With the information coded into the appropriate objects of the grammar of data collection (coding categories in the content analysis terminology) and stored in PC-ACE's underlying relational database tables, we can run queries to investigate the lynching narrative, asking questions about that story.To make the process of data retrieval (or querying) easier, PC-ACE comes with a Query Manager based on a Graphical User Interface (GUI). 14The GUI translates a query based on the objects of a user-defined story grammar and graphically displayed on the screen into an SQL query based on the PC-ACE tables and fields where data are stored15 .For instance, to answer the general question "what did a certain actor do?" (e.g., "the Negro"), we need to relate actors and actions under the same semantic triplet stored in the database and filter the actors on the value(s) we are interested in (i.e., Negro).
The query yields the following list of available records showing the Negro's actions in the lynching narrative of Excerpt 3 (Fig. 3).
This example shows that, with PC-ACE, we can easily answer the question: "WHO does WHAT?" (e.g., the mob).Similarly, we could answer questions such as: "WHEN and WHERE did these actions occur?" "WHO was the target of an agent's actions?"

Working with aggregate codes: PC-ACE
We used PC-ACE to carry out QNA on the 1,300 documents of the Georgia lynching project (1875-1930).The project yielded over 6,000 semantic triplets.Most dictionary values for the various simplex objects appear in this database with a frequency of 1, for a total of more than 200 distinct values for actors and 1,000 for actions when we look at the entire lynching database (a normal finding in this type of research ; Franzosi 2004, pp.83, 356).In order to facilitate statistical analysis, these distinct values should be aggregated into broader categories (Franzosi 2010, pp.103-104).Table 1 provides an example of aggregation applied to the actions found in the database.
However, how do we aggregate the original coding output into broader semantic clusters?PC-ACE offers two solutions to the problem of data aggregation: one during coding, with the coders carrying out data aggregation, and one after coding has been completed.In the first case (aggregation during data entry), the grammar would be setup to include a required simplex for the aggregate code of an object (e.g., aggregate actor or aggregate action).This would force a PC-ACE user to code both disaggregated and aggregated values for each dictionary entry during data entry.In addition, PC-ACE has an Update Manager that can be used after the completion of data collection to build new aggregate codes from a list of individual values via SQL UPDATE statements.With the help of the Update Manager, the researcher can set filters on any original code (see Fig. 4) and define the new value to be indicated in the correspondent aggregate category (see Fig. 5).Hence, the name of collective actors "daughters", "girls", "mothers", and "sisters" in Fig. 4 can be reaggregated as "women" in a new aggregate category called aggregate actor, as shown in Fig. 5. 16 Hence, the PC-ACE user can conduct queries using both the original codes (e.g.name of collective actor) and their values (e.g., mothers, sisters, fathers, husbands) and aggregate codes (e.g., aggregate actor) and their values (e.g.women, men).

Quantitative narrative analysis in CAQDAS applications
Having shown how to carry out QNA in PC-ACE, we now turn to illustrate how to perform these tasks in CAQDAS programs.Given the differences in software design, the order of implementation of the tasks changes slightly in CAQDAS compared to PC-ACE (grammar setup, data entry, and data query in PC-ACE and source document compilation, setup of codes, data coding, and data query in CAQDAS).This overview will not only show how to implement QNA using CAQDAS programs, but will also reveal the challenges (and ultimately, limitations) of doing QNA in CAQDAS.Because CAQDAS programs work in similar ways, Fig. 6 Assigning articles to hermeneutic units in ATLAS.tiwe will use ATLAS.ti as a baseline, and discuss two other popular programs, MAXQDA and NVivo, only when significant differences apply.

Cross references: grammar objects and documents
The first step towards carrying out data coding in CAQDAS programs is to import and link together the source documents 17,18 .In ATLAS.ti, this is done through a "hermeneutic unit" (HU).A HU is used to link source documents together as one database so that the same code list is applied to all imported texts.In our case, the user would assign the two articles of our example to the same "hermeneutic unit" (HU)19 (see Fig. 6).
CAQDAS users can only code information one document at a time 20 .Each article within the HU is a separate document.Contrary to PC-ACE, this feature makes it difficult to link textual elements across different articles.Consider our two-article story of the Hinesville lynching brought together into a single HU.In the first article, we read that "a band of armed men overpowered the deputy Sheriff."In the second article we are told that the band ("a crowd") was composed of fifteen men ("a crowd of fifteen men").In CAQDAS programs, there is no way of linking this additional information to the actor coded under the first article.As a consequence, the fact that fifteen men (rather than a band with no numbers, as told in the first article) were involved would be lost.
To avoid losing this information, users would need to combine their source documents into a single document prior to coding.ATLAS.ti,NVivo, and MAXQDA allow the user to reword, manipulate, or modify source documents from within the programs.Alternatively, text revisions can be made outside the programs and the modified documents can then be imported into the database.Our lynching story, for instance, can be rewritten, combining the two available source documents of Excerpt 1 and 2 into a single story (Excerpt 3) (Appendix I).Needless to say, depending upon the number of articles in a project (1,300 in the Georgia lynching project) and the number of multiple-article events, this process of text editing could be very time consuming.

Setting up the grammar in CAQDAS (codes)
CAQDAS programs use codes to assign data to categories of analysis.Typically, these codes consist of key themes, concepts, processes or contexts based on theories from the researcher's academic tradition and the purpose of the analysis (Lewins and Silver 2007, p. 7).To conduct Fig. 7 ATLAS.tiCode Manager QNA in CAQDAS programs, the researcher needs to create codes that reproduce the objects of a story grammar, rather than key themes: the Participant-S, Process, and Participant-O (i.e., the SVO template) as well as all their modifiers (e.g.space, time).
Figure 7 shows the list of codes setup in ATLAS.ti for the lynching story.Codes are arranged alphabetically21 as a seriatim list with no relational ties among them.As the entries in Fig. 7 show, each code name includes: 1. the names of every parent object in the "path" of the grammar all the way up to, but excluding, the first hierarchical object "semantic triplet" (e.g., the city within space, within circumstances, within simple process within process, in the code "Process: Simple process: Circumstances: Space: City"); 2. and, in addition, the actual value a code assumes (e.g., "Hinesville" for a city).Notice that since values have to be included in code names, unlike PC-ACE, codes cannot be generated prior to coding.This means that CAQDAS users will build their grammar while coding.
These code-naming criteria, requiring both paths and values, can lead to a very long code list.Consider the Georgia lynching database built in PC-ACE.The grammar consists of 79 simplex objects (i.e., objects that contain actual information taken from the source documents) and 67 complex objects (i.e., container objects that mark the path of a simplex object in the grammar) (see Appendix II).In CAQDAS programs the 79 simplex objects of the grammar can be easily reproduced as 79 codes.However, each CAQDAS code corresponding to a simplex object will also contain the name of that simplex's parent complex objects (e.g., the simplex object Name of individual actor can be reproduced in CAQDAS as "Participant-S: Actor: Individual: Name of individual actor").The problem is that in the grammar of Appendix II several objects-both simplex and complex-appear as children of different parent complex objects.Thus, an Individual actor appears as the child of both Participant-S and Participant-O.The City appears as the residence of an Individual actor or a Collective actor (and each for either Participant-S or Participant-O), and as the location where an action occurs (one of the circumstances of the process).In the Georgia lynching grammar of Appendix II, 21 complex objects appear as children of more than one complex object. 22Each of these 21 complex objects will require as many code names as the number of pathways they belong to. 23In addition, in CAQDAS programs, the simplex children at the end of the path of each of these 21 complex objects would also need to include the name of the different values the code for the simplex takes. 24For instance, in the lynching database, there are 1,266 distinct instances of Verbal phrase, yielding 1,266 distinct codes (e.g., "Process: Simple process: Verbal phrase: arrest", "Process: Simple process: Verbal phrase: broke into", "Process: Simple process: Verbal phrase: burn").Hence, in CAQDAS programs the code list will number in the thousands.
Code families provide a way to reduce the number of codes in ATLAS.ti. 25,26Code families are umbrella categories that group together individual codes-those that encompass the actual value a simplex takes.Thus, the codes "Process: Simple process: Verbal phrase: arrest" and "Process: Simple process: Verbal phrase: burn" can be grouped together into the code family "Process: Simple process: Verbal phrase".Figure 8 shows an example of ATLAS.ti's"code families".
Once a code family is created, the applicable codes need to be "assigned", i.e., they have to be attributed to the code family (see Fig. 9).
Since the name of a code family does not appear in the code list (as one can see from Fig. 7), the use of code families reduces the number of codes.In our case, with 79 simplex objects in the grammar, each resulting in a code with embedded path, if code families are applied for each simplex, this would result in a reduction of 898 codes (the number of simplex 22 The objects are: Actor, Address, Age, Content, Event, Family relationship, Implicit object, Individual, Instrument, Macroevent, Number, Organization, Outcome, Ownership, Reason, Relation to other location, Residence, Semantic Triplet, Simple process, Space, and Time.For instance, Actor occurs under 6 complex objects (i.e., Participant-S, Family relationship, Group composition, Subset (among which), Participant-O, and Implicit object).Now, these are the 21 shared children complex with the highest hierarchical position in the grammar.Obviously, their own children also appear in the rewrite rules of their own parents.For instance, Actor occurs under 6 complex objects and so do all its children (i.e., individual, collective actor, organization) and the children of children. 23"Macroevent", "Event" and "Semantic triplet" are not rewritten when they appear as a child of a complex object.5 MAXQDA and NVivo both have an equivalent to the ATLAS.tiCode Family.In MAXQDA, "sets" are used to arrange the "codes" and "subcodes".In NVivo, relationship nodes can be used to link codes and subcodes, enabling the formation of distinct groups of codes. 26Code families also facilitate CAQDAS query operations, as we will see.

Working with aggregate codes in CAQDAS
To avoid working with such an unwieldy number of codes, even when using code families, in a CAQDAS approach to QNA, the only reasonable solution is to work with aggregate codes (and perhaps even with a simplified grammar).Such verbs as "kill", "wound", "burn", "riddle with bullets", "torture", "hang", "beat up", "lynch", "rape", … could all be coded as "violence".This way, the 1,266 distinctive verbs can be reduced to a more manageable set of some 50 or 60 aggregated categories.The same is true for any other code (e.g., actors such as "police", "Sheriff", "deputy Sheriff", "officer", "marshal" etc. could all be coded as "law enforcement").
There are drawbacks to coding using aggregate codes.First, when coders perform both coding and aggregating at once, they dangerously come to play "surrogate scientist" (particularly if the aggregate codes are abstract theoretical categories, where not all values can be clearly pre-specified; see Markoff et al. 1975, p. 37).Second, some of the detailed original information is lost, and it can no longer be retrieved through a query (e.g. the frequency of verbs such as "kill" vs. "wound" once coded as "violence").Third, the researcher would need to have the aggregate codes in mind at the start of the coding process.This, in turn, may require extensive pre-reading of the documents to be coded or a strong theory that provides the basis for aggregation regardless of dictionary values.Furthermore, findings may emerge during analysis that require unforeseen data queries.When these queries involve codes not incorporated into the original coding scheme, a CAQDAS researcher must choose between either creating new codes and re-reading and re-coding all documents or not performing the additional queries. 27,28 27 CQDAS researchers do not necessarily have to work with aggregate codes.They can work with disaggregated codes and group these together into aggregated codes.ATLAS.ti,NVivo, and MAXQDA all have functions allowing a user to manipulate existing codes through merging, renaming, duplicating etc. Distinctive to ATLAS.ti is the "super codes" function.This combines codes that have been linked together during querying.Unfortunately, these codes appear as yet more code names in the code list, ultimately defeating the purpose of working with fewer codes. 28The number of codes could be further reduced by setting up each object of the story grammar with its own distinct code name, without path and value embedded in its name.Users would then assign several different codes to the same element in the text.For those cases of grammars where the same object has different paths (to repeat, Actor found both under Participant-S and Participant-O, or City found under both Actor and Process), this approach reduces the number of codes required to implement QNA in CAQDAS.For instance, under the first method, the 213 codes for all Participant-S children objects have to be duplicated, with the leading object Participant-O instead of Participant-S, when setting up the codes.Considering the additional codes for Case and Abstract and Physical objects, Participant-O pathways alone require 239 codes.In the alternative method, the same codes are used for both Participant-S and Participant-O children objects with a drop in the number

Setting up codes for hierarchical objects (Macroevent and Event)
In narrative, the micro-level relational structure (the semantic triplet, or basic SVO, with S related to V and V related to O) can be aggregated into Macroevent-level hierarchical structures (e.g., several triplets into an event, several events into a Macroevent).A story grammar, as we have seen, allows a user to specify both types of structures: relational and hierarchical.We have shown how CAQDAS users can set up a story grammar from the semantic triplet down.But such hierarchical objects as event and Macroevent in the lynching grammar have simplex children objects of their own (namely, victim and type of event ).
In ATLAS.ti,one can set up these objects via the Primary Document Family Manager in the same way as for code families 29 .This tool allows one to generate a document family (i.e., a batch of documents with no internal hierarchical organization) and "assign" the relevant documents to the family.In so doing, texts can be grouped together by type of information (e.g., type of event at the event level). 30ssentially, a CAQDAS project is arranged into a hierarchy, as Fig. 10 shows. 31The highest level in the CAQDAS hierarchy consists of the documents.Beneath these come all the other branches of the hierarchy.The primary document branch is used for coding the simplex objects at the highest hierarchical levels of the story grammar (i.e., victim and type of Footnote 28 continued of codes from 901 to 150.And these totals are only baseline figures that do not take into account the values these codes take on and that require hard coding.In spite of the apparent appeal of this alternative approach, there are several drawbacks.First, the approach will increase coding time and decrease data reliability.In our first-proposed method, only one code (e.g., "Process: Simple process: Verbal Phrase: Stole") is used to code the relevant elements in the text.As the code list is arranged alphabetically in CAQDAS programs, finding and applying this code is easy.In the alternative approach, the multiple codes that need to be assigned to the same textual element will not necessarily be found together in an alphabetically-sorted code list (consider "Process", "Simple process", "Verbal Phrase" and "Stole").Coding will require scrolling through a long list of codes (some 400 of them without counting either the 1,266 distinct verbs or the 60 aggregate ones).The longer the list the more time consuming and error prone coding becomes.Second, querying as well becomes more problematic.Consider the query for "WHAT did the "negro" do?"Under the first method, the query is: ("Process: Simple process: Verbal Phrase" WITHIN ("Semantic Triplet" ENCLOSES "Participant-S: Actor: Individual: Name of individual actor: Negro")) This simple query becomes more complex when using this alternative method.Under the alternative coding method, the query is: ((("Process" & "Verbal Phrase") & "Simple process") WITHIN ("Semantic Triplet" ENCLOSES (((("Participant-S" & "Actor") & "Name of individual actor") & "Negro")) In the first method, 3 codes and 2 operators are involved in obtaining the answer (codes: "Process: Verbal Phrase", "Semantic Triplet", "Participant-S: Actor: Negro"; operators: WITHIN, ENCLOSES).In the second, 9 codes and 8 operators are involved (codes: "Process", "Verbal Phrase", "Simple process", "Semantic Triplet", "Participant-S", "Actor", "Individual", "Name of individual actor", "Negro"; operators: &, which stands for AND, WITHIN , ENCLOSES, &, &, &, &).Query time grows exponentially with query complexity (as measured by the number of codes and operators in the query). 29Similarly, in MAXQDA, documents can be organized by text groups or text sets.Attributes can then be applied to texts within these.In NVivo, sources need to be coded as cases.Then the case can be given attribute values. 30This tool mirrors the Code Family Manager, as described previously, but it groups documents rather than codes. 31CAQDAS programs use the term "hierarchy" and "hierarchical object" slightly differently from PC-ACE, and the way we have used the term up to here.In CAQDAS programs, any complex object is a "hierarchical object."In PC-ACE, only a handful of complex objects are "hierarchical objects" (namely, macroevent, event, semantic triplet).Technically, using PC-ACE's RDBMS terminology, a complex object is characterized by one-to-few relationships; hierarchical objects by one-to-many relationships.In practice, there is little difference between the two types of objects and the distinction only serves design purposes (e.g., in PC-ACE, one-to-many objects are displayed in a different form).There are benefits to applying different objects of the story grammar to different sections of the CAQDAS hierarchy, namely to document families and codes.First, by coding the highest hierarchical levels of the story grammar at the document level, in CAQDAS the codes for type of event and victim do not appear in the code list.This reduces the list of available codes-albeit, only marginally; but this depends upon the number of modifiers attached to the highest level of aggregation, in our case the "Macroevent".Second, and most importantly, it reduces the time required for coding.Once the document families have been created it is a simple matter to move documents into a category.Entire documents, and consequently all the codes applied within those documents, are placed into a category depending upon the information they contain (the types of events they describe and the victim in those events).The alternative would be to apply another set of codes to every coded segment of text within the documents.This would take time and add to the complexity of coding.
Unlike ATLAS.ti,MAXQDA and NVivo allow the coding system to be displayed using a hierarchical view.Figure 11 shows that MAXQDA and NVivo display codes in the same way as PC-ACE, with indented objects indicating a child of the parent object above (a code can contain up to 10 levels).
Although MAXQDA and NVivo use functional coding scheme hierarchies that can act both as an organizational tool and retrieval engine (Lewins and Silver 2007, p. 93), their Fig.11 Hierarchical organization of categories in MAXQDA hierarchical function does not preserve the relationship between the triplet elements, i.e., the relationships within the SVO template with S related to V and V to O. Instead, the hierarchy permits arranging codes in a manner that allows for a more practical approach to coding.For example, the pathway "Actor: Individual: Name of individual actor: Negro" is used in both "Participant-S" and "Participant-O".In these programs, the user would place a "Negro" code in the hierarchies of both "Participant-S" and "Participant-O" and the portions of text coded as "Negro" for the "Participant-S" pathway would remain separate from the portions coded as "Negro" under "Participant-O"32 .

CAQDAS data entry
Using the first Atlanta Constitution article (Excerpt 1), Fig. 12 shows an ATLAS.tiscreenshot for QNA coding.Codes appear in the right window and the text to be coded on the left.CAQDAS coding is done by selecting a portion of the text in the left window and then assigning a code to the segment (e.g., the selected text to the code Semantic Triplet). 33or QNA each text segment from the newspaper article needs to be coded in a way that identifies its position in the story grammar.Consider semantic triplet 3: "A few weeks ago a house and warehouse were destroyed by the fire in Hinesville, and all the circumstances pointed to its being the work of an incendiary"."Incendiary" requires the code "Participant-S: Actor: Individual: Name of individual actor: incendiary" but this code then has to be assigned to the code family "Participant-S: Actor: Individual: Name of individual actor";34 "Hinesville" requires a code named "Process: Simple process: Circumstances: Space: City: Hinesville" in the code family "Process: Simple process: Circumstances: Space: City".

CAQDAS data query
With the documents fully coded, data analysis can begin by using CAQDAS data queries.CAQDAS query commands are based on Boolean, semantic, and proximity operators.These commands can be used to retrieve data to answer QNA questions: WHO did WHAT?Pro or against WHOM?WHEN did they do it?WHERE?Consider the question: "WHAT did the Fig. 13 ATLAS.tiquery tool "Negro" do?"The answer would involve constructing a query of the code family "Process: Simple Process: Verbal phrase" from "WITHIN" semantic triplets "ENCLOSING" the code "Participant-S: Actor: Individual: Name of individual actor: Negro".Figure 13 shows the list of quotes extracted by the query: "confessed", "implicated", "stealing", "broke into" and "set fire". 35TLAS.tican retrieve information about the basic elements of a narrative (e.g., what an actor does) if (and only if) the values of a code (e.g., "Negro") have been hard-coded in the code name.This is why the value a code takes must be embedded in the code name ("Negro" coded as "Participant-S: Actor: Individual: Name of individual actor: Negro") and why this code must be arranged into family codes ("Participant-S: Actor: Individual: Name of individual actor").So that such questions as "WHO confessed?"can be answered.CAQ-DAS query tools dictate the code naming criteria.Similarly, if the user needs to retrieve data about "WHO was acting," for any specific type of action, the user must create a set of codes where the verbal phrase is hard-coded within the code name (e.g., "Process: Simple process: Verbal phrase: confessed").Again, "confessed" would have to be coded as "Process: Simple process: Verbal phrase: confessed" to answer the questions about "WHO confessed?"and the Fig. 14 Using a document family as a filter in ATLAS.ticode would have to be placed in the "Process: Simple process: Verbal phrase" code family so that it can be extracted by the query above for "WHAT did the "Negro" do?" Document families can be used in conjunction with the query tool to set filters on a query to restrict query results.For instance, the filters set up for the query shown in Fig. 14, will only return results for the documents included in that document family.
By coding the modifiers of the hierarchical objects of the story grammar (i.e., Macroevents and events) at the document level, CAQDAS users can use these elements of the story grammar to filter for subsets of data in the database.They can query, for instance, the subset of texts reporting a certain type of event (e.g., hanging, shooting).Had not this information been coded at the document level, each and every query would require an extra set of query commands. 36This would decrease both query efficiency (i.e., taking longer to run a query) and data reliability (i.e., increasing the likelihood of errors).

CAQDAS data query: limitations
Querying in CAQDAS has its limitations: elements not explicitly present in a text cannot easily be coded and, therefore, cannot be queried.After all, if something is not there it cannot Fig. 15 Implicit elements be selected and assigned to a code.Examples are the missing syntactic subject of passive clauses (e.g., "The Negro was carried into the woods.")or the missing syntactic object of clauses with intransitive verbs (e.g., the intransitive verb "strike" of a labor dispute has no syntactic object, but it does have a semantic implicit object, "employer").Another example is the more general problem known as anaphora resolution.In the sentence, "He is said to have confessed the deed, and implicated several in the crime."The "he" refers to the lynched Negro, who, however, is nowhere to be found in the sentence (see Fig. 15).
The user could select the "he" and assign it to the code "Participant-S: Actor: Individual: Name of individual actor: Negro" as the subject of the semantic triplet.However, when querying the data for the actors present in the story, the value "he" (rather than "Negro") would be returned.Depending upon how many "he" you have in the set of documents analyzed, the information provided by the query results could be meaningless: he, who?The Negro?The deputy Sheriff?Mr. Chapman?Again, text manipulation may be required (either before or after importing the documents to be coded) to make coding (and querying) more transparent.In this case, the user could edit the original text, substituting the word "Negro" to the word "he".

PC-ACE and CAQDAS compared: a summary
So, what are the differences between PC-ACE and CAQDAS?Does CAQDAS, a more familiar option than PC-ACE for social scientists, offer a viable solution to computer-assisted QNA?To answer these questions, let us bring together what we have learned about the main tasks involved in QNA: grammar setup, document cross-referencing, data entry, and data query.

Setting up the grammar
PC-ACE provides all the tools to generate complex story grammars with relational and hierarchical objects.CAQDAS programs, while allowing the user to generate hierarchies or families of codes, are not designed to reproduce the multiple relations among the objects of complex story grammars.

Data entry
In PC-ACE, the source documents are external to the software (e.g., on a microfilm) and coding is based on manual data entry of information into text boxes or combo boxes.CAQDAS programs work with documents imported into the programs and coding is done by assigning codes to selected portions of a document.In CAQDAS, if documents are not available in digital form (a Word, PDF, or ASCII document), the texts will have to be converted to digital files (for instance, through scanning, and this can be a lengthy process).Problems of anaphora resolution may also require extensive editing of input documents in CAQDAS programs (which can also be a time consuming process).

Document cross-referencing
Each item of information coded in PC-ACE is cross-referred to its original source document.In data entry, users code information from different documents to construct a single, unified story (albeit, perhaps, with alternative story lines).In data query, information is (or can be) retrieved regardless of source documents.CAQDAS programs allow users to work on different source files but they do not provide simple tools to merge textual elements from different documents.CAQDAS programs provide an option for linking different items of information within a single document and across documents (Lewins and Silver 2007, p. 63).These links, however, are hyperlinks that allow the user to jump from one segment of text to the linked text.While this is useful to remind the researcher of key intersections between documents, there is no way to retrieve this information in query form as in PC-ACE. 37To overcome CAQDAS cross-referencing limitations, users can combine multiple documents into a single document or edit individual documents, but this is a labor intensive process.

Data query
PC-ACE and CAQDAS programs work very differently.While PC-ACE is an RDBMS program that relies on SQL to extract information from coded data, CAQDAS programs provide their own set of query commands: Boolean, semantic, and proximity operators.Boolean operators only allow combinations of keywords and are the most common operators used in any information retrieval system.Semantic operators, (e.g., up, down and siblings in ATLAS.ti)can be used when codes are linked via so called transitive relations like "is part of", "is a", etc. something that defines a hierarchical relations.They extract information from codes having, for instance, "parent-children" relations or "sibling" relations.Proximity operators (e.g.within, encloses, overlapped by, overlaps, follows, precedes and co-occur in ATLAS.ti)describe spatial relations between coded textual elements.
Unfortunately, none of these CAQDAS tools allow the user to do what the SQL where command does in PC-ACE: set filters on the instances of specific coding categories (e.g., "Negro" as the specific value instance of the category <actor>).This means that a CAQ-DAS user interested in what a specific actor (e.g., "Negro") does, needs to "hard-code" the value ("Negro") into the code name (e.g., Participant-S: Actor: Individual: Name of individual actor: Negro) in order to be able to retrieve that information in a query.This approach is feasible with a limited number of actors and actions (more generally, of objects in the grammar).In large projects, however, there may be thousands of distinct names of actors, actions, and all other objects, forcing CAQDAS users to work with literally thousands of codes.To overcome this limitation, doing QNA in CAQDAS programs would require 1. using a limited grammar, with a small number of simplex and a handful of complex objects; 2. working with codes that, for each simplex object, aggregate the distinct values in a handful of categories (e.g., for verbs, violence and communication, instead of "wound", "punch", "kick", "stone", "kill" or "say", "write", "tell", "publish").Having coders perform both coding and aggregating at once may lead to serious reliability problems (Markoff  et al. 1975, p. 37).Furthermore, this hard-coding of aggregate values also presumes that the researcher has a complete list of aggregate codes and of the individual values that make them up.
Another drawback of CAQDAS query systems concerns the computation of frequency counts.CAQDAS queries only extract quotations and frequency counts of codes.Like CAQ-DAS, PC-ACE does not have any statistical capabilities; but the SQL count statement in PC-ACE allows the researcher to compute frequency distributions for objects in relation to other objects (e.g., the frequency distribution of cities where mobs burned Negroes as opposed to hang them).Both software programs allow the user to export data in statistical packages for further manipulation but PC-ACE can also compute automatically network matrices that can be imported into network programs (e.g., UCINET, Pajek).

Conclusions
This paper has discussed a computer implementation of QNA using PC-ACE and CAQDAS programs, in particular, ATLAS.ti,MAXQDA, and NVivo.QNA is a powerful research tool for the quantitative analysis of narrative texts.Given the complexity of the coding schemes (story grammars) upon which QNA relies, software is necessary to perform the analysis ("No software, no QNA"; Franzosi 2010, p. 67).
PC-ACE was specifically designed to carry out QNA. CAQDAS programs were designed to help researchers in organizing and analyzing their material qualitatively (e.g., transcripts from focus groups or in-depth interviews).PC-ACE was born as a tool for quantitative analysis; CAQDAS programs as qualitative tools.Given these differences in basic epistemology and design strategy, a comparison between the two types of software may seem unfair and artfully set up.Yet, the popularity of CAQDAS programs may well make them the first port of call for any researcher interested in QNA; hence, our comparison.Certainly, and not unsurprisingly, PC-ACE provides a more comprehensive approach to QNA.PC-ACE's power lies in its ability to preserve relationships between objects of the grammar and query these relationships to answer questions such as "who does what, pro/against whom, where and when."Both PC-ACE and CAQDAS allow users to export tables' content and query results to other software packages for specialized analyses.Contrary to CAQDAS, however, PC-ACE computes automatically the network matrix that can be cut and pasted into network analysis software.Indeed, the application of QNA in PC-ACE has helped to shed light on such socio-historical puzzles as the rise of Italian fascism (1919-1922) on the basis of over 50,000 newspaper articles from three different newspapers yielding some 200,000 semantic triplets (Franzosi 2010, pp.107-142, Franzosi 2011) and the lynching of African Americans in Georgia (1875-1930) on the basis of over 1,300 newspaper articles for nearly 7,000 triplets (Franzosi in press; Franzosi et al. 2011). 38AQDAS programs have limitations that affect 1. the complexity of the story grammar that can be used; 2. the scale of the project that can be carried out; 3. and the range of questions that can be asked.It is the query tools of CAQDAS programs that set the main obstacle to a full QNA implementation.Contrary to PC-ACE, where SQL-based queries (Structured Query Language) allow users to query the data across a number of coding categories and for any value of any category (e.g., who did what in the city of Hinesville), CAQDAS query tools were designed to extract units of text assigned to various codes across different documents (users can also extract frequencies of these codes).Users cannot set filters for specific values on the objects involved in the query (e.g., the actions performed by the specific actor: "Negro").To perform queries involving specific values of a code (again, the value "Negro" of the code "Name of individual actor"), these values must be hard coded into the name of the code (e.g., Semantic triplet: Participant-S: Actor: Individual: Name of individual actor: Negro).Even for small projects, based on a limited number of documents, these limitations could lead to a bewildering number of codes-in the hundreds if not in the thousands, way too many for any sensible approach to coding. 39s a result, researchers wishing to use CAQDAS programs for QNA may need to focus on a handful of specific objects in relation to other objects (e.g., where an action occur) and on a handful of specific values of objects (e.g., "Negro," "police", "woman" as the value of an actor). 40They may also need to work with a simplified grammar based on highly aggregated codes.
For those researchers familiar with any of the CAQDAS programs, reluctant to invest in the learning of specialized software as PC-ACE, and willing to accept the limitations in project's scope and questions, a limited implementation of QNA may be all they need.And this paper has shown how to do just that: implement a limited QNA in CAQDAS programs. 38If 200,000 semantic triplets are a daunting number for any dissertation, 7,000 is well within reach.Doyle (Doyle) coded 8,483 semantic triplets from a sample of best-selling children books; Vicari (2008) coded 5,462 from a sample of social fora; De Fazio (2011) coded 6,035 semantic triplets from published chronology, and Junker (2012) use of Chinese activists' publications and Internet postings has generated some based on some 1,500 triplets.All of these dissertations brought out novel patterns in the data on the basis of a few thousand triplets. 39At the 1st International Seminar on Computer-Aided Qualitative Research, held in Amsterdam, 10-11 June 2008, Stewart Shuman, director of the Qualitative Data Analysis Program at the University of Massachusetts, told the audience of his ATLAS.tiMasterclass that he would not use more than 10 codes and that if a client wanted more than 10 codes he would go through the text twice with different codes rather than increase the code list.Suzanne Friese, of ATLAS.ti,states: "Some researchers develop 40 codes, others a few hundred or even a few thousand codes.… The software … just offers functions to create new codes, to delete, to rename or to merge them.… Developing a lot of codes is clearly an adverse effect of using software.No one would ever come close to 1,000 or more codes when using the old-style paper & pencil technique.But also when using software, too many codes lead to a dead end.There might be exceptions, but in most cases this hinders further analysis" (Friese 2011, p. 12). 40 Relations between coding categories (or codes) are central not only in QNA story grammars, but also in other types of approaches to text (e.g.conceptual relationships; on these issues, see Carley 1993; Franzosi  2010, p. 51).These types of complex coding schemes require both the generation of relevant coding categories and the mapping of hierarchical and relational ties between those categories.Since CAQDAS coding schemes can only be generated as lists of codes, users cannot develop complex nested codes in these programs.As a result, CAQDAS programs are not only unsuitable for QNA but also for other forms of textual analyses, such as map analysis, where the relations among different coding categories are of central interest.

Fig. 1
Fig. 1 PC-ACE setup form showing the children objects of the complex object semantic triplet

Fig. 4
Fig. 4 PC-ACE Update Manager: setting up a filter on the code name of collective actor Consider this.An Individual actor can be both under Participant-S and Participant-O, leading to the two codes Participant-S: Actor: Individual: Name of individual actor and Participant-O: Actor: Individual: Name of individual actor.If we have "Negro" and "sheriff" as both Participant-S and Participant-O throughout a text, this would lead to four different codes: Participant-S: Actor: Individual: Name of individual actor: Negro, Participant-S: Actor: Individual: Name of individual actor: sheriff, Participant-O: Actor-Individual: Name of individual actor: Negro, and Participant-O: Actor: Individual: Name of individual actor: sheriff.

Fig. 8
Fig. 8 Creating a code family in ATLAS.ti

Fig. 9
Fig. 9 Assigning codes to a code family in ATLAS.ti

Fig. 10
Fig. 10 View of the object hierarchy in the ATLAS.tiObject Explorer

Table 1
The what revisited: