Argument diagramming and critical thinking in introductory philosophy

In a multi‐study naturalistic quasi‐experiment involving 269 students in a semester‐long introductory philosophy course, we investigated the effect of teaching argument diagramming (AD) on students’ scores on argument analysis tasks. An argument diagram is a visual representation of the content and structure of an argument. In each study, all of the students completed pre‐ and post‐tests containing argument analysis tasks. During the semester, the treatment group was taught AD, while the control group was not. Methodological problems with the first study were addressed in the second. The results were that among the different pre‐test achievement levels, the scores of low‐achieving students who were taught AD increased significantly more than the scores of low‐achieving students who were not taught AD, while the scores of the high‐achieving students did not differ significantly between the treatment and control groups. The results for intermediate‐achieving students were mixed. The implication of these studies is that learning AD significantly improves low‐ and intermediate‐achieving and students’ ability to analyze arguments.


Introduction
The past few decades have seen a tremendous amount of educational effort and research directed at the improvement of students' general critical thinking (CT) skills. The search for pedagogical methods that efficiently develop these skills is part of a growing national concern that our high school students are under-prepared for the rigors of college and that our college students are being well trained for particular industries, but inadequately prepared to be participating members of our democratic society (Perkins, Allen, & Hafner, 1983;Kuhn, 1991;Means & Voss, 1996).
In response to these concerns, many universities have made the development of CT skills part of their mission statements, along with adding at least one CT course to their graduation requirements. Unfortunately, many studies have shown that very few college courses actually improve these skills (Annis & Annis, 1979;Pascarella, 1989;Resnick, 1987;Stenning, Cox, & Oberlander, 1995). In our introductory philosophy course, we aim at real improvement and have developed a CT-focused curriculum to supplement the content-focused curriculum traditionally used in this course. *Email: mharrell@cmu.edu Although there is no generally accepted, comprehensive list of skills that constitutes 'CT skills', there seems to be fair agreement on many types of skills to which educators are referring when they speak about teaching CT. Specifically, most agree that one aspect involves the ability to reconstruct, understand and evaluate an argument -cognitive tasks we may describe as 'argument analysis' (Ennis, 1987;Fisher & Scriven, 1997;Kuhn, 1991).
The first step in argument analysis is reading a text for the argument, as opposed to, for example, reading for the plot (as in a novel) or the facts (as in a textbook). Mandler (1984) provides an overview of research supporting the claim that adults and children as young as three years old possess 'story schemata' that guide understanding when reading or listening to a story. Thus, learning the skill of reading for the argument requires students to develop a new schema, or set of schemata, with which they can interpret the text appropriately.
Schema theory, first introduced by Bartlett (1932Bartlett ( , 1958 and further developed by Evans (1967), Rumelhart and Ortony (1977) and Mandler (1984), explains cognition as information processing mediated by schemata. A schema is a packet of knowledge containing both data and information about the interconnections among the data. Rumelhart (1980) refers to schemata as the representations of concepts stored in memory, and Sweller (1994) describes schemata as representations of either concepts or problem-solution procedures.
To facilitate the acquisition of new schemas, Sweller (1994) recommends reducing the extraneous cognitive load during the learning process. One common way of reducing extraneous cognitive load is by using graphic organizers (GOs), such as diagrams, to supplement regular reading and instruction. Previous research has shown that students' use of GOs is generally efficacious in producing improvements on a wide range of cognitive tasks -including those generally labelled CT tasks -that are significantly higher than improvements gained by students engaged in reading and regular instruction alone (Horton et al., 1993;Moore & Readance, 1984).
In this paper we investigate whether teaching our students argument analysis with a particular kind of GO in our introductory philosophy aids the development of new schemata better than traditional ways that argument analysis is taught. For our students, argument analysis consists in identifying which statements are the main conclusion, the sub-conclusions and the premises, identifying the structure of the argument by determining how the premises work together to support the conclusion and evaluating the argument by determining whether the premises actually do support the conclusion and whether they are true.
Our supplementary curriculum focuses on reading text for an argument and constructing argument diagrams (ADs). An AD is a visual representation of the content and structure of an argument (for an overview of the development of argument diagramming see Reed, Walton and Macagno [2007]). For illustration, consider the following argument: I think everyone would agree that life is worth protecting and that the environment sustains all of us. It stands to reason, then, that we need to protect the environment. One particular threat to the environment is the emission of greenhouse gasses. This is because greenhouse gasses trap the energy of the sun, causing the warming of the planet, and the warming of the planet could have catastrophic effects on the environment. So, we just can't avoid the conclusion that we need to reduce greenhouse gas emissions.
For the AD, the claims are put into boxes, the inferential connections are represented by arrows and all the excess verbiage is removed (see Figure 1). Previous research indicates that learning argument diagramming does aid the development of students' CT skills in a curriculum devoted to informal logic ( van Gelder, Bissett, & Cumming, 2004). In order to test the efficacy of our merely supplementary curriculum, we designed a quasi-experimental study in which some of the students in introductory philosophy received AD instruction along with the regular content instruction for the course while the other students in the course received only the regular content instruction. The students completed pre-tests at the beginning of the semester and post-tests at the end, each consisting of a series of argument analysis tasks. We repeated this study, with modifications, the following semester.
To summarize: in this context, schema theory would suggest that students with no existing appropriate schemata for reading text will gain significant understanding by acquiring a new schema and tuning it with regular practice. We conducted two studies aimed at testing this prediction. Our hypothesis is that students who began the semester with poor argument analysis abilities will benefit from AD instruction and so gain more from pre-test to post-test than students who do not receive AD instruction.

Participants and design
In the Spring of 2004, 139 students (46 women, 93 men) in four distinct sections of an introductory philosophy course took a pre-test at the beginning of the semester and a structurally identical post-test at the end. These tests each consisted of several short arguments to be analyzed. Each distinct section had a different lecturer and teaching assistant and the students chose their section at the start of the term. The students attended class with the lecturer twice a week and a recitation class with the teaching assistant once a week.
The students in Section 1 of the course (35 students -13 women and 22 men) were explicitly taught the AD curriculum. In contrast, students in Sections 2, 3 and 4 (104 students -33 women and 71 men) were not taught the AD curriculum, but, rather, were taught (only implicitly) to use more traditional kinds of representations (e.g. lists of statements).

Materials
To determine whether students developed an 'argument schema' over the course of a semester, we developed a pre-test to be taken at the beginning of the semester and a companion post-test to be taken at the end. Each question on the pre-test consisted of a series of questions about a particular argument and for each argument on the pre-test, there was a structurally identical argument on the post-test. In Questions 1 and 2 the student was only asked to state the conclusion of the argument. Questions 3-6 each had five parts: (1) state the conclusion of the argument, (2) state the premises (reasons) of the argument, (3) indicate (via multiple choice) how the premises are related, (4) provide a visual, graphical, schematic or outlined representation of the argument and (5) decide whether the argument is good or bad and explain this decision.

Test coding
The students who either only took the pre-test or only took the post-test were not included in the study, but their tests were used for coder-calibration, prior to each session of coding. Included pre-tests and post-tests were coded by two coders, who each independently coded all pairs of tests.
Questions 1 and 2, and each question part of 3-6 except for part (4), was coded 1 for a correct answer and 0 for an incorrect answer. Thus, there were 18 question-parts that were coded either 1 or 0. During the coder-calibration session, we determined that there were some standard representations that students used, so for part (4) of each question, answers were coded according to the type of representation used: Correct argument diagram, Incorrect or incomplete argument diagram, List, Translation into logical symbols (like a proof), Venn diagram, Concept map, Schematic (like P1 + P2/ C), Other or Blank.
The Percentage Agreement (PA) between the coders was .85 for both tests, so the inter-coder reliability was good. But one coder was systematically a 'tougher grader' than the other and we wanted to allow for a more nuanced scoring of each question than either coder alone could give, so, for each test, the codes from the two coders on each question were averaged.
Since we were interested in whether the use of AD aided the student in answering each part of each question correctly, the code a student received for part (4) of each multi-part question was set aside, while the sum of the codes received on each of the other 18 question-parts determined the raw score a student received on the test. This raw score was converted to a percentage (the 'score').
The primary variables of interest were the pre-test and post-test scores, whether the student was taught AD and the kind of visual representation the student provided for each multi-part question on the post-test for each semester. In addition, the following data were recorded for each student in each semester: which section the student was enrolled in, the student's final grade in the course, the student's year in school, the student's home college, the student's sex and whether the student had taken the concurrent honors course associated with the introductory course.

Student characteristics
To determine whether the students in the study differed in any statistically significant characteristic other than being taught AD, we tested how well we could predict students' gains from pre-test to post-test based on the variables we had collected. We performed a regression for Gain using Pre-test, Section, Gender, Honors, Grade, Year and College as regressors. The results indicate that none of the variables besides Pretest and Section was a factor in a student's post-test score. Thus, we are confident that the students in the treatment group were not different in any important aspect from the students in the control group.

Grouping students by pre-test score
The results of the pre-tests and post-tests were analyzed using a classification of the students' academic level as reflected in their pre-test results. We divided the students into three roughly equal groups based on the pre-test scores: Low Academic Level, Intermediate Academic Level and High Academic Level. The dividing lines between the groups were decided based on the mean test score: the intermediate level included scores one half of a standard deviation on either side of the mean, the low level and high level consisted of the scores below and above the intermediate level, respectively. In this study, Low was score ≤ 0.5, Intermediate was 0.5 < score < 0.67 and High was score ≥ 0.67.

Comparison of students by treatment and by argument diagram use
We first determined whether, overall, the average gain of the students who were taught AD was statistically significantly higher than that of the students who were not. The scores are given in Table 1.
Overall, the post-test scores of the students who were taught AD were higher than those of the students who were not, t(89) = 6.05, p < .001. However, this can be explained by the fact that, overall, the pre-test scores of the students who were taught AD were higher than those of the students who were not, t(61) = 2.50, p = .015. This explanation is further supported by the fact that the gains from pre-test to post-test of the students who were taught AD were not significantly different from the gains of the students who were not taught AD, t(53) = 1.35, p > .05. The hypothesis was that the differences in improvement on performance of argument analysis tasks between students who do and do not learn AD will be greatest for students who are low-achieving. We tested this by determining whether, for each academic level, the mean gain from pre-test to post-test of the students who were taught AD was statistically significantly higher than the mean gain of the students who were not. The results of this analysis indicate that the hypothesis is confirmed. The scores are given in Table 2. A two-way ANOVA was conducted for Pre-test with factors 'Taught' and 'Academic Level'. The main effect of Taught was not statistically significant (as was expected, since the students had not actually been taught anything at the time of the pre-test ), F(1,133) = 0.7, p > .05, while the main effect of Academic Level was statistically significant (as expected, since Academic Level was based on the pre-test scores), F(2,133) = 151.2, p < .001. The interaction between the factors was not statistically significant, F(2,133) = 0.2, p > .05.
An ANCOVA was conducted for Post-test with the same factors and using Pre-test as a covariate. The main effect of Taught was statistically significant, F(1,133) = 21.1, p < .001. The main effect of Academic Level was not significant, F(2,133) = 2.1, p > .05, but the effect of the interaction was, F(2,133) = 2.9, p < .05. In addition, an ANOVA was conducted for Gain with the same factors. The main effect of Taught was statistically significant, F(1,133) = 14.6, p < .001, as was Academic Level, F(2,133) = 40.8, p < .001, but not the interaction, F(2,133) = 2.4, p > .05.
In order to explain these results, Tukey post-hoc comparisons of the treatment group and the control group among academic levels were performed. These results indicate that the pre-test scores of the students who were taught AD were not significantly different from the pre-test scores of the students who were not, among any of the academic levels (p > .05 in all cases). However, the post-test scores of the students who were taught AD were significantly higher than the post-test scores of the students who were not, among both the low academic level students, t(7) = 3.10, p = .02, and the intermediate students, t(32) = 3.93, p < .01, as were the gains for the low academic level students, t(7) = 2.48, p = .14 and the intermediate students, t(32) = 3.51, p < .01. In contrast, among the high academic level students, neither the post-test scores nor the gains of the students who were taught AD were different from the pre-test scores of the students who were not taught AD (p > .05 in both cases). More importantly, though, we are interested in the effect sizes for the treatment group versus the control group. Cohen's d (the difference in average means of the pretest and post-test divided by the standard deviation of the pre-test) was calculated for each academic level. The results are given in Table 3.

Discussion
The main conclusion that can be drawn from the results of the pilot study is that by dividing the participants into groups based on their pre-test scores, we can see clearly whom the AD instruction is helping the most. These results indicate that students who begin the introductory philosophy course with poor or moderate argument analysis skills benefit from being taught AD. That is, among the low and intermediate academic levels, students in the treatment group gained significantly more from pretest to post-test than students in the control group. The difference in the effect size of the instruction over the course of the semester is extremely large -for the low and intermediate levels, 6.9 and 6.9, respectively, for the students who were taught AD, but only 2.8 and 3.5, respectively, for the students who were not. This is in contrast to the result that, among the high academic level, the gains of the students in the treatment group and the students in the control group were the same. This indicates that the hypothesis is confirmed: learning AD aids low-and moderatelyachieving students in the improvement of performance on argument analysis tasks over a semester-long course in introductory philosophy.
While these results are very encouraging, this pilot study suffered from several methodological flaws, all of which were addressed in Study 2. First, the number of students in the treatment group was significantly lower than the number of students in the control group. Moreover, the students in the treatment group all received instruction from the same lecturer, while the students in the control group received instruction from three different lecturers. In addition, while all the students completed the pre-tests on the same day, different students completed the post-test on different days, depending on the student's class. Finally, nearly all of the students in both the treatment and control groups answered Questions 1 and 2 correctly on both the pre-test and the post-test. Also, part (3) of Questions 3-6 asked the students to choose one characteristic to describe the structure of the argument, when in fact several of the arguments had more than one of the characteristics listed. This led to the determination that the coders should give a code of 1 to part (3) of each question if the answer correctly identified one of the structural characteristics of the argument.

Study 2 Methods
Participants and design In the Fall of 2004, 130 students (36 women, 94 men) in five distinct classes of the introductory philosophy course were studied. Each distinct class had a different lecturer and teaching assistant and the students chose their class at the start of the term. Three of the five instructors were also instructors in the first study: one of these instructors taught AD in both studies, one taught AD in the second but not the first and one did not teach AD in either study. Sixty-eight students (21 women and 47 men) in 3 different classes were explicitly taught AD (the treatment group), while 62 students (17 women and 45 men) in 2 different classes were not taught AD (the control group). This distribution addressed two of the flaws of the previous study: (1) the treatment group and the control group have roughly even numbers and (2) both the treatment group and the control group were taught by multiple instructors.

Materials
We used modified versions of the pre-test and post-test from the previous study. The tests in this study did not have any questions in which the student was only asked to state the conclusion of the argument, thus addressing another of the flaws in the previous study. Instead, each test consisted of five questions in which the student had to analyze a short argument and each question again had five parts: (1) state the conclusion of the argument, (2) state the premises (reasons) of the argument, (3) indicate (via multiple choice) how the premises are related, (4) provide a visual, graphical, schematic or outlined representation of the argument and (5) decide whether the argument is good or bad and explain this decision. For part (3), the students were asked to indicate all the characteristics of the structure of the argument that apply, thus correcting a flaw in the tests of the previous study.

Test coding
The procedure for test coding was identical to that in this first study, except that two different coders were used. In this study, PA = .88 for the pre-test and PA = .89 for the post-test. Thus, the inter-coder reliability was better than for the first study. Again, for each test the codes on each question were averaged. The score for each test was determined the same way as in the first study and the same additional data were recorded for each student.

Student characteristics
Again, we tested how well we could predict students' gains from pre-test to post-test based on the variables we had collected. We performed a regression for Gain using Pre-test, Section, Sex, Honors, Grade, Year and College. These results were the same as those in the first study: as expected, a student's pre-test score and section were statistically significant predictors of the student's post-test score, while none of the other variables we collected was a factor. Thus, we are confident that, as in the first study, the students who were taught AD were not different in any important aspect from the students who were not.
Grouping students by pre-test score Again, we divided the students into three groups based on the pre-test scores: Low Academic Level, Intermediate Academic Level and High Academic Level. While this set of tests was more difficult (because of the change of the question in part [3]), the division between the groups were determined in the same way as in the first study. In this study, Low was score ≤ 0.35, Intermediate was 0.35 < score < 0.55 and High was score ≥ 0.55.

Comparison of students by treatment and by argument diagram use
We first determined that, unlike in the previous study, overall, the average gain of the students who were taught AD was not statistically significantly higher than that of the students who were not. The scores are given in Table 4.
The post-test scores of the students who were taught AD were statistically significantly higher than the post-test scores of the students who were not taught AD, t(124) = 2.67, p = .01. Unlike in the previous study, however, this cannot be explained by differences in pre-test scores as the pre-test scores of the students in the treatment group were not significantly higher those of the control group, t(121) = 1.16, p > .05. Furthermore, the overall gains from pre-test to post-test of the students who were taught AD were not significantly different from the gains from pre-test to post-test of the students who were not taught AD, t(126) = 1.22, p > .05. This puzzle leads to the hypothesis that the differences in improvement on performance of argument analysis tasks between students who do and do not learn AD will be greatest for students who are low-achieving. We tested this by determining whether, for each academic level, the mean gain from pre-test to post-test of the students who were taught AD was statistically significantly higher than those of the students who were not. The results of this analysis indicate that this hypothesis is confirmed. The scores are given in Table 5. A two-way ANOVA was conducted for pre-test with factors 'Taught' and 'Academic Level'. The main effect of Taught was not statistically significant (as Table 4. Number of participants, mean scores (95% Confidence) (SD) for the pre-test, posttest and gain for students who were either taught or not taught argument diagramming in the fall of 2004. The interaction effect was also statistically significant, F(2,124) = 3.9, p = .02. An ANCOVA was conducted for post-test with factors 'Taught' and 'Academic Level', with 'Pre-test' as a covariate. While the main effect of Academic Level was not significant, F(2,123) = 2.26, p > .05, the main effect of Taught was statistically significant, F(1,123) = 5.92, p = .01, as was the interaction, F(2,123) = 4.51, p = .01. In addition, an ANOVA was conducted for Gain with the same factors. The main effect of Taught was statistically significant, F(1,124) = 3.05, p = .05, as was the main effect of Academic Level, F(2,133) = 24.78, p < .001 and the interaction, F(2,124) = 4.88, p = .009.
In order to explain these results, Tukey post-hoc comparisons of the treatment group to the control group among academic levels were performed. These results indicate that among each of the low, intermediate and high academic level students, the pre-test scores of the students who were taught AD were not different from the pretest scores of the students who were not taught AD (p > .05 for all). However, among the low academic level students, the post-test scores of the students who were taught AD were statistically significantly higher than the post-test scores of the students who were not taught AD, t(38) = 3.42, p = .01, as were the gains, t(38) = 3.57, p = .007. In contrast, the post-hoc comparisons indicate that, among both the intermediate and high academic level students, neither the post-test scores nor the gains of the students who were taught AD were different from those of the students who were not taught AD, p > .05.
Again, we are interested in the effect sizes for the treatment group versus the control group and Cohen's d was calculated for each academic level. The results are given in Table 6.

Discussion
The results of the main study corroborate the conclusion of the pilot study: that lowachieving students do, in fact, benefit from being taught argument diagramming. First, there was no statistical difference in the pre-test scores between the treatment group and the control group. And while the students in the treatment group had statistically significantly higher post-test scores than the students in the control group, the gains from pre-test to post-test of the treatment group were not statistically different from the gains of those in the control group. The solution to this puzzle is to divide the participants into groups based on their pre-test scores. These results indicate that students who begin the introductory philosophy course with poor argument analysis skills benefited the most from being taught AD. That is, among the low academic level, students in the treatment group gained statistically significantly more from pre-test to post-test than students in the control group. This is in contrast to the results that, among the intermediate and high academic level, the gains of the students in the treatment group and the students in the control group were not statistically significantly different. For both low academic level and high academic level this is the same result as that of the pilot study. For the intermediate academic level, however, the results from Study 1 are different from the results of Study 2.
This difference can be seen specifically in the comparison of the effect sizes of the treatment in both studies. In both studies, the difference between the effect sizes (as measured by Cohen's d) is large (2.1, 1.6) for the low achieving students and small (0.36, 0.13) for the high achieving students. The studies differ, however, for the intermediate achieving students (3.7 for Study 1 and -0.5 for Study 2).

General discussion Findings
For students taking introductory philosophy, our studies have one main result, shown in two ways. First, both the pilot and main studies showed that students with the poorest argument analysis skills at the beginning of the semester benefit the most from instruction on argument diagramming. The low-achieving students who were taught argument diagramming improved their abilities on these tasks significantly more than the low-achieving students who were taught using more traditional methods, while the high-achieving students who were taught argument diagramming improved about the same as the high-achieving students who were not taught argument diagramming. The results were unclear for students who were intermediate-achievers. In the pilot study, the intermediate achieving students who were taught argument diagramming improved their abilities on argument analysis tasks significantly more than the intermediate-achieving students who were not taught argument diagramming. In the main study, however, there was no difference in improvement among the intermediateachieving students between those who were and were not taught argument diagramming. However, given the evidence from other studies that argument diagramming does significantly aid CT skills development ( van Gelder, Bissett, & Cumming, 2004), it seems reasonable to conclude that teaching argument diagramming to students with average argument diagramming skills does, in fact, aid the improvement of these skills over the semester.
Second, the sizes of the effect of the introductory philosophy course on students' argument analysis skills are quite high in each of the achievement levels. This is partly due to the fact that the pre-tests and post-tests were specifically designed to test the skills we were trying to teach our students. But the above results were borne out in the effect sizes. In both studies the size of the effect was far greater for low-achieving students who were taught argument diagramming than who were not. Also, in both studies, the effect size was roughly equal for high-achieving students who were and were not taught argument diagramming. In contrast, in the first study the effect size was quite different among intermediate-achieving students for those who were taught argument diagramming and those who were not, while in the second study there was no difference.
Schema theory not only offers a motivation for using argument diagramming in an introductory philosophy course, but also explains the results of these two studies. Instruction on argument diagramming to aid argument analysis provides a new schema and a new cognitive process to students for reading text. Students may begin the introductory philosophy course with only one or two general schemata for reading text -the story schema and perhaps the fact schema -especially if they have never, or rarely, been taught how to read for an argument. This explains why some students perform poorly on argument analysis tasks at the beginning of the semester.
The process of learning how to understand written arguments and construct ADs is then equivalent to the process of acquiring a new reading schema -the argument schema. This new structural schema is a picture of the text in which the claims are enclosed in boxes and the inferential connections between the claims are the arrows between the boxes. In fact, as Kienpointner (1987) and van Eemeren and Kruiger (1987) argue, this general argument schema can itself be restructured into several different argument schemata that cover a wide range of different kinds of arguments -for example modus ponens, disjunctive syllogism, argument by analogy and argument from authority. As the students with poor or average argument analysis skills become more familiar with the new schema and tune it with extensive practice, their argument analysis abilities -especially under time constraints -greatly improve.
Similarly, the likely reason that students who have well-developed argument analysis skills at the beginning of the course do not benefit as much from argument diagram instruction and practice is that they come to the course with reasonably welldeveloped schemata for reading arguments already in place. These students spend the semester tuning their schemata and so improve during the semester, but the tuning happens equally well no matter what kind of argument analysis construction they receive.
We conclude that taking our introductory philosophy course helps students develop certain critical thinking skills -those we call 'argument analysis skills'. We also conclude that learning how to construct ADs significantly improves the ability of a student with poor or average argument analysis skills to analyze, comprehend and evaluate arguments.

Educational importance
The primary educational importance of these studies is two-fold. First, the results indicate that it is possible to significantly improve students' argument analysis skills over the course of just one semester. This finding is important since many studies have shown that college students in general improve their CT skills (of which argument analysis is a part) only one standard deviation at most during all four years of college (Pascarella & Terenzini, 2005). This discouraging statistic, however, may be much improved by more students enrolling in courses that explicitly focus on the development of CT skills.
Second, these results indicate that a relatively small addition to the curriculum of an introductory philosophy course can have dramatic benefits for the students who begin the semester with the poorest argument analysis skills. The initial instruction in understanding arguments and creating argument diagrams can be given in one or two class-periods and regular, weekly homework assignments can be added to reading, summary and/or reflection assignments. Teaching argument diagramming does not require a radical reworking of an instructor's syllabus, course readings or assignments. This is a great benefit to instructors who may be reluctant to change a curriculum that has been working reasonably well.

Limitations and future work
The pre-tests and post-tests completed by the students in these studies do accurately assess the skills we want our students to develop over the course of the semester in the introductory philosophy course. Nonetheless, one significant limitation of these two studies is the fact that these tests are not standard critical thinking skills tests and thus comparison to other studies may be difficult. In future studies, we plan to use recognized critical thinking assessment tools (such as the Ennis-Weir Critical Thinking Skills Test or the California Critical Thinking Skills Test) for our pre-test and posttest pairs.
In addition, there is a potential problem with the use of two tests which are not known to be equivalent for the pre-test and post-test. It could be that any improvement measured is due to the fact that the post-test is easier than the pre-test. 1 In future studies, equal numbers of both tests should be randomly given in each section as a pre-test and then each student would get the test not taken as a post-test.
Our future work also aims to address other areas in which teaching argument diagramming might usefully aid students. For example, while it is clear that the ability to construct argument diagrams significantly improves a student's critical thinking skills along the dimensions tested, it would be interesting to consider whether there are other skills that may usefully be labelled 'critical thinking' that this ability may help to improve.
In addition, the arguments we used in testing our students were necessarily short and simple when compared to arguments encountered in primary source