Analysis of category level performance on the Praxis® earth and space science: Content knowledge test: Implications for professional learning

Abstract A large body of work has shown that science teacher knowledge is one of the most fundamental components of effective teaching and learning. Our study analyzes the Praxis® Earth and Space Science Content Knowledge Test (ESS CKT) from May 2006 to June 2016. We present one of the largest datasets comprising 11,273 ESS teacher candidates in order to provide information about their demonstrated ESS CK. Understanding that the benefits associated with teacher retention outweigh the cost of hiring new teachers, our results can be used to design targeted professional learning (PL) experiences for pre- and inservice teachers. Findings from this study are particularly useful while planning inservice topic specific PL for teachers pre- and inservice ESS teachers by answering the following research questions (1) How have examinees performed as a whole in each category on the Praxis® ESS CKT? (2) Which personal and/or professional characteristics are most associated with examinee performance in each category and how does this inform professional learning? Examinee performance at the category level was analyzed through a five-part process: 1. Confirmatory Factor Analysis: 2. Percent correct; 3. Regression; 4. ANOVA; 5. Scaled points lost. Our findings revealed that examinees demonstrated strongest performance in the topics assessing Earth’s Atmosphere & Hydrosphere, Earth Materials & Surface Processes, and Tectonics & Internal Earth Processes and identity History of the Earth and its Life-Forms as topics in need of support. Across categories, we found differences in achievement associated with undergraduate major, gender, and ethnicity. Test-takers with geoscience majors consistently lost fewer points than their out-of-field counterparts, that men outperformed women in the study, and White test-takers lost fewer scaled points than Black and Hispanic candidates. Our recommendations include reviewing our results for alignment with state standards in order to develop comprehensive CK development that will be used as an anchor for focused support on those topics where test-takers tend to demonstrate lowest proficiency.


Introduction
Access to high quality STEM education continues to challenge US education systems. It has been widely accepted that teachers are central to improving K-12 education systems (Darling-Hammond & Youngs, 2002;Harris & Sass, 2011;Toh et al., 2007). American students tend to underperform in STEM fields when compared with students in other countries. This underperformance has been attributed, in part, to a lack of access to STEM teachers (Huntoon & Baltensperger, 2012;Kuenzi, 2008). High quality science education engages students with rigorous science and engineering standards that encourages students to explore the world around them (National Research Council, 2012). In 2015, the Every Student Succeeds Act replaced No Child Left Behind and allowed each state to determine their own definition of quality with a focus on equity in access to effective educators through policies that improve instruction (Saultz et al., 2017).
Standardized tests such as the ESS Praxis ® CKT are designed with the intent to ensure a measure of quality by providing an objective view of what entry level teachers know about their discipline (Goldhaber & Hansen, 2010). Each state has autonomy over teacher licensing and preparation. Potential teacher pipelines are diverse and include college and university students, those holding unused teacher certifications, and classroom teachers changing schools or disciplines, but the main entry point is traditional teacher preparation programs. Despite having multiple pathways into the profession, there remains a shortage of science teachers (Aragon, 2016). This science teacher shortage crisis results from increasing student enrollment, high teacher turnover, and a lack of qualified new teachers (van Rooij et al., 2019). Teacher retention has a positive association with gains in student achievement as well as student attendance and decreases in classroom disruptions (Podolsky et al., 2016). The ESS community has faced unique challenges associated with teacher recruitment and retention. It is misperceived as a "lesser science" and less critical to society (Lewis, 2017) and is associated with higher rates of out-of-field teaching when compared with science counterparts in other disciplines (Huntoon & Baltensperger, 2012;Lewis, 2008;2017).
Our previous study on the ESS Praxis ® CKT as a whole without investigating its sub-categories (Ndembera et al., 2021) revealed performance and participation gaps associated with undergraduate major, ethnicity, and gender. This left us with further questions about whether we would find similar patterns regarding differences in performance at the category level. Here we build on that foundation through CK analysis at the category level of the ESS Praxis ® CKT in order to identify topics and populations for targeted pre-and inservice PL efforts that support entry level ESS teachers. This paper seeks to answer the following research questions (1) How have examinees performed as a whole in each category on the Praxis ® ESS CKT? (2) Which personal and/or professional characteristics are most associated with examinee performance in each category and how does this inform professional learning?

Background & research context
In order to obtain teacher licensure, candidates commonly complete college courses in education and/or courses related to the subject they plan to teach and pass a standardized licensure test (Goldhaber & Anthony, 2003). Teacher licensure tests ensure a standardized baseline of content knowledge (CK) for entry level teachers before they enter the classrooms. Without the ability to directly observe teachers in their classrooms, the ESS CKT's screening function ensures a baseline of CK and prevents those who would be considered under-qualified from becoming teachers (Goldhaber & Hansen, 2010, Mehrens & Phillips, 1989. ESS teachers holding a bachelor's degree in geosciences would have likely completed a combination of core geology courses including electives and related STEM courses (Drummond & Markin, 2008).
Typical content courses for ESS candidates include astronomy, physical & historical geology, structural geology, mineralogy, paleontology, meteorology, and oceanography along with supporting STEM courses such as biology, chemistry, physics, and calculus (Drummond & Markin, 2008;Lewis, 2008). Nearly all public school teachers in the country hold an undergraduate degree. When compared with the non-STEM majors, fewer science and math teachers hold degrees in the field in their subject area and even fewer hold advanced degrees in the subject areas in which they teach.
The Praxis ® exams are designed to assess academic and subject-specific CK (Educational Testing Service: Praxis, 2020a). The ESS CKT is the most ubiquitous ESS teacher licensure examination across the United States and is used in 31 states and Washington DC. Over the decade studied, the 11,273 examinees present one of the largest datasets of ESS teacher candidates in order to provide information about their demonstrated knowledge at the topic level. It should be noted that states with large teacher populations such as California, New York, Texas, and Florida have their own licensure examinations (Gitomer & Qi, 2010) and are not included in this study.
The ESS CKT is a 125 question selected-response assessment developed in alignment with the National Science Education Standards and National Science Teacher Association standards. Assessment questions are designed to evaluate conceptual understanding, critical thinking and problem solving in science. Content categories include (1) Basic Principles and Processes, (2) Tectonics and Internal Earth Processes, (3) Earth Materials and Surface Processes, (4) History of the Earth and its Life-Forms, (5) Earth's Atmosphere and Hydrosphere, and (6) Astronomy (Educational Testing Services: The Praxis Study Companion, 2020b). Summarized Test Specifications detailing topics assessed within each category are depicted in Supplemental  Table A.
Although the categories and generalized proportions represented on the assessment (Table 1, Supplemental Table A) are consistent, the questions vary between test forms. This form of quality assurance minimizes the influence of prior experience with the test form for examinees who take the assessment multiple times. States are invested in meeting prospective teachers' and preparation programs' needs.
Candidates have access to test preparation materials including free Study Companions with learning outcomes for each category, sample questions with answers and explanations, and study tips and strategies (Educational Testing Service, 2018). Each state develops their own standard-setting procedures in order to determine the passing score for the licensure examination (Gitomer & Qi, 2010).
Educational Testing Service (ETS) takes measures to ensure standardized treatment of examinees by constructing assessments that support fairness, reliability, and validity of reported scores (Educational Testing Service, 2014). Test development relies on input from a range of stakeholders in the field. Educators and faculty from teacher preparation programs are present throughout the process and offer input to ensure that tests are valid for the intended purpose of assessing knowledge and skills necessary for an entry level teacher (Educational Testing Service, 2018). Test item development includes preliminary item analysis, a statistical analysis intended to flag items with the purpose of ensuring there is no more than one correct answer (Educational Testing Service, 2018). Where sample size is great enough, Differential Item Functioning (DIF) analysis is performed to identify any differences in subgroup performance potentially associated with race or gender. Any items with high levels of DIF are dropped and are not used in future assessments (Educational Testing Service, 2018). Those who argue against using teacher licensure exams assert that there is little to no statistical difference in student outcomes between teachers who meet a passing cut score and those who do not (Goldhaber, 2007;Hanushek et al., 2005). There are also concerns with the test acting as a gatekeeper for preservice teachers of color (Petchauer, 2015). We've chosen to analyze the Praxis ® ESS CKT because it is currently the most commonly used benchmark for assessing CK of those entering the field. We assert that CK is central to instructional quality and student performance, thus teachers with stronger CK are better equipped to implement science curricula and are more likely to identify and respond to students' instructional needs. We aim to leverage Praxis ® ESS CKT assessment data by offering insight into what entry-level ESS teachers know at the topic level and direct a focus for PL activities designed to improve ESS teacher classroom practice. Continuous engagement in PL activities refers to ongoing learning experiences of teachers intended to improve instructional quality and student learning (Akiba & Liang, 2016;Luft & Hewson, 2014;Opfer & Pedder, 2011). These include formal activities through inservice programs and preservice coursework or informal activities such as collaboration with colleagues and independent study (Akiba & Liang, 2016;Harris & Sass, 2011;Luft & Hewson, 2014). In order to best prepare ESS teachers to provide high quality instruction, it is imperative to understand how ESS teachers develop in their practice. Shulman (1987) proposed three categories of teacher content knowledge: content knowledge (CK), pedagogical content knowledge (PCK) and curricular knowledge. PCK encompasses the instructional strategies teachers draw upon and is critical as they engage students in learning. It also includes practices that enable teachers to assess, identify, and respond to student misconceptions (Kind, 2009). In order to strengthen the development of PCK, science teachers need a solid foundation of science concepts to accompany general pedagogy (Long et al., 2019;Schneider & Plasman, 2011).

Conceptual framework
The National Research Council (2012) calls upon the science education community to provide high-quality opportunities for students to engage in science through provision of three dimensional learning that includes rigorous standards, science & engineering practices, and crosscutting concepts. In order to support the call to action, science PL must consider the discipline and CK focus with the understanding that this impacts growth in PCK (Luft & Hewson, 2014). Figure 1 depicts a model for ESS PCK, teacher professional knowledge and skill, adapted from Julie (2015). ESS teacher professional knowledge and skill are influenced by topic specific knowledge, CK, as measured by the Praxis ® ESS CKT, teacher identity, and PL experiences. Overall growth in PCK and student outcomes relies on topic specific professional knowledge and an understanding of student developmental level (Julie, 2015) as they engage with science CK. This study focuses on category level CK as measured by the Praxis ® ESS CKT with recommendations for topic-specific PL experiences. Teacher PL is an ongoing process and should be considered of equal importance to preservice learning experiences (Angrist & Lavy, 2001;Harris & Sass, 2011, Webster-Wright, 2009). CK and associated PCK can be improved through inservice PL opportunities (Kanter & Konstantopoulos, 2010) as teachers become more sophisticated in their decisions around how best to teach ESS.
At the secondary level, ESS is far too commonly taught by teachers without geoscience degrees (Huntoon & Baltensperger, 2012;Lewis, 2008). High numbers of out of field teaching suggests a need for improved recruitment and retention efforts with a focus on geoscience and instructional quality. Here it is important to note that owing to differences in state licensure requirements a candidate may be considered qualified to teach ESS in their state while lacking depth in foundational content knowledge base. This is concerning because student engagement with the practice of science relies on strong content-area preparation (Huntoon & Baltensperger, 2012).
Understanding the diversity in teacher experiences will help professional developers (Moore, 2008), researchers, policymakers and educators as they explore factors influencing  (2015). earth and space science teacher knowledge and skill is influenced by topic specific knowledge. teacher personal and professional characteristics serve to amplify and/or filter their goals for teaching. student outcomes drive professional development that strengthens teacher PcK.
instructional quality. Professional and personal characteristics intersect to influence the positional identity of science teachers (Mensah, 2016). There is a need for recruitment and retention of STEM teachers who better reflect the communities in which they teach (Podolsky et al., 2016). The positional identity of teachers influences their PCK (Mensah, 2016;Moore, 2008) and allows students to interact with science content through their unique experiences and identities (Mensah & Jackson, 2018). Multicultural social reconstructionist (MCSR) education reforms address those personal and professional characteristics associated with assessment performance. Incorporating MCSR approaches into teacher preparation programs allows candidates to address associations between their roles as teachers and social class, race, and gender identities directly through their coursework (Boylan & Woolsey, 2015;Martin & Van Gunten, 2002).
Ultimately, for transformation in instructional practice to influence student outcomes, PL must be ongoing and offer adequate support for teachers (Marshall et al., 2017). Teacher CK plays a central role in certification processes & programs; student learning; and ongoing PL opportunities for pre-and inservice teachers. This starts with prior teaching experiences and science coursework in conjunction with teacher preparation programs where professional identities and social relations are formed (Martin & Van Gunten, 2002). Effective PL improves student achievement and emphasizes the impact of teacher knowledge on their beliefs toward teaching (Desimone, 2009). The lens through which CK is delivered is associated with teacher identity where personal and professional characteristics serve to amplify and/or filter their goals for PCK (Julie, 2015).
While the screening function of teacher licensure examinations aims to set a benchmark standard for entry level teacher CK, current licensure policies have a disparate impact on teaching eligibility. When planning and evaluating PL for pre-and inservice teachers, Guskey (2014) suggests that it is best to begin with the end in mind. Analysis of achievement data such as results from large scale Praxis ® ESS CKT assessments will help to identify content areas where ESS teachers would benefit from content-specific PL and provide information for developers as they plan these learning opportunities (Guskey, 2014).
Teachers participate in a range of PL experiences that include both formal, structured seminars and informal peer discussion with colleagues (Desimone, 2009). There are a multitude of PL opportunities available to science teachers including mentoring, coaching, conferences, research experiences and localized in-school and virtual programs (McConnell et al., 2013;Rodriguez, 2010;Whitworth & Chiu, 2015). Our previous work (Ndembera et al., 2021) provided an overview of the test as a whole with recommendations for improving inclusivity within ESS fields. Few studies offer the granular topic-level CK for entry level ESS teachers across diverse disciplinary backgrounds. With the goal of improving ESS student learning, data presented here serves to anchor and individualize PL around critical features such as course content and target teacher populations that will enhance science knowledge and skills necessary for strengthening classroom practice (Easton, 2008;Webster-Wright, 2009).

Study population & setting
The study sample includes analysis of the total population of examinees who took the Praxis ® ESS CKT. This includes all examinees from May 2006 -June 2016 with reported ages between 18-75 (N = 11,273). Supplemental Table B presents a detailed list of personal and professional characteristics for the testing population over the decade studied. Examinees self-report demographic information in response to survey questions asking about personal and professional characteristics embedded within the exam. Test-takers report their biological sex by selecting from male/female options. Non-binary options were not available for respondents in the study population. Understanding that these terms reference biological sex rather than gender, we use terms such as man and woman when not discussing results directly. In order to maintain consistency with reporting on the Praxis ® ESS CKT we will use male/female options when discussing our results.
Examinees can take the assessment more than once. In order to avoid duplication, the dataset was filtered by the maximum (highest) test score. Test-takers are not required to answer all demographic questions. To handle missing data in our models, pairwise deletion was selected. Had we used listwise deletion, all data for any test-taker would have been removed if they chose to leave any one response blank and would have resulted in potential bias, thus limiting our dataset. With pairwise deletion, which is allowed when conducting regressions, our data was maximized through correlation analysis where information was available. Each step in the regression is calculated separately using only cases that have data available for that step. In other words, if a test-taker chose to provide data on ethnicity but not gender, we looked for correlations between ethnicity and category score for that test-taker but did not include them in analysis of correlation between gender and category score.
Women made up the majority of the testing population (54.3%). Asian American, Pacific Island American, Alaskan Native, Native American, multiple races were grouped and formed the "Other" category for ethnicity accounting for 5.2% of the testing population. Black and Hispanic test-takers make up 2.9% and 1.7% respectively, resulting in an overwhelmingly White testing population (85.7%). These participation rates do not match the national population (US Census Bureau, 2018). Geology & ESS undergraduate majors comprised 18.9% of the population studied. STEM other included majors such as general science, mathematics, and engineering, comprising 8.0% of the testing population. Biology, chemistry, and physics made up 16.4%, 2.8% and 1.7% respectively. Non-STEM other subjects included humanities majors such as history, foreign languages, drama, fine arts. Education majors represented the largest category for reported graduate majors (26.0%) with Geology & ESS making up only 8.6% of the testing population. 25.8% of the population studied had completed a master's degree while 62.8% had earned a bachelor's degree or less. Those planning to enroll in a teacher education program made up the largest portion of the population studied (39.3%); 16.2% had one -three years of teaching experience; and 26.1% had more than three years of teaching experience.

Research design: Methodology
Assessments such as licensure tests require individuals to pass the test before entering the classroom. Analysis at the category level also allows them to act as signals, indicating where to allocate resources for pre-and inservice PL (Goldhaber & Hansen, 2010). The following models identify topics assessed on the Praxis ® ESS CKT in need of support through professional development. Examinee performance at the category level was analyzed through a five-part process: 1. Confirmatory Factor Analysis: 2. Percent correct; 3. Regression; 4. ANOVA; 5. Scaled points lost.

Confirmatory factor analysis (CFA)
A CFA was conducted through which we examined the assessment through a single factor (whole examination) and as a six-factor (category level) solution using SAS software, Version 9.4. This measurement tool allowed the authors to establish validity for further analysis of the assessment at the category level. Each test item is aligned with one of the six categories on the assessment. For example, a question asking test-takers about S-P arrival intervals would be aligned with the topic of Tectonics & Internal Earth Processes. Through the CFA, structural models are produced in order to specify how well factors are related to one another (Brown & Moore, 2012). Appendix Table C presents the number of cases and relevant statistical parameters for each test form administered over the decade studied.

Estimation of categorical percent correct
To gain insight into mastery of ESS subject matter. The dataset provided by ETS includes information about the highest number of points accumulated by an examinee in each category but does not include the actual number of questions per category. The highest number of items reported was used to represent the total number of questions. In order to estimate the categorical percentage score for each test-taker, the following equation was used: Percent Correct number of correctly answered questions highes / t t itemsreported correct u100 For example, to calculate percent correct on the topic of Tectonics & Internal Earth Processes, an individual who answered 9 of the 18 reported test items would have answered 50% correctly. We repeated this across all test-takers to determine the average percent correct for the category.

Regression model selection
In order to determine which groups of teachers would be most likely to be in need of CK, a stepwise linear regression was performed on the whole data set using SAS software, Version 9.4. A 10-fold cross validation procedure was used in order to offer the best predictive model for our data set. This procedure splits the data into 10 equal-sized parts, one of which is held out for validation (Inc, S. I., 2017). This allowed us to estimate associations between self-reported test-taker characteristics and category performance on the Praxis ® ESS CKT. Self-reported examinee characteristics were looked at as personal (ie. gender, age, ethnicity) or professional (ie. undergraduate major, graduate major, years in teaching, undergraduate GPA). Table 2 depicts the demographic variables identified by the regression model for.

ANOVA model selection
In order to determine which demographic variables were most strongly associated with variance (η 2 ) in test-taker performance at the category level we extended the model. Variables from the regression model were analyzed through Analysis of Variance (ANOVA) calculations using SAS software, Version 9.4. The three variables explaining the greatest η 2 for each category (Table 2) were further analyzed to determine an estimation of scaled points lost.

Estimation of scaled points lost per category
Performance at the category level was analyzed in order to determine examinees' relative performance and provide information on whether there were disciplinary content areas in need of support. Scaled points lost were calculated using the equation: where m was equal to the slope between scaled score and total questions correct on the exam (Shah et al., 2018

Confirmatory factor analysis
The CFA was conducted as part of research question 1 in order to determine whether further investigation at the category level (six factor) was necessary or whether analysis of the examination as a whole (single factor) was sufficient.
The large sample size yielded strong results (Supplemental Table C) for both single factor and six factor solutions for each test form administered during the decade studied. Lower X 2 and root mean square of approximation (RMSEA) fit measures reported in Supplemental Table C presented for each test form within the six-factor solution are smaller and therefore slightly stronger than the single factor solution, thus warranting further investigation at the category level.

ANOVA model: Category performance
The ANOVA model was developed as part of research question 2. Several statistically significant relationships were revealed through correlational analysis of the stepwise linear regression. Identified variables were used to create an aggregate model to determine total η. 2 Table 2 presents the examinee characteristics most strongly associated with category performance on the ESS CKT. For all six categories on the assessment, the F Values and their associated p<.0001 values (Table 2) confirm that the demographic variables represented within the model account for a significant portion of the variability in category level score. Reported η 2 values provide information about the proportion of variance in the category score accounted for in the sample. Large effect sizes presented in Table 2 (Total η 2 >0.14) indicate strong relationships between reported demographic variables and test-taker performance at the category level. Comparison of means reveals differences in achievement most consistently associated with undergraduate major, gender, and ethnicity across the six categories of the assessment. The top three characteristics for each category were further analyzed to make comparisons in scaled points lost. These data are presented in Figures 2-7. Graphical representation was selected as a means of demonstrating differences between subgroups that impact overall scores. As expected, undergraduate major was most frequently associated with category level performance. It ranked in the top three characteristics in all six categories of the Praxis ® ESS CKT. In four of the six categories ESS-related majors lost the fewest scaled points, outperforming those examinees holding degrees in education or non-STEM fields. Males consistently outperformed female test-takers and White candidates lost fewer scaled points than Black and Hispanic candidates.

Basic principles & processes
The three characteristics most strongly associated with performance in the category assessing basic principles and processes ( Figure 2) were gender, undergraduate major, and education level. They explained 4.9%, 4.4%, and 2.3% of the total variance ( In comparing category performance when filtering for examinees' education level those who held doctorate and master's degrees lost an average of 3.0 and 3.9 scaled points respectively while those who hadn't completed bachelor's degrees lost 4.5 scaled points on average. Figure 3 presents the undergraduate major, gender, and ethnicity as the three characteristics most strongly associated with category performance for tectonics and internal Earth processes, explaining respectively 6.2%, 2.7%, and 2.0% (Table 2) of the variance in this category. ESS majors lost the fewest scaled points (4.9) and outperformed those holding other degrees, followed by physics (5.3), and STEM other (6.7). Education majors lost the most scaled points (8.1). Female test-takers trailed males by 1.1 scaled points.

Tectonics & internal earth processes
When comparing scaled points lost based on ethnicity we found that White test-takers lost an average of 6.7 scaled points, Hispanic test-takers lost an average of 7.3 scaled points, and Black test-takers lost an average of 10.2 scaled points.

Earth materials & surface processes
Undergraduate major, ethnicity, and graduate major explained 12.9% of the variance (Table 2) in scaled score for this category with undergraduate major accounting for 8.9%. When comparing scaled points lost (Figure 4) ESS majors lost the fewest scaled points (6.4). Non-STEM other and education majors demonstrated poorest performance, each losing an average of 10 scaled points. White test-takers lost an average of 9 scaled points while Black and Hispanic test-takers lost 13.2 and 9.6 scaled points respectively. In comparing graduate majors, we found similar trends as undergraduate majors whereby examinees holding degrees in ESS fields lost the fewest average scaled points (6.2)

History of the earth & its life-forms
In the category assessing History of Earth and its Life-Forms ( Figure 5) undergraduate major, gender, and ethnicity explained 7.6% of the overall variance (Table 2) with undergraduate major accounting for 4.8%. ESS (5.0) and physics (5.5) majors lost the fewest scaled points. Education and non-STEM other majors performed similarly, losing an average of 7.0 scaled points. Males and females also

Earth's atmosphere & hydrosphere
Undergraduate major, ethnicity, and gender ( Figure 6) collectively explained 14.1% of the variance in performance (Table 2) on Earth's Atmosphere and Hydrosphere items. Similar to other categories, undergraduate major explained  the largest percentage (6.8%) of the variance. ESS (5.3) and physics (5.7) majors outperformed their counterparts holding education (8.0) and non-STEM other (7.2) majors. Male test-takers outperformed females by 1.4 scaled points. White, Hispanic, and Other test-takers performed similarly. They lost the fewest scaled points in this category and outperformed their Black counterparts by an average of 3.9 scaled points.

Astronomy
In Astronomy, gender, undergraduate major, and teaching status ( Figure 7) were revealed to be the top three characteristics and explained 10.7% of the variance in category performance, 5.0%, 2.9%, and 2.7% respectively (Table 2). In this category male and female test-takers performed similarly. Males outperformed females by 1.3 scaled points. Undergraduate majors also performed similarly, physics majors lost the fewest scaled points (2.4) as compared with education (5.8), non-STEM other (5.2), and biology (5.0). ESS majors lost an average of 4.2 scaled points. When comparing category performance based on teaching status, it was revealed that test-takers with three or more years of teaching experience lost the fewest scaled points (4.4) when compared with those who had not enrolled in a teacher preparation program (5.0) or had recently graduated (5.3).

Discussion
Although the Praxis ® ESS CKT is used as a summative assessment for entry-level teachers, it can also be used as a formative assessment benchmark for developers as they plan learning opportunities that improve ESS teacher practice. Examining ESS teacher CK is important because in order for transfer of content to occur teachers need discipline-specific pedagogy supported by a foundation of science CK (Kanter & Konstantopoulos, 2010). Our findings present differences in performance of different groups across categories assessed through the Praxis ® ESS CKT most commonly associated with both professional and personal demographic characteristics of test-takers. Professional characteristics such as undergraduate major, graduate major, education level, and teaching status were associated with performance on the Praxis ® ESS CKT. Outside of the results associated with the reported demographic variables presented here, further research on test takers including details about academic preparation, qualifications, and reasons for taking the assessment is warranted (Gitomer & Qi, 2010). This would offer additional information about the remaining variance in test taker performance at the category level presented in Table 2.
In order to most effectively teach ESS, teachers must have an understanding of CK and PCK. Across three of the six categories (Table 1) the estimated percent correct was below 70%. History of the Earth and its Life Forms was found to have the lowest performance at the category level. This is consistent with literature that discusses student misconceptions about sequencing geologic events within scales of space and time (Dodick & Orion, 2003;Kusnick, 2002).  (Table 1). We found aggregate percentage scores for all three to be above 70%. This is likely because those topics are most commonly included in undergraduate physical geology courses. Analysis of category performance offers a granular view to strengthen ESS content development.
Geology & ESS majors consistently lost the fewest scaled points across four of the six categories. Interestingly, physics majors lost the fewest scaled points in Basic Principles & Processes ( Figure 2) and Astronomy (Figure 7). Traditional geoscience survey courses at the undergraduate level include physical geology, historical geology, and Earth science, however, unlike other science disciplines taught in schools, there is a lack of consistency in standardization of introductory ESS curricula from program to program in what is taught to preservice Earth science teachers (Drummond & Markin, 2008;Lewis, 2008). This may account for discrepancies in performance at the category level (Table 1).
Classroom experience and academic proficiency of teachers are positively correlated with student achievement. Teacher learning, when thought of as continuous, results in increased understanding of science and science instruction as more time is spent within the classroom (Goldhaber & Anthony, 2003;Schneider & Plasman, 2011). Unsurprisingly, test-takers holding doctorate degrees and those with three or more years of teaching experience lost the fewest scaled points. More information about those test-takers with experience in the classroom is needed to make determinations about why they outperformed those with less experience.
Personal characteristics such as gender and ethnicity were associated with assessment performance. Consistent with findings from our previous study (Ndembera et al., 2021) men and White test-takers lost the fewest scaled points. Men and women typically were within a one scaled point range from one another across categories (Figures 2-7 (Ndembera et al., 2021) we found that although Black examinees earned fewer average scaled scores, the difference in scores between ethnic groups was narrowed over the decade studied.

Limitations
This study encompasses one of the most large-scale analyses of the Praxis ® ESS CKT, however, it does not include states such as New York with large bodies of ESS teachers which does not require the ESS CKT for licensure. Because of this, it may not be a representative sample of the US ESS teacher population as a whole. As such, it should be noted that our analysis is specific to the testing population rather than a generalization about CK of a demographic group as a whole. Furthermore, the reported category level scores do not offer information about whether the candidate passed the exam and entered the teaching profession. We acknowledge that while the findings presented in this study do offer a unique perspective on CK, we do not account for variables that impact early career teachers such as school environment, mentorship, or PCK. Presented data is limited to 2006-2016, it is possible that there may be changes in test-taker populations and category level performance.

Implications for practice
Earth science literacy is important as society prepares to respond to current challenges associated with human impacts on Earth systems (Wysession et al., 2012). Despite this, ESS is frequently taught by teachers without a strong foundational background (Huntoon & Baltensperger, 2012;Lewis, 2008). Our category level analysis of the Praxis ® ESS CKT yields identification of subject areas targeted for pre-and inservice PL activities (Basic Principles & Processes, History of the Earth & it's Life-Forms, and Astronomy). Intensive PL opportunities sustained over a longer period of time, focused on specific curriculum content have been found to positively impact student learning (Darling-Hammond et al., 2009). Programs that provide ongoing support for participants, include professionals in the field and provide opportunities for leadership development improve development of ESS CK by introducing content while modeling instructional best practices (Ellins et al., 2013). It is recommended that PL opportunities for ESS teachers move beyond increasing and updating science CK and that developers incorporate a broader view, inclusive of sociocultural or social constructivist perspective in learning models (Howe & Stubbs, 2003). In our between-group comparisons, White examinees consistently lost fewest scaled points when compared with their Black counterparts at the category-level. This invites further investigation about prior access to opportunities to engage with geoscience curriculum in order to gain deeper understanding of potential inequalities in access or stereotype threat (Quinn, 2020).

Professional learning
In striving to improve the CK of pre-and inservice ESS teachers, it is critical to meet them where their needs are. We encourage school building and district leaders to leverage our findings presented in Table 1 as they evaluate existing curricula in order to plan PL around those topics such as History of Earth & its Life-forms and Astronomy identified as in need of improvement. We recommend allocating time within school schedules to create job-embedded opportunities to provide PL and coaching. These learning opportunities should be relevant and accessible (Jolley et al., 2022) and include leveraging experienced science teachers through development of leadership opportunities that include PL focused on ESS CK as well as science leadership development. Engaging localized communities of practice during the school day strengthens instruction without teachers having to leave their own schools and classrooms (Darling-Hammond et al., 2017;Howe & Stubbs, 2003;Hubers et al., 2022). Within the communities of practice educators are encouraged to create and share resources that support development of instructional resources for topics such as History of Earth & its Life-Forms and Astronomy where examinees demonstrated lower proficiency on the assessment. We further recommend that teachers participate in online discussion groups that focus on ESS pedagogy. These platforms offer the flexibility of asynchronous participation while providing a sense of community and shared resources (Riding, 2001). Participation in ESS focused communities of practice are driven by individual learning, supportive colleagues, and group accomplishments. The collegial sense of growth and accomplishment results in a reciprocal benefits loop through which ESS teachers collectively grow in their practice (Kastens & Manduca, 2017).

Recruitment
In conjunction with our recommendations about content specific PL, we emphasize a greater need to diversify representation among the ESS teacher population in general. Recruitment efforts at the undergraduate level are crucial to promoting teaching as a career option for science majors. Giving teachers the opportunity to identify the intersection between their overall PL goals and content areas in need of improvement identified in this study will help target groups such as those teaching without a geoscience background to reconceptualize their work and strengthen development of science concepts. In order to connect identities with science, it is critical to engage students at an early age. Racial inequities related to science education impact teaching practices and must be addressed in science teacher preparation programs. Partnerships between teacher preparation programs and local school districts that include microteaching opportunities for candidates of color help to broaden instructional experiences while diversifying classroom settings (Mensah & Jackson, 2018).

Retention
We found that teachers with three or more years of teaching experience lost fewer scaled points than those planning or newly entering the field. Partnering students with strong mentors in the classroom positively influences development of preservice teachers as they develop self-efficacy through practical classroom experiences (Pfitzner-Eden, 2016;van Rooij et al., 2019). We call upon the geoscience community to collaborate with teacher preparation programs within their institutions in order to foster and strengthen development of opportunities for geoscience undergraduate students to experience teaching as a career option. Early interventions such as academic support during introductory science courses help to reduce disparities in outcomes and support retention in the STEM fields. These include out of course interventions such as mentoring and tutoring as well as instructional shifts within the courses themselves. (Mensah & Jackson, 2018;Theobald et al., 2020). Active learning and inquiry-based instruction have also been found to benefit students and improve outcomes in STEM courses (Theobald et al., 2020).
Teachers are central among factors that influence student learning (Yang et al., 2020). Because of this, recruitment efforts must be integrated into the educational process at the precollege level and include local school districts, science teacher preparation programs, and college science programs (Luft et al., 2011). Ultimately teachers must take ownership over their learning once they enter the classroom as they are the primary agents of educational change. Having access to their assessment data itemized by category will better help them to make informed decisions about where to invest their time and resources to further their development of ESS CK and thus PCK.