Assessment is a key element in the process of science education teaching and research. Understanding sources of performance bias in science assessment is a major challenge for science education reforms. Prior research has documented several limitations of instrument types on the measurement of students' scientific knowledge (Liu et al., 2011; Messick, 1995; Popham, 2010). Furthermore, a large body of work has been devoted to reducing assessment biases that distort inferences about students' science understanding, particularly in multiple-choice [MC] instruments. Despite the above documented biases, much has yet to be determined for constructed response [CR] assessments in biology and their use for evaluating students' conceptual understanding of scientific practices (such as explanation). Understanding differences in science achievement provides important insights into whether science curricula and/or assessments are valid representations of student abilities.
Using the integrative framework put forth by the National Research Council (2012), this dissertation aimed to explore whether assessment biases occur for assessment practices intended to measure students' conceptual understanding and proficiency in scientific practices. Using a large corpus of undergraduate biology students' explanations, three studies were conducted to examine whether known biases of MC instruments were also apparent in a CR instrument designed to assess students' explanatory practice and understanding of evolutionary change (ACORNS: Assessment of COntextual Reasoning about Natural Selection).
The first study investigated the challenge of interpreting and scoring lexically ambiguous language in CR answers. The incorporation of 'multivalent' terms into scientific discourse practices often results in statements or explanations that are difficult to interpret and can produce faulty inferences about student knowledge. The results of this study indicate that many undergraduate biology majors frequently incorporate multivalent concepts into explanations of change, resulting in explanatory practices that were scientifically non-normative. However, use of follow-up question approaches was found to resolve this source of bias and thereby increase the validity of inferences about student understanding.
The second study focused on issues of item and instrument structure, specifically item feature effects and item position effects, which have been shown to influence measures of student performance across assessment tasks. Results indicated that, along the instrument item sequence, items with similar surface features produced greater sequencing effects than sequences of items with dissimilar surface features. This bias could be addressed by use of a counterbalanced design (i.e., Latin Square) at the population level of analysis. Explanation scores were also highly correlated with student verbosity, despite verbosity being an intrinsically trivial aspect of explanation quality. Attempting to standardize student response length was one proposed solution to the verbosity bias.
The third study explored gender differences in students' performance on constructed-response explanation tasks using impact (i.e., mean raw scores) and differential item function (i.e., item difficulties) patterns. While prior research in science education has suggested that females tend to perform better on constructed-response items, the results of this study revealed no overall differences in gender achievement. However, evaluation of specific item features patterns suggested that female respondents have a slight advantage on unfamiliar explanation tasks. That is, male students tended to incorporate fewer scientifically normative concepts (i.e., key concepts) than females for unfamiliar taxa. Conversely, females tended to incorporate more scientifically non-normative ideas (i.e., naive ideas) than males for familiar taxa. Together these results indicate that gender achievement differences for this CR instrument may be a result of differences in how males and females interpret and respond to combinations of item features.
Overall, the results presented in the subsequent chapters suggest that as science education shifts toward the evaluation of fused scientific knowledge and practice (e.g., explanation), it is essential that educators and researchers investigate potential sources of bias inherent to specific assessment practices. This dissertation revealed significant sources of CR assessment bias, and provided solutions to address these problems.
|Advisor:||Nehm, Ross H.|
|Commitee:||Ding, Lin, Haury, David|
|School:||The Ohio State University|
|Department:||EDU Teaching and Learning|
|School Location:||United States -- Ohio|
|Source:||DAI-A 77/03(E), Dissertation Abstracts International|
|Keywords:||Assessment, Evolution education, Explanation, Lexical ambiguity, Order effects, Performance expectations, Science education|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be