Using Pictures to Improve the Speaking Ability of the First Grade Students in SMKN 1 Kediri in Describing Things and People

Rabu, 05 Oktober 2011
Abstrak

Yerri Kardiana. 2011. Using Pictures to Improve the Speaking Ability of the First Grade Students in SMKN 1 Kediri in Describing Things and People. Thesis. Graduate Program in English Language Education. State University of Malang. Advisors: (I) Dr. Enny Irawati, M. Pd. (II) Drs. Fachrrurazy, M. A., Ph. D.



Key words: using pictures, describing things and people, improve, speaking ability.

This research employed a classroom-action-research. Considering the fact that the students' speaking ability is still poor, the present researcher plans to promote a new technique that will improve the students' speaking ability. In addition, this technique should be interesting so that the students are motivated to engage communication in speaking activities based on the task independently. In this case, a picture is a suitable technique that can be used to overcome the existing problem. The preliminary study to find the speaking problems have been carried out. Classroom action-research design was used to investigate how the students' speaking ability can be improved using Pictures in Describing Things and People. The research problem is: How can the speaking ability of the first grade students of SMKN 1 Kediri be improves by using Pictures in Describing Things and People?" The procedure of the research consisted of the four main steps i.e. planning, implementing, observing and reflecting. This research was conducted in two cycles. Each cycle comprised of three meetings. The instruments for collecting the data were observation checklist, field notes, and speaking assessment.

The finding of the research indicated that Using Pictures in Describing Things and People was successful in improving the students' speaking ability. The improvement could be seen from the increase of students' speaking assessment from pre test to cycle one and cycle two. In cycle one, 60.71% of the students got poor score, 21.43% of the students got very poor score , 40.62% of the students got average score, and 3.12% of the students got good score. In cycle two, 50% of the students got average score, 28.12% of the students got good score and 6.25% of the students got very good score. Besides, the finding showed that using pictures in describing Things and People was effective in improving the students' involvement in the teaching and learning process. Implementing pictures in describing Things and People for teaching speaking involved the following steps: (1) showing a set of pictures to the students, (2) describing the things and people based on the pictures, (3) asking questions to the students to check the students' understanding about the story, (4) describing the pictures once more to make clear what the pictures are about while writing the description of things and people on the blackboard to become a model description for the students, (5) dividing the students into five groups, (6) handing out six sets of pictures to each group of the students, (7) giving the students guided vocabulary related to the pictures, (8) practicing the pronunciation of the vocabulary, (9) asking the students to make brief notes of their descriptions based on the pictures, (10) asking the students to describe the things and people based on pictures, (11) correcting the students' pronunciation and grammar usage, and (12) asking the students to describe people with the real pictures (famous people).

Based on the findings, it can be concluded that using pictures in describing things and people can be used to improve both the students' speaking ability and students' involvement in the teaching and learning process. Therefore, it is suggested to the English teachers to implement it as an alternative technique in their English class particularly in the speaking class. For the principal, it is suggested to provide facilities to improve the English teachers' teaching quality by making a policy of cooperating with some experts to hold an in-service training about teaching methods for English teachers. For other researchers, it is suggested to conduct other researches by implementing the pictures in describing things and people in other school levels and for other language skills such as listening and writing.

Speaking Assessment Grade Sheet

the sheet for assessment is commonly used by Abu Dhabi IELTS
find the full version

CONSISTENCY IN CLASSROOM ASSESSMENT

Selasa, 04 Oktober 2011
find this full article

The Ontario Ministry of Education views the consistent application of classroom assessment practices as being of critical importance to fostering student success in Ontario schools. This manual, “Consistency in Classroom Assessment – Support Materials for Educators”, has been developed with the purpose of providing suggestions to maintain and further improve consistency in the classroom assessment of students.
The support materials reflected in this manual are intended for the use of a broad cross-section of Ontario educators. Furthermore these support materials have been developed using sound and relevant research and are informed by feedback from educators and students gathered in a series of focus groups held across the province.
This manual contains:
• a summary of relevant research on the topic of consistency in student assessment
• suggestions for teachers on improving consistency in classroom assessment
• suggestions for principals on improving assessment consistency among teachers
• suggestions for supervisory officers on improving consistency among schools
• suggestions to improve consistency of assessment in the final year of secondary
school
The manual is not a review of specific practices and suggestions that are found in companion documents such as “Policy to Practice”. Instead, this manual is designed to be an “active document” which will be used frequently to provide teachers, principals and supervisoryoffice rs with practical, relevant suggestions in promoting student success by the use and application of consistent and fair assessment practices.
Finally, this manual is intended to engage educators in a meaningful and healthy conversation on the importance of consistent classroom assessment and how as an Ontario educational community, we can further enrich the classroom experience for our students.

An Introduction to Classroom Assessment Techniques

by Diane M. Enerson, Kathryn M. Plank, and R. Neill Johnson

find this article here

Background knowledge probes can be used at the beginning of a course, at the start of a new unit or lesson, or prior to introducing an important new topic. Once collected and analyzed, the data can be extremely useful when planning subsequent sessions or units of the course. Although many classroom assessment activities can be done for credit, it is usually best to make these probes an ungraded activity.
Discovering that your students' background and preparation are at odds with your expectations can throw even the best-planned lesson or syllabus off-track. However, knowing is certainly better than not knowing. At the very least, such data help you guide students to the appropriate resources for any supplementary assistance they may need.

Developing Classroom Performance Assessments and Scoring Rubrics - Part II. ERIC Digest

by Moskal, Barbara M

A difficulty that is faced in the use of performance assessments is determining how the students' responses will be scored. Scoring rubrics provide one mechanism for scoring student responses to a variety of different types of performance assessments. This two-part Digest draws from the current literature and the author's experience to identify suggestions for developing performance assessments and their accompanying scoring rubrics.

This Digest addresses 1) Developing Scoring Rubrics, 2) Administering Performance Assessments and 3) Scoring, Interpreting and Using Results. Another Digest addresses Writing Goals and Objectives, and Developing Performance Assessments. These categories guide the reader through the four phases of the classroom
assessment process planning, gathering, interpreting and using (Moskal, 2000a). The
current article assumes that the reader has a basic knowledge of both performance
assessments and scoring rubrics.

DEVELOPING SCORING RUBRICS

Scoring rubrics are one method that may be used to evaluatestudents' responses to
performance assessments. Two typesof performance assessments are frequently
discussed in theliterature: analytic and holistic. Analytic scoring rubrics dividea
performance into separate facets and each facet is evaluatedusing a separate scale.
Holistic scoring rubrics use a single scaleto evaluate the larger process. In holistic
scoring rubrics, all ofthe facets that make-up the task are evaluated in combination.The recommendations that follow are appropriate to bothanalytic and holistic scoring rubrics.

Recommendations for developing scoring rubrics:

1. The criteria set forth within a scoring rubric should beclearly aligned with the
requirements of the task and the statedgoals and objectives. As was discussed earlier, a list can becompiled that describes how the elements of the task map intothe goals and objectives. This list can be extended to include how the criteria that is both analytic and holistic, is immediately available through this journal. Mertler (2001) and

Moskal (2000b) have both described the differences between analytic and holistic scoring rubrics and how to develop each type of rubric. Books have also been written or compiled (e.g., Arter & McTighe, 2001; Boston, 2002) that provide detailed examinations of the rubric development process and the different types of scoring rubrics.

ADMINISTERING PERFORMANCE ASSESSMENTS

Once a performance assessment and its accompanying scoring rubric are developed, it is time to administer the assessment to students. The recommendations that follow are specifically developed to guide the administration process.

Recommendations for administering performance assessments:

1. Both written and oral explanations of tasks should be clear and concise and
presented in language that the students understand. If the task is presented in written
form, then the reading level of the students should be given careful consideration.
Students should be given the opportunity to ask clarification questions before
completing the task.

2. Appropriate tools need to be available to support the completion of the assessment activity. Depending on the activity, students may need access to library resources, computer programs, laboratories, calculators, or other tools. Before the task is administered, the teacher should determine what tools will be needed and ensure that these tools are available during the task administration.

3. Scoring rubrics should be discussed with the students before they complete the
assessment activity. This allows the students to adjust their efforts in a manner that
maximizes their performance. Teachers are often concerned that by giving the students the criteria in advance, all of the students will perform at the top level. In practice, this rarely (if ever) occurs.

The first two recommendations provided above are appropriate well beyond the use of performance assessments and scoring rubrics. These recommendations are consistent with the Standards of the American Educational Research Association, American Psychological Association & National Council on Measurement in Education (1999) with respect to assessment and evaluation. The final recommendation is consistent with prior articles that concern the development of scoring rubrics (Brualdi, 1998; Moskal & Leydens, 2000).

SCORING, INTERPRETING AND USING RESULTS

As was discussed earlier, a scoring rubric may be used to score student responses to performance assessments. This section provides recommendations for scoring, interpreting and using the results of performance assessments.

Recommendations for scoring, interpreting and using results of performance
assessments:

1. Two independent raters should be able to acquire consistent scores using the
categories described in the scoring rubric. If the categories of the scoring rubric are
written clearly and concisely, then two raters should be able to score the same set of
papers and acquire similar results.

2. A given rater should be able to acquire consistent scores across time using the
scoring rubric. Knowledge of who a student is or the mood of a rater on a given day may impact the scoring process. Raters should frequently refer to the scoring rubric to ensure that they are not informally changing the criteria over time.

3. A set of anchor papers should be used to assist raters in the scoring process. Anchor papers are student papers that have been selected as examples of performances at the different levels of the scoring rubric. These papers provide a comparison set for raters as they score the student responses. Raters should frequently refer to these papers to ensure the consistency of scoring over time.

4. A set of anchor papers with students' names removed can be used to illustrate to
both students and parents the different levels of the scoring rubric. Ambiguities within the rubric can often be clarified through the use of examples. Anchor papers with students names removed can be used to clarify to both students and parents the expectations set forth through the scoring rubric.

5. The connection between the score or grade and the scoring rubric should be
immediately apparent. If an analytic rubric is used, then the report should contain the
scores for each analytic level. If a summary score or grade is provided, than an
explanation should be included as to how the summary score or grade was determined. Both students and parents should be able to understand how the final grade or score is linked to the scoring criteria.

6. The results of the performance assessment should be used to improve instruction
and the assessment process. What did the teacher learn from the student responses?

How can this be used to improve future classroom instruction? What did the teacher
learn about the performance assessment or the scoring rubric? How can these
instruments be improved for future instruction? The information that is acquired through classroom assessment should be actively used to improve future instruction and assessment.

The first three recommendations concern the important concept of "rater reliability" or the consistency between scores. Moskal and Leydens (2000) examine the concept of rater reliability in an article that was previously published in this journal. A more comprehensive source that addresses both validity and reliability of scoring rubrics is a book by Arter and McTighe (2001), Scoring Rubrics in the Classroom: Using Performance Criteria for Assessing and Improving Student Performance. The American Educational Research Association, American Psychological Association and National Council of Measurement in Education (1999) also address these issues in their Standards document. For information concerning methods for converting rubric scores to grades, see "Converting Rubric Scores to Letter Grades" (Northwest Regional Educational Laboratory, 2001).

CONCLUSIONS

The purpose of this article is to provide a set of recommendations for the development of performance assessments and scoring rubrics. These recommendations can be used to guide a teacher through the four phases of classroom assessment, planning, gathering, interpreting and using. Extensive literature is available on each phase of the assessment process and this article addresses only a small sample of that work. The reader is encouraged to use the previously cited work as a starting place to better understand the use of performance assessments and scoring rubrics in the classroom.

ACKNOWLEDGMENTS

This article was originally developed as part of a National Science Foundation (NSF) grant (EEC 0230702), Engineering Our World. The opinions and ideas expressed in this article are that of the author and not of the NSF.

ERIC Identifier: ED481715
Publication Date: 2003-06-00
Author: Moskal, Barbara M
Source: ERIC Clearinghouse on Assessment and Evaluation


REFERENCES

Boston, C. (Eds.). (2002). Understanding Scoring Rubrics. University of Maryland, MD: ERIC Clearinghouse on Assessment and Evaluation.

Brualdi, A. (1998). "Implementing performance assessment in the classroom." Practical Assessment, Research & Evaluation, 6(2) [On-line]. Available:
http://ericae.net/pare/getvn.asp?v=6&n=2.

Mertler, C. A. (2001). "Designing scoring rubrics for your classroom." Practical
Assessment, Research & Evaluation, 7(25). Available online:
http://ericae.net/pare/getvn.asp?v=7&n=25.

Moskal, B. (2000a) "An Assessment Model for the Mathematics Classroom."
Mathematics Teaching in the Middle School, 6 (3), 192-194.

Moskal, B. (2000b). "Scoring Rubrics: What, When and How?" Practical Assessment, Research & Evaluation, 7(3) [On-line]. Available:
http://ericae.net/pare/getvn.asp?v=7&n=3.

Northwest Regional Educational Laboratory (2002). "Converting Rubric Scores to Letter Grades." In C. Boston's (Eds.), Understanding Scoring Rubrics (pp. 34-40). University of Maryland, MD: ERIC Clearinghouse on Assessment and Evaluation.

Perlman, C. (2002). "An Introduction to Performance Assessment Scoring Rubrics". In C. Boston's (Eds.), Understanding Scoring Rubrics (pp. 5-13). University of Maryland, MD: ERIC Clearinghouse on Assessment and Evaluation.

Rogers, G. & Sando, J. (1996). Stepping Ahead: An Assessment Plan Development Guide. Terra Haute, Indiana: Rose-Hulman Institute of Technology.

Wiggins, G. (1990). "The case for authentic assessment." Practical Assessment,
Research & Evaluation, 2(2). Available online:
http://ericae.net/pare/getvn.asp?v=2&n=2.

Wiggins, G. (1993). Assessing Student Performances. San Francisco: Jossey-Bass
Publishers.

Developing Classroom Performance Assessments and Scoring Rubrics - Part I. ERIC Digest.

by Moskal, Barbara M

A difficulty that is faced in the use of performance assessments is determining how the students' responses will be scored. Scoring rubrics provide one mechanism for scoring student responses to a variety of different types of performance assessments. This two-part Digest draws from the current literature and the author's experience to identify suggestions for developing performance assessments and their accompanying scoring rubrics.

The suggestions are divided into five categories:

1) Writing Goals and Objectives,
2) Developing Performance Assessments,
3) Developing Scoring Rubrics,
4) Administering Performance Assessments and
5) Scoring, Interpreting and Using Results.

"This Digest addresses the first two categories. Another Digest addresses the last
three."

These categories guide the reader through the four phases of the classroom
assessment process planning, gathering, interpreting and using (Moskal, 2000a). The
list of suggestions provided throughout this paper are specific to formal assessment
activities as opposed to informal assessment activities (Stiggins, 1994). Formal
assessment activities refer to activities in which the students are aware that they are
being evaluated; Informal assessment activities refer to activities in which the students are not aware that they are being evaluated (Stiggins, 1994). Although some of these suggestions are appropriate for informal assessments, the primary focus of this paper is upon formal assessment activities.

The current article assumes that the reader has a basic knowledge of both performance assessments and scoring rubrics. If these assumptions are incorrect, the reader may wish to review prior articles on performance assessments and scoring rubrics before reading this article. Brualdi 's article (1998), "Implementing performance assessment in the classroom", provides an introduction to performance assessments and how they may be used in the classroom. Moskal (2000b) discusses the basics of scoring rubric development in her article, "Scoring Rubrics: What, When and How?" In the article "Designing scoring rubrics for your classroom," Mertler (2001) outlines how to develop and implement scoring rubrics in the classroom.

WRITING GOALS AND OBJECTIVES

Before a performance assessment or a scoring rubric is written or selected, the teacher should clearly identify the purpose of the activity. As is the case with any assessment, a clear statement of goals and objectives should be written to guide the development of both the performance assessment and the scoring rubric. "Goals" are broad statements of expected student outcomes and "objectives" divide the goals into observable behaviors (Rogers & Sando, 1996). Questions such as, "What do I hope to learn about my students' knowledge or skills?," "What content, skills and knowledge should the activity be designed to assess?," and "What evidence do I need to evaluate the appropriate skills and knowledge?", can help in the identification of specific goals and objectives.

Recommendations for writing goals and objectives:

1. The statement of goals and accompanying objectives should provide a clear focus for both instruction and assessment. Another manner in which to phrase this
recommendation is that the stated goals and objectives for the performance
assessment should be clearly aligned with the goals and objectives of instruction.
Ideally, a statement of goals and objectives is developed prior to the instructional
activity and is used to guide both instruction and assessment.

2. Both goals and objectives should reflect knowledge and information that is worthwhile for students to learn. Both the instruction and the assessment of student learning are intentional acts and should be guided through planning. Goals and objectives provide a framework for the development of this plan. Given the critical relationship between goals and objectives and instruction and assessment, goals and objectives should reflect important learning outcomes.

3. The relationship between a given goal and the objectives that describe that goal
should be apparent. Objectives lay the framework upon which a given goal is evaluated.

Therefore, there should be a clear link between the statement of the goal and the
objectives that define that goal.

4. All of the important aspects of the given goal should be reflected through the
objectives. Once again, goals and objectives provide a framework for evaluating the
attainment of a given goal. Therefore, the accompanying set of objectives should reflect the important aspects of the goal.

5. Objectives should describe measurable student outcomes. Since objectives provide the framework for evaluation, they need to be phrased in a manner that specifies the student behavior that will demonstrate the attainment of the larger goal.

6. Goals and objectives should be used to guide the selection of an appropriate
assessment activity. When the goals and objectives are focused upon the recall of
factual knowledge, a multiple choice or short response assessment may be more
appropriate and efficient than a performance assessment. When the goals and
objectives are focused upon complex learning outcomes, such as reasoning,
communication, teamwork, etc., a performance assessment is likely to be appropriate (Perlman, 2002).

Writing goals and objectives, at first, appears to be a simple. After all, this process
primarily requires clearly defining the desired student outcomes. Many teachers initially have difficulty creating goals and objectives that can be used to guide instruction and that can be measured. An excellent resource that specifically focuses upon the "how to" of writing measurable objectives is a book by Gronlund (2000). Other authors have also addressed these issues in subsections of l arger works (e.g., Airasian, 2000; 2001; Oosterhoff, 1999).

DEVELOPING PERFORMANCE ASSESSMENT

As the term suggests, performance assessments require a demonstration of students'
skills or knowledge (Airasian, 2000; 2001; Brualdi, 1998; Perlman, 2002).

Performance assessments can take on many different forms, which include written and oral demonstrations and activities that can be completed by either a group or an individual.

A factor that distinguishes performance assessments from other extended response
activities is that they require students to demonstrate the application of knowledge to a particular context (Brualdi, 1998; Wiggins, 1993). Through observation or analysis of a student's response, the teacher can determine what the student knows, what the student does not know and what misconceptions the student holds with respect to the purpose of the assessment.

Recommendations for developing performance assessments:

1. The selected performance should reflect a valued activity. According to Wiggins
(1990), "The best tests always teach students and teachers alike the kind of work that most matters; they are enabling and forward-looking, not just reflective of prior teaching." He suggests the use of tasks that resemble the type of activities that are known to take place in the workforce (e.g., project reports and presentations, writing legal briefs, collecting, analyzing and using data to make and justify decisions). In other words, performance assessments allow students the opportunity to display their skills and knowledge in response to "real" situations (Airasian, 2000; 2001; Wiggins, 1993).

2. The completion of performance assessments should provide a valuable learning
experience. Performance assessments require more time to administer than do other
forms of assessment. The investment of this classroom time should result in a higher
payoff. This payoff should include both an increase in the teacher's understanding of
what students know and can do and an increase in the students' knowledge of the
intended content and constructs.

3. The statement of goals and objectives should be clearly aligned with the measurable outcomes of the performance activity. Once the task has been selected, a list can be made of how the elements of the task map into the desired goals and objectives. If it is not apparent as to how the students' performance will be mapped into the desired goals and objectives, then adjustments may need to be made to the task or a new task may need to be selected.

4. The task should not examine extraneous or unintended variables. Examine the task and think about whether there are elements of the task that do not map directly into the goals and objectives. Is knowledge required in the completion of the task that is
inconsistent with the purpose? Will lack of this knowledge interfere or prevent the
students from completing the task for reasons that are not consistent with the task's
purpose? If such factors exist, changes may need to be made to the task or a new task may need to be selected.

5. Performance assessments should be fair and free from bias. The phrasing of the task should be carefully constructed in a manner that eliminates gender and ethnic
stereotypes. Additionally, the task should not give an unfair advantage to a particular
subset of students. For example, a task that is heavily weighted with baseball statistics may give an unfair advantage to the students that are baseball enthusiasts.

The recommendations provided above have been drawn from the broader literary base concerning the construction of performance assessments. The interested reader can acquire further details concerning the development process by consulting other articles that are available through this journal (i.e., Brualdi, 1998; Roeber, 1996; Wiggins, 1990) or books (e.g., Wiggins, 1993; 1998) that address this subject.


ERIC Identifier: ED481714
Publication Date: 2003-06-00
Author: Moskal, Barbara M
Source: ERIC Clearinghouse on Assessment and Evaluation


REFERENCES

Boston, C. (Eds.). (2002). Understanding Scoring Rubrics. University of Maryland, MD: ERIC Clearinghouse on Assessment and Evaluation.

Brualdi, A. (1998). "Implementing performance assessment in the classroom." Practical Assessment, Research & Evaluation, 6(2) [On-line]. Available:
http://ericae.net/pare/getvn.asp?v=6&n=2.

Mertler, C. A. (2001). "Designing scoring rubrics for your classroom." Practical
Assessment, Research & Evaluation, 7(25). Available online:
http://ericae.net/pare/getvn.asp?v=7&n=25.

Moskal, B. (2000a) "An Assessment Model for the Mathematics Classroom."
Mathematics Teaching in the Middle School, 6 (3), 192-194.

Moskal, B. (2000b). "Scoring Rubrics: What, When and How?" Practical Assessment, Research & Evaluation, 7(3) [On-line]. Available:
http://ericae.net/pare/getvn.asp?v=7&n=3.

Northwest Regional Educational Laboratory (2002). "Converting Rubric Scores to Letter Grades." In C. Boston's (Eds.), Understanding Scoring Rubrics (pp. 34-40). University of Maryland, MD: ERIC Clearinghouse on Assessment and Evaluation.

Perlman, C. (2002). "An Introduction to Performance Assessment Scoring Rubrics". In C. Boston's (Eds.), Understanding Scoring Rubrics (pp. 5-13). University of Maryland, MD: ERIC Clearinghouse on Assessment and Evaluation.

Rogers, G. & Sando, J. (1996). Stepping Ahead: An Assessment Plan Development Guide. Terra Haute, Indiana: Rose-Hulman Institute of Technology.

Rudner, L.M. & Schafer, W.D. (Eds.). (2002). What Teachers Need to Know about Assessment. Washington, DC: National Education Association.

Stiggins, R. (1994). Student-Centered Classroom Assessment. New York: Macmillan Publishing Company.

Wiggins, G. (1990). "The case for authentic assessment." Practical Assessment,
Research & Evaluation, 2(2). Available online:
http://ericae.net/pare/getvn.asp?v=2&n=2.

Wiggins, G. (1993). Assessing Student Performances. San Francisco: Jossey-Bass
Publishers.

Cognitive Science and Assessment. ERIC Digest.

by Boston, Carol

Cognitive science is devoted to the study of how people think and learn and how, when, and whether they use what they know to solve problems (Greeno, Collins, & Resnick, 1997; National Research Council, 2001). The cognitive perspective in education encompasses how learners develop and structure their knowledge in specific subject areas and how assessment tasks might be designed to enable students to demonstrate the knowledge and cognitive processes necessary to be judged proficient in these subject areas. This Digest provides educators with an overview of some important facets of cognitive science research and suggests implications for classroom assessment.

HOW DO EXPERTS AND NOVICES DIFFER IN THEIR APPROACH TO PROBLEMS?

Education researchers study the thinking of experts in various subject areas to gain an understanding of what concepts and procedures are most important to teach and how they are interrelated. The concept is that educators can and should be moving students along a continuum toward real-world subject mastery based on a deep understanding of how subject knowledge is organized (Bereiter & Scardamalia, 1986).

When faced with a problem, learners tend to search their memories for a schema, or learned technique for organizing and interpreting information in a certain subject, in
order to solve it (Rumelhart, 1980). Over time, individuals build mental models to guide their problem solving efficiently so they do not depend on trial-and-error approaches and can instead create analogies and make inferences to support new learning (Glaser & Baxter, 1999).

When compared with novice learners, experts in a subject are notable for how
well-organized their knowledge is, which in turn enables them to see patterns quickly, recall information, and study novel problems in light of concepts and principles they know already (Glaser & Chi, 1988). In other words, their schemas are well-connected and they are able to retrieve chunks of information relevant to a task at hand. Experts also have strong problem-solving skills. They know what they know and what they don'tknow, and plan and monitor the implementation of various mental strategies (Hatano, 1990).

COGNITIVE SCIENCE IN THE CLASSROOM

Ideally, developmental models of learning could be created that note the typical
progression and milestones as a learner advances from novice to competent to expert and describe the types of experiences that lead to change. For example, students generally have naive or intuitive understandings of the sciences, based in part on misconceptions that are corrected as they are exposed to new learning (e.g., Gabel, 1994, Feldman & Minstrell, 2000). And while there are individual differences among learners, when large samples are studied, patterns tend to emerge, particularly related to erroneous beliefs and incorrect procedures. For example, there appear to be a certain limited number of "subtraction bugs" that account for almost all of the ways young children make mistakes when learning to subtract two- or three-digit numbers, and these are constant even across languages (Brown and Burton, 1978).

Allowing for variations among learners, it is possible to discover the most common
pathways toward acquiring knowledge and use this information diagnostically. For
example, Case, Griffin, and colleagues have developed an assessment tool based on their empirical research regarding how children from ages 4 to 10 change in their
conception of numbers through growth and practice. While 4-year-olds can count
groups of objects, they have to guess if they face a theoretical question such as, "Which is more--four or five?" Between 4 and 6, most children develop a "mental number line" that helps them envision the answer to such a question, even when actual objects aren't present. Between 6 and 8, children gradually come to envision other number lines for counting by 2s, 5s, 10s, and 100s. By 10, many children have a better understanding of the base-10 number system, which enables them to reach a more sophisticated understanding of concepts such as regrouping and estimation (Case, 1996; Griffin and Case, 1997). Teachers can use assessments based on this research to determine their next steps in arithmetic instruction.

More research has been done about domain structure in some disciplines than in
others. Mathematics, physics, beginning reading, and U.S. history are among the areas that have been studied (see, for example, Niemi, 1996, and Wineburg, 1996).
Subject-area standards such as the National Council of Teachers of Mathematics
Standards generally reflect current thinking on cognitive processes and are a good
place for teachers to begin their explorations of this topic. The National Research
Council's How People Learn: Brain, Mind, Experience, and School
(http://stills.nap.edu/html/howpeople1/) provides another helpful introduction.

HOW DO LEARNERS STORE AND ACCESS KNOWLEDGE?

Memory may be divided into two types: short-term, or working memory, which
determines how much mental processing can go on at any one time, and long-term
memory, where people organize their content knowledge. Short-term memory, or working memory, is connected with fluid intelligence, or the ability to solve new and
unusual problems, while long-term memory is connected to crystallized intelligence, or the bringing of past experience to bear on current problems (Anderson, Greeno, Reder, and Simon, 2000). When students are learning a new skill, they must rely heavily on their working memory to represent the task and may need to talk themselves through a task. As the skill moves into long-term memory, it becomes fluent, and eventually, automatic (Anderson, 1982).

To support the learning process, students can be taught meta-cognitive skills, or
techniques to reflect on and assess their own thinking. To improve reading
comprehension, for example, young children can be taught to monitor their
understanding of passages by asking questions, summarizing, clarifying any
uncertainties, and predicting next events (Palincsar & Brown, 1984).

HOW CAN ASSESSMENT DESIGNERS USE FINDINGS FROM COGNITIVE SCIENCE?

The design of any assessment should begin with a statement of purpose for the
assessment and a definition of the particular subject area or content domain. How do people demonstrate knowledge and become competent in this domain? What important aspects of learning do we want to draw inferences from when measuring student achievement in a given subject area? What situations and tasks can we observe to make the appropriate inferences?

Cognitive science calls for test developers to:

* Work from a deep knowledge of the central concepts and principles of a given subject area, and the most important related information.

* Identify or develop those tasks that allow students to demonstrate their
understanding and skills in these areas, as opposed to rote memorization.

* Make sure tasks or questions are sufficiently complex to get at how students have organized their knowledge and how and when they use it.

* Emphasize the contents of long-term memory rather than short-term, or working, memory by not burdening test-takers withrequirements to track a large number of response options or major quantities of extraneous information while answering a question.

* Emphasize relevant constructs--for example, a mathematics assessment should not over-emphasize reading and writing, unless communicating about mathematics is the skill to be measured.

* Not limit choice of item format. Both multiple-choice and
performance-based assessments have the potential to be effective or ineffective.
Carefully constructed multiple-choice questions can tap complex cognitive processes, not just lower level skills, as traditionally believed. And performance assessments, though generally praised for capturing higher level skills, may inadvertently focus on lower level skills (Baxter & Glaser, 1998; Hamilton, Nussbaum, and Snow, 1997; Linn, Baker, & Dunbar, 1991).

* Regard task difficulty in terms of underlying knowledge of cognitive processes required, rather than statistical information such as how many respondents answered correctly.

At the classroom assessment level, cognitive science findings encourage teachers to:

* Teach learners how and when to apply various approaches and procedures.

* Teach meta-cognitive skills within content areas so learners become capable of directing their thinking and reflecting on their progress.

* Observe students as they solve problems.

* Have students think aloud as they work or describe the reasoning that leads them to a particular solution.

* Analyze student errors on assignments or tests to determine which students got a question or problem wrong and why it appeared difficult for them. Knowing the source of difficulty can lead to more targeted, effective remediation.

Teachers should also be aware that acquiring important knowledge and skills at an
in-depth level takes a significant amount of time, practice, and feedback.


ERIC Identifier: ED481716
Publication Date: 2003
Author: Boston, Carol
Source: ERIC Clearinghouse on Assessment and Evaluation


REFERENCES


Anderson, J. (1982). Acquisition of cognitive skill. Psychological Review, 89, 369-406.

Anderson, J., Greeno, J., Reder, L., and Simon, H.A. (2000). Perspectives on learning, thinking, and activity. Educational Researcher, 229 (4): 11-13.

Baxter, G. and Glaser, R. (1998). Investigating the cognitive complexity of science
assessments. Educational Measurement: Issues and Practices, 17 (3): 37-45.

Bereiter, C. & Scardamalia, M.(1986). Educational relevance in the study of expertise. Interchange, 17 (2): 10-19.

Brown, J.S. and Burton, R.R. (1978). Diagnostic models for procedural bugs in basic
mathematical skills. Cognitive Science, 2, 155-192.

Case, R. (1996). Introduction - Reconceptualizing the development of children's
conceptual structures and their development in middle childhood. Monographs of the Society for Research in Child Development, 61 (1-2): 1-26.

Feldman, A., & Minstrell, J. (2000). Action research as a research methodology for the study of the teaching and learning of science. In E. Kelly & R. Lesh (Eds.), Handbook of Research Design in Mathematics and Science Education. Mahwah, NJ: Erlbaum.

Gabel, D., ed. (1994). Handbook of Research on Science Teaching and Learning. New York: Macmillan.

Glaser, R. and Baxter, G. (1999). Assessing active knowledge. Paper presented at the 1999 CRESST Conference, Benchmarks for Accountability: Are We There Yet? UCLA, Los Angeles.

Glaser, R. and Chi, M. (1988). Overview in M. Chi, R. Glaser, & M. Farr (Eds.), The Nature of Expertise (pp. xv-xxvii). Hillsdale, NJ: Erlbaum.

Greeno, J.G., Collins, A.M., & Resnick, L.B. (1997). Cognition and learning. In D.

Berliner & R. Calfee (Eds.), Handbook of Educational Psychology (pp. 15-47). New York: Simon & Schuster Macmillan.

Griffin, S., and Case, R. (1997). Re-thinking the primary school math curriculum: An approach based on cognitive science. Issues in Education, 3, 1-65.

Hamilton, L., Nussbaum, E., & Snow, R. (1997). Interview procedures for validating science assessments. Applied Measurement in Education, 10, 181-200.

Hatano, G. (1990). The nature of everyday science: A brief introduction. British Journal of Developmental Psychology, 8, 245-250.

Linn, R., Baker, E., & Dunbar, S. (1991). Complex, performance-based assessment: Expectations and validation criteria. Educational Researcher, 20 (8):15-21.

National Research Council (2001). Knowing What Students Know: The Science and Design of Educational Assessment. Washington, DC: National Academy Press.

Niemi, D. (1996). Assessing conceptual understanding in mathematics:
Representations, problem solutions, justifications, and explanations. Journal of
Educational Research, 89, 351-363.

Palinscar, A. and Brown, A. (1984). Reciprocal teaching of comprehension-fostering and comprehension-monitoring activities. Cognition and Instruction, 1, 117-175.

Rumelhart, D. A. (1980). Schemata: The building blocks of cognition. In R. Spiro, B. Bruce, & W. Brewer (Eds). Theoretical Issues in Reading Comprehension (pp. 33-58). Hillsdale, NJ: Erlbaum.

Wineburg, S. S. (1996). The psychology of learning and teaching history. In D. Berliner & R. Calfee (Eds.), Handbook of Educational Psychology (pp. 423-437). New York: Simon & Schuster Macmillan.

Creating Meaningful Performance Assessments. ERIC Digest E531.

by Elliott, Stephen N


Performance assessment is a viable alternative to norm-referenced tests. Teachers can use performance assessment to obtain a much richer and more complete picture of what students know and are able to do.
DEFINING PERFORMANCE ASSESSMENT
Defined by the U.S. Congress, Office of Technology Assessment (OTA) (1992), as "testing methods that require students to create an answer or product that demonstrates their knowledge and skills," performance assessment can take many forms including:

*Conducting experiments.

*Writing extended essays.

*Doing mathematical computations.

Performance assessment is best understood as a continuum of assessment formats ranging from the simplest student-constructed responses to comprehensive demonstrations or collections of work over time. Whatever format, common features of performance assessment involve:

1. Students' construction rather than selection of a response.

2. Direct observation of student behavior on tasks resembling those commonly required for functioning in the world outside school.

3. Illumination of students' learning and thinking processes along with their answers (OTA, 1992).

Performance assessments measure what is taught in the curriculum. There are two terms that are core to depicting performance assessment:

1. Performance: A student's active generation of a response that is observable either directly or indirectly via a permanent product.

2. Authentic: The nature of the task and context in which the assessment occurs is relevant and represents "real world" problems or issues.
HOW DO YOU ADDRESS VALIDITY IN PERFORMANCE ASSESSMENTS?
The validity of an assessment depends on the degree to which the interpretations and uses of assessment results are supported by empirical evidence and logical analysis. According to Baker and her associates (1993), there are five internal characteristics that valid performance assessments should exhibit:

1. Have meaning for students and teachers and motivate high performance.

2. Require the demonstration of complex cognition, applicable to important problem areas.

3. Exemplify current standards of content or subject matter quality.

4. Minimize the effects of ancillary skills that are irrelevant to the focus of assessment.

5. Possess explicit standards for rating or judgment.

When considering the validity of a performance test, it is important to first consider how the test or instrument "behaves" given the content covered. Questions should be asked such as:

*How does this test relate to other measures of a similar construct?

*Can the measure predict future performances?

*Does the assessment adequately cover the content domain?

It is also important to review the intended effects of using the assessment instrument. Questions about the use of a test typically focus on the test's ability to reliably differentiate individuals into groups and guide the methods teachers use to teach the subject matter covered by the test.

A word of caution: Unintended uses of assessments can have precarious effects. To prevent the misuse of assessments, the following questions should be considered:

*Does use of the instrument result in discriminatory practices against various groups of individuals?

*Is it used to evaluate others (e.g., parents or teachers) who are not directly assessed by the test?
PROVIDING EVIDENCE FOR THE RELIABILITY AND VALIDITY OF PERFORMANCE ASSESSMENT
The technical qualities and scoring procedures of performance assessments must meet high standards for reliability and validity. To ensure that sufficient evidence exists for a measure, the following four issues should be addressed:

1. Assessment as a Curriculum Event. Externally mandated assessments that bear little, if any, resemblance to subject area domain and pedagogy cannot provide a valid or reliable indication of what a student knows and is able to do. The assessment should reflect what is taught and how it is taught.

Making an assessment a curriculum event means reconceptualizing it as a series of theoretically and practically coherent learning activities that are structured in such a way that they lead to a single predetermined end. When planning for assessment as a curriculum event, the following factors should be considered:

*The content of the instrument.

*The length of activities required to complete the assessment.

*The type of activities required to complete the assessment.

*The number of items in the assessment instrument.

*The scoring rubric.

2. Task Content Alignment with Curriculum. Content alignment between what is tested and what is taught is essential. What is taught should be linked to valued outcomes for students in the district.

3. Scoring and Subsequent Communications with Consumers. In large scale assessment systems, the scoring and interpretation of performance assessment instruments is akin to a criterion-referenced approach to testing. A student's performance is evaluated by a trained rater who compares the student's responses to multitrait descriptions of performances and then gives the student a single number corresponding to the description that best characterizes the performance. Students are compared directly to scoring criteria and only indirectly to each other.

In the classroom, every student needs feedback when the purpose of performance assessment is diagnosis and monitoring of student progress. Students can be shown how to assess their own performances when:

*The scoring criteria are well articulated.

*Teachers are comfortable with having students share in their own evaluation process.

4. Linking and Comparing Results Over Time. Linking is a generic term that includes a variety of approaches to making results of one assessment comparable to those of another. Two appropriate and manageable approaches to linking in performance assessment include:

*Statistical Moderation. This approach is used to compare performances across content areas for groups of students who have taken a test at the same point in time.

*Social Moderation. This is a judgmental approach that is built on consensus of raters. The comparability of scores assigned depends substantially on the development of consensus among professionals.
HOW CAN TEACHERS INFLUENCE STUDENTS' PERFORMANCES?
Performance assessment is a promising method that is achievable in the classroom. In classrooms, teachers can use data gathered from performance assessment to guide instruction. Performance assessment should interact with instruction that precedes and follows an assessment task.

When using performance assessments, students' performances can be positively influenced by:

1. Selecting assessment tasks that are clearly aligned or connected to what has been taught.

2. Sharing the scoring criteria for the assessment task with students prior to working on the task.

3. Providing students with clear statements of standards and/or several models of acceptable performances before they attempt a task.

4. Encouraging students to complete self-assessments of their performances.

5. Interpreting students' performances by comparing them to standards that are developmentally appropriate, as well as to other students' performances.

ERIC Identifier: ED381985
Publication Date: 1995-06-00
Author: Elliott, Stephen N.
Source: ERIC Clearinghouse on Disabilities and Gifted Education Reston VA.


REFERENCES
Baker, E. L., O'Neill, H. F., Jr., & Linn, R. L. (1993). Policy and validity prospects for performance-based assessments. American Psychologist, 48, 1210-1218.

U.S. Congress, Office of Technology Assessment. (1992, February). Testing in American schools: Asking the right questions. (OTA-SET-519). Washington, DC: U.S. Government Printing Office.

Derived from: Elliot, S. N. (1994). Creating Meaningful Performance Assessments: Fundamental Concepts. Reston, VA: The Council for Exceptional Children. Product #P5059.

Performance Assessment

Number 2

September 1993

by David Sweet

WHAT IS IT? Performance assessment, also known as alternative or authentic assessment, is a form of testing that requires students to perform a task rather than select an answer from a ready-made list. For example, a student may be asked to explain historical events, generate scientific hypotheses, solve math problems, converse in a foreign language, or conduct research on an assigned topic. Experienced raters--either teachers or other trained staff--then judge the quality of the student's work based on an agreed-upon set of criteria. This new form of assessment is most widely used to directly assess writing ability based on text produced by students under test instructions.

HOW DOES IT WORK? Following are some methods that have been used successfully to assess performance:

* Open-ended or extended response exercises are questions or other prompts that require students to explore a topic orally or in writing. Students might be asked to describe their observations from a science experiment, or present arguments an historic character would make concerning a particular proposition. For example, what would Abraham Lincoln argue about the causes of the Civil War?

* Extended tasks are assignments that require sustained attention in a single work area and are carried out over several hours or longer. Such tasks could include drafting, reviewing, and revising a poem; conducting and explaining the results of a science experiment on photosynthesis; or even painting a car in auto shop.

* Portfolios are selected collections of a variety of performance-based work. A portfolio might include a student's "best pieces" and the student's evaluation of the strengths and weaknesses of several pieces. The portfolio may also contain some "works in progress" that illustrate the improvements the student has made over time.

These methods, like all types of performance assessments, require that students actively develop their approaches to the task under defined conditions, knowing that their work will be evaluated according to agreed-upon standards. This requirement distinguishes performance assessment from other forms of testing.

WHY TRY IT? Because they require students to actively demonstrate what they know, performance assessments may be a more valid indicator of students' knowledge and abilities. There is a big difference between answering multiple choice questions on how to make an oral presentation and actually making an oral presentation.

More important, performance assessment can provide impetus for improving instruction, and increase students' understanding of what they need to know and be able to do. In preparing their students to work on a performance task, teachers describe what the task entails and the standards that will be used to evaluate performance. This requires a careful description of the elements of good performance, and allows students to judge their own work as they proceed.

WHAT DOES THE RESEARCH SAY? Active learning. Research suggests that learning how and where information can be applied should be a central part of all curricular areas. Also, students exhibit greater interest and levels of learning when they are required to organize facts around major concepts and actively construct their own understanding of the concepts in a rich variety of contexts. Performance assessment requires students to structure and apply information, and thereby helps to engage students in this type of learning.

Curriculum-based testing. Performance assessments should be based on the curriculum rather than constructed by someone unfamiliar with the particular state, district or school curriculum. This allows the curriculum to "drive" the test, rather than be encumbered by testing requirements that disrupt instruction, as is often the case. Research shows that most teachers shape their teaching in a variety of ways to meet the requirements of tests. Primarily because of this impact of testing on instruction, many practitioners favor test reform and the new performance assessments.

Worthwhile tasks. Performance tasks should be "worth teaching to"; that is, the tasks need to present interesting possibilities for applying an array of curriculum-related knowledge and skills. The best performance tasks are inherently instructional, actively engaging students in worthwhile learning activities. Students may be encouraged by them to search out additional information or try different approaches, and in some situations, to work in teams.

WHAT DOES IT COST? These positive features of performance assessment come at a price. Performance assessment requires a greater expense of time, planning and thought from students and teachers. One teacher reports, "We can't just march through the curriculum anymore. It's hard. I spend more time planning and more time coaching. At first, my students just wanted to be told what to do. I had to help them to start thinking."

Users also need to pay close attention to technical and equity issues to ensure that the assessments are fair to all students. This is all the more important as there has been very little research and development on performance assessment in the environment of a high stakes accountability system, where administrative and resource decisions are affected by measures of student performance.

What are examples of successful strategies and programs?

* Charlotte Haguchi is a third- and fourth-grade teacher at Farmdale Elementary School in Los Angeles. Regarding assessment and instruction as inseparable aspects of teaching, Ms. Haguchi uses a wide array of assessment strategies to determine how well her students are doing and to make instructional decisions. She uses systematic rating procedures, keeps records of student performances on tasks, and actively involves students in keeping journals and evaluating their own work. Ms. Haguchi can be seen in action along with other experts and practitioners in the videotape Alternatives for Measuring Performance by NCREL and CRESST. (See Jeri Nowakowski and Ron Dietel, below.)

* William Symons is the superintendent of Alcoa City Schools in Alcoa, Tennessee. Seeking higher, more meaningful student standards through curriculum reform, Dr. Symons works with school staff and the community to create a new curriculum focused on standards and an assessment linked to the curriculum. Comments and advice from Dr. Symons and other practitioners and experts are available on the audiotape Conversations About Authentic Assessment by Appalachia Educational Laboratory. (See Helen Saunders, below.)

* Richard P. Mills is the comissioner of education in the Vermont Department of Education. Vermont is assessing fourth- and eighth-grade students in writing and mathematics using three methods: a portfolio, a "best piece" from the portfolio, and a set of performance tasks. Other states that have been very active in developing and implementing performance assessments include: California, Arizona, Maryland, New York, Connecticut, and Kentucky. (See Ed Roeber and state officers, below.)

Where can I get more information?
Richard P. Mills
Commissioner
Vermont Department of Education
Montpelier, VT 05602
(802)828-3135

Carolyn D. Byrne
Division of Educational Testing
New York State Education Department
Room 770 EBA
Albany, NY 12234
(518)474-5902

Dale Carlson
California Department of Education
721 Capitol Mall
Sacramento, CA 95814
(916)657-3011

Don Chambers
National Center for Research in
Mathematical Sciences Education
University of Wisconsin at Madison
1025 West Johnson Street
Madison, WI 53706
(608)263-4285

Ron Dietel
National Center for Research on Evaluation,
Standards, and Student Testing (CRESST)/UCLA
145 Moore Hall
405 Hilgard Avenue
Los Angeles, CA 90024-1522
(310)206-1532

Steven Ferrara
Program Assessment Branch
Maryland Department of Education
200 West Baltimore Street
Baltimore, MD 21201
(410)333-2369

James Gilchrist
New Standards Project
Learning, Research and Development Center
3939 O'Hara Street
Pittsburgh, PA 15260
(412)624-8319

Paul Koehler
Arizona Department of Education
1535 West Jefferson
Phoenix, AZ 85007
(602)542-5754

Kate Maloy
National Research Center on Student Learning/LRDC
3939 O'Hara Street
Pittsburgh, PA 15260
(412)624-7457

Joe McDonald
Coalition of Essential Schools
Brown University
Box 1969
Providence, RI 02912
(401)863-3384

Jeri Nowakowski
North Central Regional Educational Laboratory (NCREL)
1900 Spring Road, Suite 300
Oak Brook, IL 60521
(708)571-4700

Edward Reidy
Office of Assessment and Accountability
Kentucky Department of Education
19th Floor Capital Plaza Tower
500 Mero Street
Frankfort, KY 40601
(502)564-4394

Douglas Rindone
Division of Research, Evaluation and Assessment
Connecticut Department of Education
Box 2219
Hartford, CT 06145
(203)566-1684

Ed Roeber
Council of Chief State School Officers
1 Massachusetts Avenue NW
Suite 700
Washington, DC 20001-1431
(202)336-7045

Larry Rudner
ERIC Clearinghouse/AIR
3333 K Street NW
Suite 300
Washington, DC 20007
(202)342-5060

Helen Saunders
Appalachia Educational Laboratory
1031 Quarrier Street
P.O. Box 1348
Charleston, WV 25325
(304)347-0400

An Open-Ended Exercise in Mathematics: A Twelfth Grade Student's Performance

[image omitted]

Reprinted by permission, from A Question of Thinking: A First Look at Students' Performance on Open-ended Questions in Mathematics, copyright 1989, California Department of Education, P.O. Box 271, Sacramento, CA 95812-0271.

by David Sweet

This is the second Education Research CONSUMER GUIDE--a new series published for teachers, parents, and others interested in current education themes.

OR 92-3056r
ED/OERI 92-38
Editor: Jacquelyn Zimmermann

This Consumer Guide is produced by the Office of Research, Office of Educational Research and Improvement (OERI) of the U.S. Department of Education.

Richard W. Riley, Secretary of Education
Sharon P. Robinson, Assistant Secretary, OERI
Joseph C. Conaty, Acting Director, OR

-###-

Item Bias Review

by Ronald Hambleton and Jane H.Rodgers
University of Massachusetts at Amherst

When important decisions are made based on test scores, it is critical to avoid bias, which may unfairly influence examinees' scores. Bias is the presence of some characteristic of an item that results in differential performance for individuals of the same ability but from different ethnic, sex, cultural, or religious groups.

This article introduces three issues to consider when evaluating items for bias -- fairness, bias, and sterotyping. The issues are presented and sample review questions are posed. A comprehensive item bias review form based on these

principles is listed in the references and is available from ERIC/AE. This Article and the review form are intended to help both item writers and reviewers.

In any bias investigation, the first step is to identify the subgroups of interest. Bias reviews and studies generally focus on differential performance for sex, ethnic, cultural, and religious groups. In the discussion below, the term designated subgroups of interest (DSI) is used to avoid repeating a list of possible subgroups.

Fairness vs. Bias

In preparing an item bias review form, each question can be evaluated from two perspectives: Is the item fair? Is the item biased? While the difference may seem trivial, some researchers contend that judges cannot detect bias in an item, but can assess an item's fairness. Perhaps the best approach is to include both types of questions on the review form. (Box 1 offers a list of questions addressing fairness.)


Box 1--Sample Questions Addressing Fairness
Does the item give a positive representation of designated subgroups of interest (DSI)?
Is the test item material balanced in terms of being equally familiar to every DSI?
Are members of DSI highly visible and positively portrayed in a wide range of traditional and nontraditional roles?
Are DSI represented at least in proportion to their incidence in the general population?
Are DSI referred to in the same way with respect to the use of first names and titles?
Is there an equal balance (across items in the test) of proper names? ethnic groups? activities for all groups? roles for both sexes? adult role models (worker, parent)? character development? settings?
Is there greater opportunity on the part of members of one group to be acquainted with the vocabulary?
Is there greater opportunity on the part of members of one group to experience the situation or become acquainted with the process presented by the items?
Are the members of a DSI portrayed as uniformly having certain aptitudes, interests, occupations, or personality traits?

Different Kinds of Bias

Bias comes in many forms. It can be sex, cultural, ethnic, religious, or class bias. An item may be biased if it contains content or language that is differentially familiar to subgroups of examinees, or if the item structure or format is differentially difficult for subgroups of examinees. An example of content bias against girls would be one in which students are asked to compare the weights of several objects, including a football. Since girls are less likely to have handled a football, they might find the item more difficult than boys, even though they have mastered the concept measured by the item (Scheuneman, 1982a).

An item may be language biased if it uses terms that are not commonly used statewide or if it uses terms that have different connotations in different parts of the state. An example of language bias against blacks is found in an item in which students were asked to identify an object that began with the same sound as "hand." While the correct answer was "heart," black students more often chose "car" because, in black slang, a car is referred to as a "hog." The black students had mastered the concept but were selecting the wrong item because of language differences (Scheuneman, 1982b). Questions that might be asked to detect content, language, and item structure and format bias are listed in Box 2.


Box 2- Sample Bias Questions

Content Bias
Does the item contain content that is different or unfamiliar to different DSI?
Will members of DSI get the item correct or incorrect for the wrong reason?
Does the content of the item reflect information and/or skills that may not be expected to be within the educational background of all examinees?

Language Bias
Does the item contain words that have different or unfamiliar meanings for DSI?
Is the item free of difficult vocabulary?
Is the item free of group specific language, vocabulary, or reference pronouns?

Item Structure and Format Bias
Are clues included in the item that would facilitate the performance of one group over another?
Are there any inadequacies or ambiguities in the test instructions, item stem, keyed response, or distractors?
Does the explanation concerning the nature of the task required to successfully complete the item tend to differentially confuse members of DSI?



Stereotyping and Inadequate Representation of Minorities

Stereotyping and inadequate or unfavorable representation of DSI are undesirable properties of tests to which judges should be sensitized. Tests should be free of material that may be offensive, demeaning, or emotionally charged. While the presence of such material may not make the item more difficult for the candidate, it may cause him or her to become "turned off," and result in lowered performance. An example of emotionally charged material would be an item dealing with the high suicide rate among Native Americans. An example of offensive material would be an item that implied the inferiority of a certain group, which would be offensive to that group. Terms that are generally unacceptable in test items include lower class, housewife, Chinaman, colored people, and red man.

Additional terms to avoid include job designations that end in "man." For example, use police officer instead of policeman; firefighter instead of fireman. Other recommendations to eliminate stereotyping:

* Avoid material that is controversial or inflammatory for DSI.
* Avoid material that is demeaning or offensive to members of DSI.
* Avoid depicting members of DSI as having stereotypical occupations (i.e., Chinese launderer) or in stereotypical situations (i.e., boys as creative and successful, girls needing help with problems).

author:Hambleton, Ronald & Rodgers, Jane (1995). Item bias review. Practical Assessment, Research & Evaluation, 4(6). Retrieved October 4, 2011 from http://PAREonline.net/getvn.asp?v=4&n=6 . This paper has been viewed 87,272 times since 11/13/1999.


Recommended Reading

This Article is based on Hambleton, R.K. and Rogers,H.J. (1996) Developing an Item Bias Review Form, which is available through ERIC/AE.

Berk, R.A. (Ed.). (1982). Handbook of methods for detecting test bias. Baltimore,MD: The Johns Hopkins University Press.

Chipman, S.F. (1988, April). Word problems: Where test bias creeps in. Paper presented at the meeting of AERA, New Orleans.

Hambleton, R.K., & Jones, R.W. (in press). Comparisons of empirical and judgemental methods for detecting differential item functioning. Educational Research Quarterly.

Lawrence, I.M., Curley, W.E., & McHale, F.J. (1988, April). Differential item functioning of SAT-verbal reading subscore items for male and female examinees. Paper presented at the meeting of AERA, New Orleans.

Mellenbergh, G.J. (1984, December). Finding the biasing trait(s). Paper presented at the Advanced Study Institute Human Assessment: Advances in Measuring Cognition and Motivation, Athens, Greece.

Mellenbergh, G.J. 1985, April). Item bias: Dutch research on its definition, detection, and explanation. Paper presented at the meeting of AERA, Chicago.

Scheuneman, J.D. (1982a). A new look at bias in aptitude tests. In P. Merrifield (Ed.), New directions for testing and measurement: Measuring human abilities, No. 12. San Francisco: Jossey-Bass.

Scheuneman, J.D. (1982b). A posteriori analyses of biased items. In R. A. Berk (Ed.), Handbook of methods for detecting test bias. Baltimore, MD: The Johns Hopkins University Press.

Scheuneman, J.D. (1984). A theoretical framework for the exploration of causes and effects of bias in testing. Educational Psychology, 19(4), 219-225.

Schmitt, A.P., Curley, W.E., Blaustein, C.A., & Dorans, N.J. (1988, April). Experimental evaluation of language and interest factors related to differential item functioning for Hispanic examinees on the SAT-verbal. Paper presented at the meeting of AERA, New Orleans.

Tittle, C.K. (1982). Use of judgmental methods in item bias studies. In R.A. Berk (Ed.), Handbook of methods for detecting item bias. Baltimore, MD: The Johns Hopkins University Press.
Descriptors: Cultural Differences; *Culture Fair Tests; Ethnicity; *Evaluation Methods; *Item Bias; Religious Cultural Groups; Sex Differences; *Stereotypes; Test Construction; Test Format; *Test Items

Basic Item Analysis for Multiple-Choice Tests.

by Jerard Kehoe,
Virginia Polytechnic Institute and State University


This article offers some suggestions for the improvement of multiple-choice tests using "item analysis" statistics. These statistics are typically provided by a measurement services, where tests are machine-scored, as well as by testing software packages.

The basic idea that we can capitalize on is that the statistical behavior of "bad" items is fundamentally different from that of "good" items. Of course, the items have to be administered to students in order to obtain the needed statistics. This fact underscores our point of view that tests can be improved by maintaining and developing a pool of "good" items from which future tests will be drawn in part or in whole. This is particularly true for instructors who teach the same course more than once.

WHAT MAKES AN ITEM PSYCHOMETRICALLY GOOD?

In answering this question, it is desirable to restrict our discussion to tests which are written to cover a unified portion of course material such that it is unlikely that a student would do well on one part of a test and poorly on another. If this latter situation is the case, the comments which follow will apply only if the corresponding topics are tested separately. Regardless, this approach would be preferred, because, otherwise, scores would be ambiguous in their reporting of students' achievement.

Once the instructor is satisfied that the test items meet the above criterion and that they are indeed appropriately written, what remains is to evaluate the extent to which they discriminate among students. The degree to which this goal is attained is the basic measure of item quality for almost all multiple-choice tests. For each item the primary indicator of its power to discriminate students is the correlation coefficient reflecting the tendency of students selecting the correct answer to have high scores. This coefficient is reported by typical item analysis programs as the item discrimination coefficient or, equivalently, as the point-biserial correlation between item score and total score. This coefficient should be positive, indicating that students answering correctly tend to have higher scores. Similar coefficients may be provided for the wrong choices. These should be negative, which means that students selecting these choices tend to have lower scores.

Alternatively, some item analysis programs provide the percentages of examinees scoring in the top, middle, and bottom thirds who select each option. In this case, one would hope to find that large proportions of the high scorers answered correctly, while larger proportions of low scorers selected the distractors.

The proportion of students answering an item correctly also affects its discrimination power. This point may be summarized by saying that items answered correctly (or incorrectly) by a large proportion of examinees (more than 85%) have markedly reduced power to discriminate. On a good test, most items will be answered correctly by 30% to 80% of the examinees.

A general indicator of test quality is the reliability estimate usually reported on the test scoring/analysis printout. Referred to as KR-20 or Coefficient Alpha, it reflects the extent to which the test would yield the same ranking of examinees if readministered with no effect from the first administration, in other words, its accuracy or power of discrimination. Values of as low as .5 are satisfactory for short tests (10 - 15 items), though tests with over 50 items should yield KR-20 values of .8 or higher (1.0 is the maximum). In any event, important decisions concerning individual students should not be based on a single test score when the corresponding KR-20 is less than .8. Unsatisfactorily low KR-20s are usually due to an excess of very easy (or hard) items, poorly written items that do not discriminate, or violation of the precondition that the items test a unified body of content.

IMPROVING THE ABILITY OF ITEMS TO DISCRIMINATE

The statistics usually provided by a test scoring service provide the information needed to keep a record of each item with respect to its performance. One approach is simply to tape a copy of each item on a 5 x 7 card with the test content area briefly described at the top. In addition, tape the corresponding line from the computer printout for that item each time it is used. Alternatively, item banking programs may provide for inclusion of the proportions marking each option and item discrimination coefficients along with each item's content.

A few basic rules for item development follow:

1. Items that correlate less than .15 with total test score should probably be restructured. One's best guess is that such items do not measure the same skill or ability as does the test on the whole or that they are confusing or misleading to examinees. Generally, a test is better (i.e., more reliable) the more homogeneous the items. Just how to restructure the item depends largely on careful thinking at this level. Begin by applying the rules of stem and option construction discussed in ERIC Digest TM 95-3 (ED 398 236). If there are any apparent violations, correct them on the 5x7 card or in the item bank. Otherwise, it's probably best to write a new item altogether after considering whether the content of the item is similar to the content objectives of the test.

2. Distractors that are not chosen by any examinees should be replaced or eliminated. They are not contributing to the test's ability to discriminate the good students from the poor students. One should not be concerned if each distractor is not chosen by the same number of examinees. Different kinds of mistakes may very well be made by different numbers of students. Also, the fact that a majority of students miss an item does not imply that the item should be changed, although such items should be double-checked for their accuracy. One should be suspicious about the correctness of any item in which a single distractor is chosen more often than all other options, including the answer, and especially so if that distractor's correlation with the total score is positive.

3. Items that virtually everyone gets right are useless for discriminating among students and should be replaced by more difficult items. This recommendation is particularly true if you adopt the traditional attitude toward letter grade assignments that letter grades more or less fit a predetermined distribution.

By constructing, recording, and adjusting items in this fashion, teachers can develop a pool of items for specific content areas with conveniently available resources.

SOME FURTHER ISSUES

The suggestions here focus on the development of tests which are homogeneous, that is, tests intended to measure a unified content area. Only for such tests is it reasonable to maximize item-test correlations or, equivalently, KR-20 or Coefficient Alpha (reliability), which is the objective of step 1 above. The extent to which a high average item-test correlation can be achieved depends to some extent on the content area.

It is generally acknowledged that well constructed tests in vocabulary or mathematics are more homogeneous than well constructed tests in social sciences. This circumstance suggests that particular content areas have optimal levels of homogeneity and that these vary from discipline to discipline. Perhaps psychologists should strive for lower test homogeneity than mathematicians because course content is less homogeneous.

A second issue involving test homogeneity is that of the precision of a student's obtained test score as an estimate of that student's "true" score on the skill tested. Precision (reliability) increases as the average item-test correlation increases, all else the same; and precision decreases as the number of items decreases, all else the same.

These two relationships lead to an interesting paradox: often the precision of a test can be increased simply by discarding the items with low item-test correlations. For example, a 30-item multiple-choice test administered by the author resulted in a reliability of .79, and discarding the seven items with item-test correlations below .20 yielded a 23-item test with a reliability of .88 That is, by dropping the worst items from the test, the students' obtained scores on the shorter version are judged to be more precise estimates than the same students' obtained scores on the longer version.

The reader may question whether it is ethical to throw out poorly performing questions when some students may have answered them correctly based on their knowledge of course material. Our opinion is that this practice is completely justified. The purpose of testing is to determine each student's rank. Retaining psychometrically unsatisfactory questions is contrary to this goal and degrades the accuracy of the resulting ranking. This article was adapted with permission from "Testing Memo 5: Constructing Multiple-Choice Tests--Part II," Office of Measurement and Research Services, Virginia Polytechnic Institute and State University, Blacksburg, VA 24060

author: Kehoe, Jerard (1995). Basic item analysis for multiple-choice tests. Practical Assessment, Research & Evaluation, 4(10). Retrieved October 4, 2011 from http://PAREonline.net/getvn.asp?v=4&n=10 . This paper has been viewed 88,461 times since 11/13/1999.


FURTHER READING

Airasian, P. (1994) "Classroom Assessment," Second Edition, NY" McGraw-Hill.

Brown, F. (1983), "Principles of Educational and Psychological Testing," Third edition, NY: Holt, Rinehart & Winston. Chapter 11.

Cangelosi, J. (1990) "Designing Tests for Evaluating Student Achievement." NY: Addison-Wesley.

Grunlund, N. (1993) "How to make achievement tests and assessments," 5th edition, MA: Allyn and Bacon.

-----

Authentic Reading Assessment.

by Peggy Dutcher
Michigan State Department of Education

Since 1977, significant advances in anthropology, cognitive psychology, education, linguistics, and sociology have made it possible to expand how reading is viewed. These advances indicate that reading is a dynamic process in which the reader actively participates. As a result, difficulty is no longer viewed as a property of a particular reading skill or task, but rather as an interaction among the reader, text, and context of the reading situation.

This article looks at authentic reading assessment as a response to this evolving concept of reading. An examination of the Michigan State Board of Education's reading assessment--one of the nation's most innovative--shows how one state is implementing authentic reading assessment using authentic reading material.

MICHIGAN'S ESSENTIAL SKILLS READING TEST

In 1983, the Michigan Department of Education adopted this definition of reading:

"Reading is the process of constructing meaning through the dynamic interaction among the reader's existing knowledge, the information suggested by the written language, and the context of the reading situation."

This definition suggests the need for an interactive model of reading which combines the top-down (whole language) and bottom-up (skills) models. This interactive model leads to the skills taught in the context of real reading of real text.

From this perspective, a good reader is one who can apply various reading skills independently and flexibly in a variety of reading situations, not one who simply demonstrates mastery of those skills.

Given this theoretical focus, a variety of factors needs to be addressed in instruction and assessment. These include:

* the influence of the reader's prior knowledge on reading comprehension,

* how the reader structures that knowledge,

* which strategies the reader uses to construct meaning,

* which skills the reader needs to perform a particular reading task,

* the type of methods and materials being used, and

* the setting in which reading occurs.

These are not new concerns, but reading research and theory have only recently enabled educators to integrate these issues into instructional and assessment practices.

Michigan's "goal" for reading education is to develop strategic motivated readers. This goal highlights the importance of a reader having knowledge of the reading process, knowledge about strategies and skills that are essential to constructing meaning or comprehending text, and knowledge about how to appropriately apply these strategies. The goal also includes motivation to read.

Unlike other formal assessments of reading, Michigan's Essential Skills Reading Test uses intact, full-length stories and subject-area reading selections taken from real life materials, such as children's magazines, literature anthologies, and textbooks for different grade levels. The reading selections are then the driving force for developing test items.

TYPES OF ITEMS ON THE TEST

The Essential Skills Reading Test has three types of constructing-meaning items:

1. Intersentence, in which the answer to this type of item can be found in two to three contiguous sentences within the reading selection;

2. Text, where one or more paragraphs of the reading selection must be read to construct meaning; and

3. Beyond Text, where the reader not only constructs meaning from the text but also must bring in some of his or her own prior knowledge to answer the test item.

In addition to the constructing-meaning items, knowledge-about-reading items might ask students if they know the purpose of the chart, graph, or illustration. Other items of this type might ask students whether they know how a reading selection is organized and how that organization influences comprehension. This can include items about what strategies should be considered when readers encounter various unknown situations.

Student-self-report items specifically ask the students about their interest in the reading selection, how they felt about their performance in understanding the reading selections and answering the test items, and the amount of effort they put into reading the selections and answering the test items. Like the knowledge-about-reading items, these items also are specific to the reading selections in the test.

REACTIONS OF PARENTS AND TEACHERS

Overall, reactions to the Michigan test have been positive. Many welcomed the long due change in the approach to assessing reading performance, although some were not as eager to adopt the new approach. Change is occurring, but perhaps not as rapidly as had been hoped.

Michigan officials made extensive efforts to inform the public about the new tests and their purpose. They prepared people for the results by emphasizing that "low scores are good news." As a result, when the low scores arrived, many parents and teachers accepted them as an opportunity to improve education in Michigan.

CURRENT RESEARCH IN READING ASSESSMENT

The impact of the continuing research on reading and reading assessment is reflected in the 1992 National Assessment of Educational Progress (NAEP) reading objectives framework. This framework introduces several new innovations to reading assessment, such as:

* developing a large proportion of open-ended items that assess constructing-meaning, including essay questions;

* expanding the constructing-meaning scale to include three different levels (building an initial understanding, developing an interpretation, responding personally or critically);

* including reading for three purposes: to gain literary experience, to get information, and to perform a task; and

* providing a measure of reading fluency or oral reading proficiency.

The new framework will continue the past emphasis on assessing the use of effective strategies, increasing knowledge about reading, and developing positive reading habits and practices. It will use longer, naturally occurring passages. The results will be reported on three scales, one for each reading purpose cited above, and will be summarized on one overall scale.

In Michigan, the Reading Test Development Coordinating Committee is in the process of reviewing and revising its Essential Skills Reading Test. Some of the issues the committee is considering include:

* formats other than multiple-choice (such as open-ended items) and other types of reading materials (such as those related to employability skills)

* test length--whether the current selections are too long and what length would be sufficient

* asking questions across tests to see if students can use more than one source of material to gather information and make decisions

* classroom assessment

author : Dutcher, Peggy (1990). Authentic reading assessment. Practical Assessment, Research & Evaluation, 2(6). Retrieved October 4, 2011 from http://PAREonline.net/getvn.asp?v=2&n=6 . This paper has been viewed 54,636 times since 11/13/1999.


ADDITIONAL READING

Dutcher, P. (1989). Reading the MEAP Test Report Forms.

Dutcher, P. & Roeber, E. (1989). New MEAP Reading Test Debuts. Michigan Association of School Boards, 50, 9.

Michigan State Board of Education (1988). Essential Goals and Objectives for Reading Education.

Michigan State Board of Education (1989). Essential Skills Reading Test Blueprint, (Fifth Edition).

Roeber, E. & Dutcher, P. (1989). Michigan's Innovative Assessment of Reading. Educational Leadership, 46, 7.

Wixson, K. & Peters, C. Reading Redefined: A Michigan Reading Association Position Paper.

Authentic Writing Assessment.

by Carmen Chapman
Illinois State Department of Education

In view of the role writing plays in people's academic, vocational, social, and personal lives, the development of students' ability to write is a main priority of schooling. Since educators can use writing to stimulate students' higher-order thinking skills--such as the ability to make logical connections, to compare and contrast solutions to problems, and to adequately support arguments and conclusions--authentic assessment seems to offer excellent criteria for teaching and evaluating writing.

This article discusses some of the ways authentic writing assessment can be used in education. Using the Illinois Writing Program as an example, this article also looks at some of the goals, solutions, and experiences of a program that is implementing authentic writing assessment.

EMERGING IDEAS IN AUTHENTIC WRITING ASSESSMENT

New directions in authentic assessment are aimed at getting beyond writing as an isolated subject unto itself. The goal is to integrate writing into the teaching of all subject areas, including science and mathematics. For example, if mathematics instructors have students write explanations for their procedures for solving problems, the instructors can evaluate the students' ability to perform the task without relying solely on the correct--or incorrect--numerical answer to measure achievement.

Literature teachers can use authentic assessment to help students discover the natural connections in understanding various themes, importance of settings, character development, comparisons, and contrasts of ancient and modern story plots. Students' writing in response to reading is one of the most valid indices of whether the student has been able to derive meaning from the text. Many believe that traditional multiple-choice response formats cannot duplicate the thinking and constructing necessary to evaluate a piece of literature.

THE FORMAT FOR AN AUTHENTIC WRITING ASSESSMENT

An authentic writing assessment should reflect various types of writing as well as levels of complexity related to the task assigned in the prompt. For example, a writing assessment assignment can be:

* totally open-ended, where the student is asked to construct an essay either requiring or not requiring certain background knowledge

* limited to specific components of the writing process, such as planning, outlining, or even revising

* used for short answers which may be either a part of planning or an abbreviated check for a basic understanding of key points

Assessment formats are also related to the amount of time one has for the assessment.

An increasingly popular format is portfolio assessment, in which students complete a body of writing over a prolonged period of time. Portfolios typically include several types of writing, and teachers consider a student's entire portfolio--not just single assignments--providing a more naturalistic approach to teaching and evaluation. As with authentic assessment programs in general, the drawbacks to portfolio assessment include technical issues of reliability for applying criteria across students and time.

AN EXAMPLE: THE ILLINOIS WRITING PROGRAM

The founders of the Illinois Writing Program are philosophically committed to integrating instruction and assessment. To accomplish this, their assessment specifications require:

* representing defined writing skills, status, and growth,

* verifying that the methods used to construct, conduct, and verify the assessment meet technical standards, and

* implementing an information network for classroom and district personnel to use test results to improve instruction.

To give a descriptive profile of a student's command of fundamental techniques of clear writing, the program has a rating system with the following analytic criteria:

* Focus: Is the main idea, theme, or point of view clear and consistently maintained?

* Support/Elaboration: Are arguments and conclusions adequately supported and explained?

* Organization: Is the logical flow of ideas clear and connected?

* Conventions: Are standard English conventions (spelling, grammar, punctuation) properly followed?

The assessment also produces a focused, holistic score Integration which reflects how well the composition as a whole accomplished the assignment.

This rating system emphasizes stages of development and avoids pejorative classifications. For example, writing at the lower end of the scale is described as "not being developed" rather than being "poor" or "weak."

The Illinois Writing Program which presents assessment results as a score profile is also designed to help teachers determine areas of instructional need. By instructing teachers on the use of the scoring system for assessment, the major emphasis becomes defining what the teacher expects students to be able to write.

TEACHER INVOLVEMENT IN THE PROGRAM

In Illinois, teacher workshops are held to teach the system as a model that may be modified to meet classroom needs. Participants are given an overview of the assessment system and then are introduced to each analytic feature. The teachers then practice scoring sample papers that represent the full scale of underdeveloped to developed writing. Teachers must not only understand the assessment, but also adapt their teaching methods to help students prepare for it.

Five years after the program began, more than 1,000 teachers have been trained with the writing assessment model. Survey and anecdotal information from the trainers indicate that teachers are overwhelmingly supportive and enthusiastic about the workshops and information tools they receive. This is especially true for elementary teachers who, for the most part, have never received instruction in teaching writing beyond grammar, spelling, and the Palmer Method of penmanship.

Positive results occur in writing instruction not only because teachers have received information that they can use in their classrooms, but also because they are in charge of the training. They have a vested interest and ownership in the entire project. Teacher trainers explain to workshop participants not only the mechanics of the system, but also the ways in which they have adapted and adopted the system for their own students.

auhtor : Chapman, Carmen (1990). Authentic writing assessment. Practical Assessment, Research & Evaluation, 2(7). Retrieved October 4, 2011 from http://PAREonline.net/getvn.asp?v=2&n=7 . This paper has been viewed 49,765 times since 11/13/1999.

ADDITIONAL READING

Chapman, C.W., Fyans, L.J., Kerins, C.T. (1984). Writing assessment in Illinois, Educational Measurement, 3, 24-26.

Chapman, C.W., (1989). Teaching to the writing test is O.K.: Integrating classroom instruction with assessment, Excellence in Teaching, 6, 9-11.

Chapman, C.W. (1991). Write On, Illinois Springfield, Illinois: Illinois State Board of Education.

Illinois State Board of Education Student Assessment Section (1991, in process). Results of the 1990 language arts assessment, Springfield, Illinois: State Board of Education.

Quellmalz, E.S. (1984). Toward successful large-scale writing assessment: Where are we now? Where do we go from here? Educational Measurement, 3, 29-32.