Archive for the ‘Language Assessment’ Category

Objective Tests

Saturday, April 7th, 2007

Objective tests measure one’s ability to remember facts & figures understanding of course materials. These tests are often designed to make test-takers think independently. Good objective tests require test-takers to employ high level critical reasoning and make fine discriminations to determine the best answer
Objective Tests. ‘

The most common objective test questions are:

  •     multiple-choice
  •     true-false
  •     matching items
  •     cloze

The most common is the multiple choice question (MCQ) test where students must select the correct answer from a number of possible answers.

The incorrect answers in MCQs are termed distractors.

Distractors should cotnain:

  •     misconceptions
  •     partly correct answers
  •     common errors of fact or reasoning (these distract students who are not well prepared for the test from giving the correct answer)

MCQs are usually used to test the test-taker’s ability to:

  •     recall information
  •     interpret data/diagrams
  •     analyse/evaluate material

Main strengths of MCQs:

  •     test a wide range of issues in a short time
  •     assessment is not affected by a student’s ability to write
  •     can be reliably marked as all answers are predetermined
  •     can be quickly marked by computer
  •     computer marking gives easy access to an item analysis of questions to pinpoint problem areas for students
  •     a large bank of questions can be built up to reduce future preparation time
  •     can be used for quick revision at the start or end of a class and marked by the students

Main weaknesses of MCQs:

  •     do not test the student’s ability to develop and organize ideas and present these in a coherent argument
  •     takes a long time to write plausible distractors (especially in cases where higher order cognitive skills are being tested)
  •     restrictions are placed on the test-taker’s answers as they must select from given alternatives
  •     guessing may result (but plausible distractors will result in intelligent guessing)
  •     questions are often re-used which means special attention to security
  •     questions need to be pre-tested and items reviewed to ensure the validity of the items

Writing MCQs is a relatively difficult task. However, the effort expended in item construction is rewarded by the ease and reliability of marking

MCQs must have:
a clear and unambiguous stem
a correct answer
several (usually 3 or 4) distractors which appear plausible to students who do not know the correct answer
coherence to the content matter to be examined

E.g.

Sample MCQ

Tips for constructing MCQs:

  •     use simply worded stems
  •     present only one issue in the stem
  •     avoid use of negative premises (may especially disadvantage ESL students)
  •     ensure that the answer to one question cannot be obtained from another
  •     Keep the distractors brief and as homogeneous as possible
  •     ensure the distractors are plausible (i.e. common errors made by students)
  •     use at least 3 distractors (reduces chance of guessing the correct answer)
  •     avoid distractors that provide clues (e.g. phrases from text books)
  •     group similar types of MCQs together
  •     avoid using a pattern for the position of the correct response

Table of Specifications

Sunday, March 25th, 2007

A Table of Specifications is a two-way chart which describes the topics to be covered in a test and the number of items or points which will be associated with each topic. Sometimes the types of items are described as well.

The purpose of a Table of Specifications is to identify the achievement domains being measured and to ensure that a fair and representative sample of questions appear on the test.

As it is impossible, in a test, to assess every topic from every aspect, a Table of Specifications allows us to ensure that our test focuses on the most important areas and weights different areas based on their importance / time spent teaching. A Table of Specifications also gives us the proof we need to make sure our test has content validity.

Tables of Specifications are designed based on:

course objectives

topics covered in class

amount of time spent on those topics

textbook chapter topics

emphasis and space provided in the text

 

A Table of Specification could be designed in 3 simple steps:

1. identify the domain that is to be assessed

2. break the domain into levels (e.g. knowledge, comprehension, application …)

3. construct the table

The more detailed a table of specifications is, the easier it is to construct the test.

Test Bias

Tuesday, March 13th, 2007

Test bias is the presence of some characteristics of an item in the test, that results in differential performance by individuals of the same ability but different sub-group

When important decisions are made based on test scores, it is critical to avoid bias which may unfairly influence test-takers’ scores

Fairness and bias are not the same thing.

Fairness has to do with how a test is used.

A biased test may be used fairly.

E.g.

For a test biased such that males score 5 points higher on average than do females if we simply add 5 points to the observed scores of the females and use that score for making decisions, the biased test will prove to be fair in use.

An item may be biased if it contains content or language that is differentially familiar to different subgroups and/or if the item structure or format is differentially difficult to different subgroups.

Relaibility

Tuesday, March 13th, 2007

Reliability refers to the consistency of a measure. A test is considered reliable if we get the same result repeatedly.

E.g.

If a test designed to measure a specific trait, then each time the test is administered to a subject, the results should be approximately the same.

Unfortunately, it is impossible to calculate reliability exactly but there are several different ways to estimate reliability. The different types of reliability that could be estimated are:

Test-Retest Reliability

Inter-rater Reliability

Parallel-Forms Reliability

Internal Consistency Reliability

To gauge test-retest reliability, the test is administered twice at two different points in time. This kind of reliability is used to assess the consistency of a test over a period of time. Test-retest reliability assumes that there will be no change in the quality or construct that is being measured.

Inter-rater reliability is assessed by having two or more independent raters score the test, then comparing the scores to determine the consistency of the raters’ estimates.

Parallel-forms reliability is estimated by comparing different tests that were created using the same content. The two tests should then be administered to the same subjects at the same time.

Internal consistency reliability is used to judge the consistency of results across items on the same test. i.e.  test items that measure the same construct are compared in order to determine the tests internal consistency.

Validity

Tuesday, March 13th, 2007

Validity is the extent to which a test measures what it claims to measure.

Testing is a matter of making judgments about test-takers competence in view of their performance on certain tasks.

These judgments are inferences as tests do not collect concrete evidence about test-takers’ ability, in the natural state, but only abstract inferences

Evidence of test performance is used to draw conclusions about candidates’ ability to handle the demands of the criterion situation.

For high-stakes tests procedures need to be taken to investigate the procedure by which the conclusions were drawn.

Test validation is this process of investigating the quality of the test-based conclusions

The different types of validity are:

Content validity

     Face validity

     Content (sampling) validity

Criterion-related validity

-[        Concurrent Validity

<!        Predictive Validity

Construct validity

Face validity is the extent to which a test meets the expectations of those involved in its use -  stake-holders

This type of validation is designed to decrease opposition by ensuring that nobody is too unhappy with it.

An example of an instrument that measures face validity is Rosenberg’s self esteem scale.

When a test has content (sampling) validity, the items on the test represent the entire range of possible items the test should cover.

To ensure this, individual test questions may be drawn from a large pool of items that cover a broad range of topics.

Content validity establishes that the measure covers the full range of the concept’s meaning, i.e. covers all dimensions of a concept

When a test has content validity, the test reflects the syllabus on which it is based

A test is said to have criterion-related validity when the test is demonstrated to be effective in predicting criterion or indicators of a construct.

There are two different types of criterion-related validity:

<!   concurrent Validity

<!   predictive validity

 

Concurrent validity occurs when the criterion measures are obtained at the same time as the test scores.

This indicates the extent to which the test scores accurately estimate an individual’s current state with regards to the criterion.

Predictive validity occurs when the criterion measures are obtained at a time after the test.

A test has construct validity if it demonstrates an association between the test scores and the prediction of a theoretical trait.

Construct under-representation and construct irrelevant variance are two major threats to validity too.

A test is said to demonstrate construct under-representation if tasks included in the test fail to measure important dimension of the construct. If this happens, results of the test are unlikely to reveal test-taker’s ability within the domain the test claims to measure.

A test is said to demonstrate construct irrelevant variance if tasks measure variables which are irrelevant to the domain the test claims to measure. This type of invalidity can take two forms:

<!     construct irrelevant easiness

<!     construct irrelevant difficulty