1 of 20

Slide Notes

Introduce the group.
DownloadGo Live

Copy of LCN606 Reading Assessment

No Description

PRESENTATION OUTLINE

Reading Assessment

Stacey, Jennifer & Gillian
Introduce the group.
Photo by mikecogh

1. The Thinking Stage

Click to add more text here
We needed a starting point for the thinking stage and this was a rough idea of a student group and we then matched a coursebook to this group.

Developing Specifications

  • Test purpose and target group
  • Skills tested
  • Test methods and text selection criteria
  • Weighting and timing
Slide 3 From the starting point we zoomed out to consider the test purpose (Who would use this textbook? How? Why?) and the target group to acknowledge a student is more than their language ability (Who are the target group? What is their motivation? Context? Goal?

Slide 4 From this point we zoomed in on the skills tested using the course book and CEF level reading descriptors as a guide. We discovered a discrepancy between these.

CEF A1 Descriptors

  • Extremely limited users of English
  • Can read simple short texts by understanding familiar names, words and basic phrases.
CEF A1 descriptor.
Photo by Enokson

Untitled Slide

Compare CEF descriptor to example activity from CB. We managed this discrepancy by balancing the two and used this to inform the writing stage.

Developing Specifications

  • Test purpose and target group
  • Skills tested
  • Test methods and text selection criteria
  • Weighting and timing
Slide 5

Zoom in closer for text selection and test methods. We looked at the coursebook Unit 8 for example texts and analysed text type, topics and language features.

Zoomed out to think about weighting (straight forward as responses were fairly uniform despite different test methods) and timing (less straight forward as it is difficult to imagine how a student at this level reads)

As a group we used Lyn’s template and worked collaboratively to produce draft specifications that were checked for completeness and consistency before starting the writing stage.

2. The Writing Stage

The writing stage.
The final unit of our course-book had an inventions theme. One of the reading comprehension tasks in the course-book had a 250 word text about four inventions. Because students were familiar with this, we decided to use it as a model for our reading task.
As Stacey outlined, our test takers are extremely limited users of English. Therefore, it was not possible to find a suitable text about inventions, so we had to go about creating our own.
Photo by solution_63

Untitled Slide

Here is the example from the course-book of the text about four inventions. We used this to create a grammar and vocabulary outline, and decide on our readability criteria.
Then, we looked for inventions that would appeal to our age group (11-14 years). In this example, we have chosen square watermelons to talk about.
Using the grammar patterns and vocabulary we had identified as a guide, we used the information highlighted in the article to write a simple paragraph about square watermelons…
We ran it through a Flesch-Kincaid readability check, to make sure it was similar to our original text in the course-book. We did this for each of our inventions in order to create the final reading text. (Which you should have a copy of, in front of you.)

Developing Questions

  • Developed pool of questions for each invention
  • Created spreadsheet for inventions and question types
  • Ensured short answer responses tested reading not writing
  • Ease of marking - one mark per question
In developing our questions,
Firstly, we decided that everyone in the group would write a MCQ and a short answer question for each invention. In addition, everyone would write some other question types, such as a gapped summary question. This developed a pool of questions to choose from for each invention.
Secondly, we created a spreadsheet to ensure questions were evenly spread over the four inventions and that there was a combination of “Wh” and How questions.
Thirdly, with our short answer questions we chose ones that could be answered in 1-4 words, to ensure that we were testing reading skills, not writing skills.
Finally, for ease of marking we decided one mark per question, therefore we avoided questions with multiple answers.
Photo by Leo Reynolds

Original Question 5:
Which two countries buy square watermelons from Japan?
Answer: Russia and Canada

Modified Question 5:
How many countries buy square watermelons from Japan?
Answer: Two

Here is an example of a question that we originally rejected because it gave us two answers.
Question 5 – originally was, “Which two countries buy square watermelons from Japan?”
The answer was Russia and Canada. (Giving us a question worth two marks or potentially half-marks)
We modified the question to “How many countries buy square watermelons from Japan?”
The answer is Two. (Potentially a more difficult question, as it prevents test-takers from just copying from the text)
It also revised grammar from a previous unit, understanding the difference between “How many?” and “Which”, and maintained the consistency of one mark per question.

Developing the Answer Key

  • IELTS marking key used as a template
  • Careful choice of questions / discussion of answers
  • Refined during feedback process
In developing the answer key, we used the IELTS marking key as a model.
During the question writing process, we discussed acceptable responses and chose our questions carefully to make marking as simple as possible.
Feedback from our reviewers also helped us to refine our marking key.
Photo by Leo Reynolds

3.  The Piloting Stage

At this point in the project we were ready to move onto the piloting stage in order to obtain feedback from experienced teachers.

We designed a feedback form that asked about our:

1. Reading Text: the difficulty level, content and vocabulary

2, Questions: their clarity, level of difficulty, whether they matched the skill being tested

3, Answers: appropriateness of the distractors, the answer key, other possible reponses to consider?

4, Time allocation (40 mins).

Two Experienced Teachers

An EAL/D teacher from a local high school.

An ELICOS teacher with extensive test writing experience.

We approached two experienced teachers:

1. An EAL/D teacher from a local high school.

2. An ELICOS teacher from QUT International College with extensive test writing experience.

So, what did they say?

Reading Text:
too basic; needed more complex/compound sentences

Timing:
40 minutes is too long;
30 minutes is sufficent

1. Firstly, both teachers believed that the Text was too basic and needed more complex and compound sentences.

However, as both Stacey & Jennifer previously explained, our text needed to be heavily modified in order to match both the A1 descriptors (CEF) and the Flesch-Kincaid scores obtained from analysis of the student workbook. This process of text modification resulted in the use of simple sentences which enhanced our construct validity and reliability.

2. Secondly, both teachers believed that 30 minutes was sufficient for completing of the test.

Why did Wakefield make choc chip cookies?

  • they tasted good
  • they sold quickly
  • she didn't have any chocolate powder
  • she sold her recipe??
  • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  • Wording: 'to sell her recipe' (better fit)
  • Length: 'to sell her recipe to a chocolate company' (longer length)
Other feedback related to our MCQ distractors.

In this example feedback concerned the wording and length of the final distractor.

1. One teacher felt it did not fit the question (Why did Ruth Wakefield make chocolate chip cookies? - she sold her recipe). So it was replaced with 'to sell her recipe', which was a better fit.

2. The other teacher noted that the correct response was longer than the three distractors.
So we lengthened the final distractor to make the correct response less obvious: 'to sell her recipe to a chocolate company'.
Photo by xmartenx

Untitled Slide

This final example shows a table requiring the students to fill in missing information.

Feedback received concerned:

1. Rewording of the instruction to enhance clarity.
2. Removing 'example' from the table as it was considered confusing.

3. Considering the response 'American' placed under the heading 'Where?'
We decided that students would not score a point for this answer as they needed to understand both the question type AND the difference in vocabulary between America vs American.
(Both of which had been previously covered in class).

4. Finally, both teachers suggested we extend the table to include ALL inventions. However, we had previously discussed this and intentionally only selected two. If we included all four inventions, the students could potentially use this information to answer subsequent questions on the test.

Construct Validity

Construct validity

Using the evaluation questions designed by Bachman and Palmer (1996) we are satisfied that the validity of our test is suitable as any threats can be explained and are outweighed by measures in the test design.

Threats include limited range of possible test tasks but can be explained by simple text/possibility of copying answers from text (this addressed in test design)/transferability of scores from this test to overall student ability.

The validity of our test is a result of the test design which focused on ensuring only the reading skill was tested (not grammar or spelling)/ strong relevance as test tasks, topic and text type are familiar to the students (limiting bias)/strong coverage as skills were tested multiple times and experienced teachers agreed the questions tested the identified skills.
Photo by Paco CT

Reliability

Brown (2004) states that there are four threats to reliability, which we will address.
1. Student related reliability: Grabe states that we should use tasks that reflect the material taught in class and the skills practiced. (2009, p. 354) and that our test-takers should be familiar with the format and testing techniques. (Hughes, 2003) We feel this is a strength of our test, as we were careful to create a similar task to the course-book.

2. Rater reliability: our marking key and rubrics helps to ensure consistency of marking and teacher feedback.

3. Test administration reliability: to minimise inconsistencies in the test conditions, we have developed an administration information sheet for the invigilators.

4. Test reliability: (or Internal consistency Popham, 2008, p. 34) This addresses how the items in a test are functioning. The more items there are, the greater the internal consistency is likely to be. Also, our specifications ensure that an alternate form of this test could be created, thus enabling a consistency of challenge in two parallel tasks.

References

Grabe, W. (2009). Reading in a second language: moving from theory to practice, New York: Cambridge University Press
Popham, W. (2008). Classroom assessment: what teachers need to know, Boston: Pearson
Hughes, 2003, Lecture slides
Photo by mikecogh

Authenticity

The authenticity of a text relates to the correspondence between the test task and the target language use (TLU) in the real world.
(Bachman and Palmer, 1996).

Strengths:
1. Texts were derived from real primary sources that were easily accessible for the students.

2. Interesting inventions, such as the Walkcar and square watermelons, were chosen to engage the students in order to promote relevance and a positive affective response.

3. Appealing pictures were selected to add interest and enhance the test's correspondence to the TLU domain

Weaknesses:
1. In order to strengthen the construct validity and reliability of our test, primary sources were heavily modified.
These modifications compromise the authenticity of our text and the generalisability of score interpretations to the TLU domain.

Bachman & Palmer (1996): The primary purpose of a test is to measure - therefore validity and reliability are essential.

Reflection

Your reflection on the experience-

what you have learned,

what was challenging,

what you would like to focus on in the future in terms of continuing to develop your lang assessment expertise etc