Assignment 7: Evaluation Plan and Progress Report IV

Due: Wednesday, November 13 at 4:00pm. Submit on Canvas.


To convince people that your idea is correct, you'll need some way to convince an expert that you have evaluated it fairly and correctly. In this assignment, you will develop an evaluation plan for your research project, and write it up.

Some projects, which are more study- or measurement-oriented, need more lead time to complete their evaluation. If you are in this set, turn this assignment in early so that you can proceed with data collection. Other projects, which are more engineering- or design- oriented, need more lead time to design and construct their approach. If you are in this set, you can still turn the assignment in early, but it's not as strongly required.

Articulate Your Thesis

As we discussed in class, the first step in planning an evaluation is to articulate the main thesis of your work. (Remember from Assignment 3, Project Introduction, that the main thesis of the work is likely embedded in the topic sentence of your bit flip paragraph.) Go back and reflect on that statement — tweak it if necessary based on what you've learned from your project so far.

Write out your thesis at the top of your submission.

Derive Your Claim

We discussed in lecture how theses imply a claim. For example, "x > y"-type ("X is better than Y") theses imply a claim that x is in fact better/faster/more performant/more enjoyable than y, and "∃ x"-type ("there exists an X") theses imply a claim that whereas x could not exist before, that it does with your system. Discuss with your team the claim implied by your thesis.

Write down your claim.

Design Your Evaluation

Now, you need to work from your claim to design a specific evaluation plan. How do you prove what you have claimed? This evaluation plan typically specifies:

  • DV: what is your dependent variable? (This is the variable you measure as the outcome, such as accuracy on a test set.)
  • IV: what is your independent variable? (This is the variable you manipulate for comparison to create conditions, such as the algorithm or the interface used.)
  • Task: what is the specific task that is being performed in order to measure the DV? (This might executing a benchmark, a known ML classification task, or a specific sequence of behaviors that a user must perform.)
  • Threats: what are the factors that might influence your outcome? For example, in what situations might your result hold or not hold? What biases might creep in that you need to make sure to account for?

You don't need to re-invent the wheel here. Often your nearest neighbor paper or other papers in your related work establish an evaluation paradigm that you can import to your paper. In fact, this is often preferred, since then you don't need to convince a reader that your approach is valid, since it's already in the literature. So, go review the evaluations used in your prior work and use those to develop a few possible models. Then, share those models with your team and work together to develop a variant that works well for your project.

Based on your project's setup, your model might look slightly different than what is laid out above. If you believe this to be the case, talk to your TA about it.

Next, run the following unit test on your proposed design: does it directly test the thesis you articulated above? Imagine a few possible outcomes from your evaluation. Depending on how it comes out, does it directly prove or disprove your thesis, or only obliquely shed light on whether your thesis is correct?

Write out the DV, IV, Task and Controls for your evaluation design. Summarize your explanation of why that design directly tests your thesis.

Write Your Evaluation

Your goal is now to write up your evaluation plan like it would appear in a published paper. Ideally, you will be able to reuse and update this for your final paper submission. Having a clear sense of what the evaluation will look like helps make sure that you are targeting your vectoring toward the goals that you need to.

Different areas structure their evaluation writeups differently. For example, in HCI, it is common to have a Method section separate from a Results section; in Systems, less so. Use your related work to develop a model of how evaluations should be reported in your area. Each section has also provided a sample strong evaluation section for you to use as a rough template below:

You obviously won't have final results ready at this point, so either make up the results you might reasonably expect to see or use any current pilot data that you do have. Include any graphs that you are going to want to include in your final writeup. You will, of course, be able to update this for your final submission as your project progresses, including changing your evaluation design (with clearance from your TA) and updating your results.

In a full paper, typically the Evaluation section is 2000-3000 words. However, that length includes a detailed analysis of the results, and at this point, you will only have pilot data at best. So, for this assignment, the requirement is 1000-1500 words, mostly on methods and a bit of results scaffolding. For the final paper, we will expect 2000-3000 words for the Evaluation section.

Progress Report IV

As you've been doing up to this point, continue submitting your weekly progress report as well. See previous progress report assignments for the instructions and format.


Executing this project will be your team's focus for the rest of the course. Here is a reminder of the timeline:

  • Week 4: Introduction due; start executing the project
  • Weeks 5-8: work on the project, with weekly check-in assignments
  • Week 8: Evaluation plan due in addition to the weekly check-in assignment
  • Week 9: Draft paper due
  • Week 10: Final presentation in class
  • Finals: Final paper due


Submit two PDFs. One with (1) your thesis, (2) your claim, (3) your evaluation design, and (4) your evaluation writeup, and the other with your progress report. Submit a progress report slide as usual. This is a group assignment; create a group for your team, and one member should submit on behalf of the group.


Your evaluation will be graded on the following rubric:

  • Claim: is the thesis a correct articulation of the project, and does your claim derive from the thesis? (5pt)
  • Evaluation design: does the design of the evaluation correctly evaluate the thesis and follow through on the claim? (5pt)
  • Description: does the writeup clearly and correctly describe the design to an expert in the area? (5pt)

Your project report will be graded on the following rubric:

  • Vector: Did you select a vector that captures the main source of uncertainty in your project right now? (5pt)
  • Plan: Did you create a plan that should reduce uncertainty in that vector? Is that plan achievable in a week? (5pt)
  • Velocity: Did you make reasonable progress on your plan? We assume that plans may need to evolve as the week goes and you may need to re-vector midweek. We're not grading on whether you stuck exactly to the plan, but on whether you maintained high velocity on your project. (5pt)

Remember that the progress report is scaled differently than the evaluation in the overall class grades.