Overcoming Challenges in Automated Grading Systems

autograder l.w
1 / 50
Embed
Share

Explore the drawbacks of traditional test-based feedback systems and the scalability challenges in grading large groups of students. Learn how constraint-based synthesis can enhance the autograding process, making it more efficient and effective for both students and instructors.

  • Automated Grading
  • Feedback Systems
  • Scalability
  • Constraint-Based Synthesis

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Autograder RISHABH SINGH, SUMIT GULWANI, ARMANDO SOLAR-LEZAMA

  2. Test-cases based feedback Hard to relate failing inputs to errors Manual feedback by TAs Time consuming and error prone Feedback on Programming Assignments

  3. "Not only did it take 1-2 weeks to grade problem, but the comments were entirely unhelpful in actually helping us fix our errors. . Apparently they don't read the code -- they just ran their tests and docked points mercilessly. What if I just had a simple typo, but my algorithm was fine? ...." 6.00 Student Feedback (2013)

  4. Scalability Challenges (>100k students) Bigger Challenge in MOOCs

  5. Todays Grading Workflow def computeDeriv(poly): deriv = [] zero = 0 if (len(poly) == 1): return deriv for e in range(0, len(poly)): if (poly[e] == 0): zero += 1 else: deriv.append(poly[e]*e) return deriv def computeDeriv(poly): deriv = [] zero = 0 if (len(poly) == 1): return deriv for e in range(0, len(poly)): if (poly[e] == 0): zero += 1 else: deriv.append(poly[e]*e) return deriv replace derive by [0] Grading Rubric Teacher s Solution

  6. Autograder Workflow def computeDeriv(poly): deriv = [] zero = 0 if (len(poly) == 1): return deriv for e in range(0, len(poly)): if (poly[e] == 0): zero += 1 else: deriv.append(poly[e]*e) return deriv def computeDeriv(poly): deriv = [] zero = 0 if (len(poly) == 1): return deriv for e in range(0, len(poly)): if (poly[e] == 0): zero += 1 else: deriv.append(poly[e]*e) return deriv replace derive by [0] Error Model Teacher s Solution

  7. Technical Challenges Large space of possible corrections Minimal corrections Dynamically-typed language Constraint-based Synthesis to the rescue

  8. Running Example

  9. computeDeriv Compute the derivative of a polynomial poly = [10, 8, 2] #f(x) = 10 + 8x +2x2 => [8, 4] #f (x) = 8 + 4x

  10. Teachers solution def computeDeriv(poly): result = [] if len(poly) == 1: return [0] for i in range(1, len(poly)): result += [i * poly[i]] return result

  11. Demo

  12. Simplified Error Model return a return {[0],?a} range(a1, a2) range(a1+1,a2) a0 == a1 False

  13. Autograder Algorithm

  14. Algorithm Rewriter Translator . ?? Solver Feedback .out ---- ---- .py .sk

  15. Algorithm: Rewriter Rewriter Translator Solver Feedback .py . ??

  16. Rewriting using Error Model range(0, len(poly)) range({0 ,1}, len(poly)) default choice a a+1

  17. Rewriting using Error Model range(0, len(poly)) range({0 ,1}, len(poly)) a a+1

  18. Rewriting using Error Model range(0, len(poly)) range({0 ,1}, len({poly, poly+1})) a a+1

  19. Rewriting using Error Model range(0, len(poly)) range({0 ,1}, {len({poly, poly+1}), len({poly, poly+1})+1}) a a+1

  20. Rewriting using Error Model ( ??) def computeDeriv(poly): deriv = [] zero = 0 if ({len(poly) == 1, False}): Problem: Find a program that minimizes cost metric and is functionally equivalent with teacher s solution return {deriv,[0]} for e in range({0,1}, len(poly)): if (poly[e] == 0): zero += 1 else: deriv.append(poly[e]*e) return {deriv,[0]}

  21. Algorithm: Translator Rewriter Translator . ?? Solver Feedback .sk

  22. Sketch [Solar-Lezama et al. ASPLOS06] void main(int x){ int k = ??; assert x + x == k * x; } void main(int x){ int k = 2; assert x + x == k * x; } Statically typed C-like language with holes

  23. ??Translation to Sketch (1) Handling python s dynamic types (2) Translation of expression choices

  24. (1) Handling Dynamic Types struct MultiType{ int type; int ival; bit bval; MTString str; MTList lst; MTDict dict; MTTuple tup; } ival bval bool int Type lst list

  25. Python Constants using MultiType - bool 2 - bool 1 1 2 int int INT INT - - - - list list - - [1,2] bool int LIST - len=2, lVals = { *, * } list

  26. Python Exprs using MultiType x + y 5 10 int 15 int - - - int bool bool bool INT INT INT - - - - - - list list list

  27. Python Exprs using MultiType x + y - - - - - - int bool int int bool bool LIST LIST LIST [1,2,3] list [1,2] list [3] list - - -

  28. Python Expressions using MultiType x + y Typing rules are encoded as constraints 5 - - - int bool int bool LIST INT [3] list - - - list

  29. (2) Translation of Expression Choices { , } MultiType modifyMTi( , ){ if(??) return else choicei = True return } - - - - - - - - - - - - - - - - - // default choice - - - // non-default choice - - - -

  30. Translation to Sketch (Final Step) harness main(int N, int[N] poly){ MultiType polyMT = MTFromList(poly); MultiType result1 = computeDeriv_teacher(polyMT); MultiType result2 = computeDeriv_student(polyMT); assert MTEquals(result1,result2); }

  31. Translation to Sketch (Final Step) harness main(int N, int[N] poly){ totalCost = 0; MultiType polyMT = MTFromList(poly); if(choicek) totalCost++; assert MTEquals(result1,result2); minimize(totalCost); } MultiType result1 = computeDeriv_teacher(polyMT); MultiType result2 = computeDeriv_student(polyMT); Minimum Changes

  32. Algorithm: Solver Rewriter Translator Solver Feedback .out .sk

  33. Solving for minimize(x) Sketch Uses CEGIS multiple SAT calls Binary search for x no reuse of learnt clauses MAX-SAT too much overhead Incremental linear search reuse learnt clauses

  34. Incremental Solving for minimize(x) (P,x) (P1,x=7) Sketch {x<7} Sketch {x<4} UNSAT (P2,x=4) Sketch

  35. Algorithm: Feedback Rewriter Translator Solver Feedback .out ---- ----

  36. Feedback Generation Correction rules associated with Feedback Template Extract synthesizer choices to fill templates

  37. Evaluation

  38. Autograder Tool for Python Currently supports: - Integers, Bool, Strings, Lists, Dictionary, Tuples - Closures, limited higher-order fn, list comprehensions

  39. Benchmarks Exercises from first five weeks of 6.00x and 6.00 int: prodBySum, compBal, iterPower, recurPower, iterGCD tuple: oddTuple list: compDeriv, evalPoly string: hangman1, hangman2 arrays(C#): APCS dynamic programming (Pex4Fun)

  40. Benchmark evalPoly-6.00 compBal-stdin-6.00 compDeriv-6.00 hangman2-6.00x prodBySum-6.00 oddTuples-6.00 hangman1-6.00x evalPoly-6.00x compDeriv-6.00x oddTuples-6.00x iterPower-6.00x recurPower-6.00x iterGCD-6.00x Test Set 13 52 103 218 268 344 351 541 918 1756 2875 2938 2988

  41. 35 30 25 Time (in s) 20 15 10 5 0 Average Running Time (in s)

  42. 90.00% 80.00% Feedback Percentage 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% Feedback Generated (Percentage)

  43. 90.00% 80.00% Feedback Percentage 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% Feedback Generated (Percentage)

  44. 90.00% 80.00% Feedback Percentage 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% Feedback Generated (Percentage)

  45. 90.00% 80.00% Feedback Percentage 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% Feedback Generated (Percentage)

  46. Generated Feedback 8579 TestSet 13365 Percentage 64.19% AvgTime(s) 9.91 Average Performance

  47. Why low % in some cases? Completely Incorrect Solutions Unimplemented Python Features Timeout comp-bal-6.00 Big Conceptual errors

  48. Big Error: Misunderstanding APIs eval-poly-6.00x def evaluatePoly(poly, x): result = 0 for i in list(poly): result += i * x ** poly.index(i) return result

  49. Big Error: Misunderstanding Spec hangman2-6.00x def getGuessedWord(secretWord, lettersGuessed): for letter in lettersGuessed: secretWord = secretWord.replace(letter, _ ) return secretWord

  50. A technique for automated feedback generation Error Models, Constraint-based synthesis Provide a basis for automated feedback for MOOCs Towards building a Python Tutoring System rishabh@csail.mit.edu Thanks! Conclusion

Related


More Related Content