‘StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code’

“Code LLMs are being rapidly deployed and there is evidence that they can make professional programmers more productive. Current benchmarks for code generation measure whether models generate correct programs given an expert prompt. In this paper, we present a new benchmark containing multiple prompts per problem, written by a specific population of non-expert prompters: beginning programmers. StudentEval contains 1,749 prompts for 48 problems, written by 80 students who have only completed one semester of Python programming.”

Find the paper and the full list of authors at ArXiv.

View on Site: ‘StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code’
,