‘MultiPL-E: A Scalable and Polyglot Approach to Benchmarking Neural Code Generation’

“Large language models have demonstrated the ability to generate both natural language and programming language text. Although contemporary code generation models are trained on corpora with several programming languages, they are tested using benchmarks that are typically monolingual. The most widely used code generation benchmarks only target Python, so there is little quantitative evidence of how code generation models perform on other programming languages. We propose MultiPL-E, a system for translating unit test-driven code generation benchmarks to new languages.”

Find the paper and full list of authors at IEEE Transactions on Software Engineering.

View on Site: ‘MultiPL-E: A Scalable and Polyglot Approach to Benchmarking Neural Code Generation’