‘Knowledge Transfer From High-Resource to Low-Resource Programming Languages for Code LLMs’

“Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as a building block for research in programming languages and software engineering. However, the quality of code produced by a Code LLM varies significantly by programming languages. … This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach generates high-quality datasets for low-resource languages, which can then be used to fine-tune any pretrained Code LLM.”

Find the paper and list of authors at ArXiv.

View on Site: ‘Knowledge Transfer From High-Resource to Low-Resource Programming Languages for Code LLMs’
,