‘USB: A Unified Summarization Benchmark Across Tasks and Domains’

“An abundance of datasets exist for training and evaluating models on the task of summary generation. However, these datasets are often derived heuristically, and lack sufficient annotations to support research into all aspects of summarization. … We introduce a benchmark comprising 8 tasks that require multi-dimensional understanding of summarization, e.g., surfacing evidence for a summary, assessing its correctness, and gauging its relevance to different topics. We compare various methods on this benchmark and discover that on multiple tasks, moderately-sized fine-tuned models consistently outperform much larger few-shot prompted language models.”

Find the paper and the full list of authors at ArXiv.

View on Site: ‘USB: A Unified Summarization Benchmark Across Tasks and Domains’