“Software developers typically rely upon a large network of dependencies to build their applications. … However, prior work on NPM dataset construction typically has two limitations: 1) only metadata is scraped, and 2) packages or versions that are deleted from NPM can not be scraped. … We present npm-follower, a dataset and crawling architecture which archives metadata and code of all packages and versions as they are published and is thus able to retain data which is later deleted.”
Find the paper and full list of authors at ArXiv.