All Work
Title
Topic

‘How to Combine MembershipInference Attacks on Multiple Updated Machine Learning Models’
“A large body of research has shown that machine learning models are vulnerable to membership inference (MI) attacks that violate the privacy of the participants in the training data. Most MI research focuses on the case of a single standalone model, while production machinelearning platforms often update models over time, on data that often shifts in distribution, giving the attacker more information. This paper proposes new attacks that take advantage of one or more model updates to improve MI.” Find the paper and the full list of authors in the Proceedings on Privacy Enhancing Technologies Symposium.

‘TMI! Finetuned Models Leak Private Information from their Pretraining Data’
“Transfer learning has become an increasingly popular technique in machine learning as a way to leverage a pretrained model … to assist with building a finetuned model. … There are reasons to believe that the data used for pretraining is still sensitive, making it essential to understand how much information the finetuned model leaks about the pretraining data. In this work we propose a new membershipinference threat model where the adversary only has access to the finetuned model and would like to infer the membership of the pretraining data.” Find the paper and full list of authors at ArXiv.

‘Differentially Private Medians and Interior Points for NonPathological Data’
“We construct differentially private estimators with low sample complexity that estimate the median of an arbitrary distribution over ℝ satisfying very mild moment conditions. Our result stands in contrast to the surprising negative result of Bun et al. (FOCS 2015) that showed there is no differentially private estimator with any finite sample complexity that returns any nontrivial approximation to the median of an arbitrary distribution.” Find the paper and the full list of authors at ArXiv.

‘DIALITE: Discover, Align and Integrate Open Data Tables’
“We demonstrate a novel table discovery pipeline called DIALITE that allows users to discover, integrate and analyze open data tables. DIALITE has three main stages. First, it allows users to discover tables from open data platforms using stateoftheart table discovery techniques. Second, DIALITE integrates the discovered tables to produce an integrated table. Finally, it allows users to analyze the integration result by applying different downstreaming tasks over it. Our pipeline is flexible such that the user can easily add and compare additional discovery and integration algorithms.” Find the paper and the full list of authors at ArXiv.

‘A Statistical Approach for Finding PropertyAccess Errors’
“We study the problem of finding incorrect property accesses in JavaScript where objects do not have a fixed layout, and properties (including methods) can be added, overwritten, and deleted freely throughout the lifetime of an object. Since referencing a nonexistent property is not an error in JavaScript, accidental accesses to nonexistent properties … can go undetected without thorough testing, and may manifest far from the source of the problem. We propose a twophase approach for detecting property access errors based on the observation that, in practice, most property accesses will be correct.” Find the paper and full list of authors…

‘Safe Environmental Envelopes of Discrete Systems’
“A safety verification task involves verifying a system against a desired safety property under certain assumptions about the environment. However, these environmental assumptions may occasionally be violated due to modeling errors or faults. Ideally, the system guarantees its critical properties even under some of these violations, i.e., the system is robust against environmental deviations. This paper proposes a notion of robustness as an explicit, firstclass property of a transition system that captures how robust it is against possible deviations in the environment.” Find the paper and the full list of authors at ArXiv.

Huselid named SIOP and NAHR fellow for ‘highest level of achievement in the human resources profession’
“D’AmoreMcKim School of Business Distinguished Professor Mark Huselid was recently named a fellow by the Society for Industrial and Organizational Psychology (SIOP) and the National Academy of Human Resources (NAHR). SIOP fellows must have made significant and sustained contributions that have shaped the field of IndustrialOrganizational (IO) psychology–the scientific study of working and the application of that science to workplace issues facing individuals, teams, and organizations. Huselid joins D’AmoreMcKim professors Paula Caligiuri and Cynthia Lee who are also SIOP fellows.” NAHR fellowship, which “he was also recently inducted as … is regarded as the most prestigious honor in the human resources field.” h

Hoitash and Pei win 2023 Ronald Copeland Best Paper Awards
“Udi Hoitash, Lilian L. and Harry A. Cowan Professor of Accounting, and Amy Pei, Assistant Professor, Marketing, have authored papers awarded the 2023 Copeland Best Paper Awards. Each Group Research Committee nominated a paper published in 2022 by a faculty member in the Group to be considered for a Ronald Copeland Best Paper Award. Two papers were selected as this year’s winners as recommended by the DMSB Research Committee and confirmed by Dean Emery Trahan. Each D’AmoreMcKim author will receive a $1,000 stipend.”

Sanders awarded Sushil K. Gupta Distinguished Service Award for contributions to ‘premier professional society’
Distinguished professor Nada Sanders has been awarded the Sushil K. Gupta Distinguished Service Award for her contributions to the Production and Operations Management Society, which she described as “our premier professional society” in a LinkedIn post.

‘Mississippi River LowFlows: Context, Causes and Future Projections’
“The Mississippi River represents a major commercial waterway, and periods of anomalously low river levels disrupt riverine transport. These lowflow events occur periodically, with a recent event in the fall of 2022 slowing barge traffic and generating sharp increases in riverine transportation costs. Here we … evaluate historical trends and future projections. … Model simulations from the LENS2 dataset show that, under a moderatehigh emissions scenario (SSP37.0), the severity and duration of lowflow events is projected to decrease through to the end of the 21st century. ” Find the paper and the full list of authors in Environmental Research: Climate.

‘JustinTime for Supply Chains in Turbulent Times’
“The Covid19 pandemic and other recent disruptions in the early 2020s led to sections in the business press blaming justintime (JIT) practices for operational failings. … Some scholars argue that JIT is not resilient, while others maintain that JIT can continue providing superior performance even with disruptions. Motivated by this debate, we discuss various misconceptions about JIT that underlie this debate … and argue that companies can improve their supply chain performance if JIT supply chain segments are chosen fittingly—even more so—during disruptions.” Find the paper and the full list of authors at Production and Operations Management.

‘Why Not Yet: Fixing a Topk Ranking That is Not Fair to Individuals’
“This work considers whynot questions in the context of topk queries and scorebased ranking functions. Following the popular linear scalarization approach for multiobjective optimization, we study rankings based on the weighted sum of multiple scores. A given weight choice may be controversial or perceived as unfair. … We introduce various notions of such whynotyet queries and formally define them as satisfiability or optimization problems, whose goal is to propose alternative ranking functions that address the placement of the entities of interest.” Find the paper and the full list of authors at the Association for Computing Machinery.

‘Fast Optimal Locally Private Mean Estimation via Random Projections’
“We study the problem of locally private mean estimation of highdimensional vectors in the Euclidean ball. Existing algorithms for this problem either incur suboptimal error or have high communication and/or runtime complexity. We propose a new algorithmic framework, ProjUnit, for private mean estimation that yields algorithms that are computationally efficient, have low communication complexity, and incur optimal error up to a 1+o(1)factor. Our framework is deceptively simple: each randomizer projects its input to a random lowdimensional subspace, normalizes the result, and then runs an optimal algorithm.” Find the paper and the full list of authors at ArXiv.

‘Online and Streaming Algorithms for Constrained kSubmodular Maximization’
“Constrained ksubmodular maximization is a general framework that captures many discrete optimization problems such as ad allocation, influence maximization, personalized recommendation, and many others. In many of these applications, datasets are large or decisions need to be made in an online manner, which motivates the development of efficient streaming and online algorithms. In this work, we develop singlepass streaming and online algorithms for constrained ksubmodular maximization with both monotone and general (possibly nonmonotone) objectives subject to cardinality and knapsack constraints.” Find the paper and the full list of authors at ArXiv.

‘Poisoning Network Flow Classifiers’
“As machine learning (ML) classifiers increasingly oversee the automated monitoring of network traffic, studying their resilience against adversarial attacks becomes critical. This paper focuses on poisoning attacks, specifically backdoor attacks, against network traffic flow classifiers. We investigate the challenging scenario of cleanlabel poisoning where the adversary’s capabilities are constrained to tampering only with the training data—without the ability to arbitrarily modify the training labels or any other component of the training process.” Find the paper and the full list of authors at ArXiv.

‘Unleashing the Power of Randomization in Auditing Differentially Private ML’
“We present a rigorous methodology for auditing differentially private machine learning algorithms by adding multiple carefully designed examples called canaries. … First, we introduce Lifted Differential Privacy (LiDP) that expands the definition of differential privacy to handle randomized datasets. … Second, we audit LiDP by trying to distinguish between the model trained with K canaries versus K−1 canaries in the dataset, leaving one canary out. … Third, we introduce novel confidence intervals that take advantage of the multiple test statistics by adapting to the empirical higherorder correlations. Together, this new recipe demonstrates significant improvements in sample complexity.” Find the paper and the full list…

‘Freaky Leaky SMS: Extracting User Locations by Analyzing SMS Timings’
“Short Message Service (SMS) remains one of the most popular communication channels since its introduction. … In this paper, we demonstrate that merely receiving silent SMS messages regularly opens a stealthy sidechannel that allows other regular network users to infer the whereabouts of the SMS recipient. The core idea is that receiving an SMS inevitably generates Delivery Reports whose reception bestows a timing attack vector at the sender. We conducted experiments across various countries, operators, and devices to show that an attacker can deduce the location of an SMS recipient.” Find the paper and the full list of authors at ArXiv.

‘How Creative Versus Technical Constraints Affect Individual Learning in an Online Innovation Community’
“Online innovation communities allow for a search for novel solutions within a design space bounded by constraints. Past research has focused on the effect of creative constraints on individual projects, but less is known about how constraints affect learning from repeated design submissions and the effect of the technical constraints that are integral to online platforms. How do creative versus technical constraints affect individual learning in exploring a design space in online communities? … We find that creative constraints lead to high rates of learning only if technical constraints are sufficiently relaxed.” Find the paper and full list of authors…

‘Homo in Machina: Improving Fuzz Testing Coverage via Compartment Analysis’
“Fuzz testing is often automated, but also frequently augmented by experts who insert themselves into the workflow in a greedy search for bugs. In this paper, we propose Homo in Machina, or HMfuzzing, in which analyses guide the manual efforts, maximizing benefit. As one example … we introduce compartment analysis. Compartment analysis uses a wholeprogram dominator analysis to estimate the utility of reaching new code, and combines this with a dynamic analysis indicating drastically undercovered edges guarding that code.” Find the paper and full list of authors in the proceedings of the 2023 IEEE Conference on Software Testing, Verification and…

‘Incremental NonGaussian Inference for SLAM Using Normalizing Flows’
“This article presents normalizing flows for incremental smoothing and mapping (NFiSAM), a novel algorithm for inferring the full posterior distribution in SLAM problems with nonlinear measurement models and nonGaussian factors. NFiSAM exploits the expressive power of neural networks, and trains normalizing flows to model and sample the full posterior. By leveraging the Bayes tree, NFiSAM enables efficient incremental updates similar to iSAM2, albeit in the more challenging nonGaussian setting.” Find the paper and the full list of authors in the IEEE Transactions on Robotics.

‘A Variational Perspective on Generative Flow Networks’
“Generative flow networks (GFNs) are a class of probabilistic models for sequential sampling of composite objects, proportional to a target distribution that is defined in terms of an energy function or a reward. GFNs are typically trained using a flow matching or trajectory balance objective, which matches forward and backward transition models over trajectories. … We introduce a variational objective for training GFNs, which is a convex combination of the reverse and forward KL divergences, and compare it to the trajectory balance objective when sampling from the forward and backward model, respectively.” Find the paper and full list of authors…

‘String Diagrams with Factorized Densities’
“A growing body of research on probabilistic programs and causal models has highlighted the need to reason compositionally about model classes that extend directed graphical models. Both probabilistic programs and causal models define a joint probability density over a set of random variables, and exhibit sparse structure that can be used to reason about causation and conditional independence. This work builds on recent work on Markov categories of probabilistic mappings to define a category whose morphisms combine a joint density … with a deterministic mapping from samples to return values.” Find the paper and the full list of authors at ArXiv.

‘Efficient Computation of Quantiles over Joins’
“We present efficient algorithms for Quantile Join Queries, abbreviated as %JQ. … Our goal is to avoid materializing the set of all join answers, and to achieve quasilinear time in the size of the database, regardless of the total number of answers. Even for basic ranking functions beyond sum, such as min or max over different attributes, so far it is not known whether there is any nontrivial tractable %JQ. In this work, we develop a new approach to solving %JQ.” Find the paper and the full list of authors at ArXiv.

‘Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries’
“We study the question of when we can provide direct access to the kth answer to a Conjunctive Query (CQ) according to a specified order over the answers in time logarithmic in the size of the database, following a preprocessing step that constructs a data structure in time quasilinear in database size. Specifically, we embark on the challenge of identifying the tractable answer orderings, that is, those orders that allow for such complexity guarantees.” Find the paper and the full list of authors in the ACM Transactions on Database Systems.

‘FibeRed: Fiberwise Dimensionality Reduction of Topologically Complex Data with Vector Bundles’
“Datasets with nontrivial large scale topology can be hard to embed in lowdimensional Euclidean space with existing dimensionality reduction algorithms. We propose to model topologically complex datasets using vector bundles, in such a way that the base space accounts for the large scale topology, while the fibers account for the local geometry. This allows one to reduce the dimensionality of the fibers, while preserving the large scale topology. We formalize this point of view and … describe a dimensionality reduction algorithm based on topological inference for vector bundles.” Find the paper and full list of authors at Dagstuhl Research Online.

‘Toroidal Coordinates: Decorrelating Circular Coordinates with Lattice Reduction’
“The circular coordinates algorithm of de Silva, Morozov, and VejdemoJohansson takes as input a dataset together with a cohomology class representing a 1dimensional hole in the data; the output is a map from the data into the circle that captures this hole, and that is of minimum energy in a suitable sense. However, when applied to several cohomology classes, the output circlevalued maps can be ‘geometrically correlated’ even if the chosen cohomology classes are linearly independent. … We identify a formal notion of geometric correlation between circlevalued maps.” Find the paper and full list of authors at Dagstuhl Research Online.

‘Topological Data Analysis of Electroencephalogram Signals for Pediatric Obstructive Sleep Apnea’
“Topological data analysis (TDA) is an emerging technique for biological signal processing. TDA leverages the invariant topological features of signals in a metric space for robust analysis of signals even in the presence of noise. In this paper, we leverage TDA on brain connectivity networks derived from electroencephalogram (EEG) signals to identify statistical differences between pediatric patients with obstructive sleep apnea (OSA) and pediatric patients without OSA … and show that TDA enables us to see a statistical difference between the brain dynamics of the two groups.” Find the paper and the full list of authors at ArXiv.

‘A VideoBased EndtoEnd Pipeline for NonNutritive Sucking Action Recognition and Segmentation in Young Infants’
“We present an endtoend computer vision pipeline to detect nonnutritive sucking (NNS)—an infant sucking pattern with no nutrition delivered—as a potential biomarker for developmental delays, using offtheshelf baby monitor video footage. One barrier to clinical (or algorithmic) assessment of NNS stems from its sparsity, requiring experts to wade through hours of footage to find minutes of relevant activity.” Find the paper and the full list of authors at ArXiv.

‘StudentEval: A Benchmark of StudentWritten Prompts for Large Language Models of Code’
“Code LLMs are being rapidly deployed and there is evidence that they can make professional programmers more productive. Current benchmarks for code generation measure whether models generate correct programs given an expert prompt. In this paper, we present a new benchmark containing multiple prompts per problem, written by a specific population of nonexpert prompters: beginning programmers. StudentEval contains 1,749 prompts for 48 problems, written by 80 students who have only completed one semester of Python programming.” Find the paper and the full list of authors at ArXiv.