All Work
Title
Topic
-
‘Fast Optimal Locally Private Mean Estimation via Random Projections’
“We study the problem of locally private mean estimation of high-dimensional vectors in the Euclidean ball. Existing algorithms for this problem either incur sub-optimal error or have high communication and/or run-time complexity. We propose a new algorithmic framework, ProjUnit, for private mean estimation that yields algorithms that are computationally efficient, have low communication complexity, and incur optimal error up to a 1+o(1)-factor. Our framework is deceptively simple: each randomizer projects its input to a random low-dimensional subspace, normalizes the result, and then runs an optimal algorithm.” Find the paper and the full list of authors at ArXiv.
-
‘Online and Streaming Algorithms for Constrained k-Submodular Maximization’
“Constrained k-submodular maximization is a general framework that captures many discrete optimization problems such as ad allocation, influence maximization, personalized recommendation, and many others. In many of these applications, datasets are large or decisions need to be made in an online manner, which motivates the development of efficient streaming and online algorithms. In this work, we develop single-pass streaming and online algorithms for constrained k-submodular maximization with both monotone and general (possibly non-monotone) objectives subject to cardinality and knapsack constraints.” Find the paper and the full list of authors at ArXiv.
-
‘Poisoning Network Flow Classifiers’
“As machine learning (ML) classifiers increasingly oversee the automated monitoring of network traffic, studying their resilience against adversarial attacks becomes critical. This paper focuses on poisoning attacks, specifically backdoor attacks, against network traffic flow classifiers. We investigate the challenging scenario of clean-label poisoning where the adversary’s capabilities are constrained to tampering only with the training data—without the ability to arbitrarily modify the training labels or any other component of the training process.” Find the paper and the full list of authors at ArXiv.
-
‘Unleashing the Power of Randomization in Auditing Differentially Private ML’
“We present a rigorous methodology for auditing differentially private machine learning algorithms by adding multiple carefully designed examples called canaries. … First, we introduce Lifted Differential Privacy (LiDP) that expands the definition of differential privacy to handle randomized datasets. … Second, we audit LiDP by trying to distinguish between the model trained with K canaries versus K−1 canaries in the dataset, leaving one canary out. … Third, we introduce novel confidence intervals that take advantage of the multiple test statistics by adapting to the empirical higher-order correlations. Together, this new recipe demonstrates significant improvements in sample complexity.” Find the paper and the full list…
-
‘Freaky Leaky SMS: Extracting User Locations by Analyzing SMS Timings’
“Short Message Service (SMS) remains one of the most popular communication channels since its introduction. … In this paper, we demonstrate that merely receiving silent SMS messages regularly opens a stealthy side-channel that allows other regular network users to infer the whereabouts of the SMS recipient. The core idea is that receiving an SMS inevitably generates Delivery Reports whose reception bestows a timing attack vector at the sender. We conducted experiments across various countries, operators, and devices to show that an attacker can deduce the location of an SMS recipient.” Find the paper and the full list of authors at ArXiv.
-
‘How Creative Versus Technical Constraints Affect Individual Learning in an Online Innovation Community’
“Online innovation communities allow for a search for novel solutions within a design space bounded by constraints. Past research has focused on the effect of creative constraints on individual projects, but less is known about how constraints affect learning from repeated design submissions and the effect of the technical constraints that are integral to online platforms. How do creative versus technical constraints affect individual learning in exploring a design space in online communities? … We find that creative constraints lead to high rates of learning only if technical constraints are sufficiently relaxed.” Find the paper and full list of authors…
-
‘Homo in Machina: Improving Fuzz Testing Coverage via Compartment Analysis’
“Fuzz testing is often automated, but also frequently augmented by experts who insert themselves into the workflow in a greedy search for bugs. In this paper, we propose Homo in Machina, or HM-fuzzing, in which analyses guide the manual efforts, maximizing benefit. As one example … we introduce compartment analysis. Compartment analysis uses a whole-program dominator analysis to estimate the utility of reaching new code, and combines this with a dynamic analysis indicating drastically under-covered edges guarding that code.” Find the paper and full list of authors in the proceedings of the 2023 IEEE Conference on Software Testing, Verification and…
-
‘Incremental Non-Gaussian Inference for SLAM Using Normalizing Flows’
“This article presents normalizing flows for incremental smoothing and mapping (NF-iSAM), a novel algorithm for inferring the full posterior distribution in SLAM problems with nonlinear measurement models and non-Gaussian factors. NF-iSAM exploits the expressive power of neural networks, and trains normalizing flows to model and sample the full posterior. By leveraging the Bayes tree, NF-iSAM enables efficient incremental updates similar to iSAM2, albeit in the more challenging non-Gaussian setting.” Find the paper and the full list of authors in the IEEE Transactions on Robotics.
-
‘A Variational Perspective on Generative Flow Networks’
“Generative flow networks (GFNs) are a class of probabilistic models for sequential sampling of composite objects, proportional to a target distribution that is defined in terms of an energy function or a reward. GFNs are typically trained using a flow matching or trajectory balance objective, which matches forward and backward transition models over trajectories. … We introduce a variational objective for training GFNs, which is a convex combination of the reverse- and forward KL divergences, and compare it to the trajectory balance objective when sampling from the forward- and backward model, respectively.” Find the paper and full list of authors…
-
‘String Diagrams with Factorized Densities’
“A growing body of research on probabilistic programs and causal models has highlighted the need to reason compositionally about model classes that extend directed graphical models. Both probabilistic programs and causal models define a joint probability density over a set of random variables, and exhibit sparse structure that can be used to reason about causation and conditional independence. This work builds on recent work on Markov categories of probabilistic mappings to define a category whose morphisms combine a joint density … with a deterministic mapping from samples to return values.” Find the paper and the full list of authors at ArXiv.
-
‘Efficient Computation of Quantiles over Joins’
“We present efficient algorithms for Quantile Join Queries, abbreviated as %JQ. … Our goal is to avoid materializing the set of all join answers, and to achieve quasilinear time in the size of the database, regardless of the total number of answers. Even for basic ranking functions beyond sum, such as min or max over different attributes, so far it is not known whether there is any nontrivial tractable %JQ. In this work, we develop a new approach to solving %JQ.” Find the paper and the full list of authors at ArXiv.
-
‘Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries’
“We study the question of when we can provide direct access to the k-th answer to a Conjunctive Query (CQ) according to a specified order over the answers in time logarithmic in the size of the database, following a preprocessing step that constructs a data structure in time quasilinear in database size. Specifically, we embark on the challenge of identifying the tractable answer orderings, that is, those orders that allow for such complexity guarantees.” Find the paper and the full list of authors in the ACM Transactions on Database Systems.
-
‘FibeRed: Fiberwise Dimensionality Reduction of Topologically Complex Data with Vector Bundles’
“Datasets with non-trivial large scale topology can be hard to embed in low-dimensional Euclidean space with existing dimensionality reduction algorithms. We propose to model topologically complex datasets using vector bundles, in such a way that the base space accounts for the large scale topology, while the fibers account for the local geometry. This allows one to reduce the dimensionality of the fibers, while preserving the large scale topology. We formalize this point of view and … describe a dimensionality reduction algorithm based on topological inference for vector bundles.” Find the paper and full list of authors at Dagstuhl Research Online.
-
‘Toroidal Coordinates: Decorrelating Circular Coordinates with Lattice Reduction’
“The circular coordinates algorithm of de Silva, Morozov, and Vejdemo-Johansson takes as input a dataset together with a cohomology class representing a 1-dimensional hole in the data; the output is a map from the data into the circle that captures this hole, and that is of minimum energy in a suitable sense. However, when applied to several cohomology classes, the output circle-valued maps can be ‘geometrically correlated’ even if the chosen cohomology classes are linearly independent. … We identify a formal notion of geometric correlation between circle-valued maps.” Find the paper and full list of authors at Dagstuhl Research Online.
-
‘Topological Data Analysis of Electroencephalogram Signals for Pediatric Obstructive Sleep Apnea’
“Topological data analysis (TDA) is an emerging technique for biological signal processing. TDA leverages the invariant topological features of signals in a metric space for robust analysis of signals even in the presence of noise. In this paper, we leverage TDA on brain connectivity networks derived from electroencephalogram (EEG) signals to identify statistical differences between pediatric patients with obstructive sleep apnea (OSA) and pediatric patients without OSA … and show that TDA enables us to see a statistical difference between the brain dynamics of the two groups.” Find the paper and the full list of authors at ArXiv.
-
‘A Video-Based End-to-End Pipeline for Non-Nutritive Sucking Action Recognition and Segmentation in Young Infants’
“We present an end-to-end computer vision pipeline to detect non-nutritive sucking (NNS)—an infant sucking pattern with no nutrition delivered—as a potential biomarker for developmental delays, using off-the-shelf baby monitor video footage. One barrier to clinical (or algorithmic) assessment of NNS stems from its sparsity, requiring experts to wade through hours of footage to find minutes of relevant activity.” Find the paper and the full list of authors at ArXiv.
-
‘StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code’
“Code LLMs are being rapidly deployed and there is evidence that they can make professional programmers more productive. Current benchmarks for code generation measure whether models generate correct programs given an expert prompt. In this paper, we present a new benchmark containing multiple prompts per problem, written by a specific population of non-expert prompters: beginning programmers. StudentEval contains 1,749 prompts for 48 problems, written by 80 students who have only completed one semester of Python programming.” Find the paper and the full list of authors at ArXiv.
-
‘StarCoder: May the Source Be with You!’
“The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase. … StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages.” Find the paper and the full list of authors at ArXiv.
-
‘Pretraining Language Models with Human Preferences’
“Language models (LMs) are pretrained to imitate internet text, including content that would violate human preferences if generated by an LM: falsehoods, offensive comments, personally identifiable information, low-quality or buggy code, and more. Here, we explore alternative objectives for pretraining LMs in a way that also guides them to generate text aligned with human preferences. We benchmark five objectives for pretraining with human feedback across three tasks and study how they affect the trade-off between alignment and capabilities of pretrained LMs.” Read the paper and see the full list of authors in ArXiv.
-
Dean Abowd receives lifetime achievement award for work in human-computer interaction
Dean Gregory Abowd received the Lifetime Achievement Award from SIGCHI, the premier conference on human-computer interaction. He later re-presented his award acceptance speech at an event on Northeastern’s Boston campus. Abowd hopes “to inspire others to dispel fear of the unknown and unlock their potential,” he says in the presentation abstract. “Life, like research, is best when shared with others whom you can respect and befriend.” Find the recorded speech on YouTube.
-
‘Problematic Advertising and its Disparate Exposure on Facebook’
“Targeted advertising remains an important part of the free web browsing experience. … However, given the wide use of advertising, this also enables using ads as a vehicle for problematic content, such as scams or clickbait. … In this paper, we study Facebook—one of the internet’s largest ad platforms—and investigate key gaps in our understanding of problematic online advertising. … We categorize over 32,000 ads collected from this panel (n=132); and survey participants’ sentiments toward their own ads to identify four categories of problematic ads.” Read the paper and see the full list of authors in ArXiv.
-
‘Genome-Wide Phage Susceptibility Analysis in Acinetobacter Baumannii Reveals Capsule Modulation Strategies’
“Phage have gained renewed interest as an adjunctive treatment for life-threatening infections with the resistant nosocomial pathogen Acinetobacter baumannii. Our understanding of how A. baumannii defends against phage remains limited. … We identified 41 candidate loci that increase susceptibility to Loki when disrupted, and 10 that decrease susceptibility. Combined with spontaneous resistance mapping, our results support the model that Loki uses the K3 capsule as an essential receptor, and that capsule modulation provides A. baumannii with strategies to control vulnerability to phage.” Find the paper and the full list of authors at PLOS Pathogens.
-
‘Dose-dependent effects of GAT107…: A BOLD phMRI and connectivity study on awake rats’
“Alpha 7 nicotinic acetylcholine receptor (α7nAChR) agonists have been developed to treat schizophrenia but failed in clinical trials due to rapid desensitization. GAT107, a type 2 allosteric agonist-positive allosteric modulator (ago-PAM) to the α7 nAChR was designed to activate the α7 nAChR while reducing desensitization. We hypothesized GAT107 would alter brain neural circuitry associated with cognition, emotion, and sensory perception.Methods: The present study used pharmacological magnetic resonance imaging (phMRI) to evaluate the dose-dependent effect of GAT107 on brain activity in awake male rats.” Read the paper and see the full list of authors at Frontiers in Neuroscience.
-
Biodegradable nanogenerators lead to less electronic waste
In “Aligned PLLA electrospun fibres based biodegradable triboelectric nanogenerator,” the authors present a new construction method for potential components in energy harvesters. These energy harvesters—like solar cells—”are not always developed using sustainable materials.” Creating components that are biodegradable could make these devices more environmentally friendly. “The presented approach,” the authors argue, “can provide attractive green energy harvesting machine to power portable devices at a large scale—without having to worry about the end-of-life electronic waste management.” Find the full list of authors in Nano Energy.
-
‘MSNetViews: Geographically Distributed Management of Enterprise Network Security Policy’
“Commercially-available software defined networking (SDN) technologies will play an important role in protecting the on-premises resources that remain as enterprises transition to zero trust architectures. However, existing solutions assume the entire network resides in a single geographic location, requiring organizations with multiple sites to manually ensure consistency of security policy across all sites. In this paper, we present MSNetViews, which extends a single, globally-defined and managed, enterprise network security policy to many geographically distributed sites.” Read the paper and see the full list of authors at the Association for Computing Machinery.
-
Using ‘recycled plastic as a building material’ on exhibit at 2023 Venice Biennale
Assistant professor of architecture Ang Li exhibited work at the U.S. Pavilion of the 2023 Venice Biennale. Li’s work conducts “investigations into the use of recycled plastic as a building material and structural system.” Read more about Li’s work and the other invited artists at Archinect.
-
‘Why We Should Care About Moral Foundations When Preparing for the Next Pandemic: Insights From Canada, the UK and the US’
“Health behaviors that do not effectively prevent disease can negatively impact psychological wellbeing and potentially drain motivations to engage in more effective behavior, potentially creating higher health risk. Despite this, studies linking “moral foundations” (i.e., concerns about harm, fairness, purity, authority, ingroup, and/or liberty) to health behaviors have generally been limited to a narrow range of behaviors, specifically effective ones. We therefore explored the degree to which moral foundations predicted a wider range of not only effective but ineffective (overreactive) preventative behaviors during the COVID-19 pandemic.” Read the paper and see the full list of authors at PLOS ONE.
-
‘Resting-State MRI Functional Connectivity as a Neural Correlate of Multidomain Lifestyle Adherence’
“Prior research has demonstrated the importance of a healthy lifestyle to protect brain health and diminish dementia risk in later life. While a multidomain lifestyle provides an ecological perspective to voluntary engagement, its association with brain health is still under-investigated. … Self-reported exercise engagement, cognitive activity engagement, healthy diet adherence, and social activity engagement were included to examine potential phenotypes of an individual’s lifestyle adherence. … The features that show consistently high importance to the classification model were functional connectivity mainly between nodes located in different prior-defined functional networks.” Find the paper and the full list of authors in Scientific…
-
Closas receives Best Paper in Track Award for work on signal jamming
“Associate Professor Pau Closas received the Best Paper in Track Award at the 2023 IEEE/ION Position, Location and Navigation Symposium (PLANS) for the work ‘Jammer Classification with Federated Learning,’ with electrical engineering students Peng Wu and Helena Calatrava, and Associate Research Scientist Tales Imbiriba.”