Research

Groundbreaking work and published results in peer reviewed journals across disciplines.

Title

Topic

  • ‘RedHOT: A Corpus of Annotated Medical Questions, Experiences and Claims on Social Media’

    ,

    “We present Reddit Health Online Talk (RedHOT), a corpus of 22,000 richly annotated social media posts from Reddit spanning 24 health conditions. … We collect additional granular annotations on identified claims.Specifically, we mark snippets that describe patient Populations, Interventions, and Outcomes (PIO elements) within these. Using this corpus, we introduce the task of retrieving trustworthy evidence relevant to a given claim made on social media. We propose a new method to automatically derive (noisy) supervision for this task which we use to train a dense retrieval model.” Find the paper and full list of authors at ACL Anthology.

    Learn more

  • ‘SemEval-2023 Task 8: Causal Medical Claim Identification and Related PIO Frame Extraction From Social Media Posts’

    ,

    “Identification of medical claims from user-generated text data is an onerous but essential step for various tasks including content moderation, and hypothesis generation. SemEval-2023 Task 8 is an effort towards building those capabilities and motivating further research in this direction. This paper summarizes the details and results of shared task 8 at SemEval-2023 which involved identifying causal medical claims and extracting related Populations, Interventions, and Outcomes (“PIO”) frames from social media (Reddit) text.” Find the paper and full list of authors at ACL Anthology.

    Learn more

  • ‘Speak Much, Remember Little: Cryptography in the Bounded Storage Model, Revisited’

    “The goal of the bounded storage model (BSM) is to construct unconditionally secure cryptographic protocols, by only restricting the storage capacity of the adversary, but otherwise giving it unbounded computational power. Here, we consider a streaming variant of the BSM, where honest parties can stream huge amounts of data to each other so as to overwhelm the adversary’s storage, even while their own storage capacity is significantly smaller than that of the adversary.” Find the paper and full list of authors at Advances in Cryptology—EUROCRYPT 2023.

    Learn more

    ,
  • ‘Exploring the Role of Audio in Video Captioning’

    “Recent focus in video captioning has been on designing architectures that can consume both video and text modalities, and using large-scale video datasets with text transcripts for pre-training, such as HowTo100M. … In this work, we present an audio-visual framework, which aims to fully exploit the potential of the audio modality for captioning. Instead of relying on text transcripts extracted via automatic speech recognition (ASR), we argue that learning with raw audio signals can be more beneficial, as audio has additional information including acoustic events, speaker identity, etc.” Find the paper and full list of authors at ArXiv.

    Learn more

  • ‘Semi-Quantitative Detection of Pseudouridine Modifications and Type I/II Hypermodifications in Human mRNAs … Direct Long-Read Sequencing’

    ,

    “Here, we develop and apply a semi-quantitative method for the high-confidence identification of pseudouridylated sites on mammalian mRNAs via direct long-read nanopore sequencing. A comparative analysis of a modification-free transcriptome reveals that the depth of coverage and specific k-mer sequences are critical parameters for accurate basecalling. By adjusting these parameters for high-confidence U-to-C basecalling errors, we identify many known sites of pseudouridylation and uncover previously unreported uridine-modified sites, many of which fall in k-mers that are known targets of pseudouridine synthases.” Find the paper and full list of authors in Nature Communications.

    Learn more

  • ‘Statistical Detection of Differentially Abundant Proteins in Experiments with Repeated Measures Designs and Isobaric Labeling’

    ,

    “Repeated measures experimental designs, which quantify proteins in biological subjects repeatedly over multiple experimental conditions or times, are commonly used in mass spectrometry-based proteomics. Such designs distinguish the biological variation within and between the subjects and increase the statistical power of detecting within-subject changes in protein abundance. Meanwhile, proteomics experiments increasingly incorporate tandem mass tag (TMT) labeling. … This manuscript proposes a family of linear mixed-effects models for differential analysis of proteomics experiments with repeated measures and TMT multiplexing.” Find the paper and list of authors in the Journal of Proteome Research.

    Learn more

    , ,
  • ‘Pulcherrimin Protects Bacillus subtilis Against Oxidative Stress During Biofilm Development’

    “Pulcherrimin is an iron-binding reddish pigment produced by various bacterial and yeast species. In the soil bacterium Bacillus subtilis, this pigment is synthesized intracellularly as the colorless pulcherriminic acid by using two molecules of tRNA-charged leucine as the substrate; pulcherriminic acid molecules are then secreted and bind to ferric iron extracellularly to form the red- colored pigment pulcherrimin. … In this study, we identified that pulcherrimin is primarily produced under biofilm conditions and provides protection to cells in the biofilm against oxidative stress.” Find the paper and full list of authors in NPJ Biofilms and Microbes.

    Learn more

    ,
  • ‘Dynamic Structure of T4 Gene 32 Protein Filaments Facilitates Rapid Noncooperative Protein Dissociation’

    “Bacteriophage T4 gene 32 protein (gp32) is a model single-stranded DNA (ssDNA) binding protein, essential for DNA replication. … Detailed understanding of gp32 filament structure and organization remains incomplete. … Moreover, it is unclear how these tightly-bound filaments dissociate from ssDNA during complementary strand synthesis. We use optical tweezers and atomic force microscopy to probe the structure and binding dynamics of gp32 on long (∼8 knt) ssDNA substrates. … Cooperative binding of gp32 rigidifies ssDNA while also reducing its contour length, consistent with the ssDNA helically winding around the gp32 filament.” Find the paper and full list of authors at…

    Learn more

    ,
  • ‘Infinite Neural Network Quantum States: Entanglement and Training Dynamics’

    “We study infinite limits of neural network quantum states (-NNQS), which exhibit representation power through ensemble statistics, and also tractable gradient descent dynamics. Ensemble averages of entanglement entropies are expressed in terms of neural network correlators, and architectures that exhibit volume-law entanglement are presented. The analytic calculations of entanglement entropy bound are tractable because the ensemble statistics are simplified in the Gaussian process limit.” Find the paper and full list of authors in Machine Learning: Science and Technology.

    Learn more

    ,
  • ‘Mentoring Black, Indigenous and People of Color (BIPOC) Nursing Faculty toward Leadership Excellence’

    “Although the importance of mentoring is emphasized across healthcare professions and among diverse disciplines and situations, the success of nursing faculty who are BIPOC, termed as Black, Indigenous, and Nursing Faculty of Color (BINFOC), is heavily influenced by the mentoring received early in their academic career. … The analysis [aims] to present a model case using a novel strategy of integrating historical research, present attributes that have been influential in legacy building for more than 30 years, and elucidate the unique situation of mentoring BINFOC toward leadership excellence.” Find the paper and full list of authors in ABNFF Journal.

    Learn more

    ,
  • Dewan publishes chapter on growing importance of nurse anesthetists in global health context

    Assistant clinical professor Janet A. Dewan, with co-author Aaron K. Sonah of Phebe Ester Bacon College of Health Science, has published a book chapter titled, “Universal Health Coverage and Nurse Anesthetists” in “Nurse Practitioners and Nurse Anesthetists: The Evolution of the Global Roles.” The chapter argues that as “the central concept that the measure of essential surgical and anesthesia care is an indication of the quality of a health system has gained traction … Access to safe surgery and anesthesia is linked to a health system’s ability to meet Universal Health Coverage benchmarks.”

    Learn more

  • Research on de-centering damage and trauma in human-computer interactions wins Best Paper Award

    The paper “Flourishing in the Everyday: Moving Beyond Damage-Centered Design in HCI for BIPOC Communities,” written by several contributors, including assistant professor Alexandra To and PhD. student Dilruba Showkat, has won a Best Paper Award from the Association for Computing Machinery: Designing Interactive Systems Conference. The abstract reads, in part: “Research and design in human-computer interaction centers problem-solving, causing a downstream effect of framing work with and for marginalized communities predominantly from the lens of deficit and damage. … However, we observe an additional need to center positive aspects of humanity … particularly for Black, Indigenous, and People of Color.”

    Learn more

    ,
  • ‘Ecosystem Graphs: The Social Footprint of Foundation Models’

    “Foundation models (e.g. ChatGPT, StableDiffusion) pervasively influence society, warranting immediate social attention. While the models themselves garner much attention, to accurately characterize their impact, we must consider the broader sociotechnical ecosystem. We propose Ecosystem Graphs as a documentation framework to transparently centralize knowledge of this ecosystem. Ecosystem Graphs is composed of assets (datasets, models, applications) linked together by dependencies that indicate technical (e.g. how Bing relies on GPT-4) and social (e.g. how Microsoft relies on OpenAI) relationships. To supplement the graph structure, each asset is further enriched with fine-grained metadata.” Find the paper and full list of authors at ArXiv.

    Learn more

  • ‘Study on High Availability and Fault Tolerance’

    “Availability is one of the most important requirements for modern computing systems. In cloud computing, it is common to use it as a key factor in adopting a cloud service. This paper studies the breakdown in calculating the availability and proposes a conceptual model as middleware. … Through simulations tests, we verified that the proposed model is able to detect the system crash in sub-seconds and improve the overall availability of the system compared to currently used industry solutions.” Find the paper and full list of authors at the 2023 International Conference on Computing, Networking and Communications.

    Learn more

  • ‘Real-Time Search and Rescue Using Remotely Piloted Aircraft System With Frame Dropping’

    “Usage of Artificial Intelligence (AI) technology to aid the Remotely Piloted Aircraft System (RPAS) helps to get accurate imagery along with vital ground details, which as a result boosts the Search and Rescue operations. Since the search must be done quickly, real-time video processing is essential for survival. Our solution attempts to integrate image processing, more specifically, the You Only Look Once (YOLO) algorithm to detect humans in all environmental conditions.” Find the paper and full list of authors at the 2023 International Conference on Computing, Networking and Communications.

    Learn more

  • ‘Emergency Surgical Scheduling Model Based on Moth-Flame Optimization Algorithm’

    “In this paper, we propose an optimization approach based on an improved Moth Flame optimization (MFO) algorithm for solving emergency operating room scheduling problems. The purpose of the MFO is to minimize the maximum span of operations, ensuring patients receive their surgeries in a timely manner. This nature-inspired algorithm stimulates the moth’s special navigation method at night called transverse orientation. The moth uses the moonlight to sustain a fixed angle to the moon, therefore, guaranteeing a straight line.” Find the paper and full list of authors at the 2023 International Conference on Computing, Networking and Communications.

    Learn more

  • ‘Multi-Agent Reinforcement Learning Based on Representational Communication for Large-Scale Traffic Signal Control’

    ,

    “Traffic signal control (TSC) is a challenging problem within intelligent transportation systems and has been tackled using multi-agent reinforcement learning (MARL). … Many deep MARL communication frameworks proposed for TSC allow agents to communicate with all other agents at all times, which can add to the existing noise in the system and degrade overall performance. In this study, we propose a communication-based MARL framework for large-scale TSC. Our framework allows each agent to learn a communication policy that dictates ‘which’ part of the message is sent ‘to whom’.” Find the paper and full list of authors at IEEE Access.

    Learn more

    ,
  • ‘On Centralized Critics in Multi-Agent Reinforcement Learning’

    “Centralized Training for Decentralized Execution, where agents are trained offline in a centralized fashion and execute online in a decentralized manner, has become a popular approach in Multi-Agent Reinforcement Learning (MARL). In particular, it has become popular to develop actor-critic methods that train decentralized actors with a centralized critic … [however,] using a centralized critic in this context has yet to be sufficiently analyzed theoretically or empirically. In this paper, we therefore formally analyze centralized and decentralized critic approaches.” Find the paper and full list of authors at the Journal of Artificial Intelligence Research.

    Learn more

  • ‘Discovering Variable Binding Circuitry With Desiderata’

    “Recent work has shown that computation in language models may be human-understandable, with successful efforts to localize and intervene on both single-unit features and input-output circuits. Here, we introduce an approach which extends causal mediation experiments to automatically identify model components responsible for performing a specific subtask by solely specifying a set of desiderata, or causal attributes of the model components executing that subtask.” Find the paper and full list of authors at ArXiv.

    Learn more

  • ‘Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task’

    “Language models show a surprising range of capabilities, but the source of their apparent competence is unclear. Do these networks just memorize a collection of surface statistics, or do they rely on internal representations of the process that generates the sequences they see? We investigate this question in a synthetic setting by applying a variant of the GPT model to the task of predicting legal moves in a simple board game, Othello.” Find the paper and full list of authors at Open Review.

    Learn more

  • ‘Mass-Editing Memory in a Transformer’

    “Recent work has shown exciting promise in updating large language models with new memories, so as to replace obsolete information or add specialized knowledge. However, this line of work is predominantly limited to updating single associations. We develop MEMIT, a method for directly updating a language model with many memories, demonstrating experimentally that it can scale up to thousands of associations for GPT-J (6B) and GPT-NeoX (20B), exceeding prior work by orders of magnitude.” Find the paper and full list of authors at Open Review.

    Learn more

  • ‘Environmental and Geographical Factors Structure Cauliflower Coral’s Algal Symbioses Across the Indo-Pacific’

    “The symbioses between corals and endosymbiotic dinoflagellates have been described as a flexible relationship whose dynamics could serve as a source of resilience for coral reef ecosystems. However, the factors that drive the establishment and maintenance of this co-evolutionary relationship remain unclear. We examined the environmental and geographical factors structuring dinoflagellate communities in a wide-ranging Indo-Pacific coral to begin to address this gap. … We provide further support for the hypothesis that coral’s Symbiodiniaceae communities could facilitate host resilience to thermal stress.” Find the paper and full list of authors at the Journal of Biogeography.

    Learn more

    , ,
  • Northeastern University launches fully automated and virtualized O-RAN private 5G network with AI automation

    “The Institute for the Wireless Internet of Things (WIoT) at Northeastern University and its Open6G R&D Center announce the availability of the first production-ready private 5G network fully automated through Artificial Intelligence (AI). The system is built on open-source components enabling a fully virtualized, programmable O-RAN compliant network in a campus environment.”

    Learn more

    ,
  • ‘Semantics-Aware Dataset Discovery From Data Lakes With Contextualized Column-Based Representation Learning’

    “Dataset discovery from data lakes is essential in many real application scenarios. In this paper, we propose Starmie, an end-to-end framework for dataset discovery from data lakes (with table union search as the main use case). Our proposed framework features a contrastive learning method to train column encoders from pre-trained language models in a fully unsupervised manner. The column encoder of Starmie captures the rich contextual semantic information within tables by leveraging a contrastive multi-column pre-training strategy.” Find the paper and the full list of authors in the Proceedings of the VLDB Endowment.

    Learn more

  • ‘Explaining Dataset Changes for Semantic Data Versioning With Explain-Da-V’

    “In multi-user environments in which data science and analysis is collaborative, multiple versions of the same datasets are generated. While managing and storing data versions has received some attention in the research literature, the semantic nature of such changes has remained under-explored. In this work, we introduce Explain-Da-V, a framework aiming to explain changes between two given dataset versions. Explain-Da-V generates explanations that use data transformations to explain changes. We further introduce a set of measures that evaluate the validity, generalizability, and explainability of these explanations.” Find the paper and full list of authors in VLDB Endowment proceedings.

    Learn more

  • ‘Table Discovery in Data Lakes: State-of-the-Art and Future Directions’

    “Data discovery refers to a set of tasks that enable users and downstream applications to explore and gain insights from massive collections of data sources such as data lakes. In this tutorial, we will provide a comprehensive overview of the most recent table discovery techniques developed by the data management community. We will cover table understanding tasks such as domain discovery, table annotation, and table representation learning which help data lake systems capture semantics of tables.” Find the paper and the full list of authors in the Companion of the 2023 International Conference on Management of Data.

    Learn more

  • ‘SANTOS: Relationship-Based Semantic Table Union Search’

    , ,

    “Existing techniques for unionable table search define unionability using metadata (tables must have the same or similar schemas) or column-based metrics (for example, the values in a table should be drawn from the same domain). In this work, we introduce the use of semantic relationships between pairs of columns in a table to improve the accuracy of the union search. Consequently, we introduce a new notion of unionability that considers relationships between columns, together with the semantics of columns, in a principled way.” Find the paper and full list of authors in the Proceedings of ACM on Management of Data.

    Learn more

  • ‘Direct Superpoints Matching for Fast and Robust Point Cloud Registration’

    ,

    “Although deep neural networks endow the downsampled superpoints with discriminative feature representations, directly matching them is usually not used alone in state-of-the-art methods. … Existing approaches use the coarse-to-fine strategy to propagate the superpoints correspondences to the point level, which are not discriminative enough and further necessitates the postprocessing refinement. In this paper, we present a simple yet effective approach to extract correspondences by directly matching superpoints using a global softmax layer in an end-to-end manner, which are used to determine the rigid transformation between the source and target point cloud.” Find the paper and full list of authors at…

    Learn more

  • ‘Do Machine Learning Models Produce TypeScript Types That Type Check? (Artifact)’

    “Type migration is the process of adding types to untyped code to gain assurance at compile time. TypeScript and other gradual type systems facilitate type migration by allowing programmers to start with imprecise types and gradually strengthen them. … In this paper we argue that accuracy can be misleading, and we should address a different question: can an automatic type migration tool produce code that passes the TypeScript type checker? We present TypeWeaver, a TypeScript type migration tool that can be used with an arbitrary type prediction model.” Find the paper and full list of authors at Dagstuhl Research Online…

    Learn more

  • ‘Online Learning in Multi-Unit Auctions’

    “We consider repeated multi-unit auctions with uniform pricing, which are widely used in practice for allocating goods such as carbon licenses. In each round, K identical units of a good are sold to a group of buyers that have valuations with diminishing marginal returns. The buyers submit bids for the units, and then a price p is set per unit so that all the units are sold. We consider two variants of the auction, where the price is set to the K-th highest bid and (K+1)-st highest bid, respectively.” Find the paper and full list of authors at ArXiv.

    Learn more