Reasoning through the picture: Machine learning between words and images

February 6, 2023

Researchers have identified a new “cross-modal retrieval” method to operate between “language and vision domains.” From their abstract: “To address this issue, we introduce an intuitive and interpretable model to learn a common embedding space for alignments between images and text descriptions. Specifically, our model first incorporates the semantic relationship information into visual and textual features by performing region or word relationship reasoning.”

Read “Image-Text Embedding Learning via Visual and Textual Semantic Reasoning” and see the full list of authors in the IEEE Transactions on Pattern Analysis and Machine Intelligence.

View on Site

Yun Raymond Fu

Electrical Engineering, Machine Learning

Reasoning through the picture: Machine learning between words and images

Related

Patent for ‘lightweight pose estimation network’ goes to Fu

‘Through the Theory of Mind’s Eye: Reading Minds with Multimodal Video Large Language Models’

‘Rigor With Machine Learning From Field Theory to the Poincaré Conjecture’

For teaching computers to see, Northeastern professor receives eminent achievement award

Ganguly and Melodia named Distinguished Members of Association for Computing Machinery

Fu elected to European Academy of Sciences and Arts

Fu and Sridhar named National Academy of Inventors Fellows

Improving the efficiency of medical device communication

Developing cutting-edge testing technology for 5G Open RAN