Reasoning through the picture: Machine learning between words and images

Researchers have identified a new “cross-modal retrieval” method to operate between “language and vision domains.” From their abstract: “To address this issue, we introduce an intuitive and interpretable model to learn a common embedding space for alignments between images and text descriptions. Specifically, our model first incorporates the semantic relationship information into visual and textual features by performing region or word relationship reasoning.”

Read “Image-Text Embedding Learning via Visual and Textual Semantic Reasoning” and see the full list of authors in the IEEE Transactions on Pattern Analysis and Machine Intelligence.

View on Site: Reasoning through the picture: Machine learning between words and images
,