‘Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection’

“We present PARQ – a multi-view 3D object detector with transformer and pixel-aligned recurrent queries. Unlike previous works that use learnable features or only encode 3D point positions as queries in the decoder, PARQ leverages appearance-enhanced queries initialized from reference points in 3D space and updates their 3D location with recurrent cross-attention operations. Incorporating pixel-aligned features and cross attention enables the model to encode the necessary 3D-to-2D correspondences and capture global contextual information of the input images.”

Find the paper and full list of authors at ArXiv.

View on Site: ‘Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection’