Abstract: Pre-trained vision-language models (VLMs) and language models (LMs) have recently garnered significant attention due to their remarkable ability to represent textual concepts, opening up new ...
Abstract: Recent real-time detection transformers (DETRs) have gained popularity due to their simplicity and efficiency. However, these detectors do not explicitly model object rotation, especially in ...