T-Rex Object Counting Model: Accurate & Interactive

What is T-Rex?

T-Rex stands is a pioneering interactive object counting model. Its primary function is to detect and count objects in a given visual field, a task it accomplishes with remarkable precision and flexibility.

‍

The model's distinctive features include:

‍

Open-Set Capability: Unlike many of its counterparts, T-Rex is not limited to predefined categories. It has the remarkable ability to count any object, offering a broad range of applications.
Visual Prompting: Users can directly influence the counting process by providing visual examples. This feature enhances the model's accuracy and adaptability to specific tasks.
Intuitive Visual Feedback: T-Rex employs a detection-based approach, which includes visual feedback like detected boxes. This allows users to easily verify and assess the accuracy of the results.‍
Interactive Nature: The model's interactive design lets users participate in the counting process, offering opportunities to correct errors and refine results.

‍

*T-Rex is object counting model distinguished by four key features: it is detection-based, visually promptable, interactive, and open-set in nature [1].*

‍

How T-Rex works?

T-Rex incorporates various workflows to facilitate interactive object counting and detection:

‍

Positive-only prompt mode: In this mode, T-Rex identifies and counts similar objects with a simple click or box drawing. Users can add additional prompts for more complex scenarios like densely packed or small objects.
Positive with negative prompt mode: This mode is particularly useful for correcting false detections. Users can add negative prompts to falsely-detected objects, enhancing the accuracy of the results.‍
Cross image prompt mode: An innovative feature, this mode enables counting across different images. By prompting on a reference image, T-Rex can detect objects in other target images. This feature is especially useful for automatic annotation, although it is still under development.

‍

*T-Rex provides three major interactive workflows, designed to be versatile and applicable across a wide range of real-world scenarios. [1]*

‍

Overview to the T-Rex model

T-Rex functions as a detection-based model with three main components:

‍

Image Encoder: This extracts image features from both the target image and optionally a reference image. In cases where there is no separate reference image, the target image itself serves as the reference.
Prompt Encoder: Utilizing user-drawn boxes or points as prompts on the reference image, this encoder extracts the encoded visual prompt from the reference image feature.‍
Box Decoder: This component combines the target image feature with the encoded visual prompt, resulting in detected boxes along with their confidence scores. A predetermined score threshold is then applied to filter these boxes, with the remaining ones being counted to determine the final object count.

‍

‍

Performance evaluation of T-Rex

T-Rex's performance is its combination of zero-shot counting excellence and unparalleled adaptability.

‍

Exceptional zero-shot counting and adaptability

T-Rex demonstrates outstanding proficiency in zero-shot counting, setting a new benchmark in this area. Its adaptability across various domains further enhances its appeal. T-Rex consistently outperforms other leading models like Grounding DINO and GPT-4V.

‍

This superior performance is attributed to its excellent zero-shot counting abilities, which enable it to accurately count objects it has never encountered before in training datasets.

‍

Benchmark setting in diverse domains

The adaptability of T-Rex is particularly noteworthy. Unlike models that are constrained to specific categories or settings, T-Rex can be applied to an extensive range of domains. Whether it's in complex industrial settings, crowded urban landscapes, or intricate biological environments, T-Rex maintains its high accuracy and reliability.

Mean Average Error (MAE) on detecting and counting objects in the image. [1]

‍

Applications of T-Rex

T-Rex's versatility allows it to be applied across various domains, including but not limited to:

Agriculture, Industry, and Livestock: For counting and monitoring purposes.
Biology and Medicine: Useful in research and diagnostics.
Retail, Electronic, and Transportation: Helpful in inventory and logistic management.
Human-related Applications: Can be used for crowd counting and monitoring.

‍

As an open-set object detector, T-Rex is exceptionally useful for automatic annotation, particularly in dense and overlapping scenes. Its zero-shot detection capability makes it a powerful tool in scenarios where predefined object categories are either unavailable or insufficient.

‍

Conclusion and Future Perspectives

The ability of T-Rex to adapt and perform with high accuracy in zero-shot counting scenarios marks a significant leap forward in the field of object detection and counting. Its success heralds a new era of intelligent, adaptable, and user-friendly machine learning models that can cater to a wide array of industries and applications.

‍

Integration with Ikomia API

An exciting development in this realm is the integration of such models with the Ikomia API. This API serves as a gateway to utilizing advanced models like Grounding DINO and SAM. For those keen on utilizing Grounding DINO or SAM, the Ikomia API provides a seamless and user-friendly platform to do so:

‍

Guide to Segment Anything Model (SAM) →

Explore Grounding Dino zero-shot detection model →

‍