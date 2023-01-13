Instance segmentation, useful in applications like autonomous driving, robotic manipulation, picture editing, cell segmentation, etc., tries to extract the pixel-wise mask labels of the interested objects. Instance segmentation has made significant strides in recent years because of the powerful learning capabilities of sophisticated CNN and transformer systems. However, many of the available instance segmentation models are trained using a fully supervised approach, which strongly relies on the pixel-level annotations of the instance mask and results in high and time-consuming labeling costs. Box-supervised instance segmentation, which uses simple and label-efficient box annotations rather than pixel-wise mask labels, has been offered as a solution to the abovementioned issue. Box annotation has recently gained a lot of academic interest and makes instance segmentation more accessible for new categories or scene types. Some techniques have been developed that use additional auxiliary salient data or post-processing techniques like MCG and CRF to produce pseudo labels to enable pixel-wise supervision with box annotation. These approaches, however, require several independent stages, complicating the training pipeline and adding more hyper-parameters to adjust. On COCO, generating an object’s polygon-based mask typically takes 79.2 seconds, yet annotating an object’s bounding box only takes 7 seconds.

