Is Deep Learning All You Need for Unsupervised Saliency Detection?

Pre-trained networks have recently achieved great success in computer vision. At present, most deep learning-based saliency detection methods use pre-trained networks to extract features, regardless of supervised or unsupervised. However, we found that when unsupervised saliency detection is performed on grayscale biomedical images, pre-trained networks such as VGG cannot effectively extract significant features. We suggest that VGG is not able to learn salient information from grayscale biomedical images and its performance greatly depends on RGB cues and quality of the training set. To verify our hypothesis, we construct an adversarial data set featuring a low signal-to-noise ratio (SNR), low resolution and rich salient objects and conduct a series of probing experiments. What’s more, in order to further explore what VGG has learned, we visualize intermediate feature maps. To the best of our knowledge, we are the first to investigate the reliability of deep learning methods for unsupervised saliency detection on grayscale biomedical images. It’s worth noticing that our adversarial data set also provides a more robust evaluation of saliency detection and may serve as a standard benchmark in future work on this task.