Abstract :
Zero-Shot Learning (ZSL) is a model training methodology where a model can predict at testing what is not learned during training. ZSL is effective when the features are learned from both images and text, mapped using different techniques to enable the model to categorize the images. ZSL is useful when we need to learn an intermediate semantic layer and an inference layer to predict the unseen classes during testing. ZSL also has several flavours of model learning. The feature learning can be broadly categorized into three in terms of the semantic space, the visual space, and the ZSL model space. The semantic space focuses on aligning the semantic attributes with class labels. The visual space extracts the features from the seen images using pre-trained networks. Transfer learning in semantic space and visual space is discussed in detail in this paper. The ZSL model space focuses on learning the relationship between the visual space and the semantic space. In inductive ZSL, only the data of the source classes are accessible during the training stage. But for the transductive ZSL strategies, both the labelled source data and the unlabelled target data are accessible for training. Generalized zero-shot learning is an extension of ZSL where the images that are to be predicted at test time contain both seen and unseen classes. This survey highlights the different hierarchies in the three areas and highlights the comparison between the different techniques and the future trends on ZSL.