Article summary written by Chase Tenewitz for discussion on 12/01/16:
Paper: Sezgin M, Sankur B. Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 2004; 13: 146-168.
The research publication, Survey over image thresholding techniques and quantitative performance evaluation, is an in depth study of 40 selected image thresholding methods. To analyze these 40 techniques, Mehmet Sezgin and Bulent Sakur categorized the methods, expressed their formulas under a uniform notation, and compared them using a specific performance comparison. These 40 techniques were compared using nondestructive testing applications, an analysis method used to evaluate the properties of a material, component or system, along with document images, such as a word document containing characters. This was done using 40 NDT and 40 document images.
To differentiate between the methods, they are categorized based on the following criteria: histogram shape-based, clustering-based, entropy-based, attribute similarity, spatial, and locally adaptive. The histogram shape-based category achieves the thresholding on the basis of the shape properties of the histogram. Next, the clustering-based method relies on clustering analysis, which groups a set of objects. The entropy-based thresholding examines the entropy of the distribution of the gray levels in an image. The basis of thresholding on attribute similarity sets a specific threshold value established on a quality or similarity between the masked, binary image and the original image. Spatial thresholding exploits the dependency of neighboring pixels. Finally, the locally adaptive methods set a threshold for each individual pixel based on varying statistics near the pixel.
To compare all 40 of the techniques in the nondestructive and document testing, a specific performance criteria needed to be created. This criteria is unique due to the fact that there is not a single set process to compare thresholding efficiency. When determining the best way to assess the methods, the authors had to account for the noise of the segmentation map along with the deformation of the characters. Ultimately, the authors used 5 criteria: misclassification error, which shows the incorrectly assigned pixel percentage, edge mismatch, which marks differences in the edge map of the threshold image and the gray level image, region nonuniformity, which specifies the distinguishability of the foreground and background, the relative foreground area error, which compares the properties of the object, and shape distortion penalty via Hausdorff distance, which measures the shape similarity of the thresholded regions to the ground-truth shapes. To provide an average performance, the 5 stated criteria where combined using two different methods. With the first approach, each image was tested with a thresholding algorithm and then each image averaged the 5 criteria. Then the averages were summed for all images with the varying algorithms, which ultimately provided the performance of the thresholding technique used. The other method used was the rank averaging of the techniques. In this method, each image was thresholded with all methods and ranked 1 through 40. Then the ranks were averaged for each thresholding technique and compared.
Using the above criteria and ranking system, the results of the experiments varied between the NDT and document applications; however, the Cluster_Kittler algorithm seemed to remain in first place for both. For NDT, the top seven ranked algorithms were from the clustering and entropy category. For the document applications, the top results were from the locally adaptive and shape categories. Although there was a trend seen in the NDT and document applications, all of the methods had at least one bad performance.