Glaucoma, Cataract, Age-related macular degeneration, (AMD) Diabetic retinopathy (DR) are among the leading retinal diseases. Thus, there is an active effort to create and develop methods to automate screening of retinal diseases. Many CAD (Computer Aided Diagnosis) systems have been expanded and are widely used for ocular diseases. Recently, Deep Neural Networks (DNNs) have been adopted in ophthalmology and applied to fundus images, achieving detection of retinal abnormalities using retinal images. There are essentially two approaches, the first one is based on hybrid method that employs image processing for preprocessing, features extraction and post processing and Deep Neural Network (DNN) is only used for classification. The second is the fully method where DNN is used for both feature extraction and classification. Several DNN models and their variants have been proposed such as AlexNet, VGG, GoogleNet, Inception, U-Net, Residual Net (ResNet), DenseNet for detection of eye retina abnormalities. The aim of this work is to provide the background and the methodology to conduct a benchmarking analysis including the computational aspects and analysis of the representative DNNs proposed in the state of the art for detection DR diseases. For each DNN different characteristics and some performance indices (i.e. model complexity, computation complexity, inference time, memory use) and detection disease performance (i.e. accuracy rate), must be taking into account to find the more accurate model. The public domain datasets used for training and testing the DNN models such as Kaggle, MESSIDOR, and EyePACS are outlined and analyzed in particular in DR detection.
This paper presents the real-time implementation of deep neural networks on smartphone platforms to detect and classify diabetic retinopathy from eye fundus images. This implementation is an extension of a previously reported implementation by considering all the five stages of diabetic retinopathy. Two deep neural networks are first trained, one for detecting four stages and the other to further classify the last stage into two more stages, based on the EyePACS and APTOS datasets fundus images and by using transfer learning. Then, it is shown how these trained networks are turned into a smartphone app, both Android and iOS versions, to process images captured by smartphone cameras in real-time. The app is designed in such a way that fundus images can be captured and processed in real-time by smartphones together with lens attachments that are commercially available. The developed real-time smartphone app provides a costeffective and widely accessible approach for conducting first-pass diabetic retinopathy eye exams in remote clinics or areas with limited access to fundus cameras and ophthalmologists.
Optical Character Recognition (OCR) systems have been designed to operate on text contained in scanned documents and images. They include text detection and character recognition in which characters are described then classified. In the classification step, characters are identified according to their features or template descriptions. Then, a given classifier is employed to identify characters. In this context, we have proposed the unified character descriptor (UCD) to represent characters based on their features. Then, matching was employed to ensure the classification. This recognition scheme performs a good OCR Accuracy on homogeneous scanned documents, however it cannot discriminate characters with high font variation and distortion.3 To improve recognition, classifiers based on neural networks can be used. The multilayer perceptron (MLP) ensures high recognition accuracy when performing a robust training. Moreover, the convolutional neural network (CNN), is gaining nowadays a lot of popularity for its high performance. Furthermore, both CNN and MLP may suffer from the large amount of computation in the training phase. In this paper, we establish a comparison between MLP and CNN. We provide MLP with the UCD descriptor and the appropriate network configuration. For CNN, we employ the convolutional network designed for handwritten and machine-printed character recognition (Lenet-5) and we adapt it to support 62 classes, including both digits and characters. In addition, GPU parallelization is studied to speed up both of MLP and CNN classifiers. Based on our experimentations, we demonstrate that the used real-time CNN is 2x more relevant than MLP when classifying characters.
Several approaches were proposed in order to extract text from scanned documents. However, text extraction in heterogeneous documents stills a real challenge. Indeed, text extraction in this context is a difficult task because of the variation of the text due to the differences of sizes, styles and orientations, as well as to the complexity of the document region background. Recently, we have proposed the improved hybrid binarization based on Kmeans method (I-HBK)5 to extract suitably the text from heterogeneous documents. In this method, the Page Layout Analysis (PLA), part of the Tesseract OCR engine, is used to identify text and image regions. Afterwards our hybrid binarization is applied separately on each kind of regions. In one side, gamma correction is employed before to process image regions. In the other side, binarization is performed directly on text regions. Then, a foreground and background color study is performed to correct inverted region colors. Finally, characters are located from the binarized regions based on the PLA algorithm. In this work, we extend the integration of the PLA algorithm within the I-HBK method. In addition, to speed up the separation of text and image step, we employ an efficient GPU acceleration. Through the performed experiments, we demonstrate the high F-measure accuracy of the PLA algorithm reaching 95% on the LRDE dataset. In addition, we illustrate the sequential and the parallel compared PLA versions. The obtained results give a speedup of 3.7x when comparing the parallel PLA implementation on GPU GTX 660 to the CPU version.
PNG (Portable Network Graphics) is a lossless compression method for real-world pictures. Since its specification, it continues to attract the interest of the image processing community. Indeed, PNG is an extensible file format for portable and well-compressed storage of raster images. In addition, it supports all of Black and White (binary mask), grayscale, indexed-color, and truecolor images. Within the framework of the Demat+ project which intend to propose a complete solution for storage and retrieval of scanned documents, we address in this paper a hardware design to accelerate the PNG encoder for binary mask compression on FPGA. For this, an optimized architecture is proposed as part of an hybrid software and hardware co-operating system. For its evaluation, the new designed PNG IP has been implemented on the ALTERA Arria II GX EP2AGX125EF35" FPGA. The experimental results show a good match between the achieved compression ratio, the computational cost and the used hardware resources.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.