KEYWORDS: Data modeling, Deep learning, Visual process modeling, Process modeling, Object detection, Machine vision, Project management, Image classification, Artificial intelligence
Deep learning (DL) for machine vision tasks has seen enormous growth and success in recent years. However, the complexity of the DL model development workflow and the prevalence of code-based solutions which are powerful for research but have limited scalability, poses a challenge in managing a large number of DL projects and datasets. In this paper, we propose an integrated platform, named MIMOS Machine Vision Package (MiMVP) Deep Learning. The MiMVP could provide a single, desktop-based visual interface to manage DL projects and guide the user through the workflow of developing and training DL models, from preparing data, training and tuning model, and comparing and analyzing results. By streamlining the DL workflow, this platform can enhance the efficiency of DL model development .
With the advance of science and technology, there are more and more vehicles on the road, emergency vehicles such as ambulances are having a hard time bypassing busy lanes. This paper proposed an ambulance siren tracking system based on SOM algorithm. By using self-organizing map (SOM) techniques, the location and the direction of the ambulance siren can be tracked. Later, the support vector machine technique was used to classify the sound of ambulance sirens. In order to improve classification of ambulance sirens, pre-processing steps such as bandpass filter are adopted, namely, the 600 Hz as the lower and 1600 Hz as the upper cutoff frequencies. For this system, we developed a mobile application, named Siren Tracking system with Emergency Support (STES), to allow users to track ambulance sirens, for drivers on the road. In order to examine the system performance, we subject the system on some real-time scenarios, using St John ambulance sirens. Based on the experimental results, the system is shown to be able to reliably localize the location of the ambulance car.
Nowadays, the increasing number of careless drivers on the road has resulted in more accident cases. Driver’s decisions and behaviors are the key to maintain road safety. However, many drivers tend to do secondary tasks like playing with their phone, adjusting radio player, eating or drinking, answering phone calls, and worse case is reading phone texts. In previous efforts, many kinds of approaches had been introduced to try to solve the task to recognize and capture potential problem related to careless driving inside the car. In this project, the work will mainly focus on the driver's secondary task recognition using the action detection method. A camera will be set up inside the car for the real-time extract of driver’s action. The video will undergo a process to extract out the human pose frames without background, called human pose estimator framework. Inside this framework, raw images will be input into a convolutional neural network (CNN) that computes human key-point's activation maps. After, that key-point's coordinate will be computed using the output activation maps and drawn on a new blank frame. Then the frames will be input into classification CNN network for the action classification. If an action performed by driver is considered a dangerous secondary task, alert will be given. The proposed framework was able to achieve a higher speed compared to other people's framework if it is being run on Raspberry Pi CPU. It is able to detect 10 different driver actions where only talking to passengers and normal driving will not trigger the buzzer to give alert to driver.
Nowadays, many people adopt a pet; not for taking care of the house but as a companion. However, many pet owners may not be able to spend time taking care of their pets, especially when they have a business trip. A stay-at-home pet has no, or less, survival skill to look for food outside and could not survive on its own, unlike stray animals. Therefore, usually, pet owners would ask their friends to take care for them; or look for a real-time food dispensing system that can feed the pet at scheduled times with monitoring functions. This paper proposes a monitoring system which comes with an automatic food dispenser, and which comes with several other useful functions to assist the pet owners. The proposed system uses the Raspberry Pi which controls and interconnects with several subsystems such as a (1) real-time monitoring subsystem, (2) door managing subsystem with software support help to ease the burden, and (3) food dispensing system. The wireless food dispensing system allows the owner to feed their pet automatically, according to a schedule or manually, according to the owner’s preference. The real-time monitoring subsystem allows the owner to monitor their pet and check if other stray pets tried to enter the cage through a camera. Finally, the door managing subsystem allows the owner to control the door lock/unlock, thus giving their pet some freedom time. This system proves to be useful to be implemented to reduce the chance of the pets being abandoned.
Action recognition is one of the popular research areas in computer vision because it can be applied to solve many problems especially in security surveillance, behavior analysis, healthcare and so on. Some of the well-known Convolutional Neural Network (CNN) in action classification using 3D convolution are C3D, I3D and R(2 + 1)D. These 3D CNNs assume that the spatial and temporal dimensions of motion are uniform where the 3D filters are uniformly shaped. However, the path in motion can be in any directions and a uniform shape filter might not be able to capture nonuniform spatial motion and this limits the performance of the classification. To address the above problem, we incorporate a 3D deformable filter in a C3D network for action classification. The deformable convolution adds offsets to the regular grid sampling locations in the standard convolution resulting in non-uniform sampling location. We will also investigate the performance of the network when apply the 3D deformable convolution in different layers and the effect of different dilation size of the 3D deformable filter. UCF101 dataset is used in the experiments. From our experiments, it is found that applying the deformable convolution in lower layer yield better result compare to other layers. Our experiment shows that if we put the deformable convolution in Conv1a, the accuracy achieved is 48.50%.
Drivers’ distraction is one of the leading causes for road accidents. To reduce driver distraction, the real-time gesture recognition system for ADAS is aimed to simplify and enhance the interaction between human and computer by implementing the vision-based technique that allows driver to interact with the vehicle infotainment system functions using natural mid-air hand gestures. Therefore, in this paper, we proposed to track and recognize human static hand gestures. The system process is separated into five steps which include the image acquisition, the background subtraction, the hand segmentation, the features extraction, and the gesture recognition. Firstly, the image frame will be captured and resized. Thereafter, the region of interest is determined to minimize the required processing to increase performance. Next, the foreground model is being extracted and being converted to HSV color space. Then, the skin filter is applied to extract skin region. In the next step, the image is being transformed into binary image by thresholding and smoothened by applying the morphological transformation. Contour detection and approximation were being implemented. Finally, the hand features will be extracted to build the gesture recognition model such as hand center, palm radius, fingertips, defect points, hull area, hand area and angle of finger. Experimental results show that the system is able to achieve 86.25% recognition accuracy in room environment and 80% recognition accuracy in car environment. In comparison, the average classification accuracy is 92.5% and 90%, for room and car environment, respectively.
Fisheries are one of the few disciplines in biology where data collection continues to rely on the removal and destruction of the objects being studied. This practice has come under increasing scrutiny in recent years as research projects have been terminated due to denied permits. In some instances, research budgets have had to absorb the cost of purchasing quota for the fish captured, with difficulty in publishing results due to animal welfare concerns. In this paper, we proposed a non-extractive sampling system to localize fishes from underwater images obtained at aquaculture farms, which suffered from several issues; being, 1) low luminance issues, which could be significantly in detecting fishes, 2) severe water‟s turbidity due to fish mass caged in an area, and 3) protection / enclosure designed for camera to ensure fishes shy away from it. These images acquired in highly turbid waters are difficult to be recovered; due to 1) the fish feeding process (fish feed) which add noises to the already turbid water, and 2) the existing healthy biodiversity at the aquaculture farms. In this work, we investigate the performance of Faster R-CNN to localize fishes in the highly turbid dataset, under different network architectures at the base network. Different base network such as MobileNet, MobileNetV2, DenseNet, and ResNet, are employed. Experimental results show that MobileNetV2, with learning rate 0.01, with 500 iteration and 15 epoch, and with 87.52 % classification accuracy, is the most feasible to be deployed in the resource constrained environment, with about 6.7M parameters requiring 27.2 MB storage. These findings will be useful when the equipment is embedded with the Faster R-CNN when placed underwater for monitoring purposes.
Aquaculture farm provides a solution towards the overfishing phenomena. However, maintaining big scale farms manually requires going through hours of video footages to collect important information about the fishes. These footages are usually taken by an underwater camera affixed in many different ways within the farm’s cage and are manually analyzed by human operators. Since they are limited to biological restrictions, issues such as wandering attention span or human error may occur. This paper proposes a non-intrusive and automated way of extracting meaningful information such as number of fishes from underwater video footages of fishes, using image processing techniques. Experimental results shows that the system managed to achieve 74.59% accuracy for correctly counting fishes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.