Besides facial expression or gestures, human speech is still the main channel of communication in ordinary human life. In addition to speech content, this signal also contains additional source / human status information. Gender, age, but also the emotional state of man can be extracted from spoken speech. This research is focused on the classification of the emotional state of man, the stress in particular. Accordingly, we have created a speech database of emergency phone calls. The database contains recordings of the Integrated Rescue System (IRS) of 112 emergency line from Czech Republic. It was designed to detect the stress from the human voice. Due to the detection of stress from a neutral (resting) state, the database was divided into neutral speech and human speech in stress. The neutral subgroup consists of voice recordings of the IRS operator. The stress subgroup is made up of people in danger. We have deliberately selected events with great stressful stimuli such as car accident, domestic violence, situations close to death, and so on. The speech signal is then pre-processed and analyzed for the feature extraction. The feature vectors represents classifier input data. Old-fashioned classification methods such as Support Vector Machine (SVM) or k-Nearest Neighbors (k-NN) classifiers and new artificial intelligence methods such as Convolutional Neural Networks (CNN) are used to detect and recognize human stress. The applications of achieved results are broad: from phone services through Smart Health to security components analysis.
This article discusses the speaker identification for the improvement of the security communication between law enforcement units. The main task of this research was to develop the text-independent speaker identification system which can be used for real-time recognition. This system is designed for identification in the open set. It means that the unknown speaker can be anyone. Communication itself is secured, but we have to check the authorization of the communication parties. We have to decide if the unknown speaker is the authorized for the given action. The calls are recorded by IP telephony server and then these recordings are evaluate using classification If the system evaluates that the speaker is not authorized, it sends a warning message to the administrator. This message can detect, for example a stolen phone or other unusual situation. The administrator then performs the appropriate actions. Our novel proposal system uses multilayer neural network for classification and it consists of three layers (input layer, hidden layer, and output layer). A number of neurons in input layer corresponds with the length of speech features. Output layer then represents classified speakers. Artificial Neural Network classifies speech signal frame by frame, but the final decision is done over the complete record. This rule substantially increases accuracy of the classification. Input data for the neural network are a thirteen Mel-frequency cepstral coefficients, which describe the behavior of the vocal tract. These parameters are the most used for speaker recognition. Parameters for training, testing and validation were extracted from recordings of authorized users. Recording conditions for training data correspond with the real traffic of the system (sampling frequency, bit rate). The main benefit of the research is the system developed for text-independent speaker identification which is applied to secure communication between law enforcement units.
This article describes a system for evaluating the credibility of recordings with emotional character. Sound recordings form Czech language database for training and testing systems of speech emotion recognition. These systems are designed to detect human emotions in his voice. The emotional state of man is useful in the security forces and emergency call service. Man in action (soldier, police officer and firefighter) is often exposed to stress. Information about the emotional state (his voice) will help to dispatch to adapt control commands for procedure intervention. Call agents of emergency call service must recognize the mental state of the caller to adjust the mood of the conversation. In this case, the evaluation of the psychological state is the key factor for successful intervention. A quality database of sound recordings is essential for the creation of the mentioned systems. There are quality databases such as Berlin Database of Emotional Speech or Humaine. The actors have created these databases in an audio studio. It means that the recordings contain simulated emotions, not real. Our research aims at creating a database of the Czech emotional recordings of real human speech. Collecting sound samples to the database is only one of the tasks. Another one, no less important, is to evaluate the significance of recordings from the perspective of emotional states. The design of a methodology for evaluating emotional recordings credibility is described in this article. The results describe the advantages and applicability of the developed method.
This paper presents a method for detecting speech under stress using Self-Organizing Maps. Most people who are exposed to stressful situations can not adequately respond to stimuli. Army, police, and fire department occupy the largest part of the environment that are typical of an increased number of stressful situations. The role of men in action is controlled by the control center. Control commands should be adapted to the psychological state of a man in action. It is known that the psychological changes of the human body are also reflected physiologically, which consequently means the stress effected speech. Therefore, it is clear that the speech stress recognizing system is required in the security forces. One of the possible classifiers, which are popular for its flexibility, is a self-organizing map. It is one type of the artificial neural networks. Flexibility means independence classifier on the character of the input data. This feature is suitable for speech processing. Human Stress can be seen as a kind of emotional state. Mel-frequency cepstral coefficients, LPC coefficients, and prosody features were selected for input data. These coefficients were selected for their sensitivity to emotional changes. The calculation of the parameters was performed on speech recordings, which can be divided into two classes, namely the stress state recordings and normal state recordings. The benefit of the experiment is a method using SOM classifier for stress speech detection. Results showed the advantage of this method, which is input data flexibility.
KEYWORDS: Current controlled current source, Mobile communications, Networks, Computer simulations, Telecommunications, Wireless communications, Neodymium, Unmanned aerial vehicles, Modeling and simulation, Defense systems
A mobile ad hoc network is a collection of mobile nodes which communicate without a fixed backbone or centralized infrastructure. Due to the frequent mobility of nodes, routes connecting two distant nodes may change. Therefore, it is not possible to establish a priori fixed paths for message delivery through the network. Because of its importance, routing is the most studied problem in mobile ad hoc networks. In addition, if the Quality of Service (QoS) is demanded, one must guarantee the QoS not only over a single hop but over an entire wireless multi-hop path which may not be a trivial task. In turns, this requires the propagation of QoS information within the network. The key to the support of QoS reporting is QoS routing, which provides path QoS information at each source. To support QoS for real-time traffic one needs to know not only minimum delay on the path to the destination but also the bandwidth available on it. Therefore, throughput, end-to-end delay, and routing overhead are traditional performance metrics used to evaluate the performance of routing protocol. To obtain additional information about the link, most of quality-link metrics are based on calculation of the lost probabilities of links by broadcasting probe packets. In this paper, we address the problem of including multiple routing metrics in existing routing packets that are broadcasted through the network. We evaluate the efficiency of such approach with modified version of DSDV routing protocols in ns-3 simulator.
This article discusses the impact of multilayer neural network parameters for speaker identification. The main task of speaker identification is to find a specific person in the known set of speakers. It means that the voice of an unknown speaker (wanted person) belongs to a group of reference speakers from the voice database. One of the requests was to develop the text-independent system, which means to classify wanted person regardless of content and language. Multilayer neural network has been used for speaker identification in this research. Artificial neural network (ANN) needs to set parameters like activation function of neurons, steepness of activation functions, learning rate, the maximum number of iterations and a number of neurons in the hidden and output layers. ANN accuracy and validation time are directly influenced by the parameter settings. Different roles require different settings. Identification accuracy and ANN validation time were evaluated with the same input data but different parameter settings. The goal was to find parameters for the neural network with the highest precision and shortest validation time. Input data of neural networks are a Mel-frequency cepstral coefficients (MFCC). These parameters describe the properties of the vocal tract. Audio samples were recorded for all speakers in a laboratory environment. Training, testing and validation data set were split into 70, 15 and 15 %. The result of the research described in this article is different parameter setting for the multilayer neural network for four speakers.
KEYWORDS: Global system for mobile communications, Networks, Quality measurement, Multimedia, Open source software, Mobile communications, Molybdenum, Databases, Oxygen, Algorithm development
The paper is focused on the building ad-hoc GSM network based on open source software and low-cost hardware. The created Base Transmission Station can be deployed and put into operation in a few minutes in a required area to ensure private communication between connected GSM mobile terminals. The convergence between BTS station and the other networks is possible through IP network. The paper tries to define connection parameters to provide sufficient quality of voice service between the GSM network and IP Multimedia Subsystem. The paper brings practical results of voice call quality measurement between users inside BTS station mobile network and users inside IP Multimedia Subsystem network. The calls are simulated by low-cost embedded solution for speech quality measurement in GSM network. This tool is under development of our laboratory and allows automatic speech quality measurement of any GSM or UMTS mobile network. The Perceptual Evaluation of Speech Quality method is used to get final comparable results. The communication between BTS station and connected networks has to be secured against the interception from the third party. The influence of the securing method for quality of service is presented in detail. Paper, apart from the quality of service measurement section, describes technical requirements for successful interconnection between BTS and IMS networks. The authentication, authorization and accounting methods in roaming between BTS and IMS system are presented too.
It is well known that Quantum Key Distribution (QKD) can be used with the highest level of security for distribution of the secret key, which is further used for symmetrical encryption. B92 is one of the oldest QKD protocols. It uses only two non-orthogonal states, each one coding for one bit-value. It is much faster and simpler when compared to its predecessors, but with the idealized maximum efficiencies of 25% over the quantum channel. B92 consists of several phases in which initial key is significantly reduced: secret key exchange, extraction of the raw key (sifting), error rate estimation, key reconciliation and privacy amplification. QKD communication is performed over two channels: the quantum channel and the classical public channel. In order to prevent a man-in-the-middle attack and modification of messages on the public channel, authentication of exchanged values must be performed. We used Wegman-Carter authentication because it describes an upper bound for needed symmetric authentication key. We explained the reduction of the initial key in each of QKD phases.
Impact of changes in blood pressure and pulse from human speech is disclosed in this article. The symptoms of increased physical activity are pulse, systolic and diastolic pressure. There are many methods of measuring and indicating these parameters. The measurements must be carried out using devices which are not used in everyday life. In most cases, the measurement of blood pressure and pulse following health problems or other adverse feelings. Nowadays, research teams are trying to design and implement modern methods in ordinary human activities. The main objective of the proposal is to reduce the delay between detecting the adverse pressure and to the mentioned warning signs and feelings. Common and frequent activity of man is speaking, while it is known that the function of the vocal tract can be affected by the change in heart activity. Therefore, it can be a useful parameter for detecting physiological changes. A method for detecting human physiological changes by speech processing and artificial neural network classification is described in this article. The pulse and blood pressure changes was induced by physical exercises in this experiment. The set of measured subjects was formed by ten healthy volunteers of both sexes. None of the subjects was a professional athlete. The process of the experiment was divided into phases before, during and after physical training. Pulse, systolic, diastolic pressure was measured and voice activity was recorded after each of them. The results of this experiment describe a method for detecting increased cardiac activity from human speech using artificial neural network.
Emotional states of humans and their impact on physiological and neurological characteristics are discussed in this
paper. This problem is the goal of many teams who have dealt with this topic. Nowadays, it is necessary to increase the
accuracy of methods for obtaining information about correlations between emotional state and physiological changes. To
be able to record these changes, we focused on two majority emotional states. Studied subjects were psychologically
stimulated to neutral - calm and then to the stress state. Electrocardiography, Electroencephalography and blood pressure
represented neurological and physiological samples that were collected during patient’s stimulated conditions. Speech
activity was recording during the patient was reading selected text. Feature extraction was calculated by speech
processing operations. Classifier based on Gaussian Mixture Model was trained and tested using Mel-Frequency Cepstral
Coefficients extracted from the patient's speech. All measurements were performed in a chamber with electromagnetic
compatibility. The article discusses a method for determining the influence of stress emotional state on the human and
his physiological and neurological changes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.