We present analysis methods that may be used to geolocate emitters using one or more moving receivers. While
some of the methods we present may apply to a broader class of signals, our primary interest is locating and
tracking ships from short pulsed transmissions, such as the maritime Automatic Identification System (AIS.)
The AIS signal is difficult to process and track since the pulse duration is only 25 milliseconds, and the pulses
may only be transmitted every six to ten seconds. In this article, we address several problems including accurate
TDOA and FDOA estimation methods that do not require searching a two dimensional surface such as the
cross-ambiguity surface. As an example, we apply these methods to identify and process AIS pulses from a
single emitter, making it possible to geolocate the AIS signal using a single moving receiver.
We present analysis methods that may be used to geolocate emitters using one or more moving receivers. While
some of the methods we present may apply to a broader class of signals, our primary interest is locating and
tracking ships from short pulsed transmissions, such as the maritime Automatic Identification System (AIS.)
The AIS signal is difficult to process and track since the pulse duration is only 25 milliseconds, and the pulses
may only be transmitted every six to ten seconds. Several fundamental problems are addressed, including
demodulation of AIS/GMSK signals, verification of the emitter location, accurate frequency and delay estimation
and identification of pulse trains from the same emitter. In particular, we present several new correlation methods,
including cross-cross correlation that greatly improves correlation accuracy over conventional methods and cross-
TDOA and cross-FDOA functions that make it possible to estimate time and frequency delay without the need of
computing a two dimensional cross-ambiguity surface. By isolating pulses from the same emitter and accurately
tracking the received signal frequency, we are able to accurately estimate the emitter location from the received
Doppler characteristics.
We address the problem of determining the source location of an electromagnetic signal from the signal received by one or more moving receivers. We base our process on cross-spectral methods that were developed in the early 1980’s for analysis and demodulation/despreading of communication and spread spectrum signals and were later applied to speech processing and speech enhancement. In this article, we expand the concept of robust polynomial tracking, which we demonstrate may be used to solve for the emitter location in closed form. This is accomplished by generating and solving a system of equations representing curves, each of which passes through the emitter location.
We present methods for accurately estimating and tracking instantaneous frequency and relative time delay of narrowband signal components. These methods are applied to the problem of estimating the location of an emitter from the signal(s) received by one or more receivers. Both instantaneous frequency estimation and time delay estimation are based on previously reported cross-spectral methods that have been applied successfully to a variety of signal processing problems. Accurate geolocation is accomplished by matching the Doppler characteristics of the received signal to Doppler characteristics estimated from the known emitter motion and possible emitter locations.
We present methods for accurately estimating and tracking instantaneous frequency and relative time delay of
narrowband signal components. These methods are applied to the problem of estimating the location of an
emitter from the signal(s) received by one or more receivers. Both instantaneous frequency estimation and time
delay estimation are based on previously reported cross-spectral methods that have been applied successfully
to a variety of signal processing problems. Accurate geolocation is accomplished by matching the Doppler
characteristics of the received signal to Doppler characteristics estimated from the known emitter motion and
possible emitter locations.
We present a new cross-spectral variation of the Cross-Ambiguity Function (CAF) and demonstrate use of the CS-CAF in obtaining improved Frequency Difference of Arrival (FDOA) estimates. Unlike the conventional CAF process that estimates the FDOA of two signals received by moving receivers, as approximately constant over a short observation time, the CS-CAF models FDOA as a slowly varying continuous function. Under the CS-CAF model, we apply cross-spectral estimation methods to estimate and track the instantaneous FDOA of the received signals. This has two important advantages. The cross-spectral frequency estimation methods are extremely accurate, and by modeling the FDOA as a continuous function of time, we resolve the issue of assigning an event time to the estimated FDOA. In addition, by recovering an FDOA component from the received signals, we may apply LaGrange interpolation to track the instantaneous phase of the FDOA component, enabling an even more accurate estimate of instantaneous FDOA.
KEYWORDS: Receivers, Doppler effect, Transmitters, Signal processing, Fourier transforms, Strontium, Statistical analysis, Signal detection, Signal analyzers, Defense and security
We present two methods to accurately estimate the location of an emitter from the received signal. The first
of these methods is a variation of the standard cross-ambiguity function (CAF) for which we introduce a cross-
spectral frequency estimation technique to replace the conventional methods based on the power spectrum. We
demonstrate the use of the CSCAF in estimating the source location of an RF emission. The CSCAF is computed
as the product of the complex-valued CAF and the conjugate of the time-delayed CAF. The magnitude of the
CSCAF is the conventional CAF energy surface, and the argument of the CSCAF is the unquantized frequency
difference of arrival (FDOA) computed as the phase of the CAF differentiated with respect to time. The
advantage of the CSCAF is that it provides an extremely accurate estimate of FDOA. We demonstrate the use
of the CSCAF in providing emitter location estimates that are superior to those provided by the conventional
CAF. The second method presented provides a precision geolocation of the emitter from the signal received by
a single moving receiver. This method matches the Doppler characteristics of the received signal to the Doppler
characteristics estimated from the geometry of the transmitter and receiver. Both the CSCAF and the single
receiver methods are enabled by cross-spectral frequency estimation methods that provide extremely accurate
frequency estimation and tracking.
Presented is a method to blindly estimate the location of a transmitter from the signal observed by a single
moving receiver. This process is based on the observation that the observed Doppler characteristics are essentially
uniquely determined by the transmission frequency, the location of the transmitter, and the time-varying flight
path of the receiver. We accurately estimate the instantaneous frequency of the received signal and blindly
calculate the transmitted frequency from the received signal and the instantaneous position and velocity of the
receiver. The transmitter location is then estimated by minimizing a cost function representing the difference
between the Doppler characteristics calculated from the relative geometry of the transmitter and receiver and
the Doppler characteristics estimated from the received signal. The method has the advantages that only one
receiving antenna is required and the emitter may be located with no a priori knowledge of the emitter location
or frequency. In addition, the process is essentially independent of the flight path of the receiver.
KEYWORDS: Doppler effect, Signal processing, Receivers, Radar, Signal to noise ratio, Transmitters, Monte Carlo methods, Motion estimation, Correlation function, Fourier transforms
There are many applications for which it is important to resolve the location and motion of a target position.
For the static situation in which a target transmitter and several receivers are not in motion, the target may be
completely resolved by triangulation using relative time delays estimated by several receivers at known locations.
These delays are normally estimated from the location of peaks in the magnitude of the cross-correlation function.
For active radars, a transmitted signal is reflected by the target, and range and radial velocity are estimated
from the delay and Doppler effects on the received signal. In this process, Doppler effects are conventionally
modeled as a shift in frequency, and delay and Doppler are estimated from a cross-ambiguity function (CAF)
in which delay and Doppler frequency shift are assumed to be independent and approximately constant. Delay
and Doppler are jointly estimated as the location of the peak magnitude of the CAF plane. We present methods
for accurately estimating delay for the static case and delay and the time-varying Doppler effects for non-static
models, such as the radar model.
Automatic Identification Systems (AIS) are commonly used in navigation for collision avoidance, and AIS
signals (GMSK modulation) contain a vessel's identity, position, course and speed - information which is
also vital in safeguarding U.S. ports. AIS systems employ Self Organizing Time Division Multiple Access
(SOTDMA) regions in which users broadcast in dedicated time slots to prevent AIS collisions. However,
AIS signals broadcast from outside a SOTMDA region may collide with those originating inside, and
demodulation in co-channel interference is desirable. In this article we compare two methods for performing
such demodulation. The first method involves Laurent's Amplitude Modulated Pulse (AMP) decomposition
of constant amplitude binary phase modulated signals. Kaleh has demonstrated that this method is
highly accurate for demodulating a single GMSK signal in additive Gaussian white noise (AWGN). Here
we evaluate the performance of this Laurent-Kaleh method for demodulating a target AIS signal through a
collision with an interfering AIS signal. We also introduce a second, far simpler demodulation method
which employs a set of filters matched to tribit states and phases of GMSK signals. We compute the bit
error rate (BER) for these two methods in demodulating a target AIS signal through a collision with
another AIS signal, both as a function of the signal-to-interference ratio (SIR), and as a function the carrier
frequency difference (CFD) between the two signals. Our experiments show that there is no outstanding
advantage for either of these methods over a wide range of SIR and CFD values. However, the matched filter
approach is conceptually much simpler, easier to motivate and implement, while the Laurent-Kaleh
method involves a highly complex and non-intuitive signal decomposition.
KEYWORDS: Signal processing, Demodulation, Modulation, Phase shift keying, Electronic filtering, Artificial intelligence, Global system for mobile communications, Interference (communication), Fused deposition modeling, Frequency shift keying
Gaussian Minimum Shift Keying (GMSK) is a modulation method used by GSM phone networks and the
Automatic Identification System (AIS) used by commercial ships. Typically these systems transmit data in
short bursts and accomodate a large number of users by time, frequency and power management. Co-channel
interference is not a problem unless the system is heavily loaded. This system load is a function of the density of
users and the footprint of the receiver. We consider the problem of demodulation of burst GMSK signals in the
presence of severe noise and co-channel interference. We further examine the problem of signal detection and
blind estimation and tracking of all of the parameters required in the demodulation process. These parameters
include carrier frequency, carrier phase, baud rate, baud phase, modulation index and the start and duration of
the signal.
KEYWORDS: Electromagnetism, Signal to noise ratio, Error analysis, Receivers, Transmitters, Super resolution, Statistical analysis, Signal processing, Doppler effect, Motion models
For wide-band transmission, geolocation modeling using the wide-band cross-ambiguity function (WBCAF) is preferable
to conventional CAF modeling, which assumes that the transmitted signal is essentially a sinusoid. We compare
the accuracy of two super-resolution techniques for joint estimation of the time-scale (TS) and TDOA
parameters in the WBCAF geolocation model. Assuming a complex-valued signal representation, both techniques
exploit the fact that the maximum value of the magnitude of the WBCAF is attained when the WBCAF is real-valued.
The first technique enhances a known joint estimation method based on sinc interpolation and 2-D Newton root-finding
by (1) extending the original algorithm to handle complex-valued signals, and (2) reformulating the original algorithm
to estimate the difference in radial velocities of the receivers (DV) rather than time scale, which avoids machine
precision problems encountered with the original method. The second technique makes a rough estimate of TDOA on
the sampling lattice by peak-picking the real part of the cross-correlation function of the received signals. Then, by
interpolating the phase of the WBCAF, it obtains a root of the phase in the vicinity of this correlation peak, which
provides a highly accurate TDOA estimate. TDOA estimates found in this way are differentiated in time to obtain DV
estimates. We evaluate both super-resolution techniques applied to simulated received electromagnetic signals which
are linear combinations of complex sinusoids having randomly generated amplitudes, phases, TS, and TDOA. Over a
wide SNR range, TDOA estimates found with the enhanced sinc/Newton technique are at least an order of magnitude
more accurate than those found with conventional CAF, and the phase interpolated TDOA estimates are 3-4 times
more accurate than those found with the enhanced sinc/Newton technique. In the 0-10 dB SNR range, TS estimates
found with the enhanced sinc/Newton technique are a little more accurate than those found with phase interpolation.
Moreover, the TS estimate errors observed with both super-resolution techniques are too small for a CAF-type grid
search to realize in comparable time.
KEYWORDS: Doppler effect, Receivers, Signal processing, Correlation function, Super resolution, Signal to noise ratio, Error analysis, Electromagnetism, Acoustics, Radar
Presented is a super resolution method for estimating the relative time delay of transmitted and received signals.
The method is applied to the problem of accurately estimating both delay and Doppler effects from transmitted
and received signals where the transmitter and receiver are moving with respect to each other. Unlike conventional
methods based on the cross-ambiguity function (CAF), we use only the crosscorrelation function and estimate
only delay with enough accuracy that accurate scale estimates may be obtained from the delay function. While
CAF processes are two dimensional and are based on a linear approximation of the Doppler process, the method
presented here represents a one dimensional solution based on the exact model of the Doppler process. While
we address the problem in the context of resolving both delay and Doppler, the method may be used to obtain
super resolution estimates of correlation delay in the case that delay is constant.
The conventional cross-ambiguity function (CAF) process assumes that the transmitted signal is a sinusoid
having slowly varying complex modulation, and models a received signal as a delayed version of the transmitted
signal, doppler shifted by the dominant frequency. For wide-band transmitted signals, it is more accurate to
model a received signal as a time-scaled version of the transmitted signal, combined with a time delay, and
wide-band cross-ambiguity models are well-known. We provide derivations of time-dependent wide-band cross-ambiguity
functions appropriate for estimating radar target range and velocity, and time-difference of arrival
(TDOA) and differential receiver velocity (DV) for geolocation. We demonstrate through simulations that for
wide-band transmission, these scale CAF (SCAF) models are signficantly more accurate than CAF for estimating
target range and velocity, TDOA and DV. In these applications, it is critical that the SCAF surface be evaluated
in real-time, and we provide a method for fast computation of the scale correlation in SCAF, using only the
discrete Fourier transform (DFT). SCAF estimates of delay and scale are computed on a discrete lattice, which
may not provide sufficient resolution. To address this issue we further demonstrate simple methods, based on
the DFT and phase differentiation of the time-dependent SCAF surface, by which super resolution of delay and
scale, may be achieved.
In the basic correlation process a sequence of time-lag-indexed correlation coefficients are computed as the inner
or dot product of segments of two signals. The time-lag(s) for which the magnitude of the correlation coefficient
sequence is maximized is the estimated relative time delay of the two signals. For discrete sampled signals, the
delay estimated in this manner is quantized with the same relative accuracy as the clock used in sampling the
signals. In addition, the correlation coefficients are real if the input signals are real. There have been many
methods proposed to estimate signal delay to more accuracy than the sample interval of the digitizer clock,
with some success. These methods include interpolation of the correlation coefficients, estimation of the signal
delay from the group delay function, and beam forming techniques, such as the MUSIC algorithm. For spectral
estimation, techniques based on phase differentiation have been popular, but these techniques have apparently
not been applied to the correlation problem . We propose a phase based delay estimation method (PBDEM)
based on the phase of the correlation function that provides a significant improvement of the accuracy of time
delay estimation. In the process, the standard correlation function is first calculated. A time lag error function
is then calculated from the correlation phase and is used to interpolate the correlation function. The signal
delay is shown to be accurately estimated as the zero crossing of the correlation phase near the index of the
peak correlation magnitude. This process is nearly as fast as the conventional correlation function on which it is
based. For real valued signals, a simple modification is provided, which results in the same correlation accuracy
as is obtained for complex valued signals.
We present a brief review of time-varying spectral analysis and we discuss the applicability of the methods to the case of x-ray bursts where it is known that there are time-varying frequency components. A preliminary analysis is done on x-ray burst the experimental data. Two methods are presented to estimat the instanteous frequency and both methods give approximately the same results.
We argue that when individuals
enunciate sounds which are perceived to be the same, the sounds have the
commonalty that their spectra can be transformed into a new domain which
results in identical spectra except for a speaker dependent translation
factor. We call the transformation function the speech scale. The speech scale
is experimentally obtained. In this paper we explore the mathematical issues
involved and obtain various criteria for when a transformation to a new domain
results in a speaker independent transform.
We address the problem of efficient resolution, detection and estimation of weak tones in a potentially massive
amount of data. Our goal is to produce a relatively small reduced data set characterizing the signals in the environment
in time and frequency. The requirements for this problem are that the process must be computationally
efficient, high gain and able to resolve signals and efficiently compress the signal information into a form that
may be easily displayed and further processed. To meet these requirements, we propose a concentrated peak representation
(CPR) in which the spectral energy is concentrated in spectral peaks, and only the magnitudes and
locations of the peaks are retained. We base our process on the cross spectral representation we have previously
applied to other problems. In selecting this method, we have considered other representations and estimation
methods such as the Wigner distribution and Welch's method. We compare our method to these methods. The
spectral estimation method we propose is a variation of Welch's method and the cross-power spectral (CPS)
estimator which was first applied to signal estimation and detection in the mid 1980's. The CPS algorithm and
the method we present here are based on the principles first described by Kodera et al.
KEYWORDS: Signal to noise ratio, Interference (communication), Time-frequency analysis, Computer simulations, Signal detection, Stars, Defense and security, Fourier transforms, Numerical simulations, Signal processing
We argue that the standard definition of signal to noise ratio may be misleading when the signal
or noise are nonstationary. We introduce a new measure that we call local signal to noise ratio
(LSNR) which is well suited to take into account nonstationary situations. The advantage of our
measure is that it is a local property unlike the standard SNR which is a single number representing
the total duration of the signal. We simulated a number of cases to show that our measure is more
indicative of the noise and signal level for nonstationary situations.
KEYWORDS: Signal processing, Signal detection, Picosecond phenomena, Signal to noise ratio, Time-frequency analysis, Interference (communication), Fourier transforms, Signal analyzers, Environmental sensing, Data modeling
We address the problem of efficient resolution, detection and estimation of weak tones in a potentially massive
amount of data. Our goal is to produce a relatively small reduced data set characterizing the signals in the
environment in time and frequency. The requirements for this problem are that the process must be computationally
efficient, high gain and able to resolve signals and efficiently compress the signal information into a form
that may be easily displayed and further processed. We base our process on the cross spectral representation we
have previously applied to other problems. In selecting this method, we have considered other representations
and estimation methods such as the Wigner distribution and Welch's method. We compare our method to these
methods. The spectral estimation method we propose is a variation of Welch's method and the cross-power
spectral (CPS) estimator which was first applied to signal estimation and detection in the mid 1980's. The CPS
algorithm and the method we present here are based on the principles first described by Kodera et al. now
frequently called the reassignment principle.
A fundamental issue of speech is that when different individuals enunciate the same perceived sound, the corresponding spectra are different, but since we perceive them to be the same, there must have a commonality that the ear extracts to recognize the same perceived sound. In previous publications we have established this commonality and have argued that the "spectra of sounds made by different individuals and perceived to be the same can be transformed into each other by a universal warping function". We call the warping function the speech scale and in previous works we have obtained it experimentally from actual speech. In this paper we give the mathematical equation that allows one to obtain the transformation function so that the transformation results in identical warped spectra except for a translation factor.
We present a method for computing the theoretically exact estimate of the instantaneous frequency of a signal from local values of its short time Fourier transform under the assumption that the complex logarithm of the signal is a polynomial in time. We apply the method to the problem of estimating and separating non-stationary components of a multi-component signal. Signal estimation and separation is based on a linear TF model in which the value of the signal at each time is distributed in frequency. This is a significant departure from the conventional nonlinear model in which signal energy is distributed in time and frequency. We further demonstrate by a simple example that IF estimated by the higher order method is significantly better than previously used first order methods.
KEYWORDS: Time-frequency analysis, Interference (communication), Signal to noise ratio, Signal detection, Sensors, Defense and security, Fourier transforms, Signal processing, Detection and tracking algorithms, Stars
When one calculates a time-frequency distribution of white noise there sometimes appear transients of short duration. Superficially, these transients appear to be real signals but they are not. This comes about by random chance in the noise and also because particular types of distributions do not resolve components well in time. These fictitious signals can be misclassified by detectors and hence it is important to understand their origin and statistical properties. We present experimental studies regarding these false transients, and by simulation we statistically quantify their duration for various distributions. We compare the number and duration of the false transients when different distributions are used.
We describe an algorithm to accurately estimate the carrier frequency
of a single sideband HF speech signal. The algorithm is based on an
outer product applied to a complex-valued time-frequency representation in which instantaneous frequency is encoded as the complex argument and the spectrogram is encoded as magnitude. Simple matrix operations are applied to isolate and estimate the carrier. The algorithm is fast, efficient, easily coded and converges rapidly to a very accurate carrier estimate.
KEYWORDS: Signal processing, Fourier transforms, Fermium, Frequency modulation, Signal detection, Nonlinear optics, Composites, Digital filtering, Linear filtering, Defense and security
We describe a new linear time-frequency paradigm in which the
instantaneous value of each signal component is mapped to the
curve functionally representing its instantaneous frequency.
The transform by which this surface is generated is linear,
uniquely defined by the signal decomposition and satisfies linear
marginal-like distribution properties. We further demonstrate
that such a surface may be estimated from the short time Fourier
transform by a concentration process based on the phase of the STFT
differentiated with respect to time. Interference may be identified
on the concentrated STFT surface, and the signal with the interference
removed may be estimated by applying the linear time marginal to the
concentrated STFT surface from which the interference components have
been removed.
We present new signal processing methods which may be used to estimate a high resolution spectrum. These methods are idally suited to the analysis of non-stationary signal components, such as speech formants or pitch. In addition, we present a new spectral correlation method, which may be used to provide accurate estimates of the frequency differences of the formants of linguistically similar utterances spoken by different speakers. These algorithms are accurate and simple and may be easily automated. These algorithms are based on a deterministic speech model and a cross-spectral representation computed from a short time Fourier transform.
In previous works, Umesh et al, demonstrated that phonetically similar vowels spoken by different individuals are related by a simple translation in a universal warped spectral representation. They experimentally derived this function and called it the “speech-scale”. We present further experimental evidence, based on a large data set, validating the speech-scale. We also estimate speaker-specific scale factors based on the speech-scale, and we present a vowel classification experiment, which demonstrates a significant performance improvement through a normalization based on the speech-scale. The results we present are based on formant estimates of vowels in a Western Michigan vowel database.
Speech is metered if the stresses occur at a nearly regular rate. Metered speech is common in poetry, and it can occur naturally in speech, if the speaker is spelling a word or reciting words or numbers from a list. In radio communications, the CQ request, call sign and other codes are frequently metered. In tactical communications and air traffic control, location, heading and identification codes may be metered. Moreover metering may be expected to survive even in HF communications, which are corrupted by noise, interference and mistuning. For this environment, speech recognition and conventional machine-based methods are not effective.
We describe Time-Frequency methods which have been adapted successfully to the problem of mitigation of HF signal conditions and detection of metered speech. These methods are based on modeled time and frequency correlation properties of nearly harmonic functions. We derive these properties and demonstrate a performance gain over conventional correlation and spectral methods. Finally, in addressing the problem of HF single sideband (SSB) communications,
the problems of carrier mistuning, interfering signals, such as manual Morse, and fast automatic gain control (AGC) must be addressed. We demonstrate simple methods which may be used to blindly mitigate mistuning and narrowband interference, and effectively invert the fast automatic gain function.
We use the tube model of speech production to study the speech-hearing connection. Recently, using real speech we showed that sounds made by different individuals and perceived to be the same can be transformed into each other by a universal warping function. We call the transformation function the speech scale and we have shown that it is similar to the Mel scale. Thus experimentally establishing the speech-hearing connection. In this paper we explore the possible origins of the speech scale and attempt to understand it from the point of view of the tube model of speech. We use the two-tube model for various vowels and study the effect of varying the lengths of the tubes on the location of formant frequencies. We show that if we use the commonly used assumption that the length of the front-tube does not change significantly when compared to the back tube for different individuals enunciating the same sound, then their corresponding formant frequencies are non-uniformly scaled. Using the same method we used for real speech we compute the warping
function.
In speech analysis, a recurring acoustical problem is the estimation of resonant structure of a tube of non-uniform cross-sectional area. We model such tubes as a finite sequence of cylindrical tubes of arbitrary, non-uniform length. From this model, we derive a closed form expression of the resonant structure of the model and analytically derive the boundary conditions for the case of a constant group delay. Since it has been noted in the literature that the group delay of the vocal tract is constant, these boundary conditions hold for the vocal tract. In the limiting case, the
non-uniform tube model reduces the well studied uniform tube model. For this limiting case, we derive an expression of the tube resonant structure in terms of a Fourier transform. Finally, we derive wave equations from the model, which are consistent with the wave equations for the telegraph wire problem.
We have previously reported experimental results that directly connect speech and hearing and lead to the concept of a universal warping function. In this paper we report further experiments based on a large database collected by Hillenbrand et al. These new results further validate the concept of a universal warping function.
We address two classical problems relating to harmonic signals. The first of these is the blind recovery of the carrier of a single sideband-AM communication signal, and the second is the isolation and blind estimation of the fundamental of a time-varying harmonic signal. The methods are based on cross-spectra estimated from the short time Fourier transform, a generalization of the Chinese remainder theorem and joint Fourier and autocorrelation representations of the signal spectrum. These tools are developed, and their utility is demonstrated in the solutions of the two problems. By an additional application of a frequency-lag autocorrelation function, it is demonstrated that the harmonic fundamental can be recovered, even if it is not present in the original spectrum.
KEYWORDS: Databases, Detection and tracking algorithms, Fourier transforms, Signal detection, Signal processing, Californium, Data processing, Fermium, Frequency modulation, Statistical analysis
We describe a fast and efficient algorithm for automatic detection and estimation of the fundamental frequency F0 of a harmonic time-domain signal. The method is based on differentiation of the short time Fourier transform (STFT) phase, which is implemented as a cross-spectral product. In estimating and isolating the fundamental frequency, several enhancement processes are developed and applied to the TF surface to improve the signal quality. We describe the algorithm in detail and demonstrate the processing gain achieved at each step. In addition, we apply the algorithm to human speech to recover the pitch fundamental F0 and report the evaluation of the algorithm's performance on the Western Michigan vowel corpus.
We study time series of the X-ray intensity of the binary XTE-J1550-564 with the goal of estimating its instantaneous power spectrum. We develop a method that, from the initial sequence of photon arrival times, is able to estimate the time-frequency spectrum in conjunction with noise reduction techniques. This method clearly highlights the presence of a quasi-periodic oscillation (QPO), a spectral component the frequency of which changes in time. Furthermore, the QPO is extracted by using signal processing methods in the time-frequency plane. The method is also validated using a synthetic signal to show the quality and reliability of its performance.
Speech is a signal which is produced as a combination of frication and a quasi periodic train of glottal pulses excites the vocal tract and causes it to resonate. Information is encoded on the signal as the vocal tract changes configuration, resulting in a rapid change of the resonant frequencies. We develop methods, based on differentiation of the short time Fourier transform (STFT) phase, which effectively demodulates the speech signal and produces accurate, high resolution time-frequency estimates of both the resonances and the signal excitation. The method effectively condenses the STFT surface along curves representing the instantaneous frequencies of the vocal tract resonances and the channel group delay function.
We address the problem of estimation of biological signal parameters and present methods based on the phase gradient of the short time Fourier transform which may be used to accurately estimate signal parameters. The methods are robust and are well suited to the analysis of non-stationary multicomponent signals. Specifically addressed are the problems of recovery of crisp narrow-band time-frequency representations from very small data sets, accurate estimation of speech formants, blind recovery of the group delay of the transmission channel and equalization of time-frequency representations.
We present a method for accurate estimation of formant frequencies. The method is based on differentiating the phase of the short time Fourier transform. The motivation for the method is its application to the estimation of the recently introduced 'universal warping function' which is aimed at separating the speaker dependence from the phonetic content of a speech utterance. The universal warping function is determined by the nature of the relationship between formants of different speakers for phonetically similar sounds and requires an accurate estimate of formants. The proposed method provides sufficiently accuracy for its estimation.
KEYWORDS: Fourier transforms, Signal detection, Electronic filtering, Signal processing, Digital signal processing, Radar, Statistical analysis, Filtering (signal processing), Multidimensional signal processing, Interference (communication)
We generalized the short time Fourier transform and the cross spectral methods developed by the author and others over the past two decades. The method presented is based on application of a multidimensional Fourier transform. Each iteration of the Fourier transform channelizes the signal into successively narrower channels, and a single application of phase differentiation stabilizes the Fourier phase and effectively places all the signal component in their correct location in the spectrum. Finally, it is shown that it is possible to recover spectral components from spectral observations which are remote from the components being estimated.
KEYWORDS: Databases, Biometrics, Statistical analysis, Detection and tracking algorithms, Algorithm development, Fourier transforms, Ear, Data processing, Signal processing, Standards development
We address the problem of classification of speakers based on measurements of features obtained from their speech. The process is an adaption of biometric methods used to identify people. The process for speech differs since speech is not stationary. We therefore propose the classification of speakers b y the statistical distributions of parameters which may be accurately estimated by modern signal processing techniques. The intent is to develop a speaker clustering algorithm which is dependent of transmission channel and insensitive to language variations, and which may be re-trained, with minimal data, to include a new speaker. We demonstrate effectiveness on the problem of identification of the speakers gender, and present evidence that the methods may be extended to the general problem of speaker identification.
We present a number of methods that use image and signal processing techniques for removal of noise from a signal. The basic idea is to first construct a time-frequency density of the noisy signal. The time-frequency density, which is a function of two variables, can then be treated as an 'image,' thereby enabling use of image processing methods to remove noise and enhance the image. Having obtained an enhanced time-frequency density, one then reconstructs the signal. Various time frequency-densities are used and also a number of image processing methods are investigated. Examples of human speech and whale sounds are given. In addition, new methods are presented for estimation of signal parameters from the time- frequency density.
We present a discussion of methods based on the complex cross- spectrum and the application of these methods to the analysis of speech. The cross spectral methods developed here are an extension of methods developed in the 1980s by one of the authors for accurately estimating stationary and cyclo-stationary parameters of signals buried deep in the noise. Since speech is non-stationary and therefore supports very little integration, the methods have been re-developed to address issues such as non-stationarity, harmonic structures and rapidly changing resonance Cross-spectral methods are presented as complex valued time-frequency surface methods which provide signal parameter estimation by taking advantage of signal structure. These methods have proven to be very powerful.
We report recognition results using scale-transform based cepstral features in a telephone based digit recognition task. The method is based on the use of scale-transform based features for speaker-independent applications, which are insensitive to linear-frequency scaling effects and therefore reduce inter-speaker variability due to differences in vocal-tract lengths. We have implemented a digit recognition task using the proposed scale-transform based features and have compared the recognition accuracy obtained when compared to using mel-cepstrum based front-end features.
KEYWORDS: Fermium, Frequency modulation, Transform theory, Fourier transforms, Receivers, Signal to noise ratio, Ear, Signal analyzers, Filtering (signal processing), Acoustics
We describe experiments that we have performed that address the issue of the relation between the same enunciations by different speakers. Our previous work indicated that frequencies are approximately scaled uniformity. In this paper we report results addressing possible corrections to uniform scaling. Our results show that the scaling is non uniform, that is the format frequencies of different speakers scale differently at different frequencies. We discuss how this leads to the mathematical issue of separating the spectrum into a speaker dependent and speaker independent parts. We introduce the concept of a universal scaling function that is aimed at achieving this separation. The fundamental idea is to find a frequency axis transformation (warping function) which transforms the energy density spectrum (the squared absolute value of the Fourier transform of the enunciation) in such a way that a further Fourier transform of the resulting function achieves this separation. We discuss this procedure and relate it to the scale transform. Using real speech data we obtain such a transformation function. The resulting function is very similar to the Mel scale, which has been previously obtained only from psychoacoustic (hearing based) experiments. That similar scales are obtained from both hearing and speech production (as reported here) is fundamental to the understanding of speech and hearing.
KEYWORDS: Detection and tracking algorithms, Sensors, Algorithm development, Signal detection, Interference (communication), Signal to noise ratio, Signal processing, Signal analyzers, Data analysis, Dimension reduction
Computationally efficient algorithms which perform speech activity detection have significant potential economic and labor saving benefit, by automating an extremely tedious manual process. In many applications, it is desirable to extract intervals of speech which are obtained by segments of other signal types. In the past, algorithms which successfully discriminate between speech and one specific other signal type have been developed. Frequently, these algorithms fail when the specific non-speech signal is replaced by a different non-speech discrimination problem. Typically, several signal specific discriminators are blindly combined with predictable negative results. Moreover, when a large number of discriminators are involved, dimensions reduction is achieved using Principal Components, which optimally compresses signal variance into the fewest number of dimensions. Unfortunately, these new coordinates are not necessarily optimal for discrimination. In this paper we apply graphical tools to determine a set of discriminators which produce excellent speech vs. non-clustering, thereby eliminating the guesswork in selecting good feature vectors. This cluster structure provides a basis for a general multivariate speech vs. non-speech discriminator, which compares very favorably with the TALKATIVE speech extraction algorithm.
In this paper, we present a new class of representations of signals in the time-frequency (TF) plane. These representations are complex valued, linear, and satisfy reconstruction conditions in which the signal and its complex spectrum may be uniquely reconstructed from their TF representation. These surfaces are generalizes of 1D linear transforms with which they share many properties. The primary advantage of these representations is that the phase of the surface may be used to recover signal information which is not contained in real TF surfaces. Linearity guarantees that cross-terms normally associated with TF distributions do not exist in these representations. Several examples of invertible surfaces are presented, and it is demonstrated that these surfaces agree with normal intuition. Finally, a method, based on the phase gradient, is proposed as a method of modifying Fourier surfaces to produce representations which are more focused or more concentrated in time and frequency.
In this paper, we present improvements over the original scale-cepstrum proposed. The scale-cepstrum was proposed as an acoustic feature for speech analysis and was motivated by a desire to normalize the first-order effects of differences in vocal-tract lengths for a given vowel. Our subsequent work has shown that a more appropriate frequency-warping than the log-warping used is necessary to account for the frequency dependency of the scale-factor. Using this more appropriate frequency-warping and a modified method of computing the scale-cepstrum we have obtained improved features that provide better separability between vowels than before, and are also robust to noise. We have used the generalized F-ratio test as a measure of separability and have compared the proposed improved features with the melcepstral features. The data used in the comparison consist of ten vowels extracted from sentences spoken by different speakers in the TIMIT database.
Speech signals have the property that they are broad-band white conveying information at a very low rate. The resulting signal has a time-frequency representation which is redundant and slowly varying in both time and frequency. In this paper, a new method for separating speech from noise and interference is presented. This new method uses image enhancement techniques applied to time- frequency representations of the corrupted speech signal. The image enhancement techniques are based on the assumption that speech and/or the noise and interference may be locally represented as a mixture of two-dimensional Gaussian distributions. The signal surface is expanded using a Hermite polynomial expansion and the signal surface is separated from the noise surface by a principal- component process. a Wiener gain surface is calculated from the enhanced image, and the enhanced signal is reconstructed from the Wiener gain surface using a time varying filter constructed from a basis of prolate-spheroidal filters.
In this paper, we derive a frequency-warping function by analyzing speech data obtained from the TIMIT database. Until now, numerous frequency scales have been proposed, based purely on psychoacoustic studies. Many speech recognition algorithms have been using such frequency scales for the spectral analysis at the signal processing front- end. The motivation for the use of such psychoacoustic frequency scales, is that, since these are based on the properties of the human auditory perception, they may provide accurate representation of the relevant spectral information in speech. Since the preference of one scale over another is ad hoc, and since the goal is to achieve better recognition, experiments are conducted to determine if better recognition rates are indeed obtained using any one such scale. In this paper, we analyze actual speech data, and present evidence of the kind of frequency-warping that may be necessary to achieve speaker-independent recognition of vowels. This provides us with the motivation to use such frequency-warping functions in speech recognition. Surprisingly, the frequency-warping obtained is similar to the Mel-scale obtained from psychoacoustic studies. This suggests that the ear may be using such a frequency-warping to remove extraneous speaker-specific information, while identifying and recognizing phonemes.
KEYWORDS: Fourier transforms, Signal processing, Ear, Modulation, Acoustics, Time-frequency analysis, Signal attenuation, Copper, Signal generators, Defense and security
We argue that an important aspect of the human speech signal is scaling in the frequency domain. We discuss the two physical mechanisms responsible for the scaling. The first mechanism is that when we have a harmonic signal whose fundamental is frequency modulated then the spectrum is the sum of scaled functions. The second comes about from the consideration that while different speakers have very different size vocal tracts (for example an adult and a child), we none the less produce speech which is similar in some sense. We will argue and present evidence to show that the speaker differences result in scaling in the frequency domain. We further discuss how one can handle scale processing.
KEYWORDS: Signal processing, Signal to noise ratio, Detection and tracking algorithms, Modulation, Fermium, Frequency modulation, Signal detection, Algorithm development, Receivers, Amplitude modulation
There are many applications for which it is desireable to reliably detect the presence of speech. Examples of these applications are speech compression, voice activated devices and machine speech recognition. In this paper, a method of speech detection is developed which uses a frequency-domain pitch-based signal-to-noise ratio (SNR) estimate. This method takes full advantage of the spectral structure of pitch, which is the primary speech excitation function. The primary output of the detection algorithm is a decision that speech is present or not present. In addition, the algorithm provides an estimate of the speech SNR which may be used to estimate signal quality. This SNR estimate is important for applications such as estimating the reliability of machine-based recognition processes. Additional advantages of this method are that it is independent of signal gain and it works well under adverse conditions such as poor SNR and in the presence of interference. A by-product of the pitch-based detection process is a method for automatic recovery of frequency offset of mistuned analog speech. Mistuning is a condition which can arise in the demodulation of single-side-band amplitude-modulated (SSB-AM) speech if the precise carrier is not used in the demodulation process. This can cause severe problems since speech becomes nearly unintelligible if it is mistuned more than 100 Hz. The methods presented here use a double complex correlation of the complex speech spectrum to recover the carrier offset. This process provides significantly better resolution than more conventional correlation processes based on the speech power- spectrum.
KEYWORDS: Modulation, Signal processing, Signal to noise ratio, Phase shift keying, Interference (communication), Fourier transforms, Error analysis, Silicon, Signal detection, Signal analyzers
We review the standard frequency domain methods for modulation classification as a baseline for comparison with new approaches. A new procedure based on the singular value decomposition of a particular data matrix is developed. This matrix has some very interesting properties that facilitate a solution of a difficult problem of QPSK versus MSK classification. Performance of the new method is assessed via simulations and compared against those in the published literature.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.