PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Recent advances in communications signal processing and VLSI technology are fostering tremendous interest in transmitting high-speed digital data over ordinary telephone lines at bit rates substantially above the ISDN Basic Access rate (144 Kbit/s). Two new technologies, high-bit-rate digital subscriber lines and asymmetric digital subscriber lines promise transmission over most of the embedded loop plant at 1.544 Mbit/s and beyond. Stimulated by these research promises and rapid advances on video coding techniques and the standards activity, information networks around the globe are now exploring possible business opportunities of offering quality video services (such as distant learning, telemedicine, and telecommuting etc.) through this high-speed digital transport capability in the copper loop plant. Visual communications for residential customers have become more feasible than ever both technically and economically.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this talk, I will describe two aspects of video compression algorithms: the first area has to do with very low bit rate video coding, and the second one with scalable video compression.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a very low bit-rate coding algorithm based on image split in order to represent it through an adaptive multigrid supported by a binary tree structure. Independently of its tree representation, the picture is segmented via a watershed procedure and several criteria are combined to automatically extract interesting areas of the image. This object information is not transmitted but used to reduce picture complexity, and therefore the bit-rate, while keeping a good subjective quality. This is achieved by a merge procedure which homogenizes values of the tree subblocks belonging to a same non-interesting object. This treatment affects both intra- and inter-images. For intra-images, the resulting tree structure is entropy coded while its leaves are encoded through a DPCM procedure followed by a multi- huffman coder. For inter-images, a motion field is adaptated by an adaptative block matching algorithm which is a kind of BMA for which blocksize is chosen in order to reach a sufficient level of confidence. Residues, essential to correct motion compensation artifacts, are sent through local intra-trees or, if the bit-rate allows it, through DCT blocks, allowing to reach an arbitrary level of quality. During the reconstruction step, an object oriented approach combined with the use of overlapping functions allows to reduce block artifacts while keeping sharp edges.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We make a case that taking the number of bits to code each motion vector into account when estimating motion for video compression results in significantly better performance at low bit rates, using simulation studies on established benchmark videos. First, by modifying a `vanilla' implementation of the H.261 standard, we show that choosing motion vectors explicitly to minimize rate (in a greedy manner), subject to implicit constraints on distortion, yields better rate-distortion tradeoffs than minimizing notions of prediction error. Locally minimizing a linear combination of rate and distortion results in further improvements. Using a heuristic function of the prediction error and the motion vector code-length results in compression performance comparable to the more computationally intensive coders while requiring a practically small amount of computation. We also show that making coding control decisions to minimize rate yields further improvements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents an object-based, object-scalable mesh design and tracking algorithm for very low bitrate video coding, which consists of three stages: object segmentation, object boundary coding, and 2D mesh design and tracking within each object. Here, we use pre- segmented test sequences; hence, object/motion segmentation is not treated. The boundary of each individual object is approximated by a polygon. Next, a node point selection algorithm followed by constrained Delauney triangulation is employed, where line segments representing the boundary of the object polygons from the constraints.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Embedded captions in TV programs such as news broadcasts, documentaries and coverage of sports events provide important information on the underlying events. In digital video libraries, such captions represent a highly condensed form of key information on the contents of the video. In this paper we propose a scheme to automatically detect the presence of captions embedded in video frames. The proposed method operates on reduced image sequences which are efficiently reconstructed from compressed MPEG video and thus does not require full frame decompression. The detection, extraction and analysis of embedded captions help to capture the highlights of visual contents in video documents for better organization of video, to present succinctly the important messages embedded in the images, and to facilitate browsing, searching and retrieval of relevant clips.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The scanline algorithms are popular in a computer graphics for complex geometric manipulations. The main characteristic of the scanline algorithms is that a geometric transformation is decomposed into multiples transforms with each pass operating only along row or column scanlines. This leads to conversion of 2D image manipulation problems to straightforward 1D problems resulting in simple and systematic methods. The goal of this work is to examine the scanline approach for manipulation of transform-compressed images without decompressing them. We show how the scanline algorithms for rotation and projective mapping can be developed for JPEG/DCT images. The performance of the proposed scanline algorithms is evaluated with respect to quality, speed, and control and memory overhead.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The large quantity of data associated with visual information necessitates the use of compression techniques. In this paper, we propose a novel approach (compressed domain technique) to implement spatial scalability directly on the compressed image/video data. In contrast to the spatial domain technique (baseline for comparison), the proposed technique removes the unnecessary decompression and re-compression procedures. The computational complexity is greatly reduced by using certain approximations. We note that depending on the image/video content, only marginal quality degradation (almost unnoticeable subjectively) may be introduced. Simulation results confirm the substantial reductions in computational complexity of the proposed technique at a comparable performance to the spatial domain technique.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To eliminate the `drift' artifacts, to increase the robustness of HDTV signal against transmission noise, and to accrue to users of existing MPEG applications the benefits of forward/backward compatibility, in this paper, we present a scalable coding scheme for HDTV using hierarchical overlapped block motion-compensated (OBMC) wavelet transform and the Lapped Orthogonal Transform (LOT). The effect of `drift' artifacts is one of the major problems in a classical scalable video coding such as frequency scalable coding where different resolutions do not have their own accurate motion information. A data loss concealment scheme is required to provide graceful degradation, especially in HDTV broadcast and transmission environments. Our research not only exploits advantages of discrete wavelet transform (DWT) and hierarchical OBMC to solve the `drift' problem, but also enhances MPEG standard by replacing the DCT with the LOT to improve the data loss concealment and mitigate the degradation of picture quality due to transmission noise. Furthermore, this enhanced MPEG scheme, which is applied to the lowest subband, also provides the compatibility to the widely-adopted MPEG standard and its existing applications. Three resolutions for applications of HDTV, conventional TV, and videophone, are aimed. Optimal bit allocation is developed for each resolution. Simulation results show that very good subjective and objective quality can be achieved for all the three resolutions. The elimination of `drift' artifacts is observed and performances of LOT and DCT incorporating with the DWT are compared in terms of error concealment and coding gain.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We explore here the implementation of Shapiro's embedded zerotree wavelet (EZW) image coding algorithms on an array of parallel processors. To this end, we first consider the problem of parallelizing the basic wavelet transform, discussing past work in this area and the compatibility of that work with the zerotree coding process. From this discussion, we present a parallel partitioning of the transform which is computationally efficient and which allows the wavelet coefficients to be coded with little or no additional inter-processor communication. The key to achieving low data dependence between the processors is to ensure that each processor contains only entire zerotrees of wavelet coefficients after the decomposition is complete. We next quantify the rate-distortion tradeoffs associated with different levels of parallelization for a few variations of the basic coding algorithm. Studying these results, we conclude that the quality of the coder decreases as the number of parallel processors used to implement it increases. Noting that the performance of the parallel algorithm might be unacceptably poor for large processor arrays, we also develop an alternate algorithm which always achieves the same rate-distortion performance as the original sequential EZW algorithm at the cost of higher complexity and reduced scalability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Software implementations of MPEG decompression provide flexibility at low cost but suffer performance problems, including poor cache behavior. For MPEG video, decompressing the video in the implied order does not take advantage of coherence generated by dependent macroblocks and, therefore, undermines the effectiveness of processor caching. In this paper, we investigate the caching performance gain which is available to algorithms that use different traversal algorithms to decompress these MPEG streams. We have found that the total cache miss rate can be reduced considerably at the expense of a small increase in instructions. To show the potential gains available, we have implemented the different traversal algorithms using the standard Berkeley MPEG player. Without optimizing the MPEG decompression code itself, we are able to obtain better cache performance for the traversal orders examined. In one case, faster decompression rates are achieved by making better use of processor caching, even though additional overhead is introduced to implement the different traversal algorithm. With better instruction-level support in future architectures, low cache miss rates will be crucial for the overall performance of software MPEG video decompression.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes an implementation of a software H.261 codec for PC, that takes an advantage of the fast computational algorithms for DCT-based video compression, which have been presented by the author at the February's 1995 SPIE/IS&T meeting. The motivation for developing the H.261 prototype system is to demonstrate a feasibility of real time software- only videoconferencing solution to operate across a wide range of network bandwidth, frame rate, and resolution of the input video. As the bandwidths of current network technology will be increased, the higher frame rate and resolution of video to be transmitted is allowed, that requires, in turn, a software codec to be able to compress pictures of CIF (352 X 288) resolution at up to 30 frame/sec. Running on Pentium 133 MHz PC the codec presented is capable to compress video in CIF format at 21 - 23 frame/sec. This result is comparable to the known hardware-based H.261 solutions, but it doesn't require any specific hardware. The methods to achieve high performance, the program optimization technique for Pentium microprocessor along with the performance profile, showing the actual contribution of the different encoding/decoding stages to the overall computational process, are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The first step of the coding technique proposed in the MPEG standard is motion compensation. It reduces the residual error energy using a fraction of the total bit rate to transmit motion information. Motion compensation is performed using a block matching approach though the algorithm to compute motion vectors is not given in the MPEG standard. Usually, an exhaustive search around the macroblock position is used. This solution (proposed in the test model) gives the lowest error but has the highest complexity. In this work we propose an algorithm that reduces the complexity of the block matching procedure while achieving comparable performance with the exhaustive search. The proposed solution is particularly attractive for the spatially scalable version of the coder when both a full resolution and a spatially downsampled sequence are transmitted. The algorithm uses a multiresolution motion compensation scheme. Exhaustive search block matching is performed in the downsampled sequence and the vector field computed is used as an estimate of the motion vectors for the full resolution sequence. Thus, only a refinement needs to be computed. This allows a consistent reduction of the computation time with respect to exhaustive search at the full resolution level, while the residual error energy increases only slightly.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Wavelet transform has been proven to be a valuable tool for image and video coding applications. Recently, a multiresolution motion estimation (MRME) technique has been proposed for wavelet-based video compression. The MRME technique estimates the motion vector hierarchically from the low resolution to the high resolution wavelet subimages, thereby reducing the computational complexity. In this paper, we propose two techniques to enhance the coding performance of the baseline MRME technique. First, we propose to use an adaptive threshold to determine whether a motion vector should be sent to the receiver resulting in a reduced number of motion vectors and hence lower bit-rate. Secondly, we propose a bi- directional motion estimation technique in the wavelet transform domain. Here, we estimate the temporal flags (direction information) only for the blocks in the lowest resolution subimages and use the same information for the corresponding blocks in the higher resolution subimages. The proposed techniques provide a superior coding performance compared to MRME technique.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Block matching algorithms (BMAs) are often employed for motion estimation (ME) in video coding. Most conventional BMAs treat the ME problem as an optimization problem and employ certain search schemes to find a solution. Except the time-consuming full search algorithms (FS), other fast algorithms such as the three step search (TSS), searching on a reduced search range, cannot guarantee optimal solutions, i.e., search is often trapped at local minima; the ME results are thus usually unsatisfactory. Few of them makes use of the information inherent in the images explicitly. We propose a new ME algorithm which can reduce the search range while guaranteeing global optimality in most cases. Microblock visual patterns are designed to extract edge information to guide block matching: searching is only carried out at places where the real match most likely happens; that is where similar edge features present. The proposed algorithm obtains a speed about 7 - 8 times as fast as that of FS with same search range. The prediction quality is very close to that of FS and much better than that of TSS. It can produce MPEG-1 or MPEG-2 compatible motion vectors, and can be extended to model-based ME. It is suitable for parallel implementation too. Moreover, the visual patterns are potential resources for video indexing to facilitate content-based information retrieval, which is important for applications such as VOD.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present an approach to characterize video sequences using information theoretic measures. This characterization is then used to efficiently represent a volume of video. In a typical video sequence, sometimes texture reveals structure, in other cases motion does it. In addition, the temporal and spatial extents are variables. The attempt of this work is to build this structure by looking at a given region over a multiplicity of frames and scales using entropy measures. We then present a hierarchically structured class of coders that efficiently represent this volume of video. The structure built in the analysis stage is used to control and select amongst this class of coders.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Variable Length Decoders (VLD) constitute one of the principal bottlenecks in building HDTV Decoders. In these systems, the VLD must decode at a rate of about 100 million code words per second or higher. The VLD speed of operation, however, is limited by long propagation delays in the word length decoding loop caused by wide barrel shifting, large table decoding, multiplexing, and arithmetic operations. The stringent speed requirements present major challenges in VLD implementation using current VLSI technology. Several techniques of VLD throughput enhancement are discussed here. Some of these methods increase the speed of the loop hardware, while the others introduce some parallelism in processing the inherently serial bit stream data. These techniques can be combined together or used independently to provide advantages in different applications. The techniques of Type-Independent Length Decoding Loop Acceleration and Scaleable Quasi-Parallel Processing produce very good results in professional applications (studio, medical etc.). For consumer applications, the One-Hot architecture along with the technique of Adaptive Acceleration in Processing of Huffman Coded Bit Streams promise to deliver feasible and inexpensive VLD implementations in VLSI with the benefit of operation at clock rates lower than those required in the architectures traditionally employed. A method of Dynamic State Machine Partitioning in Tree Searching VLD implementations is also considered for a `pre-VLD' application, where word boundary information is extracted to enhance the performance of the actual VLD.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traditional data compression algorithms for 2D images work using the information theoretic paradigm, attempting to reduce redundant information by as much as possible. However, through the use of a depletion algorithm that takes advantage of characteristics of the human visual system, images can be displayed using only half or a quarter of the original information with no appreciable loss of quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The rate control algorithm plays an important role for improving and stabilizing the playback quality for video coded with the MPEG standard. Several optimal control techniques have been proposed to aim at the best possible quality for a given channel rate and buffer size. Some of these approaches are complex in that they require the rate and distortion characteristics of the input data to be measured. This motivates us to pursue a method for approximating the rate and distortion functions to reduce the computations. Previous work has been based to a large extent on modeling the distortion as negative exponential functions of the rate. This type of model ignores many factors in a real MPEG encoding process and is not general enough for all video sources. In this paper, we use piece-wise polynomials to approximate the frame-level rate and distortion. The frame dependency between the predictive frame and its reference frames is also considered in our model. Compared to other models, our method is relatively more complex but gives more accurate results. We observe low average relative model errors, which indicates that the model is accurate for most of the quantization settings. We use the model within our gradient-based rate control algorithm and show how using the model one can closely approximate the solution obtained using the actual data. Finally, we apply a simplified version of the model to a new fast algorithm derived from the MPEG Test Model 5, and demonstrate that both the quality (in terms of PSNR) and stability of the quality can be improved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Effective quantization and rate control are very important issues in real time video compression in order to control picture quality and maintain target bit rate. MPEG-2 test model 5 (TM5) describes an adaptive quantization scheme that exploits human visual system properties to improve subjective quality. The TM5 algorithm, however produces blocking artifacts and distortion in edges on flat background in some pictures. Moreover, the TM5 quantization scheme can not effectively control the quality in predicted pictures, since the activity classification performed in the pixel domain is weakly correlated with the actual quantization performed in the transform domain. In this paper, we propose a new quantization scheme that addresses these deficiencies and also takes into account the variation in compaction property of the DCT kernel as this relates to orientation of edges and structures within a block. The proposed scheme involves a two step activity determination procedure. In the first step, activity is determined based on the actual block variance in the pixel domain and in the second step, a correction factor is applied depending upon the effectiveness of the DCT kernel in the transform domain. Our results show that the proposed scheme results in improved picture quality and better bit-distribution as compared to MPEG-2 TM5.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We describe a motion-compensated filtering scheme for preprocessing of video based on motion vectors computed from an MPEG encoder. A modified clustering filter is used to filter the pixels in the spatiotemporal kernel generated by the motion trajectories. The filter scheme can be incorporated into the encoder with very slight modifications, since it uses the results of the encoder motion estimation to filter each image prior to performing the rest of the encoding functions. In this way, our work differs from previous work that require additional motion estimation. We have tested our scheme with different types of noise. It is observed that the motion estimation in the encoder is more accurate, resulting in the motion-compensated difference signal that has smaller energies. This leads to finer quantization. Visual quality of the compressed video is significantly enhanced. Furthermore, reduced noise leads to better performance of other modules that rely on the statistics of the intermediate signals (for example, the intra/inter coding decision).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The forthcoming introduction of helical scan digital data tape recorders with high access bandwidth and large capacity will facilitate the recording and retrieval of a wide variety of multimedia information from different sources, such as computer data and digital audio and video. For the compression of digital audio and video, the MPEG standard has internationally been accepted. Although helical scan tape recorders can store and playback MPEG compressed signals transparently they are not well suited for carrying out special playback modes, in particular fast forward and fast reverse. Only random portions of a original MPEG bitstream are recovered on fast playback. Unfortunately these shreds of information cannot be interpreted by a standard MPEG decoder, due to loss of synchronization and missing reference pictures. In the EC-sponsored RACE project DART (Digital Data Recorder Terminal) the possibilities for recording and fast playback of MPEG video on a helical scan recorder have been investigated. In the approach we present in this paper, we assume that not transcoding is carried out on the incoming bitstream at recording time, nor that any additional information is recorded. To use the shreds of information for the reconstruction of interpretable pictures, a bitstream validator has been developed to achieve conformance to the MPEG-2 syntax during fast playback. The concept has been validated by realizing hardware demonstrators that connect to a prototype helical scan digital data tape recorder.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The goal of the MOVIE VLSI chip is to facilitate the development of software-only solutions for real time video processing applications. This chip can be seen as a building block for SIMD arrays of processing elements and its architecture has been designed so as to facilitate high level language programming. The basic architecture building block associates a sub-array of computational processors with a I/O processor. A module can be seen as a small linear, systolic-like array of processing elements, connected at each end to the I/O processor. The module can communicate with its two nearest neighbors via two communication ports. The chip architecture also includes three 16-bit video ports. One important aspect in the programming environment is the C-stolic programming language. C-stolic is a C-like language augmented with parallel constructs which allow to differentiate between the array controller variables (scalar variables) and the local variables in the array structure (systolic variables). A statement operating on systolic variables implies a simultaneous execution on all the cells of the structure. Implementation examples of MOVIE-based architectures dealing with video compression algorithms are given.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Massively parallel processing architectures have matured primarily through image processing and computer vision application. The similarity of processing requirements between these areas and video processing suggest that they should be very appropriate for video processing applications. This research describes the use of an associative massively parallel processing based system for video compression which includes architectural and system description, discussion of the implementation of compression tasks such as DCT/IDCT, Motion Estimation and Quantization and system evaluation. The core of the processing system is the ASP (Associative String Processor) architecture a modular massively parallel, programmable and inherently fault-tolerant fine-grain SIMD processing architecture incorporating a string of identical APEs (Associative Processing Elements), a reconfigurable inter-processor communication network and a Vector Data Buffer for fully-overlapped data input-output. For video compression applications a prototype system is developed, which is using ASP modules to implement the required compression tasks. This scheme leads to a linear speed up of the computation by simply adding more APEs to the modules.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A crucial operation in image and video processing applications is affine transforms. Typical applications of affine transforms include fractal block coding, camera operation detection, affine motion estimation, etc. Affine transforms involve complex operations and are hence difficult to implement in real-time. In this paper, we present a novel architecture for real-time implementation of affine transforms. First, we derive two fundamental operations from affine transforms and then propose an efficient method of implementing these operations. As an example of the application of ATP (Affine Transform Processor), we propose a high performance video compression algorithm mapped onto the proposed architecture. This algorithm is based on combined affine transform and vector quantization (ATVQ), where the infra-frame and inter-frame redundancy in the video sequence are exploited through piecewise self-similarity on a block-wise basis within a frame and between frames. ATVQ has the advantages of superior coding performance at a significantly reduced computational complexity. ATVQ has been mapped onto the ATP and real-time execution is demonstrated using a VHDL (VHSIC Hardware Description Language) implementation of ATP.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Conventional approaches for implementing powerful block-matching algorithms suffer from a serious limitation with regard to input data rate. In fact, as soon as the amount of data imported in the motion estimator exceeds a certain limit, these approaches fail in responding to the application specifications. A new approach is presented in this paper that is more cost effective and that performs better in terms of supporting high input data rates.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Block based motion estimation is an efficient interframe predictor, making it an important component in video coding schemes. A significant portion of a video codec's computational budget however, is allocated to the task of computing motion vectors. For low bit-rate video coding applications such as teleconferencing, motion vector information occupies a substantial percentage of the available channel bandwidth. In this paper we present a method that accelerates motion vector computation by using spatio-temporal prediction to bias the search (in a statistical sense) towards the most probable direction of the motion using object trajectories from previously computed frames. Furthermore, since the motion vectors are linearly predicted, they can be coded efficiently. Linear predictive motion vector coding compares favorably to other motion estimation methods and can be incorporated within existing video compression standards.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In block-based motion-compensated video coding, a fixed-resolution motion field with one motion vector per image block is used to improve the prediction of the frame to be coded. All motion vectors are encoded with the same fixed accuracy, typically 1 or 1/2 pixel accuracy. In this work, we explore the benefits of encoding the motion vectors with other accuracies, and of encoding different motion vectors with different accuracies within the same frame. To do this, we analytically model the effect of motion vector accuracy and derive expressions for the encoding rates for both motion vectors and difference frames, in terms of the accuracies. Minimizing these expressions leads to simple formulas that indicate how accurately to encode the motion vectors in a classical block-based motion-compensated video coder. These formulas also show that the motion vectors must be encoded more accurately where more texture is present, and less accurately when there is much interframe noise. We implement video coders based on our analysis and present experimental results on real video frames. These results suggest that our equations are accurate, and that significant bit rate savings can be achieved when our optimal motion vector accuracies are used.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
While block motion compensation has been the preferred method for reducing inter-frame dependencies in most standards for video coding (H.261, MPEG), a new proposal for very low bit rate video coding (H.263) has included overlapped block motion compensation (OBMC) as an optional mode of operation. In this paper, we present fast algorithms for motion estimation when compensating with OBMC. Standard block matching motion vectors are not optimal for OBMC. Our algorithms estimate which block motion vectors yield the most improvements upon optimizing motion, orders the blocks and optimizes motion vectors based on the ordering. The estimation is based on readily available information about block matching, viz., prediction errors over blocks. As simulation results will demonstrate, the algorithms result in near optimal performances at low computational costs. An additional advantage of the algorithms is that they may be terminated after a few motion vectors have been optimized and still result in high performance gains. This is of advantage in situations where the available computational power at the encoder varies (as in a videophony situation where the frame rate adapts depending on scene activity or available band-width) and it becomes desirable that the motion vectors chosen for optimization result in the highest gains possible.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We report two techniques for variable size block matching (VSBM) motion compensation. Firstly an algorithm is described which, based on a quad-tree structure, results in the optimal selection of variable-sized square blocks. It is applied in a VSBM scheme in which the total mean squared error is minimized. This provides the best-achievable performance for a quad- tree based VSBM technique. Although it is computationally demanding and hence impractical for real-time codecs, it does provide a yardstick by which the performance of other VSBM techniques can be measured. Secondly, a new VSBM algorithm which adopts a `bottom-up' approach is described. The technique starts by computing sets of `candidate' motion vectors for fixed-size small blocks. Blocks are then effectively merged in a quad-tree manner if they have similar motion vectors. The result is a computationally-efficient VSBM technique which attempts to estimate the `true' motion within the image. Both methods have been tested on a number of real image sequences. In all cases the new `bottom-up' technique was only marginally worse than the optimal VSBM method but significantly better than fixed-size block matching and other known VSBM implementations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper illustrates a method for affine warping based motion compensation that exploits the same prediction mechanism as the H263 Advanced Prediction Mode, only introducing new constant weighting matrices in the H263 Overlapped Motion Compensation algorithm. In particular we show that, with reference to a regular-mesh based motion estimation algorithm, image prediction using affine morphing can be easily performed with fixed coefficients, when a proper linear resampling is used. The performance of a H263-like coder based on affine transformation and linear resampling is illustrated through experimental data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a motion field segmentation scheme for video compression is presented. A split- and-merge segmentation technique and linear regression are used to segment the field and an affine motion model is used to describe the movements of the regions. In the regression, a linearization of the displaced frame difference is minimize directly. The results are compared with block-based motion estimates and MPEG style coding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Providing high bit rate real-time video services has been a major driving factor in the advancement of high speed networking technology such as ATM based BISDN. In this paper, we describe MPEG2Tool, an X-window-based software implementation of the MPEG-2 video compression algorithm with many additional useful functions. The ultimate goal of designing this toolkit was to facilitate the study of MPEG video transmission over ATM-based networks. The toolkit consists of four major modules, which appear as four push-buttons in the main Motif menu: (1) encoding, (2) statistical analysis, (3) transmission simulation, and (4) decoding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video data encoded using the Motion Picture Experts Group (MPEG) standards is highly susceptible to errors during transmission or storage. We investigate whether the error tolerance of coded MPEG video can be improved by varying the size of each slice depending on the class of coded picture. We evaluate the effect of varying the slice size within each of the three MPEG picture classes on the error tolerance of the coded sequence. We encode a number of test sequences and subject each one to simulated transmission errors. We show that reducing the slice size in I and P pictures improves the decoded quality in the presence of errors. The slice size in B pictures can be increased without significantly reducing the tolerance to errors. Reducing the slice size in I and P pictures whilst increasing the slice size in B pictures can significantly improve the tolerance of the coded sequence to transmission errors without increasing the amount of coded data. A video sequence encoded in this way complies with the MPEG1 and MPEG2 standards.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we present some error concealment techniques for MPEG-2 video coded and multiplexed streams damaged by ATM cell losses. Decoder early resynchronization limits the effects of transmission errors by decoding some information that is normally discarded from the damaged MPEG-2 video bitstreams. A part of this information cannot be completely decoded due to its differential coding among macroblocks (DC levels, motion vectors). Three different techniques are presented for the case of DC level recovery in Intra pictures. Two of them are predictive techniques, one operating in the frequency domain and the other in the spatial domain. The third technique provides an exact reconstruction of DC values using special data coded into the user data area of the MPEG-2 video bitstream. For not resynchronized areas classical temporal and spatial concealment techniques are used. These techniques have been tested on a simulated environment which includes an implementation of MPEG-2 elementary video coding and decoding and MPEG-2 system standards. The ATM transmission part has been simulated by means of specialized simulation software. Results relative to the presented concealment techniques are included.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper considers the problem of data recovery and reconstruction in erroneous MPEG-2 video sequences. The basic resynchronization point in MPEG-2 video bitstream is the slice header, a slice usually denoting a full row of macroblocks. When an error occurs, the rest of the damaged slice is lost up to the next slice. In order to improve the efficiency of conventional error concealment schemes, we propose to exploit the error detection information to force the decoding of error-free bits immediately after a lost area, before reaching either the next resynchronization point or the next erroneous area. This early resynchronization is achieved by trying to decode variable length codes until some macroblocks are recognized. In order to retrieve differentially coded data such as DC coefficients and macroblocks positions, a specific algorithm has been designed which uses not only available AC coefficients and neighboring data, but also differential values decoded from the early-resynchronized bitstream. This algorithm has been embedded in an MPEG-2 software decoder, combined with classical spatial and temporal error concealment techniques to interpolate the remaining lost areas. Simulation results show that up to 70% of the lost macroblocks can be retrieved in an intraframe coded picture. This retrieval along with temporal propagation yields a gain of several dBs as well as a visual enhancement over the whole sequence.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Rate control is considered as an important issue in video coding, since it significantly affects video quality. In this paper, we will discuss joint encoder and channel rate control for variable bit-rate (VBR) video over packet-switched Asynchronous Transfer Mode (ATM) networks. Since a variable bit-rate traffic is allowed in such networks, an open-loop encoder without rate control can generate consistent-quality video. But in order to improve statistical multiplexing gain (SMG), an encoder buffer is essential to smooth highly variable video bitstream. Due to the finite buffer size, some forms of encoder rate control have to be enforced and consequently, video quality varies. We argue that a rate control scheme has to balance both issues of consistent video quality on the encoder side and bitstream smoothness for SMG on the network side. We present a joint encoder and channel rate control algorithm for ATM networks, with leaky buckets as open-loop source flow control models. The encoder rate control is separated into a sustainable-rate control and a unidirectional instantaneous-rate control. It can improve the problem of leaky bucket saturation exhibited in previous works. Experimental results with MPEG video will be presented. The results verify our analysis and show the effectiveness of the proposed algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The effects of digital transmission errors on H.263 codecs are analyzed and the transmission of H.263 coded video over a TDMA radio link is investigated. Numerical results for the channel SNR required for providing acceptable video quality under various channel coding and interleaving strategies are presented. Fading on radio channels causes significant transmission errors and H.263 coded bit streams are very vulnerable to errors. Therefore, powerful forward error correction (FEC) codes are necessary to protect the data so that it can be successfully transmitted at acceptable signal power levels. However, FEC imposes a high bandwidth overhead. In order to make best use of the available channel bandwidth and to alleviate the overall impact of errors on the video sequence, a two-layer source coding and unequal error protection scheme based on H.263 is also studied. The scheme can tolerate more transmission errors and leads to more graceful degradation in quality when the channel SNR decreases. In lossy environments, it yields better video quality at no extra bandwidth cost.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A binocular disparity based segmentation scheme to compactly represent one image of a stereoscopic image pair given the other image was proposed earlier by us. That scheme adapted the excess bitcount, needed to code the additional image, to the binocular disparity detail present in the image pair. This paper addresses the issue of extending such a segmentation in the temporal dimension to achieve efficient stereoscopic sequence compression. The easiest conceivable temporal extension would be to code one of the sequences using an MPEG-type scheme while the frames of the other stream are coded based on the segmentation. However such independent compression of one of the streams fails to take advantage of the segmentation or the additional disparity information available. To achieve better compression by exploiting this additional information, we propose the following scheme. Each frame in one of the streams is segmented based on disparity. An MPEG-type frame structure is used for motion compensated prediction of the segments in this segmented stream. The corresponding segments in the other stream are encoded by reversing the disparity-map obtained during the segmentation. Areas without correspondence in this stream, arising from binocular occlusions and disparity estimation errors, are filled in using a disparity-map based predictive error concealment method. Over a test set of several different stereoscopic image sequences, high perceived stereoscopic image qualities were achieved at an excess bandwidth that is roughly 40% above that of a highly compressed monoscopic sequence. Stereo perception can be achieved at significantly smaller excess bandwidths, albeit with a perceivable loss in the image quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A video interpolation algorithm for spatio-temporal video pyramid coding is presented. The temporal and the spatial predictions of the present frame are interactively combined to produce the best prediction of the present frame. For this purpose, the temporal prediction error variance and the lower resolution image variance are measured and compared for each region. The proposed region-based video interpolator is shown to give visually correct predicted images by minimizing the spatial and the temporal prediction artifacts such as aliasing, uncovered background and blocking effect. Promising results are obtained for lossy compression.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present a new algorithm that adaptively selects the best possible reference frame for the predictive coding of generalized, or multi-view, video signals, based on estimated prediction similarity with the desired frame. We define similarity between two frames as the absence of occlusion, and we estimate this quantity from the variance of composite displacement vector maps. The composite maps are obtained without requiring the computationally intensive process of motion estimation for each candidate reference frame. We provide prediction and compression performance results for generalized video signals using both this scheme and schemes where the reference frames were heuristically pre- selected. When the predicted frames were used in a modified MPEG encoder simulation, the signal compressed using the adaptively selected reference frames required, on average, more than 10% fewer bits to encode than the non-adaptive techniques; for individual frames, the reduction in bits was sometimes more than 80%. These gains were obtained with an acceptable computational increase and an inconsequential bit-count overhead.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper addresses the problem of quality estimation of digitally coded video sequences. The topic is of great interest since many products in digital video are about to be released and it is thus important to have robust methodologies for testing and performance evaluation of such devices. The inherent problem is that human vision has to be taken into account in order to assess the quality of a sequence with a good correlation with human judgment. It is well known that the commonly used metric, the signal-to-noise ratio is not correlated with human vision. A metric for the assessment of video coding quality is presented. It is based on a multi- channel model of human spatio-temporal vision that has been parameterized for video coding applications by psychophysical experiments. The visual mechanisms of vision are simulated by a spatio-temporal filter bank. The decomposition is then used to account for phenomena as contrast sensitivity and masking. Once the amount of distortions actually perceived is known, quality estimation can be assessed at various levels. The described metric is able to rate the overall quality of the decoded video sequence as well as the rendition of important features of the sequence such as contours or textures.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a performance comparison of different loop filtering techniques in a generic hybrid video coding algorithm. This study will compare the performance of the filtering techniques by integrating each of the loop filters separately into a MPEG-1 compliant codec, and coding a number of video sequences at various bit-rates and motion compensation (MC) accuracies. The performance of the filters will be assessed in terms of the energy of the displaced frame difference. Comparisons are conducted between four filtering techniques: (1) the 1:2:1 loop filter described in ITU Recommendation H.261; (2) an MC-accuracy dependent 3-tap filter, whose tap weights are based upon a first-order Markov model of the source; (3) a spatially-adaptive filter for the blocking-effect based on the theory of Projections Onto Convex Sets (POCS); and (4) an anisotropic filter for the reduction of the blocking-effect. Our results will examine the effect that traditional low-pass loop filters have on MC prediction quality, and compare this to a POCS-based loop filter. The filtering of only the blocking-effect will also provide an indication of the contribution that the blocking-effect has on the overall high- frequency distortions that are reduced by the low-pass loop filters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.