Journal of Electronic Imaging, October - December 2009
J. Electron. Imaging 18, 040501 (2009) (3 pages)
©2009 SPIE and IS&T. All rights reserved.

Up: Issue Table of Contents
Go to: Previous Article | Next Article
Other formats: HTML (smaller files) | PDF ( kB)

High-precision synchronization of video cameras using a single binary light source

Qi Zhao1 and
Yan Qiu Chen2

1Fudan University, School of Information Science and Engineering, Shanghai, China
2Fudan University, School of Computer Science, Shanghai, China

(Received: 3 April 2009; revised: 25 August 2009; accepted: 26 August 2009; published online: 20 October 2009)

Camera synchronization is necessary for multicamera applications. We propose a simple and yet effective approach termed random on-off light source (ROOLS) to synchronize video sequences. It uses a single light source such as an LED to generate a random binary valued signal that is captured by the video cameras. The captured binary-valued sequences are then matched and the temporal offset of the cameras is computed up to subframe interval precision. We test the proposed method on synchronizing video sequences captured under a variety of illumination conditions and the results are verified against the ground truth provided by an LED array clock. The main contribution of the proposed method is that it reliably achieves high-precision synchronization at a low cost of only adding a simple light source. In addition, it is suited for synchronization in both laboratory and outdoor environments. ©2009 SPIE and IS&T


Contents

Introduction

For multicamera systems, synchronization is a must to provide accurate temporal correlation for incorporating image information from multiple viewpoints.

Synchronicity can be achieved through real-time hardware synchronization1 or by establishing a time relationship between sequences recorded by unsynchronized video cameras.2 While ensuring high precision synchronization, hardware solutions are costly and complex.

In a scenario where synchronous video sequences provided by hardware are not feasible, it is still possible to obtain synchronicity using image features.3,4,5 These feature-based methods depend on the existence of salient and robust features in the scene. The failure of such features to exist in the scene and the error of detecting, tracking, and matching them would lead to incorrect synchronization.

In this paper, we present a simple and yet effective method, termed random on-off light source (ROOLS), to recover the temporal offset at subframe accuracy. It utilizes an auxiliary light source such as an LED to provide temporal cues. Compared to special-purpose hardware approaches, our method is far less complex and is inexpensive. Compared to feature-based approaches, ROOLS is more robust since it is completely independent of scene properties.

Problem Statement

Without loss of generality, we consider the case of two video cameras. Let the time instances of the video frames taken by the alpha'th camera be denoted by

<b>T</b><sup><i>alpha</i></sup> = (<i>t</i><sub>1</sub><sup><i>alpha</i></sup>,<i>t</i><sub>2</sub><sup><i>alpha</i></sup>,…,<i>t</i><sub><i>N</i><sub><i>alpha</i></sub></sub><sup><i>alpha</i></sup>),   <i>alpha</i> [is-an-element-of] {1,2},  <i>N</i><sub><i>alpha</i></sub> [is-an-element-of] [openface N],1

where Nalpha denotes the length of the alpha'th sequence, and t<sub><i>k</i></sub><sup><i>alpha</i></sup> denotes the time of the k'th frame in the alpha'th sequence. Note that T1 and T2 are measured by a common clock.

In a typical situation, identical video cameras of constant frame interval DeltaT are used, when

<b>T</b><sup><i>alpha</i></sup> = (<i>t</i><sub>1</sub><sup><i>alpha</i></sup>,<i>t</i><sub>1</sub><sup><i>alpha</i></sup> + <i>Delta</i> <i>T</i>,…,<i>t</i><sub>1</sub><sup><i>alpha</i></sup> + (<i>N</i><sub><i>alpha</i></sub> − 1) × <i>Delta</i> <i>T</i>).2

Synchronizing two video sequences in such situation is equivalent to measuring the temporal offset between their initial frames

<i>T</i><sub>diff</sub> = <i>t</i><sub>1</sub><sup>1</sup> − <i>t</i><sub>1</sub><sup>2</sup>.3

Proposed Method

A.Formulation

We propose to use a single temporally coded light source such as an LED as the signal to be captured by the cameras for synchronization. The light signal is essentially a time-continuous binary-valued function denoted as

<i>f</i>:<i>R</i> [mapsto] {0,1}.4

It is sampled at Talpha by the alpha'th camera, producing a time-discrete binary-valued sequence

[<i>f</i><sub>cam</sub><sup><i>alpha</i></sup>(<i>n</i>)]<sub><i>n</i> = 1</sub><sup><i>N</i><sub><i>alpha</i></sub></sup> = [<i>f</i>(<i>t</i><sub><i>n</i></sub><sup><i>alpha</i></sup>)]<sub><i>n</i> = 1</sub><sup><i>N</i><sub><i>alpha</i></sub></sup> = {<i>f</i>[(<i>n</i> − 1)<i>Delta</i> <i>T</i> + <i>t</i><sub>1</sub><sup><i>alpha</i></sup>]}<sub><i>n</i> = 1</sub><sup><i>N</i><sub><i>alpha</i></sub></sup>.5

Since f is binary valued, it can be characterized by time instants where its function value rises from 0 to 1 or drops from 1 to 0. We term each of these instants a transition event. Let Phi denote a subsequence of all transition events in f,

<b> <i>Phi</i> </b> = (<i>phi</i><sub>1</sub>,<i>phi</i><sub>2</sub>,…,<i>phi</i><sub><i>N</i></sub>).6

For each phik, we have

[for all] <i>epsilon</i> > 0,<i>f</i>(<i>phi</i><sub><i>k</i></sub> − <i>epsilon</i>) [direct-sum] <i>f</i>(<i>phi</i><sub><i>k</i></sub> + <i>epsilon</i>) = 1,7

where [direct-sum] denotes the exclusive or operator, and epsilon is an arbitrarily small real positive number. For the alpha'th camera, let Phialpha denote the transition events corresponding to Phi. Obviously, as part of Talpha, Phialpha can be expressed as follows:

<b> <i>Phi</i> </b><sup><i>alpha</i></sup> = [<i>t</i><sub><b>I</b><sup><i>alpha</i></sup>(1)</sub><sup><i>alpha</i></sup>,<i>t</i><sub><b>I</b><sup><i>alpha</i></sup>(2)</sub><sup><i>alpha</i></sup>,…,<i>t</i><sub><b>I</b><sup><i>alpha</i></sup>(<i>N</i>)</sub><sup><i>alpha</i></sup>],8

where Ialpha is a subsequence of (1,2,…,Nalpha). For each Ialpha(k), it satisfies

<i>f</i><sub>cam</sub><sup><i>alpha</i></sup>[<b>I</b><sup><i>alpha</i></sup>(<i>k</i>)] [direct-sum] <i>f</i><sub>cam</sub><sup><i>alpha</i></sup>[<b>I</b><sup><i>alpha</i></sup>(<i>k</i>) − 1] = 1.9

This notation and their relationship are illustrated in Fig. 1.

Figure 1.

B.Achieving Sub-Frame-Interval Precision

Given Phi and Phialpha, consider the difference between a pair of corresponding transition events:

<i>delta</i><sub><i>k</i></sub><sup><i>alpha</i></sup> = <b> <i>Phi</i> </b><sub><i>k</i></sub><sup><i>alpha</i></sup> − <b> <i>Phi</i> </b><sub><i>k</i></sub> = <i>t</i><sub><b>I</b><sup><i>alpha</i></sup>(<i>k</i>)</sub><sup><i>alpha</i></sup> − <i>phi</i><sub><i>k</i></sub>.10

Provided that delta<sub><i>k</i></sub><sup><i>alpha</i></sup> is an independent and identically distributed (i.i.d.) random variable uniformly distributed over [0,DeltaT] with mean and variance being µ=DeltaT/2 and sigma2=DeltaT2/12. The averaged difference overline( <i>delta</i><sub><i>alpha</i></sub>)=1/N[summation]<sub><i>k</i> = 1</sub><sup><i>N</i></sup>delta<sub><i>k</i></sub><sup><i>alpha</i></sup> is a random variable with mean and variance being <i>µ</i>-tilde =µ and <i>sigma</i>-tilde2=(1/N)sigma2. However,

overline( <i>delta</i><sub><i>alpha</i></sub>) = (1/<i>N</i>)[summation]<sub><i>k</i> = 1</sub><sup><i>N</i></sup>[<i>t</i><sub><b>I</b><sup><i>alpha</i></sup>(<i>k</i>)</sub><sup><i>alpha</i></sup> − <i>phi</i><sub><i>k</i></sub>] = overline(<i>t</i><sub><b>I</b><sup><i>alpha</i></sup></sub>) − overline( <i>phi</i> ),11

where overline(<i>t</i><sub><b>I</b><sup><i>alpha</i></sup></sub>) and overline( <i>phi</i> ) are the mean positions of transition events with respect to f<sub>cam</sub><sup><i>alpha</i></sup> and f, as illustrated by Fig. 1. According to central limit theorem,6 the averaged sum of a sufficiently large number of i.i.d. random variables each with finite mean and variance approximates normal distribution. Hence, overline( <i>delta</i><sub><i>alpha</i></sub>) takes a normal distribution with mean of µ and variance of DeltaT2/12N. From Eq. (11) we obtain

overline(<i>t</i><sub><b>I</b><sup>1</sup></sub><sup>1</sup>) − overline(<i>t</i><sub><b>I</b><sup>2</sup></sub><sup>2</sup>) = overline( <i>delta</i><sub>1</sub>) − overline( <i>delta</i><sub>2</sub>).12

Since overline(<i>t</i><sub><b>I</b><sup><i>alpha</i></sup></sub><sup><i>alpha</i></sup>)=(1/N)[summation]<sub><i>k</i> = 1</sub><sup><i>N</i></sup>{t<sub>1</sub><sup><i>alpha</i></sup>+[Ialpha(k)−1]}, Eq. (12) can be further written as

          (13)

where T<sub>diff</sub><sup>[prime]</sup>=Tdiff/DeltaT, delta[prime]=(overline( <i>delta</i><sub>1</sub>)overline( <i>delta</i><sub>2</sub>))/DeltaT~[script N](0,1/6N). When N is sufficiently large, the variance of delta[prime] will be negligibly small, leading to high precision estimation of T<sub>diff</sub><sup>[prime]</sup>.

C.Transition Detection Accuracy

In real-world applications, the binary sequence is obtained through quantifying the image intensity of the light source by certain threshold tau. For samples crossing transition events, the quantified binary value might flip, causing the transition event to shift one frame backward. If we take a sample right before the edge where signal rises from 0 to 1 for instance, its intensity is close to 1 and incorrectly quantified to 1. Equation (13) tells us that the shift will introduce additional error to the estimation. Suppose the light source intensity is normalized and tau=0.5 is chosen as the threshold so that the probabilities for transitions to flip from 0 to 1 and 1 to 0 are identical. Let x<sub><i>i</i></sub><sup><i>alpha</i></sup> denote a single shift event in a video sequence alpha. Its probability density function (pdf) is p(x<sub><i>i</i></sub><sup><i>alpha</i></sup>=−1)=p(x<sub><i>i</i></sub><sup><i>alpha</i></sup>=0)=0.5, with expectation µx<sub><i>i</i></sub><sup><i>alpha</i></sup>=−0.5 and variance sigmax<sub><i>i</i></sub><sup><i>alpha</i></sup>=0.25. Let overline(<i>x</i><sup><i>alpha</i></sup>)=(1/N)[summation]<sub><i>i</i> = 1</sub><sup><i>N</i></sup>x<sub><i>i</i></sub><sup><i>alpha</i></sup> denote averaged transition shift. Because x<sub><i>i</i></sub><sup><i>alpha</i></sup> are i.i.d. random variables, once again by the use of central limit theorem, we have overline(<i>x</i><sup><i>alpha</i></sup>)~[script N](−0.5,1/4N). According to Eq. (13), the extra error deltas introduced by transition shift in two video sequences turns out to be overline(<i>x</i><sup>1</sup>)overline(<i>x</i><sup>2</sup>). It can be proved that deltas takes a normal distribution with mean of 0 and variance of 1/2N. Counting in deltas, the total error of temporal offset estimation is

<i>delta</i> = <i>delta</i><sub><i>s</i></sub> + <i>delta</i><sup>[prime]</sup> ~ [script N](0,(1/(1.5<i>N</i>))).14

D.Random Binary Sequence Design

The proposed method requires delta<sub><i>k</i></sub><sup><i>alpha</i></sup>=t<sub><b>I</b><sup><i>alpha</i></sup>(<i>k</i>)</sub><sup><i>alpha</i></sup>phik to be i.i.d. random variables uniformly distributed in [0,DeltaT]. To achieve this, we set the transition time

<i>phi</i><sub><i>k</i></sub> = <i>phi</i><sub><i>k</i> − 1</sub> + <i>chi</i>,15

where phik−1 is the time of the previous transition and chi is uniformly distributed in [iotaDeltaT,(iota+kappa)DeltaTiota,kappa[is-an-element-of][openface N]. Transition time generated in this way can be proved to ensure delta<sub><i>k</i></sub><sup><i>alpha</i></sup> meeting the requirement.

E.Transition Matching

The estimation of the temporal offset requires the transition events of the two cameras be matched. We refer to this process as transition matching. Let the segment

between two consecutive transition events be denoted as Dalpha(k):

<b>D</b><sup><i>alpha</i></sup>(<i>k</i>) = {<i>f</i><sub>cam</sub><sup><i>alpha</i></sup>[<b>I</b><sup><i>alpha</i></sup>(<i>k</i> + 1)] − <i>f</i><sub>cam</sub><sup><i>alpha</i></sup>[<b>I</b><sup><i>alpha</i></sup>(<i>k</i>)]} × [<b>I</b><sup><i>alpha</i></sup>(<i>k</i> + 1) − <b>I</b><sup><i>alpha</i></sup>(<i>k</i>)].16

The binary sequence [f<sub>cam</sub><sup><i>alpha</i></sup>(n)]<sub><i>n</i> = 1</sub><sup><i>N</i><sub><i>alpha</i></sub></sup> can be equivalently represented by a sequence of transition segments. Let lambda(i,j)=|D1(i)−D2(j)| denote the difference between two segments. Transition matching can be equivalently achieved by matching two sequences of transition segment. Based on the observation that the difference of corresponding segments would be small, optimal transition matching can be obtained by solving the following formula:

arg  min-[under <i>l</i>,<i>i</i>,<i>j</i>]((([summation]<sub><i>gamma</i> = 0</sub><sup><i>l</i> − 1</sup> <i>lambda</i>(<i>i</i> + <i>gamma</i>,<i>j</i> + <i>gamma</i>))/(max[[summation]<sub><i>gamma</i> = 0</sub><sup><i>l</i> − 1</sup>|<b>D</b><sup>1</sup>(<i>i</i> + <i>gamma</i>)|,[summation]<sub><i>gamma</i> = 0</sub><sup><i>l</i> − 1</sup>|<b>D</b><sup>2</sup>(<i>j</i> + <i>gamma</i>)|])) + exp{−(([summation]<sub><i>gamma</i> = 0</sub><sup><i>l</i> − 1</sup>|<b>D</b><sup>1</sup>(<i>i</i> + <i>gamma</i>)| + [summation]<sub><i>gamma</i> = 0</sub><sup><i>l</i> − 1</sup>|<b>D</b><sup>2</sup>(<i>j</i> + <i>gamma</i>)|)/(2  max[[summation]<sub><i>k</i> = 1</sub><sup><i>M</i><sub>1</sub> − 1</sup>|<b>D</b><sup>1</sup>(<i>k</i>)|,[summation]<sub><i>k</i> = 1</sub><sup><i>M</i><sub>2</sub> − 1</sup>|<b>D</b><sup>2</sup>(<i>k</i>)|]))}),17

where l denotes the number of overlapping segments, and M1 and M2 are the lengths of the two segment sequences. The first term assesses the similarity of two overlapping segment sequences. Howerver, considering only the first term possibly leads to erroneous matching due to short overlapping length l. To avoid this, the second term is introduced to give a large penalty for small l. The optimal solution is sought by evaluating all combinations of i and j.

Experiments

A.Hardware and Configuration

The experiment system is made up of two Sony HVR-V1 high-definition (HD) video cameras, an LED array clock providing the ground truth, and a single temporally encoded LED light source. The video cameras operate at 200 frames per second. The values of iota and kappa in Eq. (15) are set to 2 and 6, which ensures there are at least two frames between adjacent transition events and avoids ambiguity in transition matching. We selected tau=0.5 to quantize the image intensity of the LED a binary value.

B.Experiment Results

We conducted three groups of experiments under illumination conditions including daylight, fluorescent lighting, and darkness. The results are shown in Table I. In all the tests, only 200 transition events were used. The average estimation error was about 0.08 frame intervals. We observe that all estimation errors are less than 0.2 frame intervals. This would be explained later.

C.Comprison with Other Methods

The proposed method was compared with the feature-based approaches3,4,5 and the results are summarized in Table II. The comparison indicates ROOLS achieves higher estimation accuracy than existing approaches.

D.Analysis and Discussions

The property of normal distribution states that 3 standard deviations from the mean account for about 99.7% of the distribution. When N=200 transitions are used, according to Eq. (14), the standard deviation is about 0.05. The estimation error is bounded in 3×0.05=0.15 frame intervals. This explains why the estimation error in Table I are bounded in 0.2 frame intervals. The performance of the proposed method can be improved by increasing N.

Conclusion

We presented an innovative approach toward synchronizing commercial video cameras. It achieved high-precision synchronization at the low cost of adding only a simple temporally coded light source. The proposed method requiring the video cameras to have identical frame rates is not a serious limitation since using identical video cameras for one task is convenient and typical.

Acknowledgments

The research work presented in this paper is supported by National Natural Science Foundation of China, Grant No. 60875024.

REFERENCES


  1. T. Kanade, H. Saito, and S. Vedula, The 3D Room: Digitizing Time-Varying 3D Events by Synchronized Multiple Video Streams, The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA (1998). first citation in article
  2. A. Whitehead, R. Laganiere, and P. Bose, “Temporal synchronization of video sequences in theory and in practice,” in Proc. IEEE Workshop on Motion and Video Computing, Vol. 2, pp. 132–137 (2005). first citation in article
  3. Y. Caspi and M. Irani, “Spatio-temporal alignment of sequences,” IEEE Trans. Pattern Anal. Mach. Intell. 24, 1409–1424 (2002). first citation in article
  4. C. Rao, A. Gritai, M. Shah, and T. Syeda-Mahmood, “View-invariant alignment and matching of video sequences,” in Proc. 9th IEEE Int. Conf. on Computer Vision, pp. 939–945 (2003). first citation in article
  5. S. N. Sinha and M. Pollefeys, “Synchronization and calibration of camera networks from silhouettes,” in Proc. 17th Int. Conf. on Pattern Recognition ICPR 2004, Vol. 1, pp. 116–119, IEEE Computer Society, Washington, D.C. (2004). first citation in article
  6. W. Feller, An Introduction to Probability Theory and Its Applications, Vol. 2, Wiley, New York, (1971). first citation in article

FIGURES


Full figure (7 kB)

Fig. 1. Achieving subframe-interval estimation: (1) the transition events of f, f<sub>cam</sub><sup>1</sup>, and f<sub>cam</sub><sup>2</sup> are highlighted by black dots; (2) black vertical bars denote the mean position of corresponding transition events in f, f<sub>cam</sub><sup>1</sup>, and f<sub>cam</sub><sup>2</sup>. First citation in article

TABLES

Table I. Results of 10 experiments.
Illum.EstimatedGround TruthError
Daylight−3.410−3.26530.1447
Daylight0.61500.60000.0150
Daylight0.56000.61220.0522
Fluorescent−0.4900−0.61220.1222
Fluorescent0.110000.1100
Darkness−2.9200−2.96050.0405
Darkness1.11501.14750.0325
Darkness−2.8850−2.79610.0889
Darkness1.19001.31580.1258
Darkness−2.7300−2.62730.1027
First citation in article

Table II. Average temporal offset error of various approaches
Method[3]Method[4]Method[5]ROOLS
Average
error
0.110.20.08
First citation in article


Up: Issue Table of Contents
Go to: Previous Article | Next Article
Other formats: HTML (smaller files) | PDF ( kB)