当前位置：文档库 › METRICS FOR PERFORMANCE EVALUATION OF VIDEO OBJECT SEGMENTATION AND TRACKING WITHOUT GROUND

METRICS FOR PERFORMANCE EVALUATION OF VIDEO OBJECT SEGMENTATION AND TRACKING WITHOUT GROUND

METRICS FOR PERFORMANCE EV ALUATION OF VIDEO OBJECT SEGMENTATION AND TRACKING WITHOUT GROUND-TRUTH

C?iˇg dem Eroˇg lu Erdem*,A.Murat Tekalp**,B¨u lent Sankur*

(*)Department of Electrical and Electronics Engineering

Bo?g azic?i University,˙Istanbul80815,Turkey.

(**)Department of Electrical and Computer Engineering,

University of Rochester,Rochester,NY14627,USA.

ABSTRACT

We present metrics to evaluate the performance of video object segmentation and tracking methods quantitatively when ground-truth segmentation maps are not available.The proposed metrics are based on the color and motion differences along the boundary of the estimated video object plane and the color histogram differ-ences between the current object plane and its temporal neighbors. These metrics can be used to localize(spatially and/or temporally) regions where segmentation results are good or bad;or combined to yield a single numerical measure to indicate the goodness of the boundary segmentation and tracking results.Experimental re-sults are presented to evaluate the segmentation map of the“Man”object in the“Hall Monitor”sequence both in terms of a single numerical measure,as well as localization of the good and bad segments of the boundary.

1.INTRODUCTION

Although a variety of algorithms have been proposed in the liter-ature for object segmentation and tracking that address many ap-plications,only a few methods for quantitative evaluation of their performances have been proposed[1,2].Furthermore,the perfor-mance evaluation metrics proposed in[1,2]are useful if ground-truth segmentation maps are available.The aim of this work is to evaluate the performance of the object tracking and segmentation methods quantitatively,when ground-truth segmentation maps for each frame are not available.To this effect,we present metrics based on intra-frame and inter-frame color and motion features of the segmented video object planes.Often a single numerical measure is not suf?cient to evaluate a segmentation/tracking result, since certain parts(spatially or temporally)of the object boundary are better segmented/tracked than others depending on the vari-ation of color and motion features between the object and back-ground regions.Hence,we also propose a localization of the good and bad segments of the object boundary using the proposed met-rics.

2.COLOR METRICS

The proposed performance metrics using color features are based on the following assumptions which are true for most video se-quences and are also assumed by many segmentation algorithms: This work was supported by Scienti?c and Technical Research Coun-cil of Turkey(T¨UB˙ITAK-BAYG)and Boˇg azic?i University Research Fund under project99A203.

(a)(b)

Fig.1.(a)The video object plane for the frame of the“Hall monitor”sequence.(b)The boundary of the video object plane with the normal lines.(c)A close-up of a normal line drawn to the boundary.The two points‘just inside’and‘just outside’of the boundary are shown with symbols and,respectively.

1)Object boundaries coincide with color boundaries.2)The color histogram of the object is stationary from frame to frame.3)The color histogram of the background is different from the color his-togram of the object.Note that the background and its color his-togram are not restricted to be stationary from frame to frame. There are also no restrictions on the shape and rigidity of the seg-mented/tracked object.

Based on the above assumptions,we present two metrics for evaluating the?delity of the segmented video object plane.The ?rst color metric is based on the intra-frame color differences along the estimated object boundary and is presented in Section2.1.The second color metric uses the inter-frame color histogram differ-ences and is described in Section2.2.

2.1.Intra-frame Color Differences Along the

Boundary

In order to evaluate the performance of the tracking algorithm based on the third assumption above,the color of the pixels‘just inside’and‘just outside’of the estimated object boundary can be compared.In order to de?ne‘just inside’and‘just outside’,we draw short normal lines of length to the estimated object bound-ary at equal intervals towards the inside and outside of the ob-ject.The points at the ends of these normal lines are marked as illustrated in Figure1(b).The marked points are shown with plus signs.A closer look at one of these normal lines is given in Figure 1(c).We de?ne the color difference metric calculated along the

boundary of the object in frame as:

(1)

(2) where,is the total number of normal lines drawn to the bound-ary of the object at equal intervals in frame,and is the average color calculated in the neighborhood of the pixel using YCbCr color space.The average inside color is de?ned similarly.Instead of the averaging operation,the -trimmed mean can also be used in Eqn.2,which will be de-scribed shortly.

We de?ne the color metric for the whole sequence as:

(3) where is the number of frames in the sequence.The function can be de?ned in different ways such as the mean function,or the-trimmed mean function[3].The-trimmed mean is de?ned as:

(4)

where denotes the integer ceiling function and denotes the element of the sorted array.When is zero,the above expression is identical to the sample mean.When is odd and is,the-trimmed mean is the same as the median of the sample set.

2.2.Inter-frame Color Histogram Differencing

In order to test whether the object is segmented/tracked correctly in each frame,we make use of the second assumption stated above, i.e.the color histogram of the object is assumed to be station-ary from frame to frame.If a part of the background is included into the segmentation map by mistake,the color histogram of the segmented object is expected to follow the background histogram changes.

We can evaluate the stationarity of the color histogram of the segmented object by calculating the pairwise color histogram dif-ference of the video object planes at time and.Another approach to make the histogram differencing more robust to self-occlusions and mild intensity variations is to look at the difference between the color histogram of the video object plane at frame and the smoothed color histogram of the video object planes for frames.This smoothing can be achieved by sim-ple averaging or median?ltering of the corresponding bins in the histograms of object planes in frames.

Let us denote the color histogram of the video object calcu-lated using the YCbCr color space at time as.The locally smoothed color histogram is calculated using the formula:

(5) where denotes the total number of bins in the color histogram. The color histogram is represented as a1-D vector obtained by concatenating the histograms for Y,Cb and Cr components.

The discrepancy between the color histograms and

is estimated using four different metrics as described below[4,5], namely the,,and histogram intersection metrics.In the following formulae,the scaling parameters and are used to normalize the data when the total number of elements in the two histograms are different:

1.The Metric:The distance between the two histograms is calculated and normalized to the range as follows,

(6) 2.The Metric:The distance between the two histograms is calculated and normalized to the range as follows,

(7) 3.The Metric:is used to compare two binned data sets,and to determine if they are drawn from the same distribution function [5].It is de?ned and normalized to the range as follows:

(8) 4.Histogram Intersection Metric:To quantify the difference of the two histograms using the histogram intersection method,we de?ne the histogram intersection metric as:

(9) where,determines the number of pixels that share the same color in the two histograms[6]:

(10)

Note that when,i.e.the number of pixels his-tograms are equal,histogram intersection metric is equivalent to the metric[6].The experimental results for local color his-togram differences are given in Section5.2for the above metrics. The sensitivity of these metrics to the deviations in the location of the segmentation map are also analyzed by randomly shifting the correct segmentation maps.

Let denote the histogram difference metric calculated using one of the four metrics de?ned above.We de?ne the his-togram difference metric for the whole sequence as:

(11)

where the function can be chosen as discussed in the previous section.

3.MOTION METRIC

The assumptions that we make about the motion of the segmented object are as follows:1)The motion vectors of the object that are ‘just inside’of the object boundary and the background motion vectors that are‘just outside’of the object boundary are differ-ent.In other words,motion boundaries coincide with the object boundaries.2)Background is either stationary or has global mo-tion which shall be compensated for.

In order to quantify how well the estimated object boundaries coincide with actual motion boundaries,we use an approach sim-ilar to the one used for color.We draw small normal lines to the boundary at regular intervals as shown in Figure1(b),and we look at the difference of the motion vectors around the points and.The motion metric estimated follow-ing this approach for frame can be expressed as follows:

(12)

(13)

(14) where and denote the average motion vectors cal-culated in a square around the points and ,respectively,and denotes the distance between the two average motion vectors which is calculated as:

(15)

In Eq.(14),denotes the reliability of the motion vector

at point[7]:

where denotes the backward motion vector at location in frame;denotes the color intensity and the parameters are chosen freely.

We de?ne the motion metric for the whole sequence as:

(16)

4.PERFORMANCE EV ALUATION

In this section,we derive a single numerical measure to evalu-ate the performance of object segmentation and tracking results, as well as spatial and temporal localization of incorrect boundary segments.

https://www.wendangku.net/doc/7414687510.html,bining Color and Motion Metrics

A single numerical measure can be obtained to evaluate the perfor-mance of spatio-temporal segmentation of a video object by com-bining the color and motion metrics de?ned above as follows:

(17) where the parameters,and can be adjusted according to the characteristics of the video sequence and the relative importance and accuracy of color and motion features.Note that if the sum-mation is restricted to be one,the the metric takes values between.If this measure is above a certain threshold, it is possible to localize incorrect boundary segments in time and space as described next.4.2.Temporal Localization

The temporal localization can be achieved by checking the color and motion components of the measure

(18) at each frame against a threshold.

4.3.Spatial Localization

In frames for which is above the threshold,we can identify the segments of the boundary that have been tracked incorrectly using the color and motion scores that are obtained from‘inside’and‘outside’point https://www.wendangku.net/doc/7414687510.html,ing Eqns.(2)and(13),if

(19) where is a threshold value,we mark that segment between points and of the estimated object boundary as incorrect.

5.EXPERIMENTAL RESULTS

In order to test the effectiveness of the above proposed metrics, we quantitatively evaluate the ground-truth segmentation maps of the Hall monitor sequence which are obtained by hand for frames 32-230.A sample video object plane is shown in Figure1(a).

5.1.Experiments with intra-frame color differences along the boundary

The color differences along the boundary of the Hall monitor se-quence are calculated in the YCbCr color space using the Eu-clidean distance.

In the?rst column of Table1,the color metric is given which is calculated using the expression given in Eqn.3for frames 32-230,with.The second column shows the variance of the values for different values of.In order to observe the sensitivity of the metric to shifts in the correct segmen-tation masks,we randomly shifted the ground-truth segmentation masks with pixels,in an attempt to simulate incorrect seg-mentation.The percentage increase in,and the variance of are given in Table1,which shows that the maximum increase occurs when L=2and L=3,respectively.Although the variance of depend on the characteristics of the back-ground and object color,we expect it to be small if the object is correctly segmented and the background surrounding the object is not cluttered.

5.2.Experiments with inter-frame color histogram differences

The results for the metric based on histogram differences are sum-marized in Table2.We can observe that is the most sensitive metric to the shifts in the segmentation map since the percentage increase in,and the variance of are largest for the metric when the segmentation map is shifted.This indicates that it is able to discriminate the imperfections in the segmenta-tion/tracking results better then the other three metrics.The vari-ance of is ideally expected to be low when the segmenta-tion masks are correctly located since the color histogram of the object is not expected to change much between frames.In Figure 2,a plot of the metric is given calculated using unshifted seg-mentation masks up to frame100and with masks shifted by pixels for frames101-230.As seen in the?gure,the histogram difference metric based on distance calculation signals the in-correctness of the the segmentation mask successfully.

Fig.2.The color histogram differences between and, calculated with metric,using segmentation maps shifted by pixels,starting from frame100.

5.3.Experiments with the motion differences

Forward and backward motion estimation between successive frames of the Hall monitor sequence is performed using a hierarchical ver-sion of the Lucas-Kanade motion estimation algorithm[8].In the ?rst two columns of Table3,the values of and the variance of are given for different values.The last two columns show the percentage increase in and variance of for the shifted segmentation maps.When the background is station-ary or moving uniformly,the variance of is expected to be small if the object is segmented correctly.When the segmentation map shifts from its correct location,the variance is expected to in-crease,which is the case for the Hall monitor sequence as shown in Table3.

5.4.Localization of incorrect segmentation

In Figure3,we show the video object plane for the frame of the Hall monitor sequence(downloaded from the web page of COST211group).As observed,the boundary of the object is lo-

cated incorrectly except for a short segment around the shirt.

The

correctly located boundary segments are marked with solid lines and the incorrectly located segments are marked with dashed lines. The metric(19)is able to support the subjective observations quan-titatively.

(a)(b)

Fig.3.(a)The video object plane for the frame of the Hall monitor sequence which is downloaded from the web page of COST211group.(b)Correctly segmented regions of the bound-ary are marked with solid lines and incorrectly segmented regions are marked with dashed lines.

6.CONCLUSIONS

We presented three different performance evaluation metrics for quantitative evaluation of video object segmentation and tracking algorithms and we tested the sensitivity of the proposed metrics to shifts in the segmentation https://www.wendangku.net/doc/7414687510.html,ing the proposed metrics,it is possible to locate the regions of the boundary where the segmen-tation is not correct.An object tracking scheme that optimizes its

No Shift pixel shift L var()Perc.Incr.Perc.Inc.

var()

58.66 6.33 5.621.1

48.62 5.86 6.422.7

38.58 4.408.555.8

28.65 4.81916.6

Table1.The scores for color difference metric along the object boundary.

No Shift pixel shift

var()Perc.Incr.Perc.Inc.

var()

0.540.649238.52392.96

4.5471.4980.1243.17

4.5416.7480.1454.5

HI 3.7319.6590.5376.64

Table2.The scores for color histogram difference metric.

No Shift pixel shift L var()Perc.Incr.Perc.Inc.

var()

5 4.37.752859.8

7 6.3511.2424.455.29

Table3.The scores for motion difference metric along the object boundary.

performance based on the proposed metrics has been developed and will be presented elsewhere.

7.REFERENCES

[1]C?.E.Erdem and B.Sankur,“Performance evaluation metrics

for object-based video segmentation,”in Proc.X European Signal Processing Conference,September2000,vol.2,pp.

917–920.

[2]X.Marichal and P.Villegas,“Objective evaluation of segmen-

tation masks in video sequences,”in Proc.X European Signal Processing Conference,September2000,vol.4.

[3]J.Bednar and T.L.Watt,“Alpha-trimmed means and their

relationship to median?lters,”IEEE Trans.Acoust.,Speech, and Signal Processing,vol.32,no.1,pp.145–153,1984. [4] A.M.Ferman,A.M.Tekalp,and R.Mehrotra,“Robust his-

togram descriptors for video segment retrieval and identi?ca-tion,”Submitted to IEEE Trans.on Image Processing,2000.

[5]W.H.Press,S.A.Teukolsky,W.T.Vetterling,and B.P.Flan-

ney,Numerical Recipes in C,Cambridge Univeristy Press, 1992.

[6]M.J.Swain and D.H.Ballard,“Color indexing,”Int.Journal

of Computer Vision,vol.7,no.11,pp.11–32,1991.

[7]Y.Fu,A.T.Erdem,and A.M.Tekalp,“Tracking visible

boundary of objects using occlusion adaptive motion snake,”

IEEE Trans.Image Processing,vol.9,no.12,pp.2051–2060, December2000.

[8] A.M.Tekalp,Digital Video Processing,Prentice-Hall,New

Jersey,1995.