当前位置：文档库 › IIT Bombay

IIT Bombay

Search and Transitioning for Motion Captured Sequences

Suddha Basu?IIT Bombay Shrinath Shanbhag?

IIT Bombay

Sharat Chandran?

University of Maryland&IIT Bombay

Abstract

Animators today have started using motion captured(mocap)se-quences to drive characters.Mocap allows rapid acquisition of highly realistic animation data.Consequently animators have at their disposal an enormous amount of mocap sequences.This,iron-ically,has created a new retrieval problem.Thus,while working with mocap databases,an animator often needs to work with a sub-set of“useful”clips.Once the animator selects a candidate working set of motion clips,she then needs to identify appropriate transition points amongst these clips for maximal reuse.

In this paper,we describe methods for querying mocap databases and identifying transitions for a given set of clips.We preprocess clips(and clip subsequences),and precompute frame locations to allow interactive stitching.In contrast with existing methods that view each individual clips as nodes,for optimal reuse,we reduce the granularity.

1Introduction

Virtual humanoid characters are often driven by hierarchical skele-tons consisting of30or more degrees of freedom.Animating such characters manually is a daunting task at best.Instead,in recent times,the data for such characters can be acquired directly from a live performer using motion capture devices.Indeed,this method has gained wide acceptance as motion capture is the fastest way to generate rich,realistic animation data.

While working on a new animation sequence,animators may either choose to capture the data afresh or they may choose to synthesize the sequence using existing precaptured clips.The decision is in-?uenced by several factors.The number of existing clips available for reuse,suitability of existing clips for creating the new sequence, suitability of current state of the art techniques for reusing existing clips,availability of the performer for capturing a new sequence, time and cost required for capturing the new sequence are all rele-vant.Nevertheless,with time,the number of clips already captured increases relentlessly.

Recent([Bruderlin and Williams1995],[Unuma et al.1995],[Gle-icher1997][Gleicher1998],[Lee and Shin1999])motion editing methods allow animators to synthesize new animation in a number of interesting ways.While attempting such synthesis the animator needs to perform two fundamental tasks–search the collection of existing clips for clips matching certain criteria,and compute tran-sition points within a working set of such clips.The complexity of each of these tasks increases exponentially with number of clips. Hence schemes to assist the animator are in order.

Prior schemes such as[Kovar et al.2002]are one way to compute transition information between clips.However,our observation is that such techniques,when operated at a clip level,do not pos-sess the desired granularity to allow transitioning out at arbitrary frames.This is because they view entire motion clips as a node and only record a single transition between any two given clips.Other schemes such as[Arikan and Forsyth2002]do not allow the anima-tor to explicitly choose motion clips except for allowing selection of ?e-mail:skbasu@cse.iitb.ac.in

?e-mail:svs@it.iitb.ac.in

?e-mail:sharat@https://www.wendangku.net/doc/4b11897369.html, frames at sparsely de?ned constraint points.While such schemes are able to approximate speci?ed motion requirements,animators often demand more control over the process of motion assembly.

1.1Our contributions

We automatically chop individual clips and collect similar sub-clip sequences.Such a subsequence could be even half a dozen frame long.As a result,reuse is possible at much smaller granularity. Seemingly unrelated clips are brought together and thus potentially reused.All such sequences are collected into nodes in a cluster graph that works much the same way as motion graphs.

Clips when chopped,and linked produce too large a graph(techni-cally a hyper graph).In order to focus our clustering to clips mean-ingful in the animators context,as a?rst step,our method allows the animator to query by example a working set of relevant clips from the motion database.Although the number of retrieved frames is large,only similar clips are retrieved.The size of these selected clips is a small fraction of the original motion capture database and are provided back to the animator.

The rest of the paper is organized as follows.In the next section,we discuss our work in the context of related work.Section3has the three main parts of our work.We?rst discuss how matching clips can be retrieved when presented with a query.Next,we discuss how these clips can be placed into a cluster graph.Finally,we show how subsequences within a node in a cluster graph can be aligned.This enables stitching a sequence where one may enter a node,visit one of the subsequences,transit to another subsequence,and thus exit to any other clip.In the last section,we make concluding remarks. 2Related Work

[Kovar et al.2002],[Lee et al.2002]describe techniques to create new motion sequences from a corpus of motion data.Each tech-nique essentially clusters similar motion into nodes.Then it builds a graph of nodes,where each edge represents a transition between nodes.A walk through the cluster node graph results in synthesis of new motion sequences.The techniques differ in metrics used for clustering,pruning schemes and control criteria for node walk. [Arikan and Forsyth2002]describe a technique using novel search method based around dynamic programming to interactively syn-thesize motion from annotations.The user paints a timeline with annotations and the system assembles the motion sequence by ex-tracting suitable frames from a motion database.A scheme for syn-thesizing missing degree of freedoms and adding details to speci-?ed degrees of freedom for a roughly speci?ed motion sequence is described in[Pullen and Bregler2000],[Pullen and Bregler2002]. Their method uses correlation between the various degrees of free-dom within each motion sequence.Research in indexing mocap data for quick search is in its nascent stage.[Keogh et al.2004] have proposed a novel technique to speed up similarity search un-der uniform scaling,based on bounding envelopes for indexing hu-man motion.[Liu et al.2005]demostrate a data-driven approach for representing,compressing and indexing human-motion based on piecewise-linear components obtained using K-means cluster-ing.

3Our Method

In this section we describe our query by example and cluster graph

scheme to enable search and transitions for mocap data.We view

motion data consisting of a bundle of signals.Each signal rep-

resents a sequence of sampled values for one degree of freedom

(DOF)of an articulated skeletal model.These signals are sampled

at discrete instances of time with a uniform sampling interval to

yield a motion clip.In each frame,the sampled values of the differ-

ent DOFs at each joint determine the con?guration of an articulated

?gure for that frame.Often the root of the skeletal hierarchy con-tains six DOFs(three for translation and three for rotation about

x,y and z axis),whereas the rest of the nodes contain three DOFs

(only rotation).

3.1Query By Example

Our?rst step is to search for the best match clips in the mocap

database,corresponding to an animator sketched data sequence.For

the sake of exposition,we use only the signal corresponding to the

most signi?cant https://www.wendangku.net/doc/4b11897369.html,ter,we relax this and consider a set of

signals.Thus we have the problem

“Given a set S of N signals,and a query signal,?nd the best P

matches for this query from S.”

We observe that searching from a set of N signals is equivalent to

searching from a single signal comprising all these N signals con-

catenated on the time axis.

The?rst step in retrieving queries is to identify the continuous

monotonic portions(termed fragments)of the signals.The sig-

nals are clipped at points where the?rst derivative changes sign.

This is a reasonable choice for a motion sequence as at these points

the joint angles just start changing from one direction to another. Fragments are important in our method.For example,matches are

allowed to be disjoint.That is,matches found may not necessarily have the i th and(i+1)th fragments occurring consecutively in the

original signal.Fragments enable“mining”of possibly unusually

related clips.Fragments also enable robust scaling.For instance,if

the query signal is sampled at half or double the rate,the method

returns similar matches.For brevity,we skip details of scaling in

this version of the paper.

3.1.1Matching

Matching between two fragments of equal length is easy and related

to the error err i,j between fragment i of the query signal and candi-

date fragment j of one of the target signals.For each i,we?nd the

K minimum values of err i,j?j and store the corresponding values of j.These values are the indices of the best matching fragments

for fragment i and enable us to determine useful clips.

For the sake of determining the quality of the match,we not only

retrieve clips,but also stitch portions of them together.We construct

(as in[Pullen and Bregler2002])our matching windows in a way

so as to maximize the number of consecutive fragments in the path.

3.1.2Multiple DOF Consideration

Here the query consists of a set of s signals,each representing some

different degree of freedom for some joint angle but occurring at

the same time.Similarly,the search space now consists of N sets of signals.Each set has s signals,and the q th signal represents the same joint angle as the q th signal in the query set.

Two signi?cant changes are required to apply the single DOF al-gorithm to this multi-signal case.First the error calculation is over the s signals.Additionally we choose a driving signal which drives the fragmentation of the signals.Fragmentation of each individual signal at points where the?rst derivative changes sign would be an incorrect approach simply because these break points would not coincide in time.Thus we?rst choose an index q d and fragment the q th d signal in each set using the method discussed earlier.The remaining signals in each set are fragmented at same points of time as the q th

signals are broken.

3.1.3Experimental Results and Observations

The aim of experimentation was to see how the above methods work on real data sets.For our experiments,we have used vari-ous.bvh?les[bvh2004].In all the examples cited here,we have taken walk sequences as queries.

Single Sequence Queries As shown in Fig.1(c),we?nd the results quite satisfactory.This means that there is a signi?cant resemblance of the query subsequence from walk clip with the dance sequence for the joint angle and degree of freedom considered.Other exper-iments are available on our website.

(a)A dance clip

appearing in the

database

(b)Query se-

quence extracted

from a walk clip

(c)Matching,

stitched,dance

clip fragment

sequences

Figure1:Sample result of querying by example.

We carried out a series of experiments to examine the qualitative performance of the algorithm.Different sub-sequences,as in1(b), from the walk clip were queried against the dance clip in Figure 1(a).The performance was judged using the total error in the best match found and the number of sudden jumps in the synthesized paths as parameters.Table1shows some results of the experiments. Observations Query-by-example using a single degree of freedom allows us to mine mocap database,retrieve interesting fragments, and stitch them together.Two drawbacks stymie this approach, however.

First,the time complexity for?nding the min-cost path is expo-nential w.r.t.the number of fragments in the query(O(K N f rag) where N f rag is the number of fragments of the query signal.For K matches,the total number of paths is K N f rag.Here a path is just an ordered tuple of indices{j1,j2,j3,...,j N f rag},where each j i is among the best K matches for fragment i.Although,K is chosen to be3,and N f rag varies from2to7in most cases,the optimization of this step is very essential.In our implementation,we discarded paths which do not have any‘0-cost’transition(paths in which all joining fragments are non-consecutive).This reduces the storage space to vary linearly with the total number of‘0-cost’transitions.

Nevertheless,when N f rag is large,we do not have the luxury of choosing K too generously.

Second,and perhaps as signi?cant,when we ran the method using multiple degree of freedoms,we found the quality of the match to be not acceptable.

3.2Cluster Graph

Our experiments in the previous section forced a different approach. Instead of trying to provide a fully automated way of searching and synthesizing a motion clip,we use only the?rst step to retrieve can-didate clips.Subsequences of these candidate clips are organized into a cluster graph.The cluster graph is then used for synthesizing new clips.

The cluster graph structure consists of nodes and edges.Nodes con-tain frames from one or more clips.Frames within a node are“sim-ilar;”that is,the error between any two frames is below a threshold. Edges are obtained from the natural sequential ordering of clips within nodes.Figure2shows an

example.

Figure2:Cluster graphs contain clip subsequences organized ef?-

ciently.

Within a node,the frames are sorted by clips and time.Contiguous

sequences of frames are collected together into a structure called

clip-frame sequence.We maintain out-transitions for each clip-

frame sequence for each cluster node.Maintaining one transition

per clip-frame sequence automatically prunes away transitions from

contiguous frames.

3.2.1Cluster graph advantages

?Cluster graphs provide a level of granularity smaller than

those of motion graphs.Traditional motion graphs record

only one transition point between each pair of clips.As a

result this data structure has been used to?nd transitions be-

tween two clips without enforcing a hard time constraint.For

the realtime version,time is an important factor.We may need

to transition between two clips at precise,or more controlled

instants in time.Therefore it is useful to maintain as many

distinct transition points as possible.Note further that transi-

tion points occur in bunches.Cluster graphs prune multiple

transition points lying very close to each other temporally.

?Cluster graphs enable looping;that is it enables an animator

to create a walk sequence that can go on forever,instead of

being limited to the10second footage.A node containing

multiple clip-frame sequences from the same clip indicates

the possibility of looping.

3.2.2Constructing cluster graphs

Once a similarity metric has been selected between two frames,an

algorithm reminiscent of Kruskal’s minimum spanning tree algo-

rithm is used to create clip-frame sequences.We?rst preprocess all

n frames so that we have the distance set D for all

frames.Next,

1.Sort D intoπ=(o1,o2,...,o k)by non-decreasing values.

2.Start with F0={}.

3.Repeat Step4for o q=o1,...,o k,k= D .

4.Construct F q given F q?1as follows.Let o q correspond to

frame pair(i,j).If o q is small compared to a threshold,then

(a)Add frames(i,j)to F q?1and refer to this pair as a sin-

gle frame called I j.

(b)Remove from D all references to either i or j.

(c)Set distance I j for all frame pairs(I j,p),p=i,k=j to

be the minimum of the distances(i,k)and(j,k).Insert

these distances maintaining the sorted order.

The algorithm can be ef?ciently implemented.Speci?cally each

insert and removal in Step4is O(log n).

3.3Clip-frame Sequence Transitioning

Once clip-frame sequences are clustered in a graph,each cluster

node contains similar frames.We now?nd the best frames in each

motion clip within a node for smooth intra-cluster transitions.

First,we?nd the best point of transition between two motion clip-

frame sequences.We choose the frame pair for which the sum of

distances between them,and the derivatives at the respective points,

are https://www.wendangku.net/doc/4b11897369.html,ing derivatives ensures continuity of the stitched

motion curves.Next,we iteratively determine a set of ordered m-

tuple where m is the number of clip-frame sequences.Each element

of the set contains m frame indices such that these frames in the cor-

responding clip-frame sequences are the closest matches for each

other.Hence,we transition between two clip-frame sequences in

the cluster node,at these frames,to get a smooth motion sequence.

Figure3shows plots for three degrees of freedom using this im-

plementation for transitioning between two clips.The transition

dashed(black),for three different degrees of freedom.We transi-

tion from the solid clip to the dashed clip.The dashed portion of the

solid clip corresponds to the trailing part of the dashed signal after

transitioning to it.The dotted(blue)curve is part of the original

bold clip,made redundant because of the transition.

No.N f rag of query error in best match no.of sudden jumps

K=2K=3K=4K=2K=3K=4

170.5380.540 1.005653

26 1.024 1.024 1.024444

35 1.412 1.412 1.737443

48 1.525 1.525 1.525555

570.5550.7260.726533

65 2.330 2.330 2.618443

75 2.105 2.105 2.105444

106 1.024 1.024 1.024444

155 2.105 2.105 2.105444

25137.1697.1697.169666

Table1:Sample results from a series of experiments with the dance sequence(Figure1(a))as target.The number of fragments in the target was approximately319,and walk sequences similar to Figure1(b)were the queries.

S=L1,2

for i=2to m?1do

T={}

for each element(a,...,p)in S do

if(p,q)∈L i,i+1,then

Add(a,...,p,q)to T.

end if

end for

S=T

end for

Figure4:The alignment algorithm.

3.3.1Details

We apply a correlation-based technique on clip-frame sequences M i

and M i+1and get a list L i,i+1of k most suitable pairs of frame in-

dices.L i,i+1contains several pairs of indices(a,b),such that frame

f i a and f i+1

b resemble each other,and can be obtained in O(n log n)

expected time where n is the size of a typical clip-frame(usually small,between10-100).Once all L i,i+1are computed(1≤i≤m) we look for common indices in adjacent L i,i+1as seen in Figure4. This algorithm runs in O(m|S|?max|L i,i+1|)time.S and L i,i+1 are usually no longer than20,and m varies from3-15.

S typically contains atleast one index sequence j1,j2,...,j m.Thus if we now want to transition from clip M4to M7,we play M4till the j th4frame and then switch to(j7+1)th frame in clip M7.The transition is without any jerks and bumps.

4Concluding Remarks

In this paper we have provided ef?cient schemes(with big-Oh ex-pected running times,and supported by experimental results)for the following activities

?Given a(possibly rough)animation sequence,search for clips that contain frame sequences that are similar.Optionally stitch these frame sequences in an optimal way

?Our theoretical and experimental studies indicate that the stitching time is exponential.Sometimes the quality of the automatic stitch may be less than desirable.We therefore cre-ate a cluster graph data structure.Cluster graphs preserve the ?ne granularity of fragments,can be created ef?ciently and have several advantages(Section3.2.1).

?Cluster graphs are used interactively by the animator.Al-though we do not create a fully automated stitched sequence,

we ef?ciently compute the location of possible transition points within a node in the cluster graph,and thus between multiple clips.

References

A RIKAN,O.,AND F ORSYTH,D.A.2002.Interactive Motion

Generation from Examples.In Proc.of Siggraph’02,483–490.

B RUDERLIN,A.,AND W ILLIAMS,L.1995.Motion Signal Pro-

cessing.In Proc.of Siggraph’95.

2004.http://www.bvh?https://www.wendangku.net/doc/4b11897369.html,/bvh archive/.Dec.

G LEICHER,M.1997.Motion Editing with Spacetime Constraints.

In Proc.of the1997Symposium on Interactive3D Graphics.

G LEICHER,M.1998.Retargetting Motion to New Characters.In

Proc.of Siggraph’98.

K EOGH,E.J.,P ALPANAS,T.,Z ORDAN,V.B.,G UNOPULOS,

D.,AND C ARDLE,M.2004.Indexing large human-motion

databases.In VLDB,780–791.

K OVAR,L.,G LEICHER,M.,AND P IGHIN,F.2002.Motion Grahs.

In Proc.of Siggraph’02.

L EE,J.,AND S HIN,S.Y.1999.A Hierarchial Approach to In-teractive Motion Editing for Human like Figures.In Proc.of Siggraph’99.

L EE,J.,C HAI,J.,R EITSMA,P.S.A.,H ODGINS,J.K.,AND P OLLARD,N.S.2002.Interactive Control of Avatars Animated with Human Motion Data.In Proc.of Siggraph’02.

L IU,G.,Z HANG,J.,W ANG,W.,AND M C M ILLAN,L.2005.A system for analyzing and indexing human-motion databases.In SIGMOD’05:Proceedings of the2005ACM SIGMOD inter-national conference on Management of data,ACM Press,New York,NY,USA,924–926.

P ULLEN,K.,AND B REGLER,C.2000.Animating by Multi-level Sampling.In Proc.of IEEE Computer Animation2000.

P ULLEN,K.,AND B REGLER,C.2002.Motion Capture Assisted Animation:Texturing and Synthesis.In Proc.of Siggraph’02. U NUMA,M.,A NJYO,K.,AND T AKEUCHI,R.1995.Fourier principles for emotion based human?gure animation.In Pro-ceedings of Siggraph’95.