Search and Transitioning for Motion Captured Sequences
Suddha Basu?IIT Bombay Shrinath Shanbhag?
IIT Bombay
Sharat Chandran?
University of Maryland&IIT Bombay
Abstract
Animators today have started using motion captured(mocap)se-quences to drive characters.Mocap allows rapid acquisition of highly realistic animation data.Consequently animators have at their disposal an enormous amount of mocap sequences.This,iron-ically,has created a new retrieval problem.Thus,while working with mocap databases,an animator often needs to work with a sub-set of“useful”clips.Once the animator selects a candidate working set of motion clips,she then needs to identify appropriate transition points amongst these clips for maximal reuse.
In this paper,we describe methods for querying mocap databases and identifying transitions for a given set of clips.We preprocess clips(and clip subsequences),and precompute frame locations to allow interactive stitching.In contrast with existing methods that view each individual clips as nodes,for optimal reuse,we reduce the granularity.
1Introduction
Virtual humanoid characters are often driven by hierarchical skele-tons consisting of30or more degrees of freedom.Animating such characters manually is a daunting task at best.Instead,in recent times,the data for such characters can be acquired directly from a live performer using motion capture devices.Indeed,this method has gained wide acceptance as motion capture is the fastest way to generate rich,realistic animation data.
While working on a new animation sequence,animators may either choose to capture the data afresh or they may choose to synthesize the sequence using existing precaptured clips.The decision is in-?uenced by several factors.The number of existing clips available for reuse,suitability of existing clips for creating the new sequence, suitability of current state of the art techniques for reusing existing clips,availability of the performer for capturing a new sequence, time and cost required for capturing the new sequence are all rele-vant.Nevertheless,with time,the number of clips already captured increases relentlessly.
Recent([Bruderlin and Williams1995],[Unuma et al.1995],[Gle-icher1997][Gleicher1998],[Lee and Shin1999])motion editing methods allow animators to synthesize new animation in a number of interesting ways.While attempting such synthesis the animator needs to perform two fundamental tasks–search the collection of existing clips for clips matching certain criteria,and compute tran-sition points within a working set of such clips.The complexity of each of these tasks increases exponentially with number of clips. Hence schemes to assist the animator are in order.
Prior schemes such as[Kovar et al.2002]are one way to compute transition information between clips.However,our observation is that such techniques,when operated at a clip level,do not pos-sess the desired granularity to allow transitioning out at arbitrary frames.This is because they view entire motion clips as a node and only record a single transition between any two given clips.Other schemes such as[Arikan and Forsyth2002]do not allow the anima-tor to explicitly choose motion clips except for allowing selection of ?e-mail:skbasu@cse.iitb.ac.in
?e-mail:svs@it.iitb.ac.in
?e-mail:sharat@https://www.wendangku.net/doc/4b11897369.html, frames at sparsely de?ned constraint points.While such schemes are able to approximate speci?ed motion requirements,animators often demand more control over the process of motion assembly.
1.1Our contributions
We automatically chop individual clips and collect similar sub-clip sequences.Such a subsequence could be even half a dozen frame long.As a result,reuse is possible at much smaller granularity. Seemingly unrelated clips are brought together and thus potentially reused.All such sequences are collected into nodes in a cluster graph that works much the same way as motion graphs.
Clips when chopped,and linked produce too large a graph(techni-cally a hyper graph).In order to focus our clustering to clips mean-ingful in the animators context,as a?rst step,our method allows the animator to query by example a working set of relevant clips from the motion database.Although the number of retrieved frames is large,only similar clips are retrieved.The size of these selected clips is a small fraction of the original motion capture database and are provided back to the animator.
The rest of the paper is organized as follows.In the next section,we discuss our work in the context of related work.Section3has the three main parts of our work.We?rst discuss how matching clips can be retrieved when presented with a query.Next,we discuss how these clips can be placed into a cluster graph.Finally,we show how subsequences within a node in a cluster graph can be aligned.This enables stitching a sequence where one may enter a node,visit one of the subsequences,transit to another subsequence,and thus exit to any other clip.In the last section,we make concluding remarks. 2Related Work
[Kovar et al.2002],[Lee et al.2002]describe techniques to create new motion sequences from a corpus of motion data.Each tech-nique essentially clusters similar motion into nodes.Then it builds a graph of nodes,where each edge represents a transition between nodes.A walk through the cluster node graph results in synthesis of new motion sequences.The techniques differ in metrics used for clustering,pruning schemes and control criteria for node walk. [Arikan and Forsyth2002]describe a technique using novel search method based around dynamic programming to interactively syn-thesize motion from annotations.The user paints a timeline with annotations and the system assembles the motion sequence by ex-tracting suitable frames from a motion database.A scheme for syn-thesizing missing degree of freedoms and adding details to speci-?ed degrees of freedom for a roughly speci?ed motion sequence is described in[Pullen and Bregler2000],[Pullen and Bregler2002]. Their method uses correlation between the various degrees of free-dom within each motion sequence.Research in indexing mocap data for quick search is in its nascent stage.[Keogh et al.2004] have proposed a novel technique to speed up similarity search un-der uniform scaling,based on bounding envelopes for indexing hu-man motion.[Liu et al.2005]demostrate a data-driven approach for representing,compressing and indexing human-motion based on piecewise-linear components obtained using K-means cluster-ing.
3Our Method
In this section we describe our query by example and cluster graph
scheme to enable search and transitions for mocap data.We view
motion data consisting of a bundle of signals.Each signal rep-
resents a sequence of sampled values for one degree of freedom
(DOF)of an articulated skeletal model.These signals are sampled
at discrete instances of time with a uniform sampling interval to
yield a motion clip.In each frame,the sampled values of the differ-
ent DOFs at each joint determine the con?guration of an articulated
?gure for that frame.Often the root of the skeletal hierarchy con-tains six DOFs(three for translation and three for rotation about
x,y and z axis),whereas the rest of the nodes contain three DOFs
(only rotation).
3.1Query By Example
Our?rst step is to search for the best match clips in the mocap
database,corresponding to an animator sketched data sequence.For
the sake of exposition,we use only the signal corresponding to the
most signi?cant https://www.wendangku.net/doc/4b11897369.html,ter,we relax this and consider a set of
signals.Thus we have the problem
“Given a set S of N signals,and a query signal,?nd the best P
matches for this query from S.”
We observe that searching from a set of N signals is equivalent to
searching from a single signal comprising all these N signals con-
catenated on the time axis.
The?rst step in retrieving queries is to identify the continuous
monotonic portions(termed fragments)of the signals.The sig-
nals are clipped at points where the?rst derivative changes sign.
This is a reasonable choice for a motion sequence as at these points
the joint angles just start changing from one direction to another. Fragments are important in our method.For example,matches are
allowed to be disjoint.That is,matches found may not necessarily have the i th and(i+1)th fragments occurring consecutively in the
original signal.Fragments enable“mining”of possibly unusually
related clips.Fragments also enable robust scaling.For instance,if
the query signal is sampled at half or double the rate,the method
returns similar matches.For brevity,we skip details of scaling in
this version of the paper.
3.1.1Matching
Matching between two fragments of equal length is easy and related
to the error err i,j between fragment i of the query signal and candi-
date fragment j of one of the target signals.For each i,we?nd the
K minimum values of err i,j?j and store the corresponding values of j.These values are the indices of the best matching fragments
for fragment i and enable us to determine useful clips.
For the sake of determining the quality of the match,we not only
retrieve clips,but also stitch portions of them together.We construct
(as in[Pullen and Bregler2002])our matching windows in a way
so as to maximize the number of consecutive fragments in the path.
3.1.2Multiple DOF Consideration
Here the query consists of a set of s signals,each representing some
different degree of freedom for some joint angle but occurring at
the same time.Similarly,the search space now consists of N sets of signals.Each set has s signals,and the q th signal represents the same joint angle as the q th signal in the query set.
Two signi?cant changes are required to apply the single DOF al-gorithm to this multi-signal case.First the error calculation is over the s signals.Additionally we choose a driving signal which drives the fragmentation of the signals.Fragmentation of each individual signal at points where the?rst derivative changes sign would be an incorrect approach simply because these break points would not coincide in time.Thus we?rst choose an index q d and fragment the q th d signal in each set using the method discussed earlier.The remaining signals in each set are fragmented at same points of time as the q th
d
signals are broken.
3.1.3Experimental Results and Observations
The aim of experimentation was to see how the above methods work on real data sets.For our experiments,we have used vari-ous.bvh?les[bvh2004].In all the examples cited here,we have taken walk sequences as queries.
Single Sequence Queries As shown in Fig.1(c),we?nd the results quite satisfactory.This means that there is a signi?cant resemblance of the query subsequence from walk clip with the dance sequence for the joint angle and degree of freedom considered.Other exper-iments are available on our website.
(a)A dance clip
appearing in the
database
(b)Query se-
quence extracted
from a walk clip
(c)Matching,
stitched,dance
clip fragment
sequences
Figure1:Sample result of querying by example.
We carried out a series of experiments to examine the qualitative performance of the algorithm.Different sub-sequences,as in1(b), from the walk clip were queried against the dance clip in Figure 1(a).The performance was judged using the total error in the best match found and the number of sudden jumps in the synthesized paths as parameters.Table1shows some results of the experiments. Observations Query-by-example using a single degree of freedom allows us to mine mocap database,retrieve interesting fragments, and stitch them together.Two drawbacks stymie this approach, however.
First,the time complexity for?nding the min-cost path is expo-nential w.r.t.the number of fragments in the query(O(K N f rag) where N f rag is the number of fragments of the query signal.For K matches,the total number of paths is K N f rag.Here a path is just an ordered tuple of indices{j1,j2,j3,...,j N f rag},where each j i is among the best K matches for fragment i.Although,K is chosen to be3,and N f rag varies from2to7in most cases,the optimization of this step is very essential.In our implementation,we discarded paths which do not have any‘0-cost’transition(paths in which all joining fragments are non-consecutive).This reduces the storage space to vary linearly with the total number of‘0-cost’transitions.
Nevertheless,when N f rag is large,we do not have the luxury of choosing K too generously.
Second,and perhaps as signi?cant,when we ran the method using multiple degree of freedoms,we found the quality of the match to be not acceptable.
3.2Cluster Graph
Our experiments in the previous section forced a different approach. Instead of trying to provide a fully automated way of searching and synthesizing a motion clip,we use only the?rst step to retrieve can-didate clips.Subsequences of these candidate clips are organized into a cluster graph.The cluster graph is then used for synthesizing new clips.
The cluster graph structure consists of nodes and edges.Nodes con-tain frames from one or more clips.Frames within a node are“sim-ilar;”that is,the error between any two frames is below a threshold. Edges are obtained from the natural sequential ordering of clips within nodes.Figure2shows an
example.
Figure2:Cluster graphs contain clip subsequences organized ef?-
ciently.
Within a node,the frames are sorted by clips and time.Contiguous
sequences of frames are collected together into a structure called
clip-frame sequence.We maintain out-transitions for each clip-
frame sequence for each cluster node.Maintaining one transition
per clip-frame sequence automatically prunes away transitions from
contiguous frames.
3.2.1Cluster graph advantages
?Cluster graphs provide a level of granularity smaller than
those of motion graphs.Traditional motion graphs record
only one transition point between each pair of clips.As a
result this data structure has been used to?nd transitions be-
tween two clips without enforcing a hard time constraint.For
the realtime version,time is an important factor.We may need
to transition between two clips at precise,or more controlled
instants in time.Therefore it is useful to maintain as many
distinct transition points as possible.Note further that transi-
tion points occur in bunches.Cluster graphs prune multiple
transition points lying very close to each other temporally.
?Cluster graphs enable looping;that is it enables an animator
to create a walk sequence that can go on forever,instead of
being limited to the10second footage.A node containing
multiple clip-frame sequences from the same clip indicates
the possibility of looping.
3.2.2Constructing cluster graphs
Once a similarity metric has been selected between two frames,an
algorithm reminiscent of Kruskal’s minimum spanning tree algo-
rithm is used to create clip-frame sequences.We?rst preprocess all
n frames so that we have the distance set D for all
n
2
frames.Next,
1.Sort D intoπ=(o1,o2,...,o k)by non-decreasing values.
2.Start with F0={}.
3.Repeat Step4for o q=o1,...,o k,k= D .
4.Construct F q given F q?1as follows.Let o q correspond to
frame pair(i,j).If o q is small compared to a threshold,then
(a)Add frames(i,j)to F q?1and refer to this pair as a sin-
gle frame called I j.
(b)Remove from D all references to either i or j.
(c)Set distance I j for all frame pairs(I j,p),p=i,k=j to
be the minimum of the distances(i,k)and(j,k).Insert
these distances maintaining the sorted order.
The algorithm can be ef?ciently implemented.Speci?cally each
insert and removal in Step4is O(log n).
3.3Clip-frame Sequence Transitioning
Once clip-frame sequences are clustered in a graph,each cluster
node contains similar frames.We now?nd the best frames in each
motion clip within a node for smooth intra-cluster transitions.
First,we?nd the best point of transition between two motion clip-
frame sequences.We choose the frame pair for which the sum of
distances between them,and the derivatives at the respective points,
are https://www.wendangku.net/doc/4b11897369.html,ing derivatives ensures continuity of the stitched
motion curves.Next,we iteratively determine a set of ordered m-
tuple where m is the number of clip-frame sequences.Each element
of the set contains m frame indices such that these frames in the cor-
responding clip-frame sequences are the closest matches for each
other.Hence,we transition between two clip-frame sequences in
the cluster node,at these frames,to get a smooth motion sequence.
Figure3shows plots for three degrees of freedom using this im-
plementation for transitioning between two clips.The transition
dashed(black),for three different degrees of freedom.We transi-
tion from the solid clip to the dashed clip.The dashed portion of the
solid clip corresponds to the trailing part of the dashed signal after
transitioning to it.The dotted(blue)curve is part of the original
bold clip,made redundant because of the transition.
No.N f rag of query error in best match no.of sudden jumps
K=2K=3K=4K=2K=3K=4
170.5380.540 1.005653
26 1.024 1.024 1.024444
35 1.412 1.412 1.737443
48 1.525 1.525 1.525555
570.5550.7260.726533
65 2.330 2.330 2.618443
75 2.105 2.105 2.105444
106 1.024 1.024 1.024444
155 2.105 2.105 2.105444
25137.1697.1697.169666
Table1:Sample results from a series of experiments with the dance sequence(Figure1(a))as target.The number of fragments in the target was approximately319,and walk sequences similar to Figure1(b)were the queries.
S=L1,2
for i=2to m?1do
T={}
for each element(a,...,p)in S do
if(p,q)∈L i,i+1,then
Add(a,...,p,q)to T.
end if
end for
S=T
end for
Figure4:The alignment algorithm.
3.3.1Details
We apply a correlation-based technique on clip-frame sequences M i
and M i+1and get a list L i,i+1of k most suitable pairs of frame in-
dices.L i,i+1contains several pairs of indices(a,b),such that frame
f i a and f i+1
b resemble each other,and can be obtained in O(n log n)
expected time where n is the size of a typical clip-frame(usually small,between10-100).Once all L i,i+1are computed(1≤i≤m) we look for common indices in adjacent L i,i+1as seen in Figure4. This algorithm runs in O(m|S|?max|L i,i+1|)time.S and L i,i+1 are usually no longer than20,and m varies from3-15.
S typically contains atleast one index sequence j1,j2,...,j m.Thus if we now want to transition from clip M4to M7,we play M4till the j th4frame and then switch to(j7+1)th frame in clip M7.The transition is without any jerks and bumps.
4Concluding Remarks
In this paper we have provided ef?cient schemes(with big-Oh ex-pected running times,and supported by experimental results)for the following activities
?Given a(possibly rough)animation sequence,search for clips that contain frame sequences that are similar.Optionally stitch these frame sequences in an optimal way
?Our theoretical and experimental studies indicate that the stitching time is exponential.Sometimes the quality of the automatic stitch may be less than desirable.We therefore cre-ate a cluster graph data structure.Cluster graphs preserve the ?ne granularity of fragments,can be created ef?ciently and have several advantages(Section3.2.1).
?Cluster graphs are used interactively by the animator.Al-though we do not create a fully automated stitched sequence,
we ef?ciently compute the location of possible transition points within a node in the cluster graph,and thus between multiple clips.
References
A RIKAN,O.,AND F ORSYTH,D.A.2002.Interactive Motion
Generation from Examples.In Proc.of Siggraph’02,483–490.
B RUDERLIN,A.,AND W ILLIAMS,L.1995.Motion Signal Pro-
cessing.In Proc.of Siggraph’95.
2004.http://www.bvh?https://www.wendangku.net/doc/4b11897369.html,/bvh archive/.Dec.
G LEICHER,M.1997.Motion Editing with Spacetime Constraints.
In Proc.of the1997Symposium on Interactive3D Graphics.
G LEICHER,M.1998.Retargetting Motion to New Characters.In
Proc.of Siggraph’98.
K EOGH,E.J.,P ALPANAS,T.,Z ORDAN,V.B.,G UNOPULOS,
D.,AND C ARDLE,M.2004.Indexing large human-motion
databases.In VLDB,780–791.
K OVAR,L.,G LEICHER,M.,AND P IGHIN,F.2002.Motion Grahs.
In Proc.of Siggraph’02.
L EE,J.,AND S HIN,S.Y.1999.A Hierarchial Approach to In-teractive Motion Editing for Human like Figures.In Proc.of Siggraph’99.
L EE,J.,C HAI,J.,R EITSMA,P.S.A.,H ODGINS,J.K.,AND P OLLARD,N.S.2002.Interactive Control of Avatars Animated with Human Motion Data.In Proc.of Siggraph’02.
L IU,G.,Z HANG,J.,W ANG,W.,AND M C M ILLAN,L.2005.A system for analyzing and indexing human-motion databases.In SIGMOD’05:Proceedings of the2005ACM SIGMOD inter-national conference on Management of data,ACM Press,New York,NY,USA,924–926.
P ULLEN,K.,AND B REGLER,C.2000.Animating by Multi-level Sampling.In Proc.of IEEE Computer Animation2000.
P ULLEN,K.,AND B REGLER,C.2002.Motion Capture Assisted Animation:Texturing and Synthesis.In Proc.of Siggraph’02. U NUMA,M.,A NJYO,K.,AND T AKEUCHI,R.1995.Fourier principles for emotion based human?gure animation.In Pro-ceedings of Siggraph’95.