当前位置：文档库 › Automatic Tuning of Collective Communication Operations in MPI

Automatic Tuning of Collective Communication Operations in MPI

Automatic Tuning of Collective Communication Operations in MPI Rajesh Nishtala,Neil Patel,Kaushal Sanghavi,Kushal Chakrabarti

December2003:CS262A Final Project

Computer Science Division

University of California,Berkeley

rajeshn@https://www.wendangku.net/doc/f816334539.html,{neilp,kaushal,kushalc}@https://www.wendangku.net/doc/f816334539.html,

Abstract

In this paper we present an adaptive approach to tuning MPI collective communications algorithms.

The approach was arrived at in two separate steps.In the?rst,we observed the standard vendor im-

plementation of several collective communications operations to be naive and in?exible.To make these

operations faster and more e?cient we developed a family of algorithms for each of four collective oper-

ations that often showed impressive improvement over the standard implementations.While observing

that some of these new algorithms performed better,we also noticed that their level of performance

changed relative to each other over time.These changes persisted when we varied a number of factors

a?ecting the context of the operation,including the number of processes over which to perform the

operation,the size of the data to be communicated,and the physical cluster on which the operation

was run.These observations led us to believe that the best approach to optimizing collective commu-

nication operations is to dynamically choose best-performing algorithms based on empirical results on

recent performance.In the second step,we developed a lottery scheduler that would manage these results

and probabilistically choose a globally optimal algorithm.We observed that with a scheduler,a long-

running application would choose algorithm implementations whose performance was near the optimum

performance.

1Introduction

Over the past decade,computer systems have gotten signi?cantly faster and more powerful.One of the consequences of this rapid technological advancement,however,is that the complexity of modern systems has increased dramatically.Although most system administrators,developers,and researchers possess suf-?cient knowledge of computer science to be able to fully exploit the power of these machines,there are many problems to restricting their use to such individuals.For instance,distributed systems are widely used by scientists from?elds as diverse as physics,statistics,and chemistry.However,these users cannot be expected to manually tune either their applications or the underlying systems to fully exploit the available computational power.At the same time,however,it is important to tune these applications because the tuned versions can result in signi?cant performance improvement(up to800%in our work).[5]

In practice,distributed systems and applications are often manually and tediously tuned by professional system administrators.This approach is unfortunate because such tuning is not only very expensive but is often outpaced by the rate of technological innovation.At the same time,it is interesting to examine the notion of optimization:systems are not designed to be optimal for every possible application;indeed,they cannot be.All these problems motivate the need for a system that can automatically tune these applications based on run-time parameters.

Here,we focus on the automatic optimization of collective communication operations–the transfer of data across many processes–on distributed memory computing clusters.The complex architecture of these systems,which are characterized by the presence of a high bandwidth,low latency interconnection that net-works together many heterogeneous machines,creates signi?cant opportunity for optimization.For instance, as one can see in Table1,di?erent clusters are associated with di?erent processor speeds,physical memory sizes,and network topologies,all of which can be used to produce speci?cally tuned implementations.Even more interestingly,many hardware vendors implement their own versions of point-to-point communication software which creates the possibility of optimizing over another dimension in the tuning space.

The Message Passing Interface(MPI)is a commonly used library for inter-process communication on these systems[7].We therefore concentrate on its optimization.In fact,most scienti?c computing applications use the collective communications implemented by MPI for bulk data transfers and distribution of data across

di?erent nodes for processing.For instance,parallel scienti?c applications that perform matrix multiplica-tion could use the scatter()function in MPI to distribute submatrices to di?erent machines on the cluster, and recollect them with gather()after completion of processing.

However,the presence of such a large parameter space precludes the possibility of manual tuning and motivates the need for automatic tuning.The choice of the optimal implementation of the algorithm varies across the topology of the cluster,the number of processes in the operation,and the size of the message that we wish to transfer.In this paper we analyze four common collective communications operations:broadcast, scatter,gather,and reduce.For each of these operations we have implemented a family of implementations. These implementations vary the tree structure used to disseminate the data as well as the minimum unit of transfer(heretofore called segment size).Because the optimal choice of implementation is based on many dif-ferent run time parameters such as network load and layout of the processes within the network,we present a mechanism that will dynamically chose the correct algorithm based on the lottery scheduling mechanism[11]. The structure of this report is as follows:in Section2,we further examine the intricacies of MPI and operations on which we worked.In Section3,we discuss the variations on the di?erent implementations of the operations.We then present our experimental methodology in Section4and an initial evaluation of the data in Section5and give the motivations for a dynamic choice of algorithm.In Section6,we discuss the implementation of a lottery scheduler and present its results.We conclude with related and future work in Section7.

2Relevant Message Passing Interface(MPI)Functions

The Message Passing Interface(MPI)is used for high-performance clustered computing.Particularly popular communications using MPI involve the transfer of information between one process(the root process in the MPI context)and every other process in its communication group.These transfers,henceforth referred to as collective communication operations,are described below.

2.1Broadcast

Broadcast()is intuitively network broadcast.The root process sends the same message to all the other processes in the communication group.It is de?ned as

BROADCAST(buffer,count,datatype,root,comm):

IN/OUT buffer starting address of buffer

IN count number of entries in buffer

IN datatype data type of entries in buffer

IN root rank of broadcast root

IN comm communication group

For the purposes of this paper,we say that broadcast()is an unspecialized operation because the data received by each node is not speci?c(or specialized)to it.The spirit of this de?nition is that each process, after receiving the appropriate amount of data,need only transmit one piece of data to its receiver.

2.2Reduce

Reduce()is structurally similar to Broadcast(),but is,in fact,its inverse.Here,every process sends data to the root process–instead of the root transmitting to each process.An important,di?erence,however, is that reduce()also takes in an aggregation operation that combines the data received from each of the processes into a single data set.

REDUCE(sendbuf,recvbuf,count,datatype,op,root,comm)

OUT recvbuf address of receive buffer;significant only at root

IN sendbuf address of send buffer

IN count number of elements in send buffer

IN datatype data type of elements of send buffer

IN op aggregation operation handle

IN root rank of root process

IN comm communication group

An important feature of these aggregation operation is that they are global.In other words,these operations can be performed on all the data sent by all the processes in the communication group.For instance,the MPI interface supplies default implementations of sum,min,and max.These operations are commutative and associative,since the order in which the root receives data from the processes is not de?ned. Because of these constraints,the reduce()operation can considered unspecialized.Here,the operation corresponds to the spirit of the de?nition of unspecialization because each process,upon receipt of its senders’data,need only forward a single piece of data to the corresponding recipient.This aggregation can,in fact,be performed in arbitrary sequence because of the constraint that the aggregation function be associative and commutative.

2.3Scatter

Scatter()is the operation where the root needs to send di?erent sets of data to all processes in its com-munication group.Hence,its sendbuf is broken up into di?erent sets of data,and sendbuf is the starting point for the data that the root needs to send to process with rank i.

SCATTER(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype,root,comm) IN sendbuf address of send buffer;significant only at root

IN sendcount number of elements sent to each process;...

IN sendtype data type of send buffer elements;...

OUT recvbuf address of receive buffer

IN recvcount number of elements in receive buffer

IN recvtype data type of receive buffer elements

IN root rank of sending process

IN comm communication group

For the purposes of this paper again,we consider scatter()to be a specialized operation because each process receives a piece of data speci?c to it.When compared to unspecialization,the spirit of specialization is essentially that nodes must transmit data speci?c to each recipient.

An example of the usage of scatter()has been provided in Section1.

2.4Gather

Gather is the exact inverse operation of Scatter.The root collects a di?erent piece of information from each of the processes in the communication group.Gather is de?ned as:

GATHER(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype,root,comm)} IN sendbuf starting address of send buffer

IN sendcount number of elements in send buffer

IN sendtype data type of send buffer elements

OUT recvbuf address of receive buffer choice;significant only at root

IN recvcount number of elements in any single receive;...

IN recvtype data type of recv buffer elements;...

IN root rank of receiving process

IN comm communication group

We state,without further elaboration,that gather()is a specialized operation by analogy to the previously mentioned operations.

3Static Optimizations:Trees&Pipelining

For each of the operations described in Section2we have created a family of implementations.These implementations vary on the tree structure used to disseminate the data,along with the segment size.In this section,we?rst discuss the structure and rami?cations of the various trees,and continue onto describing the characteristics and importance of pipelining data through the tree structure.

Sequential Binomial

Binary Chain

Figure1:The di?erent trees used to disseminate the data.

3.1Tree Structures

In order to improve upon the existing vendor-provided implementation,we have implemented each of the previously mentioned collective communication operations on four di?erent tree structures.We expect that, for certain operations,the trees will allow better parallelization of both processing and network bandwidth usage.These four di?erent tree structures are described(as shown as directed graphs in Figure1)below.

In each of these trees,if the root process is transmitting data,every process sends to its children in the corresponding tree.On the other hand,if the root process is collecting data,every process sends to its parents in the tree.For simplicity,the following discussion focuses on the former case–the latter case follows straightforwardly.

Binary The standard binary tree is important for a number of reasons.First,it allows meaningful paral-lelization of processing and network bandwidth usage,while enforcing that no process incurs the cost of sending to more than two other processes.This network bandwidth parallelization is important because,at each time step,an increasing number of processes can use their network links to send data.Second,it limits the length of longest of chain of consecutive sends to O(lg N),where N is the number of processes in the communication group.Other scaling characteristics of this tree is similar to binomial,and is discussed below.No known standard MPI implementation uses the binary tree.1 Binomial The binomial tree extends the parallelization seen in the binary tree by(1)allowing a process to send to an increased number of nodes,and(2)creating a natural order in which these sends can take place.

With regard to the former,a process could be required to send to or receive from up to other O(lg N) processes.This is particularly meaningful for large communication groups,where a larger number of processes can begin to simultaneously use available network bandwidth(relative to binary).At the same time,if the process sends to the child with the greatest number of descendants(see Figure1)with blocking sends,it can be shown that every process receives the data at the same time.The binomial tree requires that a process receive data for all of its descendant processes in the tree,like binary.

Here,however,there is no straightforward expression for the amount of data that every node receives –in the worst case,however,a node might receive exponentially more data than it needs.As before, no additional data is sent in the special cases of broadcast()and reduce().The standard MPICH implementation uses the binomial tree for the broadcast()and reduce()operations.

1This observation is conditioned upon our inability to access the proprietary MPI implementation on the IBM Seaborg cluster.This condition holds for the remaining trees.

Figure2:E?ect of Segmenting.This plot shows the e?ect of segmenting on the Millennium Cluster.Each of the lines shows the speedup of the best segment size for each tree versus the unsegmented implementation of that tree.

Chain(Linear)The straightforward chain tree causes every process to send to the process that has the next highest identi?er.In this case,every process?rst receives the data for every process with an identi?er greater

than or equal to its own.Given this data,it cleaves the data intended for it,and forwards the remaining

data to the appropriate process.In the special cases of broadcast()and reduce(),the chain tree

does not induce any increased overhead in terms of network bandwidth.In the general case,however,

it requires the transmission of O(NM)data,where M is the size of each message.No known standard

implementation uses the chain tree.

Sequential The sequential tree is the naive tree in which the root process directly transmits data to every other process in its communication group.There are no parallelization bene?ts from this tree that are

apparent to the authors of this report.In fact,this tree seems to enforce that the root transmit

independent streams of data to each of its children in a way that precludes parallelization of both

network bandwidth and processing.The standard MPICH implementation of scatter()and gather()

uses the sequential tree.

3.2Pipelining

The observation that trees allow parallelization can be further leveraged by the use of pipelining.Such pipelining involves the(1)segmenting of messages and(2)the simultaneous non-blocking transmission and receipt of data.Messages can be segmented by breaking up the larger message into smaller segments and sending these smaller messages through the network.The main advantage of segmenting is that it allows the receiver to begin forwarding a segment while receiving another segment.

Data pipelining produces a number of signi?cant improvements.First,pipelining masks the processor and

network latencies that are known to be an important bottleneck in high-bandwidth networks,such as those found there.Second,because it allows the simultaneous transmission and receipt of data,pipelining exploits the full duplex nature of the interconnect links.Third,because these links are known to support very high throughput,they could in fact support the simultaneous transmission of data to multiple children,thereby decreasing the total time of transmission.

The pipeline for broadcast()and reduce()is very straightforward because the aggregated data is not speci?c to individual processes.We can thus leverage the parallelism of having processes receive one seg-ment of data and resend multiple copies of that same segment.Thus,network bandwidth can be readily parallelized.For reduce(),even processing can be parallelized within the network.

However in this model it is very di?cult to pipeline the scatter and gather operations because every mes-sages are not generic and must be routed properly through the network.Thus if we use anything besides a sequential tree to disseminate the information,there will be additional network tra?c and unnecessary transmission in the network.With respect to our tree implementations,intermediary nodes act as“packet forwarders”that route the data to their children–it is this procedure that is pipelined.We believe that, despite the dramatic increase of data on the network,pipelining on these operations still allows meaningful optimization through a greater number of processes simultaneously receiving and sending data.For instance, if there is su?cient bandwidth in the network,the transmission of additional data can occur without signi?-cantly increased cost,while still allowing the masking of network and processor latencies.In fact,In Section 5we will show that performance increases are indeed observed in practice.Figure3.2shows the e?ect of this segmenting

4Experimental Methodology

We have developed a variety of performance pro?lers to evaluate the performance of our implementations. These pro?lers,in particular,measure the performance of one of the collective communication operations across a range of segment sizes,message sizes,and communication group sizes.The ranges for these variables were as follows:message sizes ranged from1KB to1MB for scatter and gather,increasing by factor of4. For broadcast and reduce,the upper limit was extended to8MB.Segment sizes ranged from1KB to the size of the message,increasing by a factor of2.The size of the communication group ranged from2to32,50, and64for CITRIS,Millennium,and Seaborg,respectively,increasing by2.

Performance was measured in terms of median running time on each of the clusters.These times were measured on Millennium and Seaborg with standard MPI wall time clock interface,whereas times on CIT-RIS were measured with PAPI[1].We were required to implement the PAPI measurement on CITRIS because the resolution of standard MPI wall clock implementation is approximately4milliseconds–far too inaccurate for our measurements.Although the former measures total elapsed time between operation initiation and termination,and the latter measures processor ticks during actual execution,ie.excluding time spent outside of a context switch,we believe this di?erence to be negligible because we never directly compare absolute times across clusters.Each operation was executed for each parameter set ten times to account for experimental error.The data shown in the following graphs display the median run times of these runs.It is important to note,however,that the CITRIS and Millennium clusters do not employ load balancing and prevent the user from declaring acceptable load levels.For these reasons,experiments on these two clusters are not as repeatable as those on the Seaborg cluster.Table1shows a summary of the clusters that were used in our experiments.

5Initial Data Analysis

Although our pro?lers supported analysis of operations across a number of di?erent parameters,the entire parameter space was explored only for broadcast pro?ling.For scatter(),reduce(),gather(),we only analyzed performance across the four trees,segment sizes,and communication group sizes–message sizes and physical clusters were not varied.We did this for several reasons.First,the availability of computational resources were limited on the shared clusters and forced us to be selective in gathering data.Second,we observed from early trials that broadcast exhibited interesting variation across the entire range of parame-ters.Finally,analysis on broadcast alone simpli?ed our dataset while still providing suitable evidence that

Millennium[2]CITRIS[2]Seaborg(dense)[3]Seaborg(sparse)[3] Processor Type Pentium II Xeon Itanium2IBM Power3IBM Power3 Processor Clock Rate500-700MHz900MHz-1.3GHz375MHz375MHz Processors per Node2-42161 Physical Memory2-5GB4-5GB16-64GB16-64GB Network Topology Star(ie.symmetric links)Star Two Level Star Interconnect Type TCP/IP TCP/IP CSS CSS

Gigabit Ethernet Gigabit Ethernet

Nodes in Network5032464

Table1:Cluster Summary.This table shows a summary of the pertinent facts about the clusters used in our experiments.Note that there are two di?erent versions of the Seaborg cluster here.The Seaborg(dense) cluster is64processors across4nodes were used,while Seaborg(sparse)indicates64processors across64 nodes.Since the interconnects within the processor are presumably faster than the network we say that Seaborg(dense)is in essence a two level cluster.The?rst level is all the processors within a node while the second level is the interconnect of all the nodes.

variation in cluster environment exists.

Our data is organized as follows:Section5.1shows and analyzes the performance of each of the collective communications operations with each tree implementation ranging across the number of processes,all on the CITRIS cluster.Section5.2describes similar data for the broadcast operation across four di?erent clusters (CITRIS,Millennium,Seaborg(sparse),and Seaborg(dense)).Section5.3examines the performance of segmentation as a function of communication group size.

5.1Data Analysis Across Operations

We observed the performance of each of the four collective communications operations(broadcast,gather, scatter,gather)on the CITRIS cluster with each of the four tree implementations.Plots of these perfor-mance measurements are shown in Figure3.

There are substantial performance gains relative to the vendor implementation for broadcast(),reduce(), and gather();however,our various implementations of scatter()exhibited no improvements from the standard.Nevertheless,it is interesting to note that in every operation–including scatter()–there is an implementation that performs as well as,if not better than,the vendor implementation.For instance, the chain reduce()implementation scales independently of the number of processors,whereas the standard MPICH binomial reduce()implementation scales logarithmically.Similarly,we see that even for scatter(), the chain and binomial implementations perform just as well as the standard MPICH sequential implemen-tation.

In general,we see that specialized operations scale linearly with the number of processors,whereas sen-sible implementations of unspecialized operations scale logarithmically or independently of the number of processors.The additional constant cost for specialized sends is incurred because adding an additional node simply means sending up(or down)one additional level of the tree.With a pipelined implementation this

is constant cost per process.Extra unspecialized sends add little or no cost because including an additional node to the operation involves adding a single send or receive made in parallel with the original sends and receives.The sequential tree,moreover,incurs the cost because this parallelism is not present.

The similar performance of di?erent implementations of particular collective communication operations is signi?cant because transient network conditions could cause one implementation to suddenly perform sig-

ni?cantly better than its comparable implementation.This is particularly relevant for cases where many comparable implementations are,in fact,the best implementations:for instance,a chain scatter()could be adversely a?ected by network tra?c at node2,in which case a comparable binomial implementation could signi?cantly outperform it(see Figure3).This observation is critical to the motivation behind lottery scheduling and is discussed further in Section6.1.2.

Figure 3:Varying the Operation on CITRIS .These plots show the di?erences across operations on the CITRIS cluster.For a given number of processors the graph shows the time taken for the best segment size for each of the given trees.

5.2Data Analysis Across Clusters

The broadcast operation was run on four di?erent clusters.Figure 4shows the performance of the four tree implementations on each of these clusters.On the CITRIS and Millennium clusters,the binary tree and chain tree implementations perform the best,and are comparable with respect to each other in the time taken to complete the operation.Nevertheless for the Millennium cluster,binary tree would be preferred for broadcasting on a relatively small communication group (between 2and 32processes),while chain is always preferred over binary on CITRIS.This suggests that small changes in the network environment could result in one tree structure being better than an another for a period of time.Thus we need a way to automatically choose the optimum MPI operation based on network and processor parameters.

Similarly,we see that the binary tree implementation is the best performer on Sparse Seaborg but does relatively poorly on the Dense Seaborg.Additionally the cost of additional processes on the Sparse cluster for all algorithms but sequential tree is close to zero.This indicates that we must strive to achieve a way to make the same implementation of MPI operations use the most e?cient algorithm,regardless of which type of architecture it is installed on.This is in stark contrast to the current solution,which involves manually ?ne-tuning the implementation before installing it on a system.

One of the important observations is that the di?erent trees perform better across di?erent clusters.For example on CITRIS,the optimal chain implementations dominate the other implementations while in the other clusters the other trees take comparable times as chain.A speculation to this observation is that the CITRIS cluster is a bandwidth limited cluster while the others are not 2.The observation is that the higher fanout factor of a tree the more the send operations that get queued at some level of the software stack.

2The

reason that we believe CITRIS is bandwidth limited is the processors used in the network are signi?cantly faster than the processors used in the other clusters.The rate at which the Itanium2processors can ship data to the network cards is a lot higher than the rate at which the network cards can put the data on the link,therefore the processors are not the limiting factor.However on the other processors,this is not the case,implying that the processors themselves are the bottleneck and the network cards can send data at the same rate at which the processor can feed the network card.

Figure 4:Varying the Cluster on a Broadcast .These plots show the di?erences across clusters for a broadcast operation.For a given number of processors the graph shows the time taken for the best segment size for each of the given trees.

Thus on bandwidth limited networks the length of this send queue could be an an important factor,implying that chain has the best performance since it has the shortest queues at every processor.However when the network bandwidth is not a limiting factor the time spent in the send queues for the various segments is a negligible e?ect,implying that the parallelism that the trees can provide can be leveraged.This would explain why the chain trees dominate on CITRIS but not the other processors.

5.3Across Parameters

Pipelining the tree algorithms is most e?ective when it is done with a segment size that is agreeable with the cluster’s underlying architecture.We see from Figure 5that clusters respond in unique ways to segment sizes.On Millennium,segmenting is crucial for the broadcast on an 8MB message,where a 16KB segment performed the best for all four tree implementations.As Millennium is a processor-constrained cluster,sending with smaller segment sizes is probably necessary to utilize link bandwidth and to mask to latency created by slow processors.Figure 6shows the e?ect of segmenting on the Millennium cluster.The ?gure shows that each of the di?erent trees has di?erent valleys which imply an optimal segment size.On the other hand the same operation on CITRIS did not rely on ?ne-grain segmenting for best performance,as a 2MB segment size was optimal for all trees.On smaller message sizes,the optimal segment was observed to be half the size of the full message.Finally,we see that on Seaborg (Sparse and Dense),no single segment size was settled on as a best size across all comm group sizes.This variation in behavior with respect to segment size among the four clusters is initially surprising given that all four saw the same implementation of broadcast,chain tree,perform best.This means that even if an algorithm implementation emerges as the universal best performer,there are cluster-speci?c parameters that must be tuned to ensure the optimal performance.Thus an adaptive approach to determining the best algorithm for a given cluster would be ideal.If we could dynamically determine the best algorithm (and parameters)given the operation to execute,commgroup size,and message size,then we could avoid the repetitive re-implementation that is prevalent in current vendor-implemented systems to deliver optimal performance on physically unique clusters.An empirical approach to choosing the optimal would be preferred to a modelling approach,as models cannot

Figure 5:Varying the Cluster on a Broadcast .These plots show the di?erences across clusters for a broadcast operation.For a given number of processors the graph shows the best segment size for each of the given trees.

Figure 6:E?ect of Segment Size Across Processors .These plots show the di?erences across trees for di?erent segment sizes and di?erent numbers of processors involved in the communication

capture real-time changes in network conditions accurately.This reasoning motivated the development of a probabilistic lottery scheduler for choosing best-performing operation implementations.Needed:-some reasoning about CITRIS’s lack of segmenting and Seaborg’s?uctuation best segment-some reasoning about the heavy use of Millennium vs.light use of CITRIS and how this may a?ect latency and bandwidth

6Dynamic Optimizations:Lottery Scheduler

We have implemented a naive version of a broadcast lottery scheduler and can,using this,show performance results that consistently outperform the equivalent MPI broadcast implementation(Figure7).Similar results are expected for lottery scheduler implementations for gather()and reduce().3

6.1Theory and Implementation

6.1.1Architecture

Intuitively speaking,the lottery scheduler should be able to adapt to particular clusters and transient clus-ter conditions by preferring e?cient implementations4This,within our framework,requires that the lottery scheduler(1)occasionally explore the implementation space and attempt to discover the most e?cient implementation available to it(exploration phase),and(2)disproportionately select this most e?cient im-plementation(execution phase).In the following subsection,we describe the theory and implementation behind the development of such a lottery scheduler.

First,we explain the actual process undergone by collective communication lottery scheduling.Upon execu-tion of a speci?c collective communication operation,the originating node selects a particular implementation of the operation according to a previously de?ned probability distribution.In practice,each implementation is allocated,at any particular time,a certain number of lottery tickets–the probability that a particular implementation is chosen is exactly the ratio of the number of tickets that it holds over the total number of tickets.After having chosen the implementation,the originating node also chooses the number of times that this choice will be valid,i.e.a time-to-live(TTL),as a function of the implementation’s relative number of tickets.The originating node then transmits,according to a statically known implementation of broadcast, e.g.the vendor-supplied broadcast(),an encoding of the function choice and TTL to every other node in its communication group.Upon receipt of this encoding,every node in the communication group(1)executes the particular collective communication operation according to the speci?ed implementation,and(2)stores it and the TTL into memory.Every future call to the same operation checks if the TTL is positive:if it is, the call decrements it,and executes the same implementation;otherwise,it chooses a new implementation based upon the aforementioned protocol,and continues similarly.Throughout the execution of the operation, the lottery scheduler measures,at each node,performance characteristics that are consistent throughout the communication group,e.g.total time until completion.Based upon these measured characteristics,each node independently updates its independent ticket allocation.

6.1.2Ticket Allocation

To the extent that this ticket allocation determines the frequency at which particular implementations are chosen,the policy is critical to the sensible operation of a lottery scheduler.Early observation of this and the fact that there are a wide variety of such policies led us to implement a lottery scheduler that allows the straightforward incorporation of diverse policies.For the purposes of this report,however,we have im-plemented a simple ticket allocation policy that nevertheless performs satisfactorily in practice.The lottery scheduler,in particular,maintains for every implementation an exponential average of its running times.(It is this statistic that is updated at the end of every collective communication operation.)Based upon these averages,the lottery scheduler allocates a large proportionα,e.g.80%,of the tickets to the implementation that has the smallest average time.The remaining implementations are uniformly allocated the residual tickets.

3On the other hand,increased performance is not expected for MPI Scatter().To the extent that,(1)we have been unable to develop a better implementation of this collective communication,and(2)our lottery scheduler discovers the best implementation,it should always prefer the equivalent MPI implementation.

4Here,we de?ne an implementation of a collective communication operation to be a corresponding procedure that dissemi-nates data to the processes in its communication group according to a particular tree structure and segment size.

This ticket policy has at least a few subtle bene?ts.Most obviously,it ensures that the fastest algorithm is chosen a disproportionately high number of times.This is important because the data shows that there are a large number of algorithms that perform very poorly–this policy minimizes their negative contribution. Said in another way,this policy ensures that the average running time of lottery scheduled implementations is not a weighted average of all the algorithms;such a weighted average would be unsatisfactory because(1) there are a large number of non-optimal algorithms,and(2)these non-optimal algorithms have extremely poor running times(see Figure7).A somewhat more subtle bene?t is that this policy enforces that the lottery scheduler only improve its prediction:If a particular implementation is allocated80%of the tickets at a particular time,another implementation can be allocated80%of the tickets at a future time only if it has a faster running time.

Furthermore,every sensible lottery scheduler policy allows adaptive reaction to transient network or pro-cessor conditions.Unlike static compile-time and installation approaches,which can only utilize static performance data,lottery scheduling allows the cluster to dynamically choose the implementation that is best suited for current cluster conditions.For instance,suppose that(1)a static approach had decided upon the scatter()chain implementation,and(2)that the?rst child(process2),because of some massive transfer at the corresponding node,su?ered severe loss of bandwidth.In this case,the huge amount of data transferred through this child in scatter(),i.e.data for nodes1through N,would cause huge bottlenecks. On the other hand,a lottery scheduled approach would discover the poor performance of chain and choose, say,the binomial scatter()implementation:here,(1)process two receives only its data,and(2)cannot a?ect the performance of any other node.This is particularly relevant because the scienti?c computing op-erations that operate on such clusters generally execute for large amounts of time–this(1)simultaneously creates a huge cost for the static implementation,and(2)allows su?cient time for the lottery scheduler to discover another optimal implementation.

6.1.3Optimizations

Here,it is important to note that,over time,the lottery scheduler will?nd the implementation that,on average,is optimal.Although a theoretical analysis of this time follows,we can ensure that the lottery scheduler can rapidly choose optimal-or,at least,nearly optimal-implementations.Upon program initia-tion,the lottery scheduler reads from a prede?ned disk location,e.g.?le,previously generated exponential averages,ticket allocations,and other bookkeeping information.Lottery schedulers,throughout their execu-tion,continue to update-only within memory-this information.Upon program termination,however,the lottery scheduler writes back the updated information to this disk location.This approach has the bene?cial consequence of allowing the use of prior knowledge without signi?cant overhead:because program initiation requires several disk accesses anyways,an additional read is not particularly signi?cant;because program termination does not a?ect the performance of the program,disk accesses there are relatively inconsequen-tial.On the other hand,this allows the possibility of having di?erent programs overwrite updates from other programs,i.e.the critical section problem.We,however,believe that the steps necessary to counteract this problem are far too costly,e.g.disk writes at the completion of every operation,and consider this approach to be an appropriate balance.

In fact,the amount of time necessary for discovery of the correct implementation can be meaningfully computed according to straightforward statistical and probabilistic techniques.This characteristic is im-portant as it provides a theoretically meaningful approach to determining and optimizing the degree of responsiveness of the lottery scheduler.In general,the expected number of iterations E[I]necessary to converge to the optimal implementation is

E[I]∝|Z|lg|Z| 1?α

where I is the number if iterations,|Z|is the number of implementations,and(1?α)is the aforementioned probability of exploration.The intuition behind this expression lies in the observation that the lottery scheduler must“touch”every implementation some number of times.From probability theory(and the coupon collector problem),we know that,on average,|Z|lg|Z|attempts are necessary to randomly select each of|Z|items.Because the lottery scheduler will,in fact,be using non-optimal entries with probability(1?α), the lottery scheduler will iterate,on average,1

1?α

times before attempting a non-optimal implementation.

Hence,to try each of the|Z|implementations at least once,the lottery scheduler will require|Z|lg|Z|

1?α

.Finally,

Figure7:Millennium Overlay Plot.Performance of the broadcast operation as described in the previous sections with the performance of the Lottery Scheduler superimposed.

because measurements are essentially noisy observations,the lottery scheduler will need to try each of the |Z|implementations some constant(and bounded)number of times.In practice,the statistics community uses ten to?fteen observations as a general rule of thumb.

6.2Results

Looking at Figure7and Figure8,it is apparent that the lottery scheduler performs relatively well.In general,its median performance closely follows that of the optimal implementation throughout the domain. On the other hand,it is important to note that the average performance slowly diverges from the median performances of the optimal algorithm as the number of processors increase.5

This divergence is readily explained through Figure8where we can see the presence of a small num-ber of extremely poorly performing iterations.For instance,in Figure8b,we see that the overwhelming majority of lottery scheduler iterations require less than two seconds–a reasonable upper performance bound for well-performing implementations,i.e.chain and binary.The problem,however,is that a very small number of extremely poorly-performing iterations(see implementations that require more than four seconds in Figure8b)push up the average performance measurement;because there are so few calls to these implementations,the median performance measurement is not a?ected.

Although this problem cannot be entirely avoided,there are a couple improvements that warrant inspection.

5Although it might seem that we should measure median lottery scheduler performance with the other medians,this is not true.In general,a median of measurements is taken when it is believed that(1)the measurements are generated independently of each other,and(2)the presence of noisy(and extreme)measurements would inappropriately bias the actual observation. Here,however,extreme measurements are a deterministic and intended result of lottery scheduler exploration.To the end that

these extreme measurements are non-erroneous,a measurement of lottery scheduler performance should include them.

Figure 8:Performance of Lottery scheduler with 48processors.The ?rst plot shows the time taken by every iteration while the second plot shows a distribution

The most important proposal involves the allocation of exploratory tickets to non-optimal implementations in inverse proportion to their running time.Doing so acts to decrease the rate at which poor implementations are selected.This solution is,unfortunately,incomplete:it still causes the average performance of the lottery scheduler to be biased by very poorly performing implementations.For instance,in Figure 7,we see that exploration of sequential broadcast implementations would cause the inclusion of implementations that are several orders of magnitude worse than the optimal implementation.In order to circumvent this problem,we propose that the lottery scheduling algorithm only explore implementations whose running times are at most two standard deviations more than the optimal running time.This heuristic allows for the exploration of approximately optimal implementations,while precluding the implementations that are extremely poor.Although this approximation may preclude discovery of the globally optimal implementation,we believe that this could be a valid tradeo?against the tremendous cost of exploring very poor implementations.7Related Work

1.The work by Gabriel,Resch,and Rhle [4]optimize the the broadcast and reduce operations only.Moreover,they optimize only the binomial tree that is used in the MPI implementation.We improve upon this by testing with di?erent trees,as decribed in Section 3.1.

2.In [10],the automatic tuning is done by automatically re-arranging the nodes so that it matches how the cluster is structured.Also,the root dynamically sends the messages to each node informing them what they should do with the data that they have received.We improve upon this by choosing the algorithm to run by running a lottery,as decribed in Section 6.Hence,each di?erent algorigthm type gets a chance to perform well.Moreover,by calculating the time taken by an algorithm based on an exponential average,a new algorigthm type is not chosen based on a knee-jerk reaction.

3.[8]improve the existing set of MPICH implementations by optimizing the message size.However,as we explained in Section 6,this ’optimial’message size can change with changes in the environment of the cluster.As a result,automatic tuning is essential for the collective operations to perform optimally.

4.Karonis,et.al ([6])have optimized collective operations,but they have done it with a view of wide-area networks,not clusters.As a result,many of their links are much slower than the traditional high-speed clusters that we have concentrated on.

5.Shro?and Geijn have set benchmarks for common MPI operations by comparing di?erent implemen-tatins by di?erent vendors;many of which are customized for di?erent types of clusters.We believe that by letting the lottery scheduler dynamically pick the best algorithm,we have provided a generalized solution that would work well on most systems.[9].

8Future Work

?Adding di?erent types of trees.We have currently implemented the trees as a forwarding tree,i.e.: each process receives many pieces of data.It realizes if the data is meant for it,and if not,immediately passes it on to the required child process.

?Although we have optmized a few of the more oft used MPI operations,we will optimize more MPI operations,such as MPI ALLGATHER,MPI ALLREDUCE,etc.

?Extensions to the current simplistic version of Lottery Scheduler.

?Test more extensively,especially with more clusters(like Lemieux,Clerc)and di?erent interconnects like Myrinet;to make sure that the automatic tuning works across clusters,and with more varied architectures.

9Conclusion

The gains from our work emerged in two steps.In the?rst,we found the standard implementations for several MPI collective communications operations to be under-performing.To improve performance we de-veloped a family of implementations for each such operation that performed remarkably well relative to the naive implementations.In the process we observed that the algorithms behaved di?erently with respect to a variety of variables,including the operation,the size of the communication group,the size of the message, and the cluster that the operation was to run on.Additionally we found that these algorithms responded to a pipelined approach with varying degrees of success across these same variables.Reasoning about the non-uniformity of performance led us to believe that the results re?ect the fact that(1)cluster hardware architectures are diverse and thus give us little hope of?nding globally optimal implementations,and(2) conditions on networks are often unpredictably transient in nature and thus locally optimal implementa-tions may di?er over time.These observations led us to conclude that the best way to optimize collective communications in a?exible way is to adaptively choose locally optimal algorithms based on empirical data. This approach would be more?exible than the current approach of re-implementing the operations for each cluster to maintain optimal performance.The lottery scheduler was an initial attempt at achieving this goal,and its results were quite encouraging.In the future we envision that a suitably general but accurate scheduler be used across all clusters,readily incorporating new,more e?ective algorithms for the collective communications operations than the current set of tree-based implementations.

References

[1]S.Browne,J.Dongarra,N.Garner,K.London,and P.Mucci.A scalable cross-platform infrastruc-

ture for application performance tuning using hardware counters.In Proceedings of Supercomputing, November2000.

[2]UC Berkeley Millennium Cluster.https://www.wendangku.net/doc/f816334539.html,.

[3]NERSC High Performance Computing Facility,LBNL,and IBM.https://www.wendangku.net/doc/f816334539.html,/computers/sp/.

[4]E.Gabriel,M.Resch,and R.Rhle.Implementing mpi with optimized algorithms for metacomputing,

1999.

[5]Katherine A.Yelick Jack J.Dongerra,James W.Demmel.Automatic tuning for large scale scienti?c

applications.In NSF ITR Grant Proposal,2003.

[6]N.T.Karonis,B.R.de Supinski,I.Foster,W.Gropp,E.Lusk,and J.Bresnahan.Exploiting hierarchy

in parallel computer networks to optimize collective operation performance.pages377–386.

[7]Message Passing Interface Forum.MPI:A Message Passing Interface.In Proceedings of Supercomputing

’93,pages878–883.IEEE Computer Society Press,1993.

[8]Operations In Mpich.Improving the performance of collective.

[9]Mohak Shro?and Robert A.van de Geijn.Collmark:Mpi collective communication benchmark.

[10]Sathish S.Vadhiyar,Graham E.Fagg,and Jack Dongarra.Automatically tuned collective communica-

tions.pages46–46,2000.

[11]Carl A.Waldspurger and William E.Weihl.Lottery scheduling:Flexible proportional-share resource

management.In Operating Systems Design and Implementation,pages1–11,1994.

连锁店运营方案

连锁店运营方案 “XX”连锁店运营整体方案一(门店定位以高质量的产品、丰富的产品组合、适当的价格定位、畅通的网络渠道、到位的售后服务树立一个口碑性高的大众易接受,喜欢的门店形象及产品形象，进而为门店后期利用所建立起来的形象进行市场网络渠道建设，进行公司”XX”系列产品品牌的推广，进一步扩大及提升企业的盈利点所在，从而树立企业发展的利润常青树。二(门店发展目标 1.前期以微利经营，建立渠道，组建稳定,有凝聚力销售团队为主要发展目标依靠公司”XX”品牌的产品质量，定位一个较有竞争力的价格体系，配以有一定利润额度的”XX”系列产品线，以便建立一个以温州市市区农贸市场周边网络为主的销售渠道及市场产品推广渠道，确保在各农贸市场影响性及网点的覆盖率，从而确保网络的牢固性，降低门店在市场的运行风险，同时在该过程中摸索并组建一支有竞争力销售团队，从而为企业后续进行品牌的推广及运营进行人才储备打下坚实的基础。 2.后期以进行渠道升级、XX系列产品升级，提升门店盈利资源为主要发展目标随着门店网点的增加与扩大，及后期产品的升级与品牌提升，将进一步提升产品品牌形像,优化各网点资源，提高产品的利润资源，进而提升门店相关工作人员的工资福利待遇，让企业在市场上更有竞争力。三(产品策略 1.产品线规划本公司

A、生鲜系列 B、禽肉加工系列其它公司 C、外购系列产品 2.产品要求及贴标明细(以后期顾客实际需求调整产结构) 3.产品包装 (1)包装设计标准门店产品包装分为两个方面: A、方便顾客携带的手提袋(要求设计大方，实用，突出公司产品形像及品牌形像。 B，设计精美，档次高，卖得起价，包装吸引客户购买，增加销量，过年过节送的礼盒。(要求能突出产品档次，产品品牌形像、公司品牌形像) (2)公司产品包装单位(指门店运营过程中的计量单位) 4 .产品外购 (1)明确产品外购标准 A、市场有需求，但我司暂时未能生产的品种. B、我司产品有生产，但因未达到规模化效应或机器设备自动化程度不高，成本没有优势的产品， C、我司产能紧张，一些品种通过与生产沟通，在预算的时间内未能进行生产，为了确保货源不断货，需要对外进行采购的。四(渠道策略 1.渠道标准 1,拥有良好位置农贸市场周边店面，以便于提升我司产品的品牌形象。 2,具备较好的周边环境，方便公司物流配送. 3,具备较高的流动人口，保持门店的视觉印象和客户资源。

互联网+快消品经销商运营8大模式最全解析

互联网+快消品经销商运营8大模式最全解析发布日期：2015-10-15 来源：快速消费品网核心提示：“互联网+”方兴未艾，冲击并改造着一切行业，而对传统商贸公司的影响，则体现于经营的“蛋糕”在被看不见的对手侵蚀，甚至吞食着。快速消费品网讯：“互联网+”方兴未艾，冲击并改造着一切行业，而对传统商贸公司的影响，则体现于经营的“蛋糕”在被看不见的对手侵蚀，甚至吞食着：随处可见fmcg 行业休闲食品、饮品的特快专递;网上窜货成为新动向;众多O2O商城的上线，挤占传统经销商货源……越来越多的经销商被这看不见的竞争对手兼并挤压，生意十分难做。除此之外，在经营层面，“互联网+”对于商贸公司的影响则较为积极，通过运用互联网思维对相关APP进行升级，则能够给业务员清晰的发展模板，提升经营效率。说到这里，究竟何为“互联网+”呢?我这里给出一个定义：“互联网+”背后的逻辑实质，是通过信息技术改造和重塑现实社会的供需关系，为客户提供便利，同时自身获取综合收益。 “经销商互联网+”的基本模式是O2O，第一要将线下的业务搬到线上，实现线上互联互通;第二要将线下进行业务重构;第三线上与线下互动运营，最终进入云服务平台实现“智慧自发展”。因此，我为经销商“互联网+”的运营提供了八大模式。

模式一：电商谈及电商，我们所了解的淘宝店、微信、微博等的销售都是电商模式，在南方较为盛行，北方地区较为薄弱。绝大部分经销商并不适合微博营销，因此，淘宝店、微店是电商模式的重要体现。经销商在传统渠道并不针对消费者，而是联系终端店和二批商，但是“互联网+”的应用则是压缩二批、直面终端，实则是经销商生意上的补充。经销商开发淘宝店的优势在于：一是对产品有深度理解;二是可以培养出自己的品牌;三是可以跟厂家尾货甩卖联动;四是借助全国化运营打开市场。一般情况下，名牌的产品不适合做微信销售，因为这些产品本身就有成熟的分销系统，在线下渠道就可以做得很好，而杂牌、销售利润高的产品较为适合。【案例】沂蒙公社。销售炒货、地瓜干等产品，今年刚刚开始上线销售，预计可突破1500万元。最高纪录通过一场线上活动，从晚上6点到早上7点，卖了18000件货，57万元，产品毛利均保持在50%以上。模式二：自营O2O 经销商自营O2O风起云涌，有B2B、B2C两个应用方向。目前，网上商城、微信公共号、微博甚至APP都有做，但成功的人很少。【案例】华南商贸配送中心。该公司选择市区人口密度高的地方，服务区域为6公里之内，便于人员采用摩托车配送，涵盖南安市区，并实现半小时送货上门服务承诺。公司采取终端消费者会员制，以会员为结点来推荐会员，结成网状的体系，现在会员有1万多人，按每个家庭4个人计算，其覆盖的客户人数为4万多人，发展前景良好，存在众筹与合作的无限商业潜力。这种发展模式的优势在于走差异化的经营之道，塑造“直接服务消费者的经营模式”，即“互联网+”的B2C模式。随着会员规模化，可延展性增强，可借助微信群、APP互联

重庆大足石刻导游词

重庆大足石刻导游词大足石刻是大足县境内主要表现为摩崖造像的石窟艺术的总称,是中国晚期石窟造像艺术的代表，其中以宝顶山和北山石刻最为著名。下面是小编收集整理的重庆大足石刻导游词范文，欢迎借鉴参考。重庆大足石刻导游词(一) 游客朋友们：大家好!我是来自重庆中国国际旅行社的导游员。在我身后的这位呢，就是我们本次行程的司机张师傅。我很荣幸能有机会为大家服务，非常欢迎大家来到重庆游览观光，待会儿，我将带领大家游览素有“石刻之乡”美称的大足石刻。大足石刻距重庆主城约有130公里，行车时间约两个半小时，在这段时间里我先把大足石刻的概况为大家做个简要的介绍。大足石刻是重庆大足县境内所有摩崖石刻造像艺术的总称，反映了佛教中国化、世俗化、生活化的进程，是集儒道佛三教造像于一体的大型石窟造像群，最初开凿于初唐永徽年间(650年)，历经晚唐、五代，盛于两宋，明、清时期亦有所增刻。最终形成了一处规模庞大，集中国石刻艺术之大成的石刻群，堪称中国晚期石窟艺术的代表，与云冈石窟、

龙门石窟和莫高窟齐名，1999年12月1日，被联合国教科文组织列入《世界遗产名录》。07年被评为国家5A级景区。大足石刻群共有石刻造像70多处，总计5万余尊，铭文10万余字。其中，以北山、宝顶山、南山、石篆山、石门山五处石窟最具特色。关于大足石刻名称的由来有两种说法，一是大足为传统农业区，气候温和，风调雨顺，收成稳定、人民安居乐业，为大足大丰之地，故得名。还有就是与佛教传说有关，相传释迦牟尼在涅盘前夕曾到过大足，在其讲佛之地留下了一双大脚，故人们叫当地为大足。北山石刻位于大足县城北面约1公里处，共有摩崖造像5000于尊，由唐末刺史韦君靖开凿于唐景福元年，后由地方官吏、乡绅、僧尼等续建，至南宋绍兴年间(892年-1162年)结束，历时250年。以其雕刻细腻、精美、典雅著称于世，展示了晚唐至宋中国民间佛教信仰及石窟艺术风格的发展、变化，尤以观音造像最为突出，被誉为“中国观音造像陈列馆”。北山石刻以当时流行的佛教艺术为主，体现了佛教的世俗化，其转轮经藏窟被公认为是“中国石窟艺术皇冠上的一颗明珠”。宝顶山石刻位于大足县城东北约15千米处，由宋代高僧赵智凤于南宋淳熙至淳佑年间(1174年-1252年)主持建造，历时70余年，以圣寿寺为中心，包括大佛湾、小佛湾等13处造像群，共有摩崖造像近万余尊，题材以佛教密宗故事人物为主，整个造像群宛若一处大型的佛教圣地，展现了石刻艺术的精华。宝顶山有千手观音、释迦涅盘像、九龙浴太子等著名景点。

连锁店经营模式及运作管理分析

连锁店经营模式及运作管理分析《营销界?化妆品观察》2012年8月22日作者谭丽娴文章关键词：连锁店经营模式运作管理现在国内化妆品连锁店面对的挑战越来越大，比如价格竞争、产品同质化现象、员工培养难题、租金飞涨以及库存流转等，所以更需要学习国外优秀连锁企业运营管理经验，让我们自身发生改变现在都说化妆品网络销售很红火，销售增长很快，但是大家更要知道，目前国内化妆品专营店高达16万家，化妆品年销售总量中有一千多亿是来自线下门店所创造的。所以面对网购，不用担心，也不要太悲观，化妆品零售店自有其存在的价值，而问题在于在面对同行或不同渠道的竞争，化妆品连锁店该如何发挥我们的优势。当然，大家也都知道，现在化妆品连锁店面对的挑战越来越大，比如价格竞争、产品同质化现象、员工培养难题、租金飞涨以及库存流转等，所有这些问题的凸显都在考验着我们怎样才能将化妆品连锁店做大做强，这需要我们一起探讨。连锁化经营及店铺管理核心从店内品牌的选择到店铺管理体系的建设和人才的培养，从店铺发展战略的规划到店铺经营战术的制定，以及店铺品牌力的打造，化妆品店的店铺管理和连锁化经营需要做到以下几个核心点：商品特色化。大品牌首先是让消费者记住它的品牌，然后慢慢记得它的特色；而一些不知名的小品牌，则需要先让消费者记住其特色，然后才记得它的品牌。所以，独特的商品特色是连锁店实现差异化经营的核心，也是连锁企业能够存在的原因。

品牌人文化。对于一个连锁企业而言，所需要的不仅是其产品的质量要好，更要求它把品牌建设作为一种文化事业来经营。连锁店需要考虑的是我们的经营和服务能够带给顾客什么样的文化内涵。而商品永远都只是基础，品牌文化才是连锁店品牌形象最主要的体现。比如大家喜欢麦当劳，并不是因为麦当劳的汉堡特别好吃，而是因为到了麦当劳你会感到是受欢迎的，是很快乐的。而人们去星巴克，也不是因为它的咖啡好喝，实际上是去体验一种青涩的咖啡文化，这些都是连锁店实施品牌人文化的理念。服务品牌化。一个人要有自己的核心特长，一个品牌也需要有自己的核心价值。连锁店的品牌打造是一个系统性的工程，不是一朝一日可以做得到的。但是在连锁品牌的众多要素之中，服务品牌化是可以相对快速见效的。比如同仁堂，它里面贩卖的中药比其他同类产品价格要高，虽然其中成药能够卖高价可能与它的配方、品质以及品牌因素有关，但他们所聘请的资深专家、中医大夫所提供的药方同样也是功不可没的。运营标准化。作为一个连锁店，我们的运营一定要有标准。如果同一个连锁品牌在不同的店面里，顾客接受的服务基本一致并认可这种服务，那么只要有这个品牌在，他首先都会想到去这个品牌连锁店购物。比如小肥羊。小肥羊的成功要素之一就是用涮火锅的方法，摆脱了厨师的问题。小肥羊的所有门店都是是标准化运营的，它的火锅底料、包装和食材都一直是标准化执行。连锁规模化。加盟店的数量不是万能的，但是没有数量和规模就万万不能了。在连锁企业发展初期，连锁店的数量比质量更重要，因为它要快速扩大连锁店的数量，然后争取扩张的资金，销售产品。

大足石刻(教案新部编本)

教师学科教案[ 20 – 20 学年度第__学期] 任教学科：_____________ 任教年级：_____________ 任教老师：_____________ xx市实验学校

大足石刻风景调查重庆市渝中区望龙门小学李波【教学目标】：一、知识与技能目标 1、了解世界文化遗产——大足石刻，重点了解石刻的现在情况。 2、学习各类调查方式方法，撰写调查计划，并分工调查。培养学生分析和解决问题的能力。 3、复习信息搜索技术，准确、高效、全面收集相关主题资料。培养学生分析和解决问题的能力。 4、，培养合作学习的意识、技能与方法，增强团队意识。设计一份良好的问卷。二、过程与方法目标 1、在分解调查内容的过程中，培养学生独立思考的能力以及综合运用网络资源的能力。 2、在问卷设计的过程中，提高学生动手、探究的能力及灵活运用word软件的嫩里。 3、让学生逐步形成一种喜爱质疑、努力求知的心理倾向。三、情感、态度与价值观目标 1、培养学生团结合作、尊重他人、分享成果的良好品格。 2、培养学生鲜明的个性和创新的意识。 3、培养学生科学精神和优良品质。【教学重点】：准确、高效、全面地收集相关主题资料。

【教学难点】：围绕调查目的分解调查内容，并综合运用相关软件的能力。【教学方法】：情境教学法、案例分析法、小组合作探究法。【教学过程】：一、导入： 1、上课之前，老师先请同学们来观看一段录像片（课件：大足风光片） 2、我们重庆有许多这样漂亮的风景区，大足石刻就是其中之一，大足石刻是古代先民给我们留下的瑰宝，我们如何去了解她，保护她，我们今天将运用我们所学的知识为保护我们的历史文化遗迹做出总结的贡献，同学们愿意吗？二、新授：（一）网络调查，收集资料 1、撰写调查计划师：同学们，你们要了解哪些关于大足石刻的信息，请你利用网络知识在互联网上进行搜索，在搜索前请撰写好总结的调查计划。师：请同学们以小组为单位进行操作，在小组中学会分工合作，信息共享。（教学备注：信息搜索以网上搜索引擎的使用为主，对本机、相关联计算机内的资料收集也要重视。） 2、学生进行网络调查。（教师随机巡视，并在学生有困难的时候及时给予指导。）3、汇报成果

附重庆、大足石刻及丰都鬼城简介

附：重庆、大足石刻及丰都鬼城简介：大足石刻、丰都鬼城考察（每位代表可二选一）及重庆夜景观光 280元（费用将一并开入会务及资料费中）。重庆市位于中国西南部，长江与嘉陵江交汇处，四面环山，江水回绕，城市傍水依山，层叠而上，既以江城著称。地处长江上游，东西长470千米，南北宽450千米，总面积8.2万平方千米，与湖北、湖南、贵州、四川、陕西等省接壤。重庆自然景观颇为丰富，是一座大自然的艺术珍藏馆。其北有大巴山，东有巫山，东南有武陵山，南有大娄山，原始景观俊美，山山环绕，美不胜收；境内还有长江、嘉陵江、乌江、涪江等大江、河流横贯而过，水水相连。长江干流自西向东，横穿巫山三个背斜，形成了著名的瞿塘峡、巫峡、西陵峡，即举世闻名的立体画廊长江三峡，壮丽非凡，气贯河山。嘉陵江自西北而来，三折之后而入长江，便又形成了沥鼻峡、温塘峡、观音峡的嘉陵江小三峡，雄奇瑰丽，钟灵毓秀。青山秀水之间更是汇聚了众多的奇峰、幽谷、溶洞、泉瀑等旖旎的自然景观，令重庆之美雄奇于外。重庆是一座举世闻名的山城，它最突出的特点是地形起伏有致，立体感强，其夜景蔚为壮观。重庆还是一座现代化的城市（特别是自辖以后发生了翻天覆地的变化），旅游资源堪称得天独厚。从夏禹

王“三过其门而不入”的涂山旧痕，到国共两党众多名人名事遗址（如著名的红岩村、渣滓洞、白宫馆位居其中）；从大宁河千古悬棺真貌，到“上帝折鞭之处”的合川钓鱼城古迹；从驰名古今的长江三峡，丰都鬼城，到誉满天下的大足石刻，组成了具有重庆特色的“重庆风光”。同时，巴渝古朴独特的民风民俗引人入胜，多姿多彩的地方文艺令人倾倒。重庆还是川菜主要代表地域之一，“吃”与“游”相得益彰，平添旅游者无限雅兴。大足石刻是摩崖造像石窟艺术的总称。大足石刻群有石刻造像70多处，总计10万多躯，其中以宝顶山和北山摩崖石刻最为著名，其以佛教造像为主，儒、道教造像并陈，是中国晚期石窟造像艺术的典范，规模之宏大，艺术之精湛，内容之丰富，可与敦煌莫高窟、云冈石窟、龙门石窟齐名。丰都鬼城："鬼城"丰都古为"巴子别都"。因北宋苏轼题诗“平都天下古名山”而得名。丰都名山系道家72洞天福地之一。名山古刹多达27座；东汉和帝永元二年置县，素以"鬼国京都"、"阴曹地府"闻名于世，是传说中人类亡灵的归宿之地，集儒、佛、道民间文化于一体的民俗文化艺术宝库，被誉为中国"神曲之乡" 人类"灵魂之都"。

天猫运营全攻略之客服篇

【天猫全攻略】打造史上最强客服团队好客服就是一个好老板。客服是一个店铺的核心，以客服为中心的管理体系能带动整个店铺的管理运作！！！生意好不好，顾客说了算，而客服，是我顾客最亲密接触的人，始终与市场的核心——顾客保持沟通互动。客服，能将店铺的所有问题挖掘出来，从商品本身，页面设计、运营管理到仓储物流，而发现问题，问题就解决了一半。他们影响着店铺业绩，决定了店铺的服务体验水平，是顾客对店铺认知影响最深的人。比如，客服的态度直接影响我们的DSR，直接影响着我们免费流量搜索来源部分的店铺权重，同时，从买家端来看，也影响着我们的店铺引流和转化（在搜索结果中，飘绿的店铺更容易被跳过），客服的态度直接影响我们的售后综合指标，继而影响我们的活动报名，现在三项售后综合服务指标已经被展现在评价页面，如图1-1所示，直接会影响买家的购买决策。图1-1店铺评价页售后指标由此可见，客服的工作有多重要。依据个人的实践经验，从下面五个方面简要地谈谈客服管理。一、客服工作岗位职责与KPI考核

大家都知道，客服是电商团队里直接接触买家的人，让我们看看他们从哪些方面影响着我们店铺的绩效。从商品交易的角度来看，除开静默下单的，剩下的订单成交都与客服工作息息相关——下单转化率、下单-支付转化率、仅退款率、退货退款率，这些指标都直接影响着商家的生意。已支付成交金额-退款金额=净销售金额客服的工作绩效，既影响被减数，又影响着减数。比如，我们常常说到的客服工作之一“催单”，这个是促成交易最有效的方式，已下单未支付的订单是最易转化的，在支付环节流失订单，就像煮熟的鸭子都飞走了，特别遗憾的事情。而催单的效果和我们客服的催单能力成正比，从流量的质量分析，愿意去咨询客服的买家意向最高的了，客服的咨询转化率又直接影响我们的订单量，有的店铺客服成交额占比超过50%，对我们的店铺经营有着至关重要的影响。从用户体验来看，用户体验好，我们的回购率会增加，间接地降低了我们的流量成本，大家都知道一个新客户的引入成本是老客的N倍，老客的客单价和转化率都是最高的。客服的岗位都是一个个活生生的人，这意味着客服可以站在消费者的角度去思考问题，能换位思考，我们也称之为“同理心”，是客服人员共情的能力。好的客服能把服务做到顾客心里去。这是我们的客服机器人或者快捷回复或某种软件无法做到的。所以，无论科技技术如何进步，客服的工作都是无可取代的。大家都知道，著名的火锅连锁店海底捞，就是靠着无敌的服务屹立在餐饮界。我们客服的最高境界，就是消费者因为我们优秀的客户服务而愿意再次购买。 1、客服工作职责是哪些？我们会想，这么优秀的人上哪找去！招聘一个人，不可能让他什么都做，而且什么都要做好，这是不可能的，所以，我们要明确这个岗位的工作职责，需要做什么不做什么。每个店铺依据人员架构的安排会有所不同，图1-1 的客服工作职责给大家做一个参

大足石刻的简介

大足石刻的简介大足石刻１９９９年被列入世界文化遗产名录。“北敦煌、南大足” 保顶山是佛教圣地之一，有“上朝峨嵋，下朝宝顶”之说。石刻创始人为宋蜀中名僧赵智凤，建于南宋，历时70多年，石刻共13处，造像数以万计，以大佛湾和小佛湾规模最大。宝顶山的石刻，在风格上和北山很不相同。这里的刻像，大都是用一组一组的雕像来连续表达一个或几个不同内容的佛经故事，不仅内容丰富，而且具有浓郁的生活气息。因此，又有“天府灵山”的著称。北山石刻以佛湾造像最为集中，分南北两区域，南区多为晚唐、五代作品，北区则以两宋作品为主。卧佛重庆大足宝顶山的大足卧佛，是半身像。此雕像长31米，后半身隐于山石间，民间有此卧佛“身在大足，手摸巴县，脚踏泸州”的说法，有“佛大不可度量”的意思大足卧佛群像借山石、山势而凿刻。千手观音这是我最喜欢的石雕，它是一个非常壮观的雕像，它的"千手"其实是1007只手，尤如孔雀开屏，分别从上、左、右三个方向伸出，每只手都雕得纤美细柔，手里拿着各种法器，而且每只手掌中还有一只眼睛，因此又名“千手千眼观音”。千姿百态，无一雷同。传说千手观音原来是印度妙庄王的三公主。妙庄王有三个公主，大公主叫妙金，二公主叫妙银，三公主叫妙善。妙善自幼出家修行，父王不让她去，她坚决要去。她所在的庙子住了500个大和尚，妙庄王一把火把这庙子烧了，500和尚都被烧死在里面。她父王作了恶业，身上长了500个大脓疮，什么药都用了也无济于事。医生说非要亲骨肉的一只眼一只手作药，才能医好。大公主不愿意，二公主舍不得。只有修行的三公主妙善，听说要亲生骨肉的一只眼和一只手才能治好父王的病，自己就挖了一只眼，砍了一只手给父王作药。父王服药后，全身脓疮消失，身体康复。妙善的大孝行为感动了释迦牟尼，释迦牟尼就召见妙善公主：“你这大孝子，舍了一只眼、一只手，我就还你一千只眼、一千只手。”这样，妙善公主就成了千手千眼观世音菩萨，为成千上万的善男信女所崇敬。宝顶山的千手千眼观音也就成了历代香火鼎盛，与峨眉齐名的佛教胜地。转轮经藏窟

连锁运营管理手册

XX连锁运营管理手册

目录第一章连锁店的权利和义务 1、连锁店有权获得“XX集团XX”总部提供的教学方案、教学产品、师资培训、经营技术等支持。但必须在遵守国家有关知识产权保护等有关法律法规的前提下正确使用经营技术并保守商业秘密。 2、连锁店有权获得“XX集团XX”总部所提供的各项培训和指导。连锁店应安排好接受培训的人员，充分利用培训的机会提高自己的经营管理、教学管理水平。 3、连锁店须在授权合同约定的范围内使用“XX集团XX”总部的标记、课程体系、管理经验等从事经营活动，同时有权获得“XX集团XX”总部的培训和指导。 4、《运营管理手册》（1）连锁店有义务严格按照本套《运营管理手册》规定的标准开展培训工作。（2）在发生任何与《运营管理手册》中样本或标准不符的变动时，连锁店有义务事先取得“XX集团XX”总部的书面批准。（3）连锁店有义务将“XX集团XX”项目独立运作。 5、学费（1）“XX集团XX”总部实行指导定价。各连锁店可根据当地的经济发展水平在一定范围内自主调整。但必须提前30天以书面形式向“XX集团XX”总部提出申请，经“XX集团XX”总部书面同意方可实施。（2）连锁店有义务保证在未经“XX集团XX”总部许可的情况下，不得变相提供各种折扣。 6、连锁店有义务保证所有授课班级全部由已获得“XX集团XX教师资格证”的教师进行授课。

7、连锁店有义务制定每年、季、月市场宣传计划后递交“XX集团XX”总部备案，总部将加以指导，并制定在全国媒体上的市场宣传计划以配合连锁店。 8、连锁店有义务保证使用的所有资料均由“XX集团XX”总部提供。连锁店同时应确保提供给学员的教材为原版教材，不复制教材的任何一部分并且不向学员和其他人群提供复制教材。同时，连锁店应确保不向任何非本连锁店及“XX集团XX”课程学员以外的人群销售教材。 9、连锁店有义务保证不在合同规定的授权区域之外地域运作“XX集团XX”项目。连锁店应对违反此义务而给“XX集团XX”总部及“XX集团XX”其它连锁店造成的一切损失承担赔偿责任。 10、连锁店有义务按照国家有关规定建立完善的会计制度,并在连锁店所在地银行设立帐户。 11、由于连锁店直接或间接的商业行为所引起的损害、赔偿以及其他责任，由此产生的相关各项费用（包括但不限于听证费、仲裁费、律师费、交通费、直接和间接损失等）“XX集团XX”总部均无义务承担。连锁店应当承担由于本连锁店的不当行为对“XX集团XX”总部造成的所有损失。 12、作为被授权方，连锁店不能在未获得“XX集团XX”总部批准的情况下，擅自再授权或参与其他培训机构经营“XX集团XX”产品，连锁店应对违反此义务而给“XX集团XX”总部造成的一切损失承担赔偿责任。

连锁店标准运营管理手册DOC

连锁店标准运营手册目录前言员工必读第一篇员工的人事管理制度一、员工入职条件二、员工离职手续的处理三、福利与报酬四、考勤制度五、工作调动六、员工培训七、工作成绩考核及评核八、晋升机会第二篇店铺日常管理制度第三篇工作职责篇一、店铺运作工作（每日既定工作）

二、店铺运作工作（每周既定工作）三、店铺运作工作（每月既定工作）四、店长（副店长）工作范围五、资深营业员（见习营业员）工作范围六、营业员（见习营业员）工作范围七、收银员的工作范围内第四篇产品知识篇一、鞋系列二、服装系列三、配件系列第五篇服务篇一、服务的认识二、兼顾内、外顾客服务三、快乐服务宝典四、微笑的服务五、服务的要求第六篇附则第一篇员工人事管理制度

一、员工入职的条件 1、入职：员工在入职前经过直营部主管面试，面试合格新员工根据所接到入职通知时间携带身份证原件、复印件，学历证原件、复印件，及1寸红底彩照2张到直营部办理相关的入职手续，手续办理完毕，由直营主管安排至指定的门店实习； 2、试用期：实习员工试用期为1-3个月，试用期间，上级主管将指定专人对新进员工进行一对一带教，帮助新员工学习岗位职责、业务知识、业务流程等，员工试用期满上级主管发给《试用期考核表》，由员工自评后交部门主管复评，考核通过即给予转正；新员工试用期满仍无法达到公司工作标准的，根据实际情况将延长观察期或是解聘。二、员工的离职手续的办理： 1、主动离职 1）员工在试用期内，因个人原因申请离职，须提前3天向店长递交书面辞职申请，直营主管批准后办理相关离职手续 2）正式员工因个人原因申请离职,须提前15天向店长递交书面辞职申请，直营主管批准后办理相关离职手续 2、自动离职 1）员工未办理正常离职手续离职 2）员工连续旷工三天，视为自动离职 3）自动离职的员工扣发当月薪资作为公司补偿三、福利与报酬 1、试用期员工与正式员工分别享受相应的的薪酬待遇，试用期员工转

大足石刻英语导游词

Hello everybody，may I get your attention please？I’m your local guide from rock carving travel agency. Today I will get you to Dazu rock carving. I’m so glade to do this and I’m always at your service. Ok first let me give a brief introduction of dazu rock carving. Dazu County was located in the southeast of the Chongqing municipality. It was found in first year of Qianyuan（建元） in Tang dynasty (758 a.d.)，The name of Dazu Cou nty which suggested that of harvest and abundance has a long history of more than 1240 years. The Dazu Rock Carvings started around 650 during the Tang Dynasty (AD 618-907) and continued through the Ming Dynasty and the Qing Dynasty (1616-1911). Today, it is enjoys equal popularity with Dunhuang frescos壁画 and forms the trilogy（三部曲） with Yungang and Longmen Grottoes . And in 1999,it was listed as world cultural heritage by UNESCO. Go up to mount Omei, and go down to Baoding Shan. The spots of rock carvings at Baoding Shan, Beishan, Nan Shan, Shizhuanshan and Shimenshan have been tourist resorts since ancient times. 14 km away from Dazu County ,stone caring on Mount Baoding were created from 1179 to 1249 in the South Song Dynasty. The person in charge was Zhao Zhifeng,who was born in Xueliang township of Dazu County,several km away from mount Baoding.He became a monk when he was five, and moved to the western part of Sichuan Province to learn Buddhism when he was 16.Then he returned home and had the Buddhism “Daochang”built under his care. He dedicated his following 70 years to this course until he passed away at eh age of 90,when the forest of statues was completed at a preliminary level.The Buddha Bay we are seeing now is the major part of the stone carvings on mount Baoding. Baoding Shan, Which has come to be known as the Mountain of Efficacy{efikes} under Heaven world, receive tens of thousands of visitors and tourists on the 19th of the second month of each Chinese lunar year, the date said to be the birthday of Thousand-Armed Awalokitesvara.

连锁店运营店运营方案

精品连锁店运营店运营方案月17日《关于在朝阳市区及省内各市建立“悦牛”精品连锁店》的会议精神，牛业公司经市场调查和研究，制定具体运营方案如下：一、经营理念：建精品店，售精品货，做优质服务，创悦牛品牌二、经营模式 1、在公司的统一领导下实行经理、店长负责制。 2、根据市场运营情况确定销售额和利润指标。 3、实行基础工资加利润分成的薪酬管理制度。 4、实行统一店面、统一管理、统一配货、统一价格、统一培训、统一核算、统一着装的“七统一”管理模式； 5、土特产品要体现精、新、奇、特。二、店面选址在消费群体集中的街道或小区，对购买力进行分析基础上进行选址。三、专营店建立的标准： 1、营业面积:40—200 m2o 2、店面设计：由专业策划装饰公司统一设计、统一装修，从门脸到地面，从设置摆放到宣传板块布置实行统一标准，由牛业公司提供相关资料。

四、店内配置2 1、根据店面的不同面积，确定店内设施配置，由公司统一购买，规格一致。（1）冰柜（冻品）（2）冷鲜柜（3）货架（4）电子称（5）保鲜展示柜（样品柜）（6）刨片机（7）条码秤（8 ）电脑（9 ）收款机（10）监控系统（1 1 ）存包柜（12 ）、提货筐 2、办公用品的配置（桌椅）。 3、人员配置：精品连锁店设店长1人，营业员2-6人，收银员1-2人，更夫1人。五、店内经营区域设置 1、精品区 2、中档区六、经营品种： 1、冻肉产品系列 2、鲜肉产品系列 3＞副产品系列 4、熟食系列 5、蔬菜水果系列（由加盟商自主经营，统一管理，收取一定数额的加盟费） 6、调料、酒、饮料系列 7、粮油系列 8、土特产系列七、产品的运输配置3由公司暂配一台厢式货

大足石刻介绍

大足石刻大足石刻是世界文化遗产，世界八大石窟之一，位于重庆市大足区境内，是以佛教题材为主的宗教摩崖石刻，儒、道教造像并陈，是著名的艺术瑰宝、历史宝库和佛教圣地，有“东方艺术明珠”之称。大足石刻最初开凿于初唐，历经晚唐、五代，盛于两宋，明清时期亦有所增刻，最终形成了一处规模庞大，集中国石刻艺术精华之大成的石刻群。大足石刻群有75处，5万余尊宗教石刻造像，总计10万多躯，铭文10万余字，其中以宝顶山和北山摩崖石刻最为著名，其以佛教造像为主，是中国晚期石窟造像艺术的典范，与云岗石窟、龙门石窟和莫高窟相齐名，是古代汉族劳动人民卓越才能和艺术创造力的体现。大足石刻的千手观音是国内唯一真正的千手观音，约1006个，被誉为“世界石刻艺术之瑰宝”、“国宝中的国宝”。北山造像依岩而建，龛窟密如蜂房。宝顶山大佛湾造像长达500米，气势磅礴，雄伟壮观。变相与变文并举，图文并茂；布局构图谨严，教义体系完备，是世界上罕见的有总体构思、历经七十余年建造的一座大型石窟密宗道场。造像既追求形式美，又注重内容的准确表达。宝顶山大佛所显示的故事内容和宗教、生活哲理对世人能晓之以理，动之以情，诱之以福乐，威之以祸苦。它涵盖社会思想博大，令人省度人生，百看不厌。南山、石篆山、石门山摩崖造像精雕细琢，是中国石窟艺术群中不可多得的释、道、儒“三教”造像的珍品。大足石刻时间跨度从9世纪到13世纪，以其艺术品质极高、题材丰富多变而闻名遐迩，从世俗到宗教，反映了中国这一时期的日常社会生活，证明了这一时期佛教、道教和儒家思想的和谐相处局面，被誉为9世纪末至13世纪中叶石窟艺术陈列馆。大足石刻规模之宏大，艺术之精湛，内容之丰富，保存之完好，更是世界罕见。毫无疑问，在渝西走廊这条旅游线路上，大足石刻犹如一颗闪闪发亮的珍珠，杰出的艺术价值和悠久的历史文化都让它熠熠生辉。

连锁运营系列

产品连锁运营系列日期：5月15-16日地点：广州主题：专卖店的经营与管理主讲：中国终端培训第一人刘子滔主办：益策（中国）学习管理机构商战名家网一、课程前言在当今这个生产、流通、消费全球同步的时代，一个企业要想生存、发展与壮大，必须有足够大的力量应对全球化的竞争与挑战。连锁专卖因为有统一化、规模化的效应，能迅速壮大企业实力，已成为当今流通型企业谋求发展的最佳选择。然而，在终端专卖店经营与管理过程中，普遍遇到以下问题：易受外部市场大环境的影响；经营管理跟不上业务扩张的步伐…… 很多企业容易将所有的问题归咎于市场，但在市场大环境无法改变的情况下，我们该如何解决专卖店的经营与管理问题？针对以上问题，我们特邀中国终端培训第一人刘子滔先生担任主讲嘉宾，他以深厚的理论与丰富的实践经验，与我们一同分享专卖店经营管理的秘诀。机会难得，不容错过！二、课程目的（一）规范终端销售服务的标准化，提升单店业绩；

（二）建立标准化培训模式，降低人员招聘的难度，提升培训终端结果；（三）提升店铺整体形象，以视觉冲击来迎接新的消费时代；（四）脱离打折买赠的红海，以多元化促销为竞争手段；（五）建立有效的客户系统，从客户资源当中重新开发资源；（六）重新定位店长的职能，为多店化运作打造基石；（七）建立考核机制，厘清人性化管理的误区；（八）建立双赢激励的机制，软硬兼施提升人员工作激情；（九）加强加盟商愿景规划与管理，让员工从制度和愿景当中看到未来与希望。三、课程特色讲师的授课风格风趣、幽默，以简单、易理解的语言表述系统的理论体系，以案例、工具帮助学员清晰掌握实战操作方法，把理论与实操相结合，达到最佳的效果！四、课程提纲（一）引言 1、终端店铺四项收入分析 2、可控与不可控因素分析

潼南大佛寺景区简介

潼南大佛寺景区简介省级风景名胜区潼南大佛寺景区位于重庆市潼南县城西郊，是潼南——大足——合川“石刻金三角”的重要组成部分。大佛寺依山面江，风景佳绝。寺周里许之地，荟萃有我国第一大室内摩岩饰金大佛“八丈金仙”、我国最早使用全琉璃顶的古建筑“大像阁”、我国古代四大回音建筑之一的“石磴琴声”、我国最大的摩岩书法石刻“顶天佛字”、中外文物专家誉为石刻瑰宝的“千佛岩”和奇妙的天然回音岩“海潮音”，传说神奇的“黄罗帐”、“翠屏秋月”、“仙女洞”及“百仙岩”、“鉴亭”、、“读书台”、“合掌峰”、“滴水岩”、“瑞莲池”、“鹰蛙石”、“关刀石”、“云岩飞霞”十八胜景。大佛寺旧名“南禅寺”，始建于唐咸通三年（862年），北宋治平年间（1064-1067年）赐额“定明院”。遗存有始于隋，盛于唐，继于宋，续于元，承于明清，晚迄民国，年代一直延续未断，时间长达1400多年之久的儒、释、道三教造像125龛928尊。于岩壁和殿宇木柱、门枋、栿壁之上，还遗留下身居显赫地位之官吏所撰写之碑文以及历代文人学士为记趣揽胜而书刻的题咏87通，造像记31则，字体各异之楹联22副，记录历代水文、重大灾害之题刻7则。其摩岩造像的年代，最早为隋“开皇十一年”，即公元591年，距今已有1420年的历史，比大足石刻早200余年。其摩岩造像之历史为重庆市最早者，亦属我国早期宗教造像地区之一。尤其是大佛殿内摩岩凿造的弥勒大佛，身高18.43米。佛首凿于唐长庆四年（824年），北宋靖康丙午（1126年）续凿佛身，南宋绍兴壬申（1152年）为大佛装金，粧成“佛如金山，处于琉璃阁中，金碧争光，晃耀天际”，誉称“八丈金仙”。整个大佛像的开凿，共用时330年之久，是我国儒释道三教融合、通力协作的典范，也是我国大佛造像家族中耗时最多的大佛造像。金大佛虽然历经330余年始成，但风格统一，比例匀称，线条圆润，手法娴熟，面目慈祥，庄严肃穆，雕刻精美，栩栩如生，被众多中外文物专家誉为“金佛之冠”。大佛寺景区文化内涵丰富、历史底蕴厚重，自然风光旖旎，古迹名胜众多，是我国难得的历史文化瑰宝。1956年，公布大佛寺摩岩造像为四川省第一批文物保护单位；2006年，被国务院公布为第六批全国重点文物保护单位；2009年大佛寺景区被评为中国著名文化旅游景区；2014年被评为国家AAAA级景区。景区于2008年启动建设，目前核心区已基本建成。规划面积379公顷，将充分利用大佛寺景区丰富的佛家、道家、儒家文化资源和山水资源，挖掘厚重的人文历史资源，自然生态景观资源，整合宗教旅游文化，营建“神圣、形胜、意盛”儒、释、道共融互生的文化氛围，科学保护历史文化遗产，恢复古南禅寺历史风貌，逐步修复十八胜景，完善和优化旅游基础设施，开发利用滨江水域，通过景区、景点的衔接和串联，建成集商务会议、观光旅游、文化体验、休闲养生、互动娱乐、餐饮食宿等多功能一体化的综合景区。

连锁店的运营体制修订版

连锁店的运营体制修订版 IBMT standardization office【IBMT5AB-IBMT08-IBMT2C-ZZT18】

五、连锁店的营运体制规范

(一)总公司的营运体制——经营策略是针对连锁体系的经营策略,通过内外因素相互组合运用。 (1)外在因素可包括顾客策略(顾客层的掌握,市场区隔化的展开,固定顾客层的培养,消费需求的探讨)及竞争状况(市场情报的充分把握、同行业的深入分析、市场机会点的掌握、市场问题点的突破等项)。 (2)内在因素可包括行销策略(商品战略、价格战略、销售渠道战略)及管理制度(人事、财务、行政、营业)。 1.连锁发展计划首先要确立型态的选择及短、中、长期计划的拟定,连锁化经营有示范店的必要性,借以了解市场需求状况,发现其优点与缺点,诸如商店的店面设计、装潢设施、卖场气氛、商品计划及服务结构、顾客交易情形、作业程序、帐务处理方法及存货控制方式、人员的训练内容及现场操作要领,突发状况的处理等。唯有通过示范店的实验,才能正确拟订政策。 2.综合形象的运用 (1)CI的形象塑造具体的实施CI体制,必须深入到顾客形象的建立,广告媒体的运用,从业人员的士气提升,乃至关系企业的形象强化。 (2)CI提高的要领

必须针对商品层面、店铺表现层面、卖场构成层面、基本服务层面、销售层面、推广层面、顾客层面等诸多角度,借以塑造商店形象。连锁经营必须具备的要素 ①统一的招牌。 1统一的广告。 2统一的采购。 3统一的教育。 ⑤统一的装潢。 4统一的制造。 5统一的价格。 ⑧统一的品质。 3.管理制度的展开连锁店管理的情报体系。有关信息包括以下: 1财务管理。 2存量管理。 3季节库存管理。

Automatic Tuning of Collective Communication Operations in MPI

连锁店运营方案

互联网+快消品 经销商运营8大模式最全解析

重庆大足石刻导游词

连锁店经营模式及运作管理分析

大足石刻(教案新部编本)

最新连锁店经营模式及运作管理分析

附重庆、大足石刻及丰都鬼城简介

天猫运营全攻略之客服篇

大足石刻的简介

连锁运营管理手册

连锁店标准运营管理手册DOC

大足石刻英语导游词

连锁店运营店运营方案

大足石刻介绍

连锁运营系列

潼南大佛寺景区简介

连锁店的运营体制修订版

互联网+快消品经销商运营8大模式最全解析