当前位置：文档库 › The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System

The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System

The Performance of Runtime Data Cache Prefetching in a Dynamic

Optimization System?

Jiwei Lu,Howard Chen,Rao Fu,Wei-Chung Hsu, Bobbie Othmer,Pen-Chung Yew Department of Computer Science and Engineering University of Minnesota,Twin Cities {jiwei,chenh,rfu,hsu,bothmer,yew}@https://www.wendangku.net/doc/c28218697.html,

Dong-Yuan Chen Microprocessor Research Lab Intel Corporation

dychen@https://www.wendangku.net/doc/c28218697.html,

Abstract

Traditional software controlled data cache prefetching is often ineffective due to the lack of runtime cache miss and miss address information.To overcome this limitation,we implement runtime data cache prefetching in the dynamic optimization system ADORE(ADaptive Object code RE-optimization).Its performance has been compared with static software prefetching on the SPEC2000benchmark suite.Runtime cache prefetching shows better performance. On an Itanium2based Linux workstation,it can increase performance by more than20%over static prefetching on some benchmarks.For benchmarks that do not bene?t from prefetching,the runtime optimization system adds only1%-2%overhead.We have also collected cache miss pro?les to guide static data cache prefetching in the ORC R compiler. With that information the compiler can effectively avoid generating prefetches for loops that hit well in the data cache.

1.Introduction

Software controlled data cache prefetching is an ef?-cient way to hide cache miss latency.It has been very successful for dense matrix oriented numerical applications. However,for other applications that include indirect mem-ory references,complicated control structures and recursive data structures,the performance of software cache prefetch-ing is often limited due to the lack of cache miss and miss address information.In this paper,we try to keep the ap-plicability of software data prefetching by using cache miss pro?les and a runtime optimization system.

?This work is supported in part by the U.S.National Science Founda-tion under grants CCR-0105574and EIA-0220021,and grants from Intel, Hewlett Packard and Unisys.1.1.Impact of Program Structures

We?rst give a simple example to illustrate several is-sues involved in software controlled data cache prefetching. Fig.1shows a typical loop nest used to perform a ma-trix multiplication.Experienced programmers know how void matrix_multiply(long A[N][N], long B[N][N], long C[N][N]) {int i, j, k;

for ( i = 0; i < N; ++i )

for (j = 0; j < N; ++j )

for (k = 0; k < N; ++k )

A[i][j] += B[i][k] * C[k][j];

}

Figure1.Matrix Multiplication

to use cache blocking,loop interchange,unroll and jam,or library routines to increase performance.However,typi-cal programmers rely on the compiler to conduct optimiza-tions.In this simple example,the innermost loop is a can-didate for static cache prefetching.Note that the three ar-rays in the example are passed as parameters to the func-tion.This introduces the possibility of aliasing,and results in less precise data dependency analysis.We used two com-pilers for Itanium system in this study:the Intel R C/C++ Itanium R Compiler(i.e.ECC V7.0)[15]and the ORC open research compiler2.0[26].Both compilers have imple-mented static cache prefetch optimizations that are turned on at O3.For the above example function,the ECC com-piler generates cache prefetches for the innermost loop,but the ORC compiler does not.Moreover,if the above exam-ple is re-coded to treat the three arrays as global variables, the ECC compiler generates much more ef?cient code(over 5x faster on an Itanium2machine)by using loop unrolling. This simple example shows that program structure has a sig-ni?cant impact on the performance of static cache prefetch-ing.Some program structures require more complicated

analysis(such as interprocedural analysis)to conduct ef?-cient and effective cache prefetching.

1.2.Impact of Runtime Memory Behavior

It is in general dif?cult to predict memory working set size and reference behaviors at compile time.Consider Gaussian Elimination,for example.The execution of the loop nest usually generates frequent cache misses at the be-ginning of the execution and few cache misses near the end of the loop nest.Initially,the sub-matrix to be processed is usually too large to?t in the data caches;hence frequent cache misses will be generated.As the sub-matrices to be processed get smaller,they may?t in the cache and produce fewer cache misses.It is hard for the compiler to generate one binary that meets the cache prefetch requirements for both ends.

The memcpy library routine is another example.In some applications,the call to memcpy may involve a large amount of data movement and intensive cache misses.In some other applications,the calls to the memcpy routine have few cache misses.Once again,it is not easy1to provide one memcpy routine that meets all the requirements.

void daxpy( double *x, double *y, double a, int n ) {int i;

for ( i = 0; i < n; ++i )

y[i] += a * x[i];

}

Figure2.DAXPY

1.3.Impact of Micro-architectural Constraints

Microarchitecture can also limit the effectiveness of soft-ware controlled cache prefetching.For example,the issue bandwidth of memory operations,the memory bus and bank bandwidth[14][34],the miss latency,the non-blocking de-gree of caches,and memory request buffer size will affect the effectiveness of software cache prefetching.Consider the DAXPY loop in Fig.2,for example.On the latest Ita-nium2processor,two iterations of this loop can be com-puted in one cycle(2ldfpd s,2stfd s,2fma s,which can?t in two MMF bundles).If prefetches must be generated for both x and y arrays,the requirement of two extra memory operations per iteration would exceed the“two bundles per cycle”constraint.Since the array references in this exam-ple exhibit unit stride,the compiler could unroll the loop to reduce the number of prefetch instructions.For non-unit 1A compiler may generate multiple versions of memcpy to handle dif-ferent cases.stride loops,prefetch instructions are more dif?cult to re-duce.

Stride-based prefetching is easy to perform ef?ciently. Prefetching for pointer chasing references and indirect memory references[23][25][29][35][22]are relatively chal-lenging since they incur a higher overhead,and must be used more carefully.A typical compiler would not attempt high overhead prefetching unless there is suf?cient evidence that a code region has frequent data cache misses.

Due to the lack of cache miss information,static cache prefetches usually are less aggressive in order to avoid un-desirable runtime overhead.Therefore we attempt to con-duct data cache prefetching at runtime through a dynamic optimization system.This dynamic optimization system au-tomatically monitors the performance of the execution of the binary,identi?es the performance critical loops/traces with frequent data cache misses,re-optimizes the selected traces by inserting cache prefetch instructions,and patches the binary to redirect the subsequent execution to the op-timized https://www.wendangku.net/doc/c28218697.html,ing cache miss and cache miss address information collected from the hardware performance mon-itors,our runtime system conducts more ef?cient and effec-tive software cache prefetching,and signi?cantly increases the performance of several benchmark programs over static cache prefetching.Also in this paper,we will establish a secondary process for comparison to feedback cache miss pro?le to the ORC compiler so that the compiler could perform cache prefetch optimizations only for the code re-gion/loop that has frequent cache misses.

The remainder of the paper is organized as follows.Sec-tion2introduces the performance monitoring features on Itanium processors and the framework of our runtime op-timization system.Section3discusses the implemention of the runtime data cache prefetching.In Section4,we present the performance evaluation of runtime prefetching and a pro?le-guided static prefetching approach.Section5 highlights the related works and Section6offers the con-clusion and future work.

2.Runtime Optimization and ADORE

The ADORE(ADaptive Object code REoptimization) system is a trace based user-mode dynamic optimiza-tion system.Unlike other dynamic optimization systems [10][3][7],its pro?ling and trace selection are based on sampling of hardware performance monitors.This section explains performance monitoring,trace selection and opti-mization mechanisms in ADORE.Runtime prefetching is part of the ADORE system and will be discussed in detail in Section3.

2.1.Performance Monitoring and

pling on Itanium processor

Intel’s Itanium architecture offers support for performance monitoring[19].

2processor provides more than one hundred measure the performance events such as branch prediction rate,CPU cycles, counts,and pipeline stalls.Two usage models using these performance counters:workload tion,which gives the overall runtime cycle pro?ling,which provides information for gram bottlenecks.Both models are supported in the latest IA64Linux kernel. two models,St′e phane Eranian developed a interface called perfmon[28]to help

the IA64PMU(Performance Mornitoring performance pro?les on Itanium processors. of ADORE is built on top of this kernel In ADORE,sample collection is

signal handler communicating with the Perfmon samples the IA64PMU every N CPU N=300,000cycles).Once the kernel

up,it throws a buffer-over?ow signal to handler routine to move the samples out to a trace selection/optimization.

Each PMU sample consists of three accumulative counter values required by ADORE:CPU cycles,Retired Instruction Count,and Data Cache Load Miss Count.In addition,ADORE needs the samples of the Branch Trace Buffer(BTB)registers and the Data Event Address Reg-isters(DEAR).BTB is a circular register?le recording the most recent4branch outcomes and the source/target address information.DEAR holds the most recent ad-dress/latency information related to a data cache load miss, a TLB miss or an ALAT miss.For example,ADORE will sample DEAR for the latest data cache miss events with load latency≥8cycles.Finally,each sample is in the form of an n-tuple:.

2.2.Dynamic Optimization System

ADORE is implemented as a shared library on Linux for IA64that can be automatically linked to the applica-tion at startup.It is also a runtime trace optimizer like Dy-namo[3],but ADORE relies on the Itanium hardware per-formance monitor(HPM)to identify hotspots instead of us-ing interpretation.A trace is a single entry,multi-exit code sequence.In Dynamo,a trace is selected once the reference count of the target block of a backwards taken branch ex-ceeds a threshold.In ADORE,trace selection is based on the branch trace samples collected by HPM.Fig.3illus-trates the framework of ADORE.In ADORE,there are two threads existing at runtime,one is the original main thread running the unmodi?ed executable;the other is a dynamic optimization thread in charge of phase detection,trace se-lection and runtime prefetching.When the Linux system starts a program,it invokes a libc entry-point routine named libc start main,within which the main function is called. We modi?ed this routine by including our startup codes as shown in Fig.4.

Function dyn open and dyn close are used to open/close the dynamic optimizer.dyn open carries out four tasks. First,it creates a large shared-memory block for the orig-inal process.This is the trace pool,which stores the op-timized traces.Second,it initiates the perfmon kernel in-terface.Perfmon resets the PMU(Performance Monitoring Unit)in the Itanium processor,determines the sampling rate and creates a kernel sampling buffer,which is called System Sampling Buffer(SSB)in this paper.Third,dyn open in-stalls the signal handler to copy all sample events from the SSB to a larger circular User Event Buffer(UEB)every time the SSB over?ows.Finally,it creates a new thread for dy-namic optimization that has the same lifetime as the main thread.Next,dyn close is registered by the library function on exit so that it can be invoked at the end of main program execution.dyn close frees the memory and noti?es the op-timizer thread that the main program is complete.

2.3.Coarse-Grain Phase Detection

Recent research[31][13][6]shows that a running pro-gram often exhibits multiple phases during execution.In order to detect execution phases,ADORE implements a coarse-grain phase detector.This phase detector is lightweight,quick to respond,and reasonably accurate in detecting stable phases and catching phase changes.Since a stable phase may have different meanings in different con-texts,we de?ne a stable phase for the purpose of prefetch-ing as a stage in which the program is repeatedly executing the same set of code with a relatively stable CPI and cache miss rate.The phase detection algorithm used in ADORE is as follows:

We de?ne a pro?le window as the period of time for the SSB to?ll up.Let SIZE UEB=SIZE SSB?W,where W is a positive number(W=16in ADORE).Hence the

UEB can consist of up to W pro?le windows.By setting the sampling interval to R cycles/sample and SIZE SSB to N samples,the SSB will over?ow every R×N cycles. The UEB will then contain the latest performance pro?le of W×R×N cycles.For instance,if W=8,R=250,000 and N=4,000,the UEB can contain the most recent8 seconds of performance history on a1GHz machine.

To decide whether the main program incurs a phase change or starts a stable phase,the phase detector is in-voked every100milliseconds to check whether a new pro-?le window has been added to the UEB.If so,it computes the CP I(Cycles Per Instruction),DP I(D-cache Load Miss Per Instruction)and P C center for this pro?le window. The P C center is computed as the arithmetic mean of all of the pc addresses from the samples in that window.Thus, each pro?le window in UEB has three values:CP I,DP I, P C center.If the phase detector detects there are consecu-tive pro?le windows having low standard deviations of the above three factors,it signals that a stable phase has oc-curred.Likewise,high deviations suggest a change of the current stable phase.We compute the P C center to estimate the center of the code area of each pro?le window and com-pute the standard deviations to determine the?uctuation of code area centers of consecutive pro?le windows.To im-prove accuracy,the algorithm removes noise in the above computations.For runtime cache prefetching,we ignore phases that do not have high cache miss rate.Moreover,if a phase turns out to be from the trace pool,this phase will be skipped to avoid re-optimization(But we may continue to monitor the execution of the optimized trace to detect and ?x nonpro?table ones).

Our study shows that occasionally a stable phase cannot be detected for a long time,i.e.,the deviations are always greater than the thresholds.In such cases,the phase detector doubles the size of the pro?le window in case the window is too small to accommodate a large phase.2.4.Trace Selection

The trace selector starts to build traces after a stable phase has been detected.We will only brie?y discuss trace selection here and trace patching in Section2.5since the focus of this paper is runtime cache prefetching.

Trace selection starts by reading all samples in the UEB (User Event Buffer).Remember that the Branch Trace Buffer in performance monitoring allows for4consecu-tive branch outcomes to be recorded in each sample.These branch outcomes form a fraction of path pro?le[4].The trace selector uses one hash table to save the path pro?les and a secondary hash table to save all the branch targets. When selecting a trace,the trace selector starts from the target address having the largest reference count and builds the traces based on the path pro?les.This work is not too hard,but there are issues unique to the Itanium processor family.For instance,IA64instructions are encoded in bun-dles of2-3instructions with different template types.It is common that the second slot in a bundle is a taken branch. Thus we have to break the current bundle and connect the prior instruction stream with the instructions starting from the taken branch’s target address,discarding the remaining instruction in the fall-through path.Furthermore,the IA64 ISA[18]provides predication,so a single basic block may have disjoint execution paths that complicates data analy-sis.Nested predicates also makes branch conversion(?ip a taken branch into fall-through)dif?cult.

Instructions along the hottest path are added to the cur-rent trace until a trace stop-point is reached.This stop-point can be a function-return,a back-edge branch that makes the trace into a loop,or a conditional branch whose taken/fall-through bias is balanced.Upon this point,the trace selec-tor adds the current trace into a trace queue and continues to select the next trace.After trace selection is completed, control will be transferred to the dynamic optimizer.

2.5.Trace Patching

Trace patching involves writing optimized traces into the trace pool.At this stage,the trace patcher prepares an un-used memory area in the trace pool for each https://www.wendangku.net/doc/c28218697.html,bels and branch targets must then be mapped into the trace pool. During patching,branch instructions that jump to the orig-inal code of other traces will be modi?ed to branch to the new traces.Furthermore,the?rst instruction bundle of each trace in the original code is replaced by a new bundle that has only a branch instruction jumping to the optimized trace in the trace pool.The replaced bundle is not simply over-written;it is saved so that if the dynamic optimizer wants to unpatch the trace later on,it only needs to write this bundle back.

3.Runtime Prefetching

The purpose of runtime software prefetching is to in-sert prefetch instructions into the binary code to hide mem-ory latency.In the current version of ADORE,data cache prefetching is the major optimization implemented.Our experiments show that by inserting prefetch instructions at runtime,half of the SPEC2000benchmark programs com-piled with O2gain performance.

Just like the traditional software prefetching,our run-time optimizer merges the prefetching code directly into the traces to hide large cache miss latency after identifying the delinquent loads in hot loops.The approach of runtime prefetching in ADORE is as follows:(a)Use performance samples to locate the most recent delinquent loads.(b)If the load instruction is in a loop-type trace,extract its dependent instructions for address calculation.(c)Determine its data reference pattern.(d)Calculate the stride if it has spatial or structural locality.Otherwise,insert special codes to pre-dict strides for pointer-chasing references.(e)Schedule the prefetches.

3.1.Tracking Delinquent Loads

Each sample contains the latest data cache miss event with latency no less than8cycles.On the Itanium based systems,this much latency implies L2or L3cache misses. In general,there are more L1cache misses.However, the performance loss due to L2/L3cache misses is usu-ally higher because of the greater miss latency.Therefore prefetching for L2and L3cache misses can be more cost effective.

To track a delinquent load,the source address,the la-tency and the miss address of each cache miss event are mapped to the corresponding load instruction in a selected trace(if any).With path pro?le based trace selection,it is possible that one delinquent load appears in two or more se-lected traces.But in most of the cases,there is only one loop trace that contains the load instruction,and the current im-plementation runtime prefetching is targeted for loop traces only.Finally,prefetching in ADORE is applied to at most the top three miss instructions in each loop-type trace,i.e., the load instructions with the greatest percentage of overall latency.

3.2.Data Reference Pattern Detection

For software prefetching,there are three important data reference patterns in loops:direct array reference,indirect array reference and pointer-based reference.It is not obvi-ous what pattern a delinquent load belongs to at the binary-level.For pointer-chasing prefetching,some papers[11]

Figure5.Data Reference Patterns and Depen-

dence Code Slices

Figure6.Prefetching for Different Reference

Patterns

[34]propose checking the load address to see if it is re-ferring the memory heap.This is impractical for software based prefetching.To recognize the above three reference patterns,the runtime prefetcher in ADORE analyzes the dependent instructions for the address calculation of each delinquent load.

3.2.1.Direct/Indirect Array References

Fig.5illustrates the three data reference patterns recog-nized by ADORE’s dynamic optimizer.Cases A and B in Fig.5show that direct array reference(single-level memory access)and indirect array reference(multi-level memory access)patterns exhibiting spatial locality.Load instruc-tions in bold fonts are delinquent loads.Other instructions are related to address computation taken from the selected loop-type traces.For example,the stride of case A in Fig. 5is calculated by incrementing r14by4three times.So the stride is4+4+4=12.Case B in Fig.5shows a2-level memory reference,in which both level of refer-ences have signi?cant miss penalties.For such cases,two different code slices should be inserted to prefetch for both level of memory access,and the prefetch for the?rst level reference must be in several iterations ahead of that for the second level reference(See Fig.6).

3.2.2.Pointer Chasing Reference

Pointer-based references are dif?cult for software prefetching.Wu[35]proposes a pro?le-guided prefetching technique to?nd the regular strides in irregular programs. His method requires pro?les of address data of consecutive iterations.In the current runtime prefetching in ADORE, an approach similar to induction pointer[33]is used to help approximate the data traversal of pointer-chasing ref-erences.However,to insert correct prefetches,the dynamic optimizer must?nd the recurrent pointer from dependance analysis.Case C in Figure5gives a typical example in 181.mcf.In this example,r11is the pointer critical to the data traversal because r11is used to compute the miss ad-dress and“ld8r34=[r11]”is the delinquent load.Therefore r11is chosen to apply the induction pointer based prefetch-ing in this case.

The prefetching algorithm is straightforward.For the above example,an unused integer register remembers the value of r11at the beginning of the loop.After r11is up-dated,the address distance is calculated and multiplied by an”iteration ahead”count.Finally this ampli?ed distance is used to prefetch future data along the traversal path(See Fig.6).This approach has been shown to be useful for linked lists with partially regular strides.As for data struc-tures like graphs and trees,this approach is less applicable if cache misses are evenly distributed along all traversal paths.

3.3.Prefetch Generation

New registers are needed in computing prefetching ad-dress.On many RISC machines with base+offset address-ing mode,the computation of prefetching address can be avoided by folding the prefetch distance into the base ad-dress.For example,“lfetch[r11+80]”.However,due to the lack of this addressing mode in IA64ISA,we must generate explicit instructions to calculate prefetching address.There are multiple ways to obtain new registers on Itanium:(1) Dynamically allocate more registers using alloc,(2)Spill existing registers,(3)Work with the compiler to reserve some registers.In the current implementation,we use the third approach.Speci?cally,we ask the static compiler to reserve four global integer registers(r27?r30)and one global predicate registers(p6)from the IA64register?les. We have also tried the?rst approach,but that requires the identi?cation of the immediately dominating alloc,which may fail if multiple alloc s are used in a subroutine.How-ever,the use of reserved registers makes our system less transparent,so we are now looking for a robust mechanism to acquire free registers at the runtime.

Fig.6illustrates the inserted prefetch instructions for all three reference patterns in the above examples.Notice for cases A and B,initialization codes must be inserted on top of the loop to preset the prefetch distance.Since accu-rate miss latency of each cache miss event is available in ADORE,the prefetch distance is easily computed as:dis-tance= average latency/loop body cycles .In addition, for small strides in integer programs,prefetch distances are aligned to L1D cache line size(not for FP operations since they bypass L1cache).

3.4.Prefetch Code Optimization

Prefetch code often exhibits redundancy,hence should be optimized as well.For example,in Fig.6,case A, one“lfetch[r27],12”is suf?cient for both data prefetching and stride advancing.Such optimization is important be-cause it reduces the number of instructions executed and consumes fewer registers.

3.5.Prefetch Scheduling

Prefetch code should be scheduled at otherwise wasted empty slots so that the introduced cost is kept as small as possible.For instance,it would be better if each lfetch/ld.s be put in an instruction group having a free mem-ory slot.Ineffective insertion of prefetches may increase the number of bundles and cause signi?cant performance loss.Details of the Itanium microarchitecture can be found in[18][19].

3.6.Implementation Issues

Although architectural state preservation and precise ex-ception handling[17]are among the critical issues to be solved in dynamic optimization systems,the two optimiza-tions,trace layout and prefetching,implemented in ADORE are considered safe.Prefetch instructions use reserved reg-isters and non-faulting loads(ld.s),so they do not gener-ate exceptions or change the architecture state.The orig-inal program’s execution sequence has not been changed either,because the traces are only duplicates of the origi-nal code,and the current optimizer does not schedule other instructions.Self-modi?ed code can be detected by write-protecting text pages.

4.Performance Results

4.1.Methodology

To evaluate the performance of runtime prefetching, nine SPEC FP2000benchmarks and eight SPEC INT2000 benchmarks[32]are tested with reference inputs2.Our 2Other benchmarks either cannot be compiled properly at all optimiza-tion levels or do not have a stable execution time in our experiment.For benchmarks having multiple data inputs,only the?rst input data is used in our measurement.

Figure7.Performance of Runtime Prefetching

test machine is a2-CPU900MHz Itanium2zx6000work-station.The Operating system is Redhat Linux7.2(kernel version2.4.18.e25)with glibc2.2.4.The ORC R compiler v2.0[26]is chosen to compile the benchmark programs.

4.2.Static Prefetching Guided by Sample Pro?le

Before evaluating the performance of runtime optimiza-tion in the ADORE system,we assess a pro?le-guided static prefetching scheme in which we modi?ed the exist-ing prefetching algorithm of the ORC compiler to reduce the number of prefetches for loops guided by performance sampling pro?les.The sampling pro?le used here has the same format as that used in runtime prefetching(Section 2.1)except that the runtime prefetcher uses a smaller most recent pro?le.

The existing prefetching algorithm in the ORC compiler is activated when compiling at O3.It is similar to Todd Mowry’s algorithm[24].Like other compile-time prefetch-ing,this algorithm requires accurate array bounds and lo-cality information.It also generates unnecessary prefetches for loads that might at runtime hit well in the data caches.In our study,we merely modify this algorithm to select loops for prefetching under the pro?le’s guidance.We did not rewrite the whole algorithm to more aggressively prefetch for data reference patterns such as pointer chasing.We be-lieve pro?le-guided software prefetching can be further im-proved in such cases.

Using the sampling pro?les,the static compiler sorts the delinquent loads in decreasing order of total miss la-tency.Then these delinquent loads are added one by one to a list until the total latency caused by the loads in the list covers90%of all pro?led cache miss latency.Static prefetching is then invoked as usual(with O3)except that the compiler now generates prefetches only for loops con-taining at least one delinquent load in that https://www.wendangku.net/doc/c28218697.html,par-ing this method with normal O3optimization,83%of loops scheduled for prefetching have been?ltered out on average (See Table1).Static code size is reduced by as much as 9%.This result adds evidence to the hypothesis that only a

loops scheduled for prefetch normalized execution time normalized binary size

Spec2000O3O3+Profile O3O3+profile O3O3+Profile ammp1131310.98910.980 applu521910.99810.998 art392010.98510.964 bzip265111 1.00710.927 equake34410.99710.992 facerec941210.99710.970 fma3d10233910.99610.990 gap553181 1.00810.938 gcc6512110.99310.986 gzip8521 1.00410.939 lucas592310.99910.992 mcf7310.98610.973 mesa5831410.99510.911 parser67510.99010.958 swim1991 1.00110.995 vortex20010.99510.999 vpr120510.99010.987

Table1.Pro?le Guided Static Prefetching

few instructions in a program dominate the memory access latencies during execution,for which runtime prefetching would be ideal.In fact,the reason that there is no obvi-ous speedup from this pro?le-guided prefetching is because we merely use pro?le to?lter out unnecessary prefetches. Pro?le-guided software prefetching may attain greater per-formance if the pro?le provides suf?cient evidence and in-formation to guide more aggressive but expensive prefetch-ing transformations3.

4.3.Runtime Prefetching

Runtime prefetching is more transparent than pro?le-guided static prefetching because it needs no extra run to collect pro?les.In our test,runtime prefetching is applied to the benchmark programs from two most commonly-used compilations:O2and O3.As mentioned,at O2the ORC compiler does not generate static prefetching while at O3it does.In both compilations,the compiler reserved4integer registers,1predicate register and disabled software pipelin-ing.We disable software pipelining because our dynamic optimization currently does not handle software-pipelined loops with rotation registers.However,this limitation may 3This assumes the cache miss pro?le collected by training run is able to reliably predict the actual data references.

(a)

(b)

Figure 8.Runtime Prefetching for 179.art

(a)

(b)

Figure 9.Runtime Prefetching for 181.mcf

not signi?cantly change our results and we will discuss the performance impact later in this section.All benchmark programs run reference data inputs.

Fig.7(a)and Fig.7(b)illustrate the performance impact of O2+Runtime-Prefetching and O3+Runtime-Prefetching.In Fig.7(a),9out of 17Spec2000benchmarks have speedup from 3%to 57%.For the remaining 8programs that show no bene?t from dynamic optimization,the per-formance differences are around -2%to +1%.A further examination of the traces generated by the dynamic opti-mizer shows that our runtime prefetcher did locate the right delinquent loads in applu ,swim ,vpr and gap .The failure in improving the performance of these programs is due to three reasons.First,for some programs,the cache misses are evenly distributed among hundreds of loads in several large loops (e.g.applu ).Each load may have only 2-3%of total latency and their miss penalties are effectively over-lapped through instruction scheduling.Furthermore,the current dynamic optimizer can only deal with the top three delinquent loads.With only four integer registers available for the dynamic optimizer,we need a more sophisticated algorithm to handle a large number of prefetches in a loop.Second,some delinquent loads have complex address calcu-lation patterns (e.g.function call or fp-int conversion),caus-ing the dynamic optimizer to fail in computing the stride in-formation (in vpr ,lucas and gap ).Third,the optimizer may be unable to insert prefetches far enough to hide latencies if the loop contains few instructions and has small iteration count.For integer benchmarks,except for mcf ,runtime data prefetching has only slight speedup.gzip ’s execution time is too short (less than 1minute)for ADORE to detect a sta-ble phase.vortex is sped up by 2%but that is partly due to the improvement of I-cache locality from trace layout .gcc ,in contrast,suffers from increased I-cache misses plus sam-pling overhead and ends up with a 3.8%performance loss.This may be improved by further tuning on trace selection or I-cache prefetching.

As expected,runtime prefetching shows different results when applied to the O3binaries (Fig.7(b)).For programs like mcf ,art and equake ,the current static prefetching can-not ef?ciently reduce the data miss latency,but runtime-prefetching is able to.The performance improvement is almost as much as those received from O2binaries.How-ever,the remaining programs have been fully optimized by O3,so the runtime prefetcher skips many traces to opti-mize since they either don’t have cache misses or already

have compiler generated“lfetch”.For this reason the per-formance differences for many programs are around-3%to +2%.

Now let’s look at an example to understand how runtime prefetching works for these benchmark programs.In Fig. 8,the left graph shows the runtime CPI change for179.art with/without runtime prefetching(O2binary).The right graph shows the change of DEAR Load Miss per1000in-structions.There are two clear phases shown in both graphs. One is from the beginning of the execution;the other starts

at about1

4of way in the execution.Phase detection works

effectively in this case.The?rst phase is detected about ten seconds after startup and prefetching codes are applied immediately.Both CPI and DEAR Load Miss Per1000in-structions are reduced by almost half.About one and a half minutes later,a phase change occurs followed by the sec-ond stable phase till the end.The phase detector catches this second phase too.Since the prefetching effectively re-duces the running time,the second lines in both graphs are shorter than the top lines.Fig.9is the same type of graph for181.mcf.Other programs like bzip2,fma3d,swim and equake also exhibit similar patterns.

Only a small number of prefetches have been in-

SpecFP2000ammp applu art equake facerec fma3d lucas mesa swim direct array 0211061711619 indirect array206102000 pointer-chasing200000000 optimized phase #322134111 SpecINT2000bzip2gap gcc gzip mcf parser vortex vpr

direct array 103200121 indirect array60000000

pointer-chasing00003200 optimized phase #23202121

Table2.Prefetching Data Analysis

serted into the trace code to achieve the above speedup. Table2shows the total number of stable phases applied with runtime prefetching and the individual number of the three reference patterns prefetched in each benchmark pro-gram(O2binary).The majority of speedup comes from prefetching for direct/indirect array references.Prefetching for pointer chasing references is not widely applicable be-cause not many LDS(linked data structure)intensive appli-cations exhibit regular stride.For such kind of data struc-tures,runtime prefetching should consider more elaborate approaches such as correlation prefetching[25].

Since we have disabled software-pipelined loops and re-served4gr s when compiling the benchmarks,we must eval-uate the impact to performance.Fig.10measures this im-pact by comparing the original O2with our restricted O2. For most of the17programs,the impact of fewer registers and disabling software pipelining is minor.Four programs show difference greater than3%.They are equake,mcf, facerec and swim.These performance differences come pri-marily from SWP(Software Pipelining).However,the

rea-

Figure10.Impacts of Register Count and

Software Pipelining to Performance

son for disabling SWP is that the current dynamic optimizer in ADORE cannot insert prefetches into software pipelined loops where rotation registers are used.In the future,this limitation can be relaxed.At this time,the performance dif-ferences can be minimized by applying runtime prefetching directly to these programs compiled with O3,although the speedups for other programs might be lower.On the other hand,for mcf,equake,art,bzip2,fma3d,mesa,vpr,and vortex,their O2binaries(without SWP),when optimized by runtime prefetching,are always faster than the O3bi-nary,whether with SWP or not.

At the end of this section,we evaluate the runtime over-head incurred by our dynamic optimization.In this sys-tem,the major overhead is introduced by continuous sam-pling,phase detection and trace optimization.Our experi-ence suggests the sampling interval be no less than100,000 cycle/sample.The phase detector polls the sample buffer in a while-loop.We let it hibernate for100milliseconds after each poll to save cpu time.Furthermore,the current working mechanism of our phase detection model prevents trace optimization from being frequent.Consequently,al-though running on the second processor,the second proces-sor for the dynopt thread is idle almost all of the time4. Fig.11shows the benchmark program’s“real clock time”when prefetch insertion is disabled in ADORE(compared with O2binary).It is measured using the shell command time.The“user cpu time”are not shown here since they are always smaller than“real clock time”in our experi-ment.These results demonstrate that the extra overhead of ADORE system is trivial.

5.Related Work

5.1.Software Prefetching

Many software based prefetching techniques have been proposed in the past few years.Todd C.Mowry et al.?rst 4The same speedup can be achieved on a single cpu system.

Figure11.Overhead of Runtime Prefetching presented a general compiler prefetching algorithm work-ing effectively with scienti?c program[24].Later Luk and Mowry proposed a compiler-based prefetching scheme for recursive data structures[22].This requires extra storage at runtime.Santhanam et al.discussed the major implementa-tion issues of compile time prefetching on a particular RISC processor:HP-PA8000[30].Jump Pointer[29],which has been used widely to break the serial LDS(linked data struc-ture)traversal,stores pointers several iterations ahead in the node currently visited.Other research tried to improve spa-tial locality and runtime memory reference ef?ciency with improved compilation methods.In a recent work,Gau-tam Doshi et al.[14]discussed the downside of software prefetching and exploited the use of rotating registers and predication to reduce the instruction overhead.

5.2.Pro?le Guided Software Prefetching

Software prefetching is ineffective in pointer-based pro-grams.To address this problem,Chi K.Luk et al.presented a Pro?le Guided Post-Link Stride Prefetching[23]using a stride pro?le to obtain prefetching guidance for the com-pilers.In an earlier research,Luk and Mowry presented a correlation-pro?ling scheme[25]to help software-based techniques detect data access correlations.

Recently,Chilimbi and Hirzel[8]explored the use of burst pro?ling to prefetch for hot data stream in their dy-namic optimization framework,where three stages(pro-?ling,optimization,hibernation)repeat in a?xed timing sequence.In the pro?ling stage,hot stream patterns are catched in the form ofpairs.The optimizer, in the next stage,generates detection code and prefetches into the procedures duplicated from the program’s code seg-ment.The detection code behaves like a?nite state machine that matches the pre?xes of a hot data stream and triggers prefetching for the suf?xes.Although their framework is also a real dynamic system,the binaries must be statically instrumented for pro?ling and pre-linked with a dynamic optimizer library,which is not required by ADORE.5.3.Hardware Prefetching

Among the many studies of Hardware prefetching [12][1][20][11],Collins,et al.,attempted to prefetch delin-quent loads in a multithreaded architecture based on Ita-nium ISA[12].Annavaram,et al.,proposed a hardware Dependence Graph Precomputation mechanism(DGP)[1] aiming to reduce the latency of a pending data cache miss.

Although hardware techniques can exhibit more?exibil-ity and aggressiveness in data prefetching,they may be ex-pensive to implement.Design and performance evaluation of the above schemes are mostly carried out by simulations.

5.4.Dynamic Optimization Systems

Software Runtime Optimization Systems are commonly seen in Java Virtual Machines(JVM)[9][2][27],where Just-In-Time engines apply recompilation at runtime to achieve higher performance.On these systems,JIT usu-ally employs adaptive pro?le-feedback optimization(e.g. by runtime instrumentation or interpretation)to take advan-tages of Java programs’dynamic nature.

Dynamo[3]is a transparent dynamic native-to-native optimization system.Dynamo starts running a statically compiled executable by interpretation,waiting for hot traces to show up.Once hot traces are detected,Dynamo stops the program and generates code fragments for these traces.Subsequent execution on the same trace will be redi-rected to the newly optimized code in the fragment cache. The interpretation approach in Dynamo is expensive,and as a result,Dynamo tries to avoid interpretation by converting as many hot traces as possible to the fragment cache.To achieve this goal,it uses a small threshold to quickly de-termine if a trace is hot.This approach often ends up with translating too much code and less effective traces.There-fore in a recent work called DynamoRIO[5],this feature has been changed.Other research on dynamic optimization also explored dynamic translation[7][10],continuous pro-gram optimization[21]and binary transformation[16]. 6.Conclusion and Future Research

In this paper we propose a runtime prefetching mecha-nism in a dynamic optimization system.The overhead of this prefetching scheme and the dynamic optimization is very low due to the use of sampling on the Itanium proces-sor’s performance monitoring https://www.wendangku.net/doc/c28218697.html,ing this scheme,we can improve runtime performance by as much as57%on some SPEC2000benchmarks compiled at O2for Itanium-2processors.In contrast,compile time prefetching ap-proaches need sophisticated analysis to achieve similar or lower performance gains.For binaries compiled at O3with static prefetching,our system can improve performance by

as much as20%.In this paper,we also examined a pro?le-guided static prefetching using the HPM based sampling pro?les.The results show that a minor modi?cation of ex-isting static prefetching to use cache miss pro?les can avoid generating most of the unbene?cial prefetches without los-ing performance.

For future work on runtime prefetching,we plan to en-hance our algorithm to also handle software pipelined loops. The current phase detection scheme in our system does not work very well on programs with rapid phase changes, hence needs to be improved.Finally,we are investigating the possibility of adding selective runtime instrumentation to collect information not available from HPM.

References

[1]M.Annavaram,J.M.Patel,and E.S.Davidson.Data

Prefetching by Dependence Graph Precomputation.In ISCA-28,pages52–61.ACM Press,2001.

[2]M.Arnold,S.Fink,D.Grove,M.Hind,and P.F.Sweeney.

Adaptive Optimization in the Jalape?n o JVM.In OOP-SLA’00,pages47–65.ACM Press,2000.

[3]V.Bala,E.Duesterwald,and S.Banerjia.Dynamo:A Trans-

parent Dynamic Optimization System.In PLDI’00,pages 1–12.ACM Press,2000.

[4]T.Ball and https://www.wendangku.net/doc/c28218697.html,rus.Ef?cient Path Pro?ling.In Micro-29,

pages46–57.IEEE Computer Society Press,1996.

[5] D.Bruening,T.Garnett,and S.Amarasinghe.An Infras-

tructure for Adaptive Dynamic Optimization.In CGO’03, pages265–275,2003.

[6]H.Chen,W.-C.Hsu,J.Lu,P.-C.Yew,and D.-Y.Chen.Dy-

namic Trace Selection Using Performance Monitoring Hard-ware Sampling.In CGO’03,pages79–90,2003.

[7] A.Chernoff,M.Herdeg,R.Hookway,C.Reeve,N.Ru-

bin,T.Tye,S.Bharadwaj Yadavalli,and J.Yates.FX!32a Pro?le-Directed Binary Translator.Micro,IEEE,18(2):56–64,Mar/Apr1998.

[8]T.M.Chilimbi and M.Hirzel.Dynamic Hot Data Stream

Prefetching for General-Purpose Programs.In PLDI’02, pages199–209.ACM Press,2002.

[9]M.Cierniak,G.-Y.Lueh,and J.M.Stichnoth.Practicing

JUDO:Java Under Dynamic Optimizations.In PLDI’00, pages13–26.ACM Press,2000.

[10]R.S.Cohn,D.W.Goodwin,and P.G.Lowney.Optimiz-

ing Alpha Executables on Windows NT with Spike.Digital Technical Journal,9(4),Jun1998.

[11]J.Collins,S.Sair,B.Calder,and D.M.Tullsen.Pointer

Cache Assisted Prefetching.In Micro-35,pages62–73.

IEEE Computer Society Press,2002.

[12]J.D.Collins,H.Wang,D.M.Tullsen,C.Hughes,Y.-F.

Lee,https://www.wendangku.net/doc/c28218697.html,very,and J.P.Shen.Speculative Precomputation: Long-Range Prefetching of Delinquent Loads.In ISCA-28, pages14–25.ACM Press,2001.

[13] A.S.Dhodapkar and J. E.Smith.Managing Multi-

Con?guration Hardware via Dynamic Working Set Analy-sis.In ISCA-29,pages233–244.IEEE Computer Society, 2002.[14]G.Doshi,R.Krishnaiyer,and K.Muthukumar.Optimiz-

ing Software Data Prefetches with Rotating Registers.In PACT’01,pages257–267,2001.

[15]Intel R C++Compiler for Linux,

https://www.wendangku.net/doc/c28218697.html,/software/products/compilers/clin/. [16] A.Edwards,A.Srivastava,and H.V o.Vulcan:Binary

Transformation in A Distributed Environment.Technical Report MSR-TR-2001-50,Apr2001.

[17]M.Gschwind and E.Altman.Optimization and Precise Ex-

ceptions in Dynamic Compilation.ACM SIGARCH Com-puter Architecture News,29(1):66–74,2001.

[18]Intel Corp.Intel R IA-64Architecture Software Developer’s

Manual,revision2.1edition,Oct2002.

[19]Intel Corp.Intel R Itanium R 2Processor Reference Manual

for Software Development and Optimization,Jun2002. [20] D.Joseph and D.Grunwald.Prefetching Using Markov Pre-

dictors.In ISCA24,pages252–263.ACM Press,1997. [21]T.Kistler and M.Franz.Continuous Program Optimization:

Design and https://www.wendangku.net/doc/c28218697.html,puters,IEEE Transactions on, 50(6):549–566,Jun2001.

[22] C.-K.Luk and https://www.wendangku.net/doc/c28218697.html,piler-Based Prefetching

For Recursive Data Structures.In ASPLOS-7,pages222–233.ACM Press,1996.

[23] C.-K.Luk,R.Muth,H.Patil,R.Weiss,P.G.Lowney,and

R.Cohn.Pro?le-Guided Post-link Stride Prefetching.In ICS-16,pages167–178.ACM Press,2002.

[24]T.C.Mowry,https://www.wendangku.net/doc/c28218697.html,m,and A.Gupta.Design and Evalua-

tion of A Compiler Algorithm for Prefetching.In ASPLOS-5,pages62–73.ACM Press,1992.

[25]T.C.Mowry and C.-K.Luk.Predicting Data Cache Misses

in Non-Numeric Applications Through Correlation Pro?l-ing.In Micro-30,pages314–320.IEEE Computer Society Press,1997.

[26]Open Research Compiler for Itanium TM Processor Family,

https://www.wendangku.net/doc/c28218697.html,.

[27]M.Paleczny,C.Vick,and C.Click.The Java TM HotSpot

Server Compiler.In Java TM VM’02,2001.

[28]https://www.wendangku.net/doc/c28218697.html,/research/linux/perfmon.

[29] A.Roth and G.S.Sohi.Effective Jump-Pointer Prefetching

for Linked Data Structures.In ISCA-26,pages111–121.

IEEE Computer Society Press,1999.

[30]V.Santhanam,E.H.Gornish,and W.-C.Hsu.Data Prefetch-

ing on the HP PA-8000.In ISCA-24,pages264–273.ACM Press,1997.

[31]T.Sherwood,S.Sair,and B.Calder.Phase Tracking and

Prediction.In ISCA-30,Jun2003.

[32]SPEC:https://www.wendangku.net/doc/c28218697.html,/cpu2000.

[33] A.Stoutchinim,J.N.Amaral,G.R.Gao,J.C.Dehnert,

S.Jain,and A.Douillet.Speculative Prefetching of Induc-tion Pointers.In CC-10,April2001.

[34]Z.Wang,D.Burger,S.K.Reinhardt,K.S.Mckinley,and

C.C.Weems.Guided Region Prefetching:A Cooperative

Hardware/Software Approach.In ISCA30,2003.

[35]Y.Wu.Ef?cient Discovery of Regular Stride Patterns in

Irregular Programs and Its Use in Compiler Prefetching.In PLDI’02,pages210–221.ACM Press,2002.

HACCP体系及其应用准则国际食品法典委员会

HACC体系及其应用准则（国际食品法典委员会） 1. 前言 1.1 本指南制定了危害分析关键控制点（HACC）P 的基本原则及实施指导，以帮助食品企业提高食品安全的管理水平，保证食品卫生质量，维护消费者利益。HACC的具体实施应结合食品企业生产经营的实际情况和具体条件。 1.2 HACCI可以应用在整个食品供应链-从初级（原料）生产到最终消费。并且应以健康危害方面的科学依据为导向进行实施。HACC的实施还有助于政府对食品安全的监督，并通过提高食品安全的可信度促进经济发展。 1.3 HACCP的成功实施要求企业管理层及工作小组的充分支持和参与。HACCP勺实施相容于质量管理体系（例如ISO9000 系列），是在质量管理体系下管理食品安全的一种系统方法。 1.4国家鼓励各类食品企业自觉实施HACC管理，并对已经实施HACCP管理的企业进行指导和评价。 2. HACCP简介 20世纪60 年代初，美国的食品生产者与美国航天规划署合作，首次建立起了HACCI 系统。1993年，国际食品法典委员会（CAC推荐HACC系统为目前保障食品安全最经济有效的途径。 HACCI是以科学为基础，通过系统性地确定具体危害及其控制措施，以保证食品安全性的系统。HACCP勺控制系统着眼于预防而不是依靠终产品的检验来保证食品的安全。任何一个HACCP系统均能适应设备设计的革新、加工工艺或技术的发展变化。HACC是一个适用于各类食品企业的简便、易行、合理、有效的控制体系。 3. 定义本指南涉及的术语、定义如下： 3.1 危害分析（Hazard Analysis ）：指收集和评估有关的危害以及导致这些危害存在的资料，以确定哪些危害对食品安全有重要影响因而需要在HACC计划中予以解决的过程。 3.2关键控制点（Critical Control Point , CCP）:指能够实施控制措施的步骤。该步骤对于预防和消除一个食品安全危害或将其减少到可接受水平非常关键。 3.3必备程序（Prerequisite Programs）:为实施HACC体系提供基础的操作规范，包括良好生产规范（GMP和卫生标准操作程序（SSOP等。 3.4良好生产规范(GoodManufacture Practice，简称GMP：是为保障食品安全、质量而制定的贯穿食品生产全过程一系列措施、方法和技术要求。它要求食品生产企业应具备良好的生

国外食品安全法律法规标准清单

境外销售目的国食品安全法律法规标准清单一、CAC食品法典委员会 CAC RCP 1-1969(Rev.3-1997,Amd.1999) 食品卫生实践通则 CAC GL 2 1985(Rev.1-1993,Amd.2-2006)食品标签法典准则 CAC GL 30 1999 微生物风险评估准则和导则 CAC GL 36 1989（2011修订）食品添加剂类名和国际编码系统 CAC GL 69 2008 食品安全控制措施确认指南 CAC CODEX STAN 192-1995-2015 食品添加剂标准 CAC GL21 1997食品微生物标准建立和应用原则 CAC GL 44 2003 现代生物技术食品的风险分析原则 CAC GL 63-2007国际食品法典微生物风险管理(MRM)行为原则和准则 CAC GSFA，Codex Stan 192-1995 食品添加剂通用标准 CAC CODEX STAN 239-2003 食品添加剂的通用分析方法 CAC 食品中农药残留量2014年7月更新 CAC MRL 2 2015食品中兽药残留 CAC Codex Stan 193食品中污染物和毒素通用标准 CAC/GL 23-1997营养和健康声称使用指南 CAC/GL 24-1997 “清真”术语使用通用导则 CAC/GL 019－1995 食品安全控制紧急情况时信息交流的法典导则 CAC/GL 020－1995 食品进出口检验和出证原则 CAC/GL 025－1997 食品进口过程中拒收情况下两国信息交流导则 CAC/GL 034－1999 食品进出口检验与出证系统中增进等同互认性导则 XOT 02－1987 有关食品添加剂在食品中转移的原则 CACMISC 6 -2001食品添加剂参考规格目录二、欧盟 1、欧盟水质标准9883EEC 2、欧盟853号规章20040429 3、欧洲议会和理事会（EC）No 852规章20040429 4、欧盟食品添加剂名单NO1129 20111111 5、欧盟EC 1441 2007微生物限量中文版20071205 6、欧盟委员会第EC 2073 2005号条例关于食品的微生物标准 7、欧盟委员会183 2005条例关于食品卫生监测的要求制定（内容与EEA相关） 8、欧盟委员会第234 2011条例（EU）关于建立食品添加剂、食品酶和食品调味料对共同批准程序的欧洲议会和理事会实施条例 9、欧盟委员会第80 1089 EEC号建议关于食品添加剂安全性评价的测试 10、欧盟委员会第953 2009 EC号条例关于在食品中添加特定营养用途的物质 11、欧洲议会与理事会令2011 91 EU 关于识别食品所属批次的标记或标示 12、欧盟委员会条例（EU）为某些食品中二恶英二恶英类多氯联苯的成分的官方控制和制定取样和分析方法以及修订条例（EC）No 18832006 13、欧盟委员会第16 -2011号条例（EU）为食品和饲料的快速警报系统制定实施措施

ISO22000：2015质量和食品安全管理手册(全套)

文件编号：QSM-0001 版次： A/0质量和食品安全管理手册符合：ISO22000：2005及ISO9001：2015标准文件编号：QSM-0001 版/ 次： A / 0 受控状态：批准：

文件编号：QSM-0001 版次： A/0 0.1 目彔 0.1目彔 0.2修订控制页 0.3质量和食品安全管理手册颁布令 0.4公司简介 1范围 2引用标准 3术语和缩写 4质量和食品安全管理体系 4.1总要求 4.2文件要求 5管理责任 5.1管理承诺1 5.2以客户为关注焦点 5.3质量和食品安全方针 5.4策划 5.5职责、权限与沟通 5.6管理评审 5.7突发事件准备和响应 6资源管理 6.1资源的提供 6.2人力资源 6.3基础设施 6.4工作环境 7实施与运行 7.1产品安全和实现策划 7.2与客户有关的过程 7.3设计和开发 7.4采购 7.5生产和服务提供 7.6监视和测量装置的控制 8监视、测量、分析和改进 8.1总则 8.2监视和测量 8.3不合格品和潜在不安全品控制 8.4数据分析 8.5改进 9附录 9.1质量和食品安全管理体系组织架构及职责 9.2管理者代表（食品安全小组组长）任命书 9.3质量和食品安全管理体系方针颁布令 9.4质量和食品安全管理体系职能分配表 9.5体系模式图

文件编号：QSM-0001 版次： A/0

文件编号：QSM-0001 版次： A/0 0.3 质量和食品安全管理手册颁布令东莞市XX食品厂质量和食品安全管理手册依据国际标准化组织制定的《质量管理体系—要求》国际标准化组织制定的《食品安全管理体系——对整个食品链中组织的要求》[ISO22000：2005]和国际食品法典委员会制定的《食品卫生通则》[CAC/RCP1-1969，(Rev.4-2003)]及其附件《HACCP体系及应用准则》[Annex to CAC/RCP1-1969，(Rev.4-2003)]并结合本公司实际情况编制而成，阐述了公司质量和食品安全方针，质量和食品安全目标及质量和食品安全管理体系的过程、过程关系及其管理方法。经认真审核，符合国际质量和食品安全管理标准的要求，适合公司的需要，现予颁布。公司质量和食品安全管理手册对内是公司质量和食品安全管理体系有效运行，规范公司持续改进和提供优质产品服务的纲领性文件，是质量和食品安全管理活动的基本依据；对外是证实公司有能力稳定地提供满足客户和法律法规要求的产品服务的证实文件。公司质量和食品安全管理手册（A/0版）定于2017-03-01发布并实施。要求公司各部门、全体员工必须正确理解并严格贯彻执行。公司质量和食品安全管理手册在实施后可能会修订，在使用手册时，应注意检查是否有修订记录。总经理：日期：

食品中污染物的中国国家标准及国际法典标准对比

食品中污染物的中国国家标准及国际法典标准对比民以食为天，食物中的污染物是涉及我们切身利益的事情，也是关系到我们每一个人的问题。食物中污染物的标准就尤为的重要，而中国的标准与国际的标准又有一些不同，本文将就污染物中的铅、砷、镉的不同标准进行对比。世贸组织(WTO)的《卫生和植物卫生措施应用协定》(SPS协定)指出，其成员国应将本国食品安全标准与CAC制定的食品法典相协调。CAC制定的标准、准则在保护消费者健康和保证国际食品贸易的公平性方面有重要作用，它是解决国际食品贸易争端的标尺。由于我国已加人世贸组织，协调我国食品卫生标准与国际食品法典(CAC)标准的形势越来越紧迫。CAC的污染物标准是由其分委员会食品添加剂和污染物法典委员会(CCFAC)制定的，CCFAC在制定荇染物标准时以FAO ／WHO食品添加剂专家委员会(JECFA)提供的污染物评价资料为依据。CCFAC首先根据污染物对人类健康的危害程度及对贸易的影响程度列出JEcFA的优先评价名单，JECFA根据污染物的毒理学资料、人群暴露量资料和各国的污染水平等，确定名单中污染物的摄入量限量，对有蓄积毒性的污染物制定出暂定可耐受的每周摄入量(PTWI)或暂定的每日最大耐受摄入量(PMTDI)。CCFAC根据这些资料制定相关标准，并征求各国的意见，通过一定的程序(共8步)最终由CAC大会通过决定成为法典标准。CAC对污染物的规定与我国类似，在《食品中污染物和毒素的通用标准》(cDdexStart 193)和一些产品标准中均涉及到有关污染物指标。目前CAC标准CodexStan 193中设定了限量值的污染物有16种：铅、砷、镉、汞、铜、锡、铁、硒、硝酸盐、亚硝酸盐、氰乙烯单体、丙烯腈、黄曲霉毒素B．、黄曲霉毒紊M．、展青霉素、棕曲霉毒素。上述16种污染物对在我国的食品卫生标准中均有限量规定，但我国棕曲霉毒隶只有检验方法，投有规定限量值。除此之外我国的食品卫生标准中还有下列指标在法典标准中未作规定：铝、铬、氟、稀土、N．亚硝基化合物、多氯联苯、3一氯．1，2．丙二醇(3-MCPD)、丙烯腈-丁二烯一苯乙烯(ABS)、脱氧雪腐镰刀菌烯醇、游离棉酚、组胺。食品中污染物的法典标准与我国标准之间的差别将对我国加人世贸组织后的进出口贸易产生深远影响。有重要意义的有下面3方面： (1)法典标准比我国国家标准的指标严格，将影响到我国的出口食品对于符合我国国家标准而未达到法典水平的产品．有可能被其他国家拒绝进口。

国际食品法典委员会

国际食品法典委员会一、国际食品法典委员会的建立全球经济一体化的发展，以及人们对食品安全问题的日益重视，使得全世界食品生产者、安全管理者和消费者越来越认识到建立全球统一的食品标准是公平的食品贸易、各国制定和执行有关法律的基础，正是在这样一个大的背景下，1962年，联合国的两个组织—联合国粮食和农业组织（FAO）和联合国世界卫生组织（WHO）共同创建了国际食品法典委员会（CAC），成为唯一的政府间有关食品管理法规、标准问题的协调机构。在食品法典委员会的成立过程中，1960年和1961年是两个历史性的时期。1960年10月，第一届FAO欧洲地区会议提出了一个广泛认同的观点：“作为保护消费者健康，确保食品质量和减少贸易壁垒的重要手段，特别是在迅速形成的欧洲共同市场的形势下，需要就基本食品标准及有关问题达成国际协定。” 在此次会议过去四个月后，FAO开始与WHO、欧洲经济委员会（ECE）、联合国经济合作与发展组织（OECD）以及欧洲食品法典理事会共同讨论有关建立一个国际食品标准计划的意向。1961年11月，FAO第11次会议决议决定成立食品法典委员会，并敦促WHO尽快共同建立FAO/WHO联合食品标准计划。1962年，FAO/WHO联合食品标准会议召开，决定成立食品法典委员会（CAC）实施计划，共同制定食品法典。1963年5月，世界卫生大会第16次会议也批准了，并且通过了食品法典委员会章程。自成立之日起，CAC在食品安全领域做了大量工作。1985年，联合国大会通过消费者保护指导纲要；1991年，召开了FAO/WHO食品安全、食物中化学物和食品贸易大会；1992年，举办了FAO/WHO国际营养大会；1995年，参与签署了SPS协议和TBT协议；1996年，举办FAO世界食物大会。二、国际食品法典委员会宗旨 CAC的工作宗旨通过建立国际协调一致的食品标准体系，保护消费者健康和促进国际间公平食品贸易。三、国际食品法典委员会的组织、管理和运作 CAC的组织机构包括秘书处、执行委员会、地区协调委员会，一般专题委员会、商品委员会和政府间特别工作组。CAC的秘书处负责日常事务，执行委员会负责全面协调，并有一个主席和三个副主席。法典委员会系统的程序规定委员会成立两类分支机构，一类是法典工作委员会，负责标准草案的准备和呈交工作；一类是法典协调委员会，负责协调区域或成员国家间在该地区的食品标准，包括制定和协调地区标准。委员会系统的特点是，各分支机构委员会有一个主办成员国主持，并委派主席，各分支机构委员会分为专题委员会和商品委员会。CAC的主要工作是通过其分委员会和其他分支机构来完成，他们制定食品的横向和纵向规定，建立起一套完整的食品国际标准体系，以“食品法典”的形式向所有成员国发布。 CAC每两年举行一次会议，在罗马粮农组织总部和日内瓦世界卫生组织总部轮流进行，有时会举行更多特殊或特别会议。大会参加人员有600人之多。自2001年，大会开始采用阿拉伯语、汉语、英语、法语和西班牙语五种语言作为工作语言。会议以国家为单位。代表团通常由会员国政府所任命的高级官员率领，代表团通常由工业、消费者组织和学术机构的代表组成。还没有成为委员会成员国的国家可派代表以观察员身份出席。目前，CAC已有173个成员国和1个成员国组织（欧盟）加入该组织，如荷兰、美国、德国、加拿大、法国等，覆盖全球99%的人口，CAC在世界的影响力越来越大。四、国际食品法典委员会的“食品法典” CAC作为全球唯一的一个食品安全领域的国际资讯组织，一贯致力于在全球范围内推广

国际食品法典标准减肥用低能量配方食品CODEXSTAN201995

国际食品法典标准减肥用低能量配方食品CODEX STAN 203-1995

1范围本标准适用于第2.1条款定义的减轻体重用低能量配方食品。产品作为特殊医用食品，中度或严重肥胖者必须遵医嘱使用，处方食品销售事宜由国家级机构决定。本标准不适用于传统食物形式的预包装食品。 2定义减轻体重用低能量配方食品（Formula foods for use in very low energy diets for weight)：是指一种含有最少量的碳水化合物和每日必需营养素，作为唯一能量来源提供450?800 kcal能量的特制食品。 3基本成分和质量指标销售的产品应符合下列成分和质量要求： 3.1能量含量减轻体重用低能量配方食品作为唯一能量来源，提供每日摄入能量450?800 kcal(lkcal=4.184kJ)。 3.2营养素含量 3.2.1蛋白质在推荐每日摄入能量时，营养1中不少于50g的蛋白相当于校正蛋白消化率的氨基酸评分值为1。必需氨基酸的添加量，仅为能够改善蛋白质品质目的的最少量。除蛋氨酸外，只能使用L-型氨基酸。 3.2.2脂肪减轻体重用低能量配方食品含有3g以上的亚油酸和0.5g以下的α-亚麻酸，且亚油酸与α-亚麻酸含量比值推荐为5?15。 3.2.3碳水化合物按推荐的每日能量摄入量，可利用碳水化合物不少于50g。 3.2.4维生素和矿物质产品应含有100%推荐的每日维生素和矿物质摄入量，也包括下表未列出的其他必需营养素。维生素1）和矿物质每日摄入量维生素1）和矿物质每日摄入量维生素1）和矿物质每日摄入量维生素A 600g 维生素D 2.5g 维生素E 10mg 200g 镁350mg 维生素C 30mg 叶酸（以单谷氨酸盐表示）硫胺素0.8mg 矿物质*-- 铜 1.5mg 核黄素 1.2mg 钙500mg 锌6mg 烟酸11mg 磷500mg 钾 1.6g 1FAO/WHO 联合专家咨询组关于蛋白质品质评价报告Bethesda，MD USA，4-8 December 1989，FAO Food and Nutrition Paper No. 51，1991，Rome，p. 23。

国际食品法典委员会简介

国际食品法典委员会简介一、国际食品法典委员会的建立和宗旨国际食品法典委员会(Codex Alimentarius Commission, CAC)是由联合国粮农组织（FAO）和世界卫生组织（WHO）共同建立，以保障消费者的健康和确保食品贸易公平为宗旨的一个制定国际食品标准的政府间组织。自1961年第11届粮农组织大会和1963年第16届世界卫生大会分别通过了创建CAC的决议以来，已有173个成员国和1个成员国组织（欧盟）加入该组织，覆盖全球 99%的人口。CAC 下设秘书处、执行委员会、6个地区协调委员会，21个专业委员会（包括10 个综合主题委员会、11个商品委员会）和1个政府间特别工作组。组织

结构见右图。所有国际食品法典标准都主要在其各下属委员会中讨论和制定，然后经CAC大会审议后通过。CAC标准都是以科学为基础，并在获得所有成员国的一致同意的基础上制定出来的。CAC成员国参照和遵循这些标准，既可以避免重复性工作又可以节省大量人力和财力，而且有效地减少国际食品贸易摩擦，促进贸易的公平和公正。二、食品法典的作用食品法典已成为全球消费者、食品生产和加工者、各国食品管理机构和国际食品贸易重要的基本参照标准。法典对食品生产、加工者的观念以及消费者的意识已产生了巨大影响，并对保护公众健康和维护公平食品贸易做出了不可估量的贡献。食品法典对保护消费者健康的重要作用已在1985年联合国第39/248号决议中得到强调，为此食品法典指南采纳并加强了消费者保护政策的应用。该指南提醒各国政府应充分考虑所有消费者对食品安全的需要，并尽可能地支持和采纳食品法典的标准。食品法典与国际食品贸易关系密切，针对业已增长的全球市场，特别是作为保护消费者而普遍采用的统一食品标准，食品法典具有明显的优势。因此，实施卫生与植物卫生措施协定(SPS)和技术性贸易壁垒协定(TBT)均鼓励采用协调一致的国际食品标准。作为乌拉圭回合多边贸易谈判的产物，SPS协议引用了法典标准、指南及推荐技术标准，以此作为促进国际食品贸易的措施。因此，法典标准已成为在乌拉圭回合协议法律框架内衡量一个国家食品措施和法规是否一致的基准。

世界各国安规认证标志简介及常见标识

世界各国安规认证标志一览表及简介序号国家及地区安规标志安规简介产品验証适用范围备注 1 全球60多个国家及地区IEC国际电工委员会范围：组织起草、制定，电子电气器材等国际化标准及法规。评估和协调各国标准可行性。是由各国电工委员会组成的世界性标准化组织，其目的是为了促进世界电工电子领域的标准化。 2 全球54个国家及地区全球性相互认証标志（CB体系的正式名称是“Scheme of the IECEE for Mutual Recognition of Test Certificates for Electrical Equipment”–“IECEE电工产品测试证书互认体系”。CB体系的缩写名称意思是“Certification Bodies’Scheme” –“认证机构体系”。） CB体系覆盖的产品是IECEE系统所承认的 IEC标准范围内的产品。 IECEE是国际电工委员会电工产品合格测试与认证组织 3 欧盟CE系欧洲通用安规认証标志认証范围针对：工业设备、机械设备、通讯设备、电气产品、个人防护用品、玩具等产品。 4 欧洲ENEC (European Norms Electrica l Certification，欧洲标准电器认证)。ENEC标志是欧洲安全认证通用标志，该标志是欧洲厂商基于调和欧洲安全标准进行测试的基础之上所采用的。认証范围针对IT（信息）、设备（EN60950、变压器（EN60742，EN61558）、照明灯饰（EN60598）和相关档（EN60920，EN60440）、电器开关 01 西班牙02 比利时03 意大利04 葡萄牙05 荷兰06 爱尔兰07 卢森堡08 法国09 希腊10 德国11 奥

CAC食品卫生通则(1)

CAC 食品卫生通则导言人们有权利期望所食用的食品是安全和适于消费的。食源性疾病和食源性损伤都是人所不愿的，甚至是致命的，而且也会带来一些其他后果。食源性疾病的蔓延不仅会破坏贸易和旅游业，而且会导致收益损失、失业甚至法律诉讼。食品腐败不仅会造成浪费，使人们付出高昂的代价，而且会对贸易和消费者信心产生负面影响。国际食品贸易和出境旅行的不断增加，带来了重大的社会和经济效益,但同时也使得疾病更易于在世界范围传播。从新食品的生产、制作和分销手段的不断发展可以看出,在过去的20年里,许多国家人们的饮食习惯已经发生了巨大的变化。因此,对食品卫生进行有效地控制是避免食源性疾病、食源性损伤和食品腐败影响人们身体健康和社会经济的关键。我们每一个人,包括食物种养殖者、加工和制作者、食品经营者和消费者都有责任保证食用的食物是安全的和适于消费的。通则为保证食品卫生奠定了坚实的基础,在应用总则时,应根据情况,结合具体的卫生操作规范和微生物标准指南使用。本文件是按食品由最初生产到最终消费的食品链,说明每个环节的关键卫生控制措施,并尽可能地推荐使用以HACCP为基础的方法,根据HACCP体系及其应用准则的要求,加强食品的安全性。通则中所述的控制措施,是国际公认的保证食品安全性和消费的适宜性的基本方法,可用于政府、企业(包括个体初级食品生产者、加工和制作者、食品服务者和零售商)和消费者。 1、目的食品卫生总则: 明确适用于整个食品链 ( 包括由最初生产直到最终消费者)的基本卫生原则,以达到保证食品安全和适于消费的、与前文统一的目的；推荐基于HACCP的方法作为加强食品安全性的手段；说明应如何贯彻执行这些原则；为可能用于食品链某一环节、加工过程、零售、加强上述区域的卫生要求的具体的法典提供指南。 2、范围、使用和定义 2.1 范围 2.1.1 食品链本文件是按照食品由最初生产者到最终消费者的食品链制定食品生产必要的卫生条件,以生产出安全且适宜消费的产品,也为某些特殊环节应用的其他细则的制定提供了一个基本框架。阅读时应结合本文件和"危害分析与关键控制点(HACCP)"体系及其应用准则的内容。 2.1.2 政府、行业和消费者的任务政府可参考本文件内容来决定如何才能最好地促进通则的贯彻执行,以达到如下目的: ·充分地保护消费者,使其免患由食品引起的疾病或损伤,制定政策时应考虑到人的脆弱性或不同人群的的脆弱性； ·确保食品适于人们食用；

国际食品法典

国际食品法典食品中使用的植物蛋白制品（VPP)通用准则 CAC/GL 4-1989

1目的为指导植物蛋白制品（VPP)在食品中的安全和正确使用，确立下列原则： ——保证含VPP食品的营养质量符合预期使用目标的原则； ——含VPP食品的适当标识原则。 2范围本通用准则适用于所有植物源性蛋白，不适用于食品中使用的单细胞蛋白。 3说明产品定义有效氨基酸（available amino acids): 是指来自食物蛋白、可被吸收并且用做新陈代谢的氨基酸。氨基酸评分（amino acid score)(以前称化学评分）: 是指l.0g测试蛋白中限制氨基酸的毫克数除以通过基准氨基酸模式所确定的l.0g蛋白质中相同氨基酸的毫克数。生物价（bioavailability) : 是指一种氨基酸或其必需营养成分能够被吸收和用于代谢的程度。互补（complementation)(蛋白的）: 是指2种含不同限制氨基酸的蛋白质经混合，提高了蛋白质的营养价值。在第一种蛋白质中含量过高的氨基酸，却是第二种蛋白质的限制氨基酸，这2种蛋白质按一定比例混合后，蛋白质质量高于混合前任一种蛋白质的质量，反之亦然。有限氨基酸（limiting amino acid): 是指以基准氨基酸模式中氨基酸数量为基准，食品蛋白质中比例最低的必需氨基酸为限制氨基酸。净蛋白质比值[net protein ratio (NPR)]: 是指测试大鼠组的增重加上无蛋白质组大鼠失去的重量除以测试大鼠组消耗的蛋白质重量。营养适当（nutritional adequacy): 参见第7条款。蛋白质质量（protein quality): 是指蛋白质源提供的符合人类需求必需氨基酸和必需氮的程度。蛋白质质量主要由蛋白源中必需氨基酸含量、配比和生物价来决定。基准氨基酸模式（reference amino acid pattern): 是指由FAO/WHO/UNU (1985)规定的符合2?5岁儿童蛋白质安全摄入需求水平的理想蛋白质必需氨基酸的含量和配比。相对净蛋白比[relative NPR (RNPR)]: 是指NPR与标准蛋白质的比值。增补（supplementation)(在蛋白质营养中）: 是指通过将具有高含量必需氨基酸的蛋白质适量添加到另一种该氨基酸为有限氨基酸的蛋白质中，以提高蛋白质的质量。可利用蛋白质（utilizable protein): 是指通过代谢可提供符合人类需求的必需氨基酸和必需氮的蛋白质。按100g产品中的粗蛋白的量（N 6.25)与蛋白质质量系数（蛋白质质量系数最大值为为1.0)的乘积计算。植物蛋白制品[vegetable protein products (VPP)]: 是指以一定方式减少或去除植物原料中某些含量高的非蛋白成分（水分、脂肪、淀粉、其他碳水化合物），使蛋白质（N 6.25)含量达到>40%的食物制品。蛋白质含量以去除添加的维生素、氨基酸和食品添加剂外的干重为基数计。 4基本原则 4.1用于人类食用的VPP应不危害人类健康。按照修订的PAG/UNU准则6编写的准则附录，应作为VPP的安全性和营养质量测试的参考。 4.2VPP的营养质量应符合其预期使用的目标。 4.3应在标识上清楚地标注食品中含有VPP。含植物蛋白制品的食品，应符合《预包装食品标识通用标准》（CODEX STAN 1-1985，Rev.1-1991）的要求，并符合下列具体条款： ——应在标识上按比例递减标注完整的成分表，除非例如：添加维生素和矿物质，则应单独分组列出，且不必按比例递减排列；

国际食品法典委员会HACCP体系及其应用准则

冠智达顾问国际食品法典委员会HACCP体系及其应用准则（参考译文）危害分析和关键控制点（HACCP）体系及其应准则CAC/RCP1-1969，Rev.3(1997)的附录序言：本文第一节列出了由CAC采纳的危害分析和关键控制点(HACCP)体系的准则。在认识到体系的具体应用会因食品操作环境而改变时，第二节提供了体系就用的一般原则。 HACCP体系以科学性和系统性为基础，识别特定危害，确定控制措施，确保食品的安全性。HACCP是一种评估危害和建立控制体系的工具，着重强调对危害的预防，而不是主要依赖于对最终产品的检验。任何HACCP体系应当具有适应变化的能力，如，设备设计、加工方法的改进或技术上的发展。 HACCP可就用于从初级生产到最终消费整个食品链中，它的运用应以对人体健康风险的科学证据作为指导。在提高食品安全的同时，实施HACCP体系也能带来其它明显的好处。此外，HACCP体系的应用有助于管理机构实施检查，并通过提高食品安全的可信度促进国际贸易。 HACCP的成功应用，需要管理层和员工的全面承诺和介入。按照特定的研究对象，它也需要多学科的研究途径，一般来说，应包括农学、兽医卫生、加工、微生物学、医学、公共卫生、食品技术、环境、化学和工程等学科的专业技术。HACCP的应用与执行质量体系，例如ISO9000系列，是兼容的，在这些体系内的食品安全管理中，HACCP的应用是一个可供选择的体系系统。考虑到HACCP对食品安全的应用，以下概念也可运用在食品质量的其它方面。定义：控制（动词）（Control）：采取一切必要行动，以保证和保持符合HACCP计划所制定的指标。控制（名词）（Control）：遵循正确的方法和达到安全指标时的状态。控制措施（Control Measure）：用以防止或消除食品安全危害或将其降低到可接受的水平，所采取的任何行动和活动。纠偏行动（Corrective Action）：监测结果表明失控时，在关键控制点（CCP）上所采取的行动。关键控制点[Critical Control Point (CCP)]：可进行控制，并能防止或消除食品安全危害，或将其降低到接受水平的必需的步骤。关键限值（Critical Limit）:区分可接受与不可接受水平的指标。偏离（Deviation）：不符合关键限值。流程图（Flow diagram）：生产或制造特定食品所用操作顺序的系统表达。 HACCP（危害分析和关键控制点）：对食品安全显著危害加以识别、评估、以及控制的体系。 HACCP计划：根据HACCP原理所制定的用以确保所考虑食品链各环节中对食品安全有显著意义的危害予以控制的文件。危害（Hazard）：食品中产生的潜在的有健康危害的生物、化学或物理因子或状态。危害分析（Hazard Analysis）：收集信息和评估危害及导致其存在的条件的过程，以便决定哪些对食品安全有显著意义，从而应被列入HACCP计划中。监控（Monitor）：为了评估CCP是否处于控制之中，对被控制参数所作的有计划的连续的观察或测量活动。步骤（Step）：包括原材料，从初级生产到最终消费的食品链中某点点、程序、操作或阶段。确认（Validation）：获得证据，证明AHCCP的各要素是有效的过程。验证（Verification）：除监控外，用以确定是否符合HACCP计划所采用的方法、程序、测试和其它评价方法的应用。 HACCP体系的原理：

食品安全管理手册内容

0 前言 0.1 颁布令为保证产品质量及安全卫生满足客户和法律法规的要求，本公司参照中国认证机构国家认可委员会制定的《基于HACCP的食品安全管理体系――规范》[CNAB-S152：2004]、国际标准化组织制定的《食品安全管理体系——对整个食品链中组织的要求》[ISO22000：2005]和国际食品法典委员会制定的《食品卫生通则》[CAC/RCP1-1969，(Rev.4-2003)]及其附件《HACCP体系及应用准则》[Annex to CAC/RCP1-1969，(Rev.4-2003)]的要求，并结合本公司的特点和需要制订本手册。本手册规定了公司的食品安全管理体系执行的方针目标，引用了食品安全管理体系的核心内容，并对管理体系的过程顺序和相互作用进行了描述，对外为第三方食品安全管理体系认证提供依据，并向顾客提供本公司质量和安全卫生保证能力的证明；对内是公司实施质量和安全卫生管理，开展质量和安全卫生策划、控制、保证和改进活动的纲领性文件。现决定本公司管理体系手册于2005年12月20日生效实施。公司的所有部门和员工应切实贯彻执行手册中的各项规定，以确保本公司的产品质量与信誉。为此，现任命总经理助理：xxx先生为食品安全小组组长，负责组织实施本管理手册，对本公司管理体系的有效运行进行协调、监督和审核，并在发现任何不符合管理体系文件规定的质量活动时，有权采取必要的措施，直至该问题得到满意的解决为止。总经理： 2005年12月20日 0.2 公司概况 ???xxx印刷包装有限公司是广东省一家大型的印刷软包装企业，公司总部位于佛山市南海区西樵科技工业园，厂区占地二万多平方米，拥有六色、八色、十色高速凹版印刷机多台和干复、挤复机多台以及其它配套设备，年生产能力超过五千吨。近年来，xxx包装坚持以“质量第一、创新发展、优质服务”为宗旨，相继建立起ISO9001质量管理体系、行政管理体系、5S管理体系、安全管理体系，不断提升企业整体管理水平，严格监控产品质量，高度重视科研和新产品开发，为食品、化工等行业提供了大量的优质塑料复合软包装材料，受到国内外客户的好评和信赖。主要产品有：果冻和饮料封盖膜、带咀自立袋、饮料袋、速冻食品包装袋、真空袋、拉链袋、米袋、糖果及小食品包装袋、各类用途的标签等。地址：电话：邮编：传真：网址：（国内）（国外）

CAC_食品卫生通则

国际推荐实践规范食品卫生基本原则 CAC／RCP 1-1969，Rev.3(1997),Amended 1999 国际推荐实践规范食品卫生基本原则导言 (2) 第I章–目标 (3) 食品卫生法典基本原则: (3) 第II章–范围、使用和定义 (3) 2.1 范围 (3) 2.2 使用 (4) 2.3 定义_ (4) 第III章–初级生产 (5) 3.1 环境卫生 (5) 3.2 食品资源的卫生生产 (5) 3.3 操作、储存与运输 (5) 3.4 初级生产中的清洁、维持和个人卫生 (6) 第IV章–企业: 设计和设施 (6) 4.1 选址 (6) 4.2 厂房与车间 (7) 4.3 设备 (7) 4.4 设施 (8) 第V章–操作控制 (9) 5.1 食品危害的控制 (10) 5.2 食品危害控制的关键因素 (10) 5.3 购进材料的要求 (11) 5.4 包装............................................ .. (11) 5.5 水 (11) 5.6 管理与监督 (12) 5.7 文件与记录 (12) 5.8 撤回程序 (12) 第VI章–企业: 维护与卫生 (12) 6.1 维护与卫生 (13) 6.2 清洁计划 (13) 6.3 害虫控制体系 (14) 6.4 废弃物管理 (14) 6.5 监控的有效性 (14) 第VII章–企业: 个人卫生 (15) 7.1 卫生状况 (15) 7.2 疾病与受伤 (15) 7.3 个人清洁 (15)

7.4 个人行为习惯 (15) 7.5 参观者 (16) 第VIII章–运输 (16) 8.1 总体要求............................................................ (16) 8.2 要求 (16) 8.3 使用与维护 (17) 第IX章–产品信息与消费者意识 (17) 9.1 批号 (17) 9.2 产品信息 (17) 9.3 标识 (17) 9.4 消费者教育 (17) 第X章–培训 (18) 10.1 意识和责任 (18) 10.2 培训计划 (18) 10.3 指导和监督 (18) 10.4 知识更新培训 (18) 附件: 危害分析关键控制点(HACCP)体系及其应用准则 (19) 前言.. 19 定义 (19) HACCP体系的原则 (20) HACCP体系应用准则 (20)

国际食品法典委员会(CAC)标准(编号与名称)

国际食品法典委员会（CAC）标准（编号与名称） 1 法典标准 1.1 一般要求标准编号标准名称 Codex Stan 001－1985 预包装食品标签通用标准 Codex Stan 106－1983 辐照食品通用标准 Codex Stan 107－1981 食品添加剂销售时的标签通用标准 Codex Stan 150－1985 食用盐标准 Codex Stan 192－1995 食品添加剂标准前言 Codex Stan 193－1995 食品污染物和毒素标准前言 Codex Stan 209－1999 加工用花生中黄曲霉素残留限量标准 1.4 特殊营养与膳食(包括婴幼儿食品) 标准编号标准名称 Codex Stan 053－1981 特殊膳食用的低盐食品 Codex Stan 072－1981 婴儿配方食品 Codex Stan 073－1981 罐装的幼儿食品 Codex Stan 074－1981 加工的婴幼儿谷物类食品 Codex Stan 118－1981 无麸质食品 Codex Stan 146－1985 特殊膳食的预包装食品标签及说明的通用标准 Codex Stan 156－1987 断奶后的配方食品 Codex Stan 180－1991 特殊药疗作用食品的标签及说明 Codex Stan 181－1992 减轻体重用低能量配方食品 Codex Stan 203－1995 控制体重用配方食品 1.5 加工和速冻水果、蔬菜标准编号标准名称 Codex Stan 013－1981 番茄罐头 Codex Stan 014－1981 桃罐头 Codex Stan 015－1981 葡萄柚罐头 Codex Stan 016－1981 菜豆和黄刀豆罐头 Codex Stan 017－1981 苹果沙司罐头 Codex Stan 018－1981 甜玉米罐头 Codex Stan 038－1981 食用真菌和真菌制品 Codex Stan 039－1981 食用干菌 Codex Stan 040－1981 鲜鸡油菌 Codex Stan 041－1981 速冻豌豆

食品中致病菌限量》 gb 问答

《食品中致病菌限量》（GB29921-2013）问答一、标准的制定目的？致病菌是常见的致病性微生物，能够引起人或动物疾病。食品中的致病菌主要有沙门氏菌、副溶血性弧菌、大肠杆菌、金黄色葡萄球菌等。据统计，我国每年由食品中致病菌引起的食源性疾病报告病例数约占全部报告的40%至50%。？《食品安全法》规定，食品安全标准应当包括食品、食品相关产品中的致病性微生物、农药残留、兽药残留、重金属、污染物质以及其他危害人体健康物质的限量规定。目前，我国涉及食品致病菌限量的现行食品标准共计500多项，标准中致病菌指标的设置存在重复、交叉、矛盾或缺失等问题。？为控制食品中致病菌污染，预防微生物性食源性疾病发生，同时整合分散在不同食品标准中的致病菌限量规定，国家卫生计生委委托国家食品安全风险评估中心牵头起草《食品中致病菌限量》（GB29921-2013，以下简称GB29921）。标准经食品安全国家标准审评委员会审查通过，于2013年12月26日发布，自2014年7月1日正式实施。？ GB29921属于通用标准，适用于预包装食品。其他相关规定与本标准不一致的，应当按照本标准执行。其他食品标准中如有致病菌限量要求，应当引用本标准规定或者与本标准保持一致。？二、标准的实施要求？ GB29921实施日期（2014年7月1日）前，允许并鼓励食品生产经营单位按照

本标准执行。在标准实施日期之后，食品生产经营单位、食品安全监管机构和检验机构应按照本标准执行。在实施日期前已生产的食品可在保质期内继续销售。进口食品的标准执行时间应按照相关规定执行。致病菌检验应按照GB29921引用的检验方法执行。？食品生产经营者应当严格执行食品生产经营规范标准或采取相应控制措施，严格生产经营过程的微生物控制,确保产品符合GB29921规定。？国家卫生计生委将组织对GB29921实施情况进行跟踪评价，根据评价情况适时修订完善标准。？三、标准的制定原则与制定过程？（一）以健康保护为目的。GB29921制定目的是控制食品中致病菌污染，预防食源性疾病。起草组分析我国2005年至2011年食源性疾病发生原因，参照国际管理经验，对“致病菌-食品”组合开展风险评估，根据风险监测和风险评估结果，优先制定高危食品中的重要致病菌限量，降低高危致病菌导致食源性疾病的风险。？（二）以科学为依据。起草组在食品中致病菌风险监测和风险评估基础上，综合分析相关致病菌或其代谢产物可能造成的健康危害、原料中致病菌情况、食品加工、贮藏、销售和消费等各环节致病菌变化情况，充分考虑各类食品的消费人群和相关致病菌指标的应用成本/效益分析等因素，科学设置致病菌限量指标。？（三）参考国外评估结果和标准，完善标准规定。GB29921参考了相关国际组织致病菌风险评估结果和标准规定，包括国际食品法典委员会（CAC）食品微生物标准的制定和应用原则、联合国粮农组织/世界卫生组织食品微生物风险评估专家委员

国际食品法典委员会(CAC)标准(编号与名称)

国际食品法典委员会（CAC）标准（编号与名称）1 法典标准 1.1 一般要求标准编号标准名称 Codex Stan 001－1985 预包装食品标签通用标准 Codex Stan 106－1983 辐照食品通用标准 Codex Stan 107－1981 食品添加剂销售时的标签通用标准 Codex Stan 150－1985 食用盐标准 Codex Stan 192－1995 食品添加剂标准前言 Codex Stan 193－1995 食品污染物和毒素标准前言 Codex Stan 209－1999 加工用花生中黄曲霉素残留限量标准 1.4 特殊营养与膳食(包括婴幼儿食品) 标准编号标准名称 Codex Stan 053－1981 特殊膳食用的低盐食品 Codex Stan 072－1981 婴儿配方食品 Codex Stan 073－1981 罐装的幼儿食品 Codex Stan 074－1981 加工的婴幼儿谷物类食品 Codex Stan 118－1981 无麸质食品 Codex Stan 146－1985 特殊膳食的预包装食品标签及说明的通用标准 Codex Stan 156－1987 断奶后的配方食品 Codex Stan 180－1991 特殊药疗作用食品的标签及说明 Codex Stan 181－1992 减轻体重用低能量配方食品 Codex Stan 203－1995 控制体重用配方食品 1.5 加工和速冻水果、蔬菜标准编号标准名称 Codex Stan 013－1981 番茄罐头 Codex Stan 014－1981 桃罐头 Codex Stan 015－1981 葡萄柚罐头 Codex Stan 016－1981 菜豆和黄刀豆罐头 Codex Stan 017－1981 苹果沙司罐头 Codex Stan 018－1981 甜玉米罐头 Codex Stan 038－1981 食用真菌和真菌制品 Codex Stan 039－1981 食用干菌 Codex Stan 040－1981 鲜鸡油菌 Codex Stan 041－1981 速冻豌豆

食品安全管理程序

□受控文件控状态 □非受控文件分发号食品安全管理程序《本手册根据I S O22000食品安全管理体系标准P A S220：2008公众可获取规范及C A C/R C P1-1969，R e v.3 (2003)附件H A C C P原理编制》批准编制修改履历

No 修改日期修改页与修改内容批准编制

目录章节名称一公司关于建立食品安全小组的颁布令二产品描述管理程序三产品工艺流程图管理程序四危害分析管理程序五关键控制限的建立管理程序六CCP监视和测量管理程序七纠偏措施管理程序八食品安全管理体系的验证程序附件1原辅料及产品规格书附件2产品工艺流程图附件3危害分析工作表附件4操作性前提方案附件5 HACCP计划附件6 风险评估方法和措施分类判断树1．目的：

规定食品安全管理体系从策划、实施到体系的验证等所有过程具体的实施要求，确保安全产品的策划和实现过程得到控制。 2．使用范围：适用于杭州紫江包装有限公司。 3．引用文件： ISO22000:2005食品安全管理体系； PAS220：2008《食品安全生产前提方案》； GB14881《食品企业通用卫生规范》；联合国食品法典委员会的《食品卫生通则》。 4．定义/术语：参考ISO22000：2005第3部分。 5．主要职责： a)公司总经理负责任命食品安全小组及组长； b)食品安全小组负责食品安全管理体系的策划，编制、实施及确认和验证。 c)各部门参与相关文件的策划、编制及实施。 6．程序内容（一、）公司关于建立食品安全小组的颁布令各管理部门：