当前位置：文档库 › Fisher_精确检验及实例分析

Fisher_精确检验及实例分析

Fisher 精确检验

检验两个二进制变量是否是独立的。该检验可以分析 2x2 列联表，并产生精确的 p 值，以检验以下假设：

· H0：行变量和列变量是独立的

· H1：行变量和列变量是相关的

Fisher 精确检验中的 p 值对于所有样本数量都是准确的，而当单元格计数较小时，用于检查相同假设的卡方检验的结果可能不准确。

例如，可以使用 Fisher 精确检验来分析下面的竞选结果列联表，以确定投票是否独立于投票人的性别。

候选人 A 候选人 B

对于该表，Fisher 精确检验产生的 p 值为 0.263。由于该 p 值大于常用的 a 水平，因此数据与原假设一致。因而，没有证据表明在竞选中投票人的性别会影响其选择。

您还可以使用 Fisher 精确检验来确定两个总体比率是否相等。对于此应用，原假设假定两个总体比率是相等的 (H0:p1 = p2)；备择假设可以是左尾 (p1 < p2)、右尾 (p1 > p2)，或双尾 (p1≠ p2)。Fisher 精确检验作为两个比率的检验十分有用，因为它对于所有样本数量都是准确的，而当事件数小于 5 时，以及试验数减去事件数的结果小于 5 时，基于正态近似的 2 个比率的检验可能不准确。

Fisher 精确检验基于超几何分布。因此，p 值在表的边际合计中是有条件的。

实例：下面用R语言实现检验：

> x=c(1,9,11,3)

> alle<-matrix(x, nrow=2)

> fisher.test(alle,alternative ="two.sided")

Fisher's Exact Test for Count Data

data: alle

p-value = 0.002759

alternative hypothesis: true odds ratio is not equal to 1

95 percent confidence interval:

0.0006438284 0.4258840381

sample estimates:

odds ratio

0.03723312

通过> help(fisher.test) 来查看使用说明，alternative = "two.sided"是双侧检验，可以根据说明进行调整为单侧'"greater"' or '"less"'.

fisher.test package:stats R Documentation

Fisher's Exact Test for Count Data

Description:

Performs Fisher's exact test for testing the null of independence

of rows and columns in a contingency table with fixed marginals.

Usage:

fisher.test(x, y = NULL, workspace = 200000, hybrid = FALSE,

control = list(), or = 1, alternative = "two.sided",

conf.int = TRUE, conf.level = 0.95,

simulate.p.value = FALSE, B = 2000)

Arguments:

x: either a two-dimensional contingency table in matrix form, or

a factor object.

y: a factor object; ignored if 'x' is a matrix.

fisher.test {stats}R Documentation Fisher's Exact Test for Count Data

Description

Performs Fisher's exact test for testing the null of independence of rows and columns in a contingency table with fixed marginals.

Usage

fisher.test(x, y = NULL, workspace = 200000, hybrid = FALSE,

control = list(), or = 1, alternative = "two.sided",

conf.int = TRUE, conf.level = 0.95,

simulate.p.value = FALSE, B = 2000)

Arguments

x either a two-dimensional contingency table in matrix form, or a factor object.

一个二维矩阵形式的列联表,或一个因素对象。

y a factor object; ignored if x is a matrix.

一个因素对象（x不是矩阵的前提下）

workspace an integer specifying the size of the workspace used in the network algorithm. In units of 4 bytes. Only used for

non-simulated p-values larger than 2 by 2 tables.

在网络算法的工作空间中指定大小的整数

hybrid a logical. Only used for larger than 2 by 2 tables, in which cases it indicates whether the exact probabilities (default)

or a hybrid approximation thereof should be computed. See

‘Details’.

control a list with named components for low level algorithm control.

At present the only one used is "mult", a positive integer

≥ 2with default 30 used only for larger than 2 by 2tables.

This says how many times as much space should be allocated

to paths as to keys: see file ‘fexact.c’ in the sources

of this package.

or the hypothesized odds ratio. Only used in the 2 by 2 case. alternative indicates the alternative hypothesis and must be one of "two.sided", "greater" or "less". You can specify just the

initial letter. Only used in the 2 by 2 case.

conf.int logical indicating if a confidence interval for the odds ratio in a 2 by 2 table should be computed (and returned). conf.level confidence level for the returned confidence interval. Only used in the 2 by 2 case and if conf.int = TRUE. simulate.p. a logical indicating whether to compute p-values by Monte

value Carlo simulation, in larger than 2 by 2 tables.

B an integer specifying the number of replicates used in the

Monte Carlo test.

Details

If x is a matrix, it is taken as a two-dimensional contingency table, and hence its entries should be nonnegative integers. Otherwise, both x and y must be vectors of the same length. Incomplete cases are removed, the vectors are coerced into factor objects, and the contingency table is computed from these.

For 2 by 2 cases, p-values are obtained directly using the (central or non-central) hypergeometric distribution. Otherwise, computations are based on a C version of the FORTRAN subroutine FEXACT which implements the network developed by Mehta and Patel (1986) and improved by Clarkson,

Note this fails (with an error message)

necessary so it has no more rows than columns. One constraint is that the product of the row marginals be less than 2^31 - 1.)

For 2 by 2 tables, the null of conditional independence is equivalent to the hypothesis that the odds ratio equals one. ‘Exact’ inference can be based on observing that in general, given all marginal totals fixed, the first element of the contingency table has a non-central hypergeometric distribution with non-centrality parameter given by the odds ratio (Fisher, 1935). The alternative for a one-sided test is based on the odds ratio, so alternative = "greater" is a test of the odds ratio being bigger than or.

Two-sided tests are based on the probabilities of the tables, and take as ‘more extreme’ all tables with probabilities less than or equal to that of the observed table, the p-value being the sum of such probabilities.

For larger than 2 by 2 tables and hybrid = TRUE, asymptotic chi-squared probabilities are only used if the ‘Cochran conditions’ are satisfied, that is if no cell has count zero, and more than 80% of the cells have counts at least 5: otherwise the exact calculation is used.

Simulation is done conditional on the row and column marginals, and works only if the marginals are strictly positive. (A C translation of the algorithm of Patefield (1981) is used.)

Value

A list with class "htest" containing the following components:

p.value the p-value of the test.

conf.int a confidence interval for the odds ratio. Only present in the

2 by 2 case and if argument conf.int = TRUE.

estimate an estimate of the odds ratio. Note that the conditional Maximum Likelihood Estimate (MLE) rather than the

unconditional MLE (the sample odds ratio) is used. Only

present in the 2 by 2 case.

null.value the odds ratio under the null, or. Only present in the 2 by

2 case.

alternative a character string describing the alternative hypothesis. method the character string "Fisher's Exact Test for Count Data". https://www.wendangku.net/doc/1a18100350.html, a character string giving the names of the data.

References

Agresti, A. (1990) Categorical data analysis. New York: Wiley. Pages 59–66.

Agresti, A. (2002) Categorical data analysis. Second edition. New York: Wiley. Pages 91–101.

Fisher, R. A. (1935) The logic of inductive inference. Journal of the Royal Statistical Society Series A98, 39–54.

Fisher, R. A. (1962) Confidence limits for a cross-product ratio. Australian Journal of Statistics4, 41.

Fisher, R. A. (1970) Statistical Methods for Research Workers. Oliver & Boyd.

Mehta, C. R. and Patel, N. R. (1986) Algorithm 643. FEXACT: A Fortran subroutine for Fisher's exact test on unordered r*c contingency tables. ACM Transactions on Mathematical Software, 12, 154–161.

Clarkson, D. B., Fan, Y. and Joe, H. (1993) A Remark on Algorithm 643: FEXACT: An Algorithm for Performing Fisher's Exact Test in r x c Contingency Tables. ACM Transactions on Mathematical Software, 19, 484–488.

Patefield, W. M. (1981) Algorithm AS159. An efficient method of generating r x c tables with given row and column totals. Applied Statistics30, 91–97.