Reproducible Sign Rates

This package is designed to analyze data from the following setting.

  • We are interested in real-valued parameters $\theta_1,\theta_2 ... \theta_n$. For example, $\theta_i$ might indicate the effect of a particular drug on a particular gene.
  • We have designed an experimental procedure which can estimate these parameters.
  • We have two performed the experiment twice.

Given data of this form, we assume the user can calculate three quantities of interest for each parameter $\theta_i$.

  1. Use the first replicate ("the training replicate") to produce a $p$-value, denoted $\rho_i$, for null hypothesis that $\theta_i=0$.
  2. Use the training experiment to produce an object $\hat Y_i$ estimating the sign of $\theta_i$ (i.e., if the estimator is accurate, $\hat Y_i=1$ if $\theta_i>0$ and $\hat Y_i=-1$ if $\theta_i<0$).
  3. Use the second replicate ("the validation replicate") produce an independent estimate $Y_i$ estimating the sign of $\theta_i$.

This package uses $\rho,\hat Y,Y$ to visualize whether the two replicates yielded the same results. It does so using a quantity called the Reproducible Sign Proportion, defined as

$\mathrm{RSP}(Y;\hat Y,\rho,\alpha) \triangleq \frac{|{i:\ \hat Y_i = Y_i,\ \rho_i\leq \alpha}|}{|{i:\ \rho_i\leq \alpha}|}$

In some cases each experimental procedure includes many subexperiments, and each subexperiment is approximately independent. With subexperiments, this package can be used to also estimate confidence interval for the Reproducible Sign Rate, defined as $\mathrm{RSR}(\hat Y,\rho,\alpha)\triangleq \mathbb{E}_Y[\mathrm{RSP}(Y;\hat Y,\rho,\alpha)]$.

API Documentation

process_from_matrices

This function processes $(\log \rho, \hat Y, Y)$ into an object which can be used for further processing. It assumes the parameters (and $\rho,\hat Y,Y$) can be organized into a matrix where the values in each row come from different subexperiments.

Returns a ReproducibleSignRateInfo object, storing various properties of two experimental replicates in terms of how well they agree. Each row of logrho, Yhat, Y is assumed to correspond to a different subexperiment.

If Yhat[i,j]==0, then parameter associated with index i,j is ignored (regardless of the associated rho value).

Parameters:

Name Type Description Default
logrho NxM matrix

confidence in a null hypothesis, expressed as a log p-value (from training replicate)

required
Yhat NxM matrix, int

estimate of the sign of a parameter theta (from training replicate)

required
Y NxM matrix, int

another estimate (from testing replicate)

required
n_thresholds int

number of thresholds on logrho to use

1000

process

This function processes $(\log \rho, \hat Y, Y)$ into an object which can be used for further processing. To use this function, one must explicitly specify the subexperiment associated with each parameter.

Returns a ReproducibleSignRateInfo object, storing various properties of two experimental replicates in terms of how well they agree. Subexperiments for each parameter must be given explicitly in subexperiment_ids.

If Yhat[i]==0, then parameter associated with index i is ignored (regardless of the associated rho value).

Parameters:

Name Type Description Default
logrho N vector

confidence in a null hypothesis, expressed as a log p-value (from training replicate)

required
Yhat N vector, int

estimate of the sign of a parameter theta (from training replicate)

required
Y N vector, int

another estimate (from testing replicate)

required
subexperiment_ids N vector

subexperiment for each parameter

required
n_thresholds int

number of thresholds on logrho to use

1000

ReproducibleSignRateInfo

Stores various properties of two experimental replicates in terms of how well they agree.

Attributes:

Name Type Description
logrho N vector

confidence in null hypotheses (from training replicate)

Yhat N vector

estimate of sign of parameters (from training replicate)

Yhat N vector

estimate of sign of parameters (from testing replicate)

thresholds L vector

different thresholds on logrho to use

subexperiment_ids N vector

subexperiment assocaited with each parameter

RSP L vector

reproducible sign proportion at each threshold

median_rejections L vector

median rejections per subexperiment at each threshold

ReproducibleSignRateInfo.confidence_interval

Computes confidence interval with (1-alpha) nominal coverage probability. Returns a lower and upper bound for each threshold in self.thresholds.

Parameters:

Name Type Description Default
alpha float

one minus coverage probability

required
alternative string

two-sided, lower-bound, or upper-bound

'two-sided'