Reproducible Sign Rates

This package is designed to analyze data from the following setting.

We are interested in real-valued parameters $\theta_1,\theta_2 ... \theta_n$. For example, $\theta_i$ might indicate the effect of a particular drug on a particular gene.
We have designed an experimental procedure which can estimate these parameters.
We have two performed the experiment twice.

Given data of this form, we assume the user can calculate three quantities of interest for each parameter $\theta_i$.

Use the first replicate ("the training replicate") to produce a $p$-value, denoted $\rho_i$, for null hypothesis that $\theta_i=0$.
Use the training experiment to produce an object $\hat Y_i$ estimating the sign of $\theta_i$ (i.e., if the estimator is accurate, $\hat Y_i=1$ if $\theta_i>0$ and $\hat Y_i=-1$ if $\theta_i<0$).
Use the second replicate ("the validation replicate") produce an independent estimate $Y_i$ estimating the sign of $\theta_i$.

This package uses $\rho,\hat Y,Y$ to visualize whether the two replicates yielded the same results. It does so using a quantity called the Reproducible Sign Proportion, defined as

$\mathrm{RSP}(Y;\hat Y,\rho,\alpha) \triangleq \frac{|{i:\ \hat Y_i = Y_i,\ \rho_i\leq \alpha}|}{|{i:\ \rho_i\leq \alpha}|}$

In some cases each experimental procedure includes many subexperiments, and each subexperiment is approximately independent. With subexperiments, this package can be used to also estimate confidence interval for the Reproducible Sign Rate, defined as $\mathrm{RSR}(\hat Y,\rho,\alpha)\triangleq \mathbb{E}_Y[\mathrm{RSP}(Y;\hat Y,\rho,\alpha)]$.

API Documentation

process_from_matrices

This function processes $(\log \rho, \hat Y, Y)$ into an object which can be used for further processing. It assumes the parameters (and $\rho,\hat Y,Y$) can be organized into a matrix where the values in each row come from different subexperiments.

Returns a ReproducibleSignRateInfo object, storing various properties of two experimental replicates in terms of how well they agree. Each row of logrho, Yhat, Y is assumed to correspond to a different subexperiment.

If Yhat[i,j]==0, then parameter associated with index i,j is ignored (regardless of the associated rho value).

Parameters:

Name	Type	Description	Default
`logrho`	`NxM matrix`	confidence in a null hypothesis, expressed as a log p-value (from training replicate)	required
`Yhat`	`NxM matrix, int`	estimate of the sign of a parameter theta (from training replicate)	required
`Y`	`NxM matrix, int`	another estimate (from testing replicate)	required
`n_thresholds`	`int`	number of thresholds on logrho to use	`1000`

process

This function processes $(\log \rho, \hat Y, Y)$ into an object which can be used for further processing. To use this function, one must explicitly specify the subexperiment associated with each parameter.

Returns a ReproducibleSignRateInfo object, storing various properties of two experimental replicates in terms of how well they agree. Subexperiments for each parameter must be given explicitly in subexperiment_ids.

If Yhat[i]==0, then parameter associated with index i is ignored (regardless of the associated rho value).

Parameters:

Name	Type	Description	Default
`logrho`	`N vector`	confidence in a null hypothesis, expressed as a log p-value (from training replicate)	required
`Yhat`	`N vector, int`	estimate of the sign of a parameter theta (from training replicate)	required
`Y`	`N vector, int`	another estimate (from testing replicate)	required
`subexperiment_ids`	`N vector`	subexperiment for each parameter	required
`n_thresholds`	`int`	number of thresholds on logrho to use	`1000`

ReproducibleSignRateInfo

Stores various properties of two experimental replicates in terms of how well they agree.

Attributes:

Name	Type	Description
`logrho`	`N vector`	confidence in null hypotheses (from training replicate)
`Yhat`	`N vector`	estimate of sign of parameters (from training replicate)
`Yhat`	`N vector`	estimate of sign of parameters (from testing replicate)
`thresholds`	`L vector`	different thresholds on logrho to use
`subexperiment_ids`	`N vector`	subexperiment assocaited with each parameter
`RSP`	`L vector`	reproducible sign proportion at each threshold
`median_rejections`	`L vector`	median rejections per subexperiment at each threshold

ReproducibleSignRateInfo.confidence_interval

Computes confidence interval with (1-alpha) nominal coverage probability. Returns a lower and upper bound for each threshold in self.thresholds.

Parameters:

Name	Type	Description	Default
`alpha`	`float`	one minus coverage probability	required
`alternative`	`string`	two-sided, lower-bound, or upper-bound	`'two-sided'`