runSimulation wraps around setModel, the phenotype component functions (genFixedEffects, genBgEffects, noiseBgEffects, noiseFixedEffects and correlatedBgEffects), rescales each component and combines them into the final phenotype. For details to all parameters, see the respective functions.

runSimulation(N, P, genVar = NULL, h2s = NULL, theta = 0.8,
  h2bg = NULL, eta = 0.8, noiseVar = NULL, rho = NULL,
  delta = NULL, gamma = 0.8, phi = NULL, alpha = 0.8,
  tNrSNP = 5000, cNrSNP = 20, SNPfrequencies = c(0.1, 0.2, 0.4),
  genotypefile = NULL, format = "delim", genoFilePrefix = NULL,
  genoFileSuffix = NULL, genoDelimiter = ",", skipFields = NULL,
  header = FALSE, probabilities = FALSE, chr = NULL,
  NrSNPsOnChromosome = NULL, NrChrCausal = NULL, kinshipfile = NULL,
  kinshipHeader = FALSE, kinshipDelimiter = ",", standardise = TRUE,
  distBetaGenetic = "norm", mBetaGenetic = 0, sdBetaGenetic = 1,
  pTraitsAffectedGenetics = 1, pIndependentGenetic = 0.4,
  pTraitIndependentGenetic = 0.2, keepSameIndependentSNPs = FALSE,
  NrFixedEffects = 1, NrConfounders = 10, distConfounders = "norm",
  mConfounders = 0, sdConfounders = 1, catConfounders = NULL,
  probConfounders = NULL, distBetaConfounders = "norm",
  mBetaConfounders = 0, sdBetaConfounders = 1,
  pTraitsAffectedConfounders = 1, pIndependentConfounders = 0.4,
  pTraitIndependentConfounders = 0.2,
  keepSameIndependentConfounders = FALSE, pcorr = 0.8,
  corrmatfile = NULL, meanNoiseBg = 0, sdNoiseBg = 1,
  nonlinear = NULL, logbase = 10, expbase = NULL, power = NULL,
  customTransform = NULL, transformNeg = "abs",
  proportionNonlinear = 0, sampleID = "ID_", phenoID = "Trait_",
  snpID = "SNP_", seed = 219453, verbose = FALSE)

Arguments

N

Number [integer] of samples to simulate.

P

Number [integer] of phenotypes to simulate.

genVar

Proportion [double] of total genetic variance.

h2s

Proportion [double] of genetic variance of genetic variant effects.

theta

Proportion [double] of variance of shared genetic variant effects.

h2bg

Proportion [double] of genetic variance of infinitesimal genetic effects; either h2s or h2bg have to be specified and h2s + h2bg = 1.

eta

Proportion [double] of variance of shared infinitesimal genetic effects.

noiseVar

Proportion [double] of total noise variance.

rho

Proportion [double] of noise variance of correlated effects; sum of rho, delta and phi has to be equal 1.

delta

Proportion [double] of noise variance of non-genetic covariate effects; sum of rho, delta and phi has to be equal 1.

gamma

Proportion [double] of variance of shared non-genetic covariate effects.

phi

Proportion [double] of noise variance of observational noise effects; sum of rho, delta and phi has to be equal 1.

alpha

Variance [double] of shared observational noise effect.

tNrSNP

Total number [integer] of SNPs to simulate; these SNPs are used for kinship estimation.

cNrSNP

Number [integer] of causal SNPs; used as genetic variant effects.

SNPfrequencies

Vector of allele frequencies [double] from which to sample.

genotypefile

Needed when reading external genotypes (into memory), path/to/genotype file [string] in format specified by format.

format

Needed when reading external genotypes, specifies the format of the genotype data; has to be one of plink, oxgen, genome, bimbam and delim when reading files into memory, or one of oxgen, bimbam or delim if sampling genetic variants from file; for details see readStandardGenotypes and getCausalSNPs.

genoFilePrefix

Needed when sampling cuasal SNPs from file, full path/to/chromosome-wise-genotype-file-ending-before-"chrChromosomeNumber" (no '~' expansion!) [string]

genoFileSuffix

Needed when sampling causal SNPs from file, following chromosome number including fileformat (e.g. ".csv") [string]

genoDelimiter

Field separator [string] of genotypefile or genoFile if format == delim.

skipFields

Number [integer] of fields (columns) in to skip in genoFilePrefix-genoFileSuffix-file. See details in getCausalSNPs if format == delim.

header

[logical] Can be set to indicate if genoFilePrefix-genoFileSuffix file has a header for format == 'delim'. See details in getCausalSNPs.

probabilities

[bool]. If set to TRUE, the genotypes in the files described by genoFilePrefix and genoFileSuffix are provided as triplets of probablities (p(AA), p(Aa), p(aa)) and are converted into their expected genotype frequencies by 0*p(AA) + p(Aa) + 2p(aa) via probGen2expGen.

chr

Numeric vector of chromosomes [integer] to chose NrCausalSNPs from; only used when external genotype data is sampled i.e. !is.null(genoFilePrefix)

NrSNPsOnChromosome

Specifies the number of SNPs [integer] per entry in chr (see above); has to be the same length as chr. If not provided, lines in genoFilePrefix-genoFileSuffix file will be counted (which can be slow for large files).

NrChrCausal

Number [integer] of causal chromosomes to chose NrCausalSNPs from (as opposed to the actual chromosomes to chose from via chr ); only used when external genotype data is sampled i.e. !is.null(genoFilePrefix).

kinshipfile

path/to/kinshipfile [string]; if provided, kinship for simulation of genetic backgound effect will be read from file.

kinshipHeader

[boolean] If TRUE kinship file has header information.

kinshipDelimiter

Field separator [string] of kinship file.

standardise

[boolean] If TRUE genotypes will be standardised for kinship estimation (recommended).

distBetaGenetic

Name [string] of distribution to use to simulate effect sizes of genetic variants; one of "unif" or "norm".

mBetaGenetic

Mean/midpoint [double] of normal/uniform distribution for effect sizes of genetic variants.

sdBetaGenetic

Standard deviation/extension from midpoint [double] of normal/uniform distribution for effect sizes of genetic variants.

pTraitsAffectedGenetics

Proportion [double] of traits affected by the genetic variant effect. For non-integer results of pTraitsAffected*P, the ceiling of the result is used. Allows to simulate for instance different levels of pleiotropy.

pIndependentGenetic

Proportion [double] of genetic variant effects to have a trait-independent fixed effect.

pTraitIndependentGenetic

Proportion [double] of traits influenced by independent genetic variant effects.

keepSameIndependentSNPs

[boolean] If set to TRUE, the independent SNPs effects always influence the same subset of traits.

NrFixedEffects

Number [integer] of different non-genetic covariate effects to simulate; allows to simulate non-genetic covariate effects from different distributions or with different parameters.

NrConfounders

Number [integer] of non-genetic covariates; used as non-genetic covariate effects.

distConfounders

Vector of name(s) [string] of distributions to use to simulate confounders; one of "unif", "norm", "bin", "cat_norm", "cat_unif".

mConfounders

Vector of mean(s)/midpoint(s) [double] of normal/uniform distribution for confounders.

sdConfounders

Vector of standard deviation(s)/extension from midpoint(s) [double] of normal/uniform distribution for confounders.

catConfounders

Vector of confounder categories [factor]; required if distConfounders "cat_norm" or "cat_unif".

probConfounders

Vector of probability(ies) [double] of binomial confounders (0/1); required if distConfounders "bin".

distBetaConfounders

Vector of name(s) [string] of distribution to use to simulate effect sizes of confounders; one of "unif" or "norm".

mBetaConfounders

Vector of mean(s)/midpoint(s) [double] of normal/uniform distribution for effect sizes of confounders.

sdBetaConfounders

Vector of standard deviation(s)/extension from midpoint(s) [double] of normal/uniform distribution for effect sizes of confounders.

pTraitsAffectedConfounders

Proportion(s) [double] of traits affected by the non-genetic covariates. For non-integer results of pTraitsAffected*P, the ceiling of the result is used.

pIndependentConfounders

Vector of proportion(s) [double] of non-genetic covariate effects to have a trait-independent effect.

pTraitIndependentConfounders

Vector of proportion(s) [double] of traits influenced by independent non-genetic covariate effects.

keepSameIndependentConfounders

[boolean] If set to TRUE, the independent confounder effects always influence the same subset of traits.

pcorr

Correlation [double] between phenotypes.

corrmatfile

path/to/corrmatfile.csv [string] with comma-separated [P x P] numeric [double] correlation matrix; if provided, correlation matrix for simulation of correlated backgound effect will be read from file; file should NOT contain an index or header column.

meanNoiseBg

Mean [double] of the normal distributions for the simulation observational noise effects.

sdNoiseBg

Standard deviation [double] of the normal distributions for the simulations of the observational noise effects.

nonlinear

nonlinear transformation method [string]; one exp (exponential), log (logarithm), poly (polynomial), sqrt (squareroot) or custom (user-supplied function); if log or exp, base can be specified; if poly, power can be specified; if custom, a custom function (see for details). Non-linear transformation is optional, default is NULL ie no transformation (see details).

logbase

[int] base of logarithm for non-linear phenotype transformation (see details).

expbase

[int] base of exponential function for non-linear phenotype transformation (see details).

power

[double] power of polynomial function for non-linear phenotype transformation.

customTransform

[function] custom transformation function accepting a single argument.

transformNeg

[string] transformation method for negative values in non linear phenotype transformation. One of abs (absolute value) or set0 (set all negative values to zero). If nonlinear==log and transformNeg==set0, negative values set to 1e-5

proportionNonlinear

[double] proportion of the phenotype to be non- linear (see details)

sampleID

Prefix [string] for naming samples (will be followed by sample number from 1 to N when constructing sample IDs); only used if genotypes/kinship are simulated/do not have sample IDs.

phenoID

Prefix [string] for naming traits (will be followed by phenotypes number from 1 to P when constructing phenotype IDs).

snpID

Prefix [string] for naming SNPs (will be followed by SNP number from 1 to NrSNP when constructing SNP IDs).

seed

Seed [integer] to initiate random number generation.

verbose

[boolean]; If TRUE, progress info is printed to standard out

Value

Named list of i) dataframe of proportion of variance explained for each component (varComponents), ii) a named list with the final simulated phenotype components (phenoComponentsFinal), iii) a named list with the intermediate simulated phenotype components (phenoComponentsIntermediate), iv) a named list of parameters describing the model setup (setup) and v) a named list of raw components (rawComponents) used for genetic effect simulation (genotypes and/or kinship, eigenvalues and eigenvectors of kinship)

Details

Phenotypes are modeled under a linear additive model where Y = WA + BX + G + C + Phi, with WA the non-genetic covariates, BX the genetic variant effects, G the infinitesimal genetic effects, C the correlated background effects and the Phi the observational noise. For more information on these components look at the respective function descriptions (see also) Optionally the phenotypes can be non-linearly transformed via: Y_trans = (1-alpha) x Y + alpha x f(Y). Alpha is the proportion of non- linearity of the phenotype and f is a non-linear transformation, and one of exp, log or sqrt.

See also

Examples

# simulate phenotype of 100 samples, 10 traits from genetic and noise # background effects, with variance explained of 0.2 and 0.8 respectively genVar = 0.2 simulatedPhenotype <- runSimulation(N=100, P=5, cNrSNP=10, genVar=genVar, h2s=1, phi=1)
#> Warning: The genetic model does not contain infinitesimal genetic effects but the total number of SNPs to simulate (tNrSNP: 5000 ) is larger than the number of genetic #> variant effects SNPs (cNrSNP: 10 ). If genotypes are not needed, consider setting #> tNrSNPs=cNrSNPs to speed up computation