| Title: | Detection of Chromosomal Aneuploidies in Ancient DNA Studies |
|---|---|
| Description: | An R implementation of ChASM (Chromosomal Aneuploidy Screening Methodology): a statistically rigorous Bayesian approach for screening data sets for autosomal and sex chromosomal aneuploidies. This package takes as input the number of (deduplicated) reads mapping to chromosomes 1-22 and the X and Y chromosomes, and models these using a Dirichlet-multinomial distribution. From this, This package returns posterior probabilities of sex chromosomal karyotypes (XX, XY, XXY, XYY, XXX and X) and full autosomal aneuploidies (trisomy 13, trisomy 18 and trisomy 21). This package also returns two diagnostic statistics: (i) a posterior probability addressing whether contamination between XX and XY may explain the observed sex chromosomal aneuploidy, and (ii) a chi-squared statistic measuring whether the observed read counts are too divergent from the underlying distribution (and may represent abnormal sequencing/quality issues). |
| Authors: | Adam B. Rohrlach [aut, cph] (ORCID: <https://orcid.org/0000-0002-4204-5018>), Jono Tuke [aut, cre] (ORCID: <https://orcid.org/0000-0002-1688-8951>), Wolfgang Haak [aut] (ORCID: <https://orcid.org/0000-0003-2475-2007>) |
| Maintainer: | Jono Tuke <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.1 |
| Built: | 2026-06-05 06:41:04 UTC |
| Source: | https://github.com/jonotuke/rchasm |
Adjusts alpha vector for contamination karyotypes
adjustA(a, c1, c2, g, correction = 1e-06)adjustA(a, c1, c2, g, correction = 1e-06)
a |
vector to change |
c1 |
X multiplier for karyotype 1 |
c2 |
X multiplier for karyotype 2 |
g |
mixing parameter |
correction |
error rate for Y mapping for non-Y karyotypes |
adjusted alpha vector
bounds number to be in [0,1] (contamination estimation)
bound(x)bound(x)
x |
number |
bounded number
Take prior and observed counts and calls karyotypes and Z-scores
callKaryotypes(indat, inDirichlet, p_contamination = 0.1)callKaryotypes(indat, inDirichlet, p_contamination = 0.1)
indat |
full input tibble |
inDirichlet |
associated Dirichlet parameters |
p_contamination |
the assumed probability of contamination |
karyotype calls
Combine both autosomal and sex chromosomal analyses into one output
combineData(calls.auto, calls.sca, z.scores, printMissingIDs = FALSE)combineData(calls.auto, calls.sca, z.scores, printMissingIDs = FALSE)
calls.auto |
a list or predefined piped string of sample IDs to look for |
calls.sca |
the autosomal karyotype calls for the samples |
z.scores |
the sex chromosomal karyotype calls for the samples |
printMissingIDs |
print the IDs of individuals not in all 3 input files (or just numbers)? |
combined data
calculate Dirichlet probabilities with filtering
ddirichletultinomial(nvector, avector, cvector, log = TRUE, correction = 1e-06)ddirichletultinomial(nvector, avector, cvector, log = TRUE, correction = 1e-06)
nvector |
observed read counts |
avector |
alpha vector to change |
cvector |
change vector for new karyotype |
log |
return value in log space |
correction |
error rate for Y mapping for non-Y karyotypes |
Dirichlet probabilities
calculate Dirichlet parameters with filtering
dirichlet.mle_filter(y, whichK)dirichlet.mle_filter(y, whichK)
y |
input values |
whichK |
which parameter to return |
parameters
Estimate contamination from XX and XY based on observed counts
estimateGamma(nvector, avector, c1, c2, correction)estimateGamma(nvector, avector, c1, c2, correction)
nvector |
observed read counts |
avector |
vector to change |
c1 |
X multiplier for karyotype 1 |
c2 |
X multiplier for karyotype 2 |
correction |
error rate for Y mapping for non-Y karyotypes |
estimated contamination
An example data set for RChASM
example_dataexample_data
A data frame with 266 rows and 26 variables:
"Ben Rohrlach"
filter for outliers
filterOutliers(v)filterOutliers(v)
v |
vector to filter |
vector for filtering
make c vector for new karyotypes
makeC(x, y, n)makeC(x, y, n)
x |
value to replace 1 |
y |
location to make replacement |
n |
length of vector |
c vector
function to make a Dirichlet prior
makeDirichlet( indat, refType, min_reads = 30000, max_reads = 1e+09, show_plot = TRUE )makeDirichlet( indat, refType, min_reads = 30000, max_reads = 1e+09, show_plot = TRUE )
indat |
full input tibble |
refType |
"auto"="autosomal"; "sca"="sex chromosomal"; "diagnostic" |
min_reads |
min number of reads per sample |
max_reads |
max number of reads per sample |
show_plot |
boolean to show plot |
Dirichlet prior
Calculates Z-scores per chromosome
makeZscores(indat, refType = "auto", min_reads = 30000, max_reads = 1e+09)makeZscores(indat, refType = "auto", min_reads = 30000, max_reads = 1e+09)
indat |
full input tibble |
refType |
"auto"="autosomal"; "sca"="sex chromosomal"; "diagnostic" |
min_reads |
min number of reads per sample |
max_reads |
max number of reads per sample |
Z-scores
Plots diagnostic plots
plot_diagnostic(IDs, inChASM, addLabels = FALSE)plot_diagnostic(IDs, inChASM, addLabels = FALSE)
IDs |
a list or predefined piped string of sample IDs to look for |
inChASM |
parameter object |
addLabels |
add "A", "B" and "C" to the plot panels? |
plots
example_calls <- runChASM(rawReadCountsIn = example_data) plot_diagnostic(IDs = 'Ind_255_1', inChASM = example_calls, addLabels = TRUE)example_calls <- runChASM(rawReadCountsIn = example_data) plot_diagnostic(IDs = 'Ind_255_1', inChASM = example_calls, addLabels = TRUE)
A function to print the result of the combined analysis to screen
printChASM(inChASM, lines = 20)printChASM(inChASM, lines = 20)
inChASM |
the result of a full ChASM analysis (output from runChASM) |
lines |
the number of lines to print to screen |
print output
example_calls <- runChASM(rawReadCountsIn = example_data) printChASM(inChASM = example_calls, lines = 10)example_calls <- runChASM(rawReadCountsIn = example_data) printChASM(inChASM = example_calls, lines = 10)
Task: takes read counts and makes useful file
processReadCounts( rawReadCountsIn, refType, minTotal = 1000, minSamplesPerProtocol = 30 )processReadCounts( rawReadCountsIn, refType, minTotal = 1000, minSamplesPerProtocol = 30 )
rawReadCountsIn |
the reads counts via KP |
refType |
for capture, use only on-target (F)? |
minTotal |
minimum number of protocol counts to be included |
minSamplesPerProtocol |
"auto"="autosomal"; "sca"="sex chromosomal" |
outreadcounts
Title
runChASM( rawReadCountsIn, minSamplesPerProtocol = 30, min_reads = 60000, max_reads = 1e+09, p_contamination = 0.01, show_plot = TRUE, printMissingIDs = FALSE )runChASM( rawReadCountsIn, minSamplesPerProtocol = 30, min_reads = 60000, max_reads = 1e+09, p_contamination = 0.01, show_plot = TRUE, printMissingIDs = FALSE )
rawReadCountsIn |
the reads counts for each chromosome |
minSamplesPerProtocol |
minimum number of reads per protocol for parameter estimation |
min_reads |
the minimum number of reads for Dirichlet parameter estimation |
max_reads |
the maximum number of reads for Dirichlet parameter estimation |
p_contamination |
the probability of a sample yielding significant contamination |
show_plot |
show the clustering plot for sex chromosomal aneuploidy Dirichlet estimation? |
printMissingIDs |
when combining karyotype calls, return names that are missing (or just the number of missing IDs)? |
summary
runChASM(rawReadCountsIn = example_data)runChASM(rawReadCountsIn = example_data)
A function for saving the results of a ChASM analysis as a tsv
saveChASM(inChASM, file, sort_by_samplename = FALSE)saveChASM(inChASM, file, sort_by_samplename = FALSE)
inChASM |
the result of a full ChASM analysis (output from runChASM) |
file |
the full path and file name to place output |
sort_by_samplename |
reorder alphabetically by sample names? |
saves results
example_calls <- runChASM(rawReadCountsIn = example_data) ## Not run: saveChASM(inChASM = example_calls) ## End(Not run)example_calls <- runChASM(rawReadCountsIn = example_data) ## Not run: saveChASM(inChASM = example_calls) ## End(Not run)
Combine both autosomal and sex chromosomal analyses into one output
summary_calls( inChASM, minTotal = 60000, minPosterior = 0.95, ignoreUnusual = FALSE, printProtocol = FALSE )summary_calls( inChASM, minTotal = 60000, minPosterior = 0.95, ignoreUnusual = FALSE, printProtocol = FALSE )
inChASM |
the result of combining both autosomal and sca analyses |
minTotal |
minimum total for autosomal or sca analyses |
minPosterior |
minimum value maxP can take (i.e. minimum posterior probability accepted) |
ignoreUnusual |
filter our unusual observations? |
printProtocol |
print protocol column? |
combined calls
example_calls <- runChASM(rawReadCountsIn = example_data) summary_calls(inChASM = example_calls, minTotal = 6e4, minPosterior = 0.95)example_calls <- runChASM(rawReadCountsIn = example_data) summary_calls(inChASM = example_calls, minTotal = 6e4, minPosterior = 0.95)