Title: | Utility Functions for 'GATK' |
---|---|
Description: | Provides utility functions used by the Genome Analysis Toolkit ('GATK') to load tables and plot data. The 'GATK' is a toolkit for variant discovery in high-throughput sequencing data. |
Authors: | Kiran Garimella |
Maintainer: | Louis Bergelson <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.2.1 |
Built: | 2024-10-10 04:59:05 UTC |
Source: | https://github.com/broadinstitute/gsalib |
Utility functions for analysis of genome sequence data with the GATK
This package is primarily meant to be used programmatically by GATK tools. However the gsa.read.gatkreport() function can be used to easily read in data from a GATKReport. A GATKReport is a multi-table document generated by GATK tools.
Kiran Garimella
Maintainer: Louis Bergelson <[email protected]>
https://gatk.broadinstitute.org/hc/en-us/articles/360035532172-GATKReport-and-gsalib
test_file = system.file("inst", "extdata", "test_gatkreport.table", package = "gsalib"); report = gsa.read.gatkreport(test_file);
test_file = system.file("inst", "extdata", "test_gatkreport.table", package = "gsalib"); report = gsa.read.gatkreport(test_file);
This function reads in data from a GATKReport. A GATKReport is a document containing multiple tables produced by the GATK. Each table is loaded as a separate data.frame object in a list.
gsa.read.gatkreport(filename)
gsa.read.gatkreport(filename)
filename |
The path to the GATKReport file. |
The GATKReport format replaces the multi-file output format used previously by many GATK tools and provides a single, consolidated file format. This format accommodates multiple tables and is still R-loadable through this function.
Returns a LIST object, where each key is the TableName and the value is the data.frame object with the contents of the table. If multiple tables with the same name exist, each one after the first will be given names of TableName.v1, TableName.v2, ..., TableName.vN.
This function accepts different versions of the GATKReport format by making internal calls to gsa.read.gatkreportv0() or gsa.read.gatkreportv1() as appropriate.
Kiran Garimella
https://gatk.broadinstitute.org/hc/en-us/articles/360035532172-GATKReport-and-gsalib
test_file = system.file("extdata", "test_gatkreport.table", package = "gsalib"); report = gsa.read.gatkreport(test_file);
test_file = system.file("extdata", "test_gatkreport.table", package = "gsalib"); report = gsa.read.gatkreport(test_file);
This function reads in data from a version 0.x GATKReport. It should not be called directly; instead, use gsa.read.gatkreport()
gsa.read.gatkreportv0(lines)
gsa.read.gatkreportv0(lines)
lines |
The lines read in from the input file. |
Returns a LIST object, where each key is the TableName and the value is the data.frame object with the contents of the table. If multiple tables with the same name exist, each one after the first will be given names of TableName.v1, TableName.v2, ..., TableName.vN.
Kiran Garimella
https://gatk.broadinstitute.org/hc/en-us/articles/360035532172-GATKReport-and-gsalib
This function reads in data from a version 1.x GATKReport. It should not be called directly; instead, use gsa.read.gatkreport()
gsa.read.gatkreportv1(lines)
gsa.read.gatkreportv1(lines)
lines |
The lines read in from the input file. |
Returns a LIST object, where each key is the TableName and the value is the data.frame object with the contents of the table. If multiple tables with the same name exist, each one after the first will be given names of TableName.v1, TableName.v2, ..., TableName.vN.
Kiran Garimella
https://gatk.broadinstitute.org/hc/en-us/articles/360035532172-GATKReport-and-gsalib
Given a GATKReport generated by GenotypeConcordance (as output by gsa.read.gatkreport
), this function reshapes the concordance for a specified sample into a matrix with the EvalGenotypes in rows and the CompGenotypes in columns (see the documentation for GenotypeConcordance for the definition of Eval and Comp)
gsa.reshape.concordance.table( report, table.name="GenotypeConcordance_Counts", sample.name="ALL")
gsa.reshape.concordance.table( report, table.name="GenotypeConcordance_Counts", sample.name="ALL")
report |
A GATKReport as output by |
table.name |
The table name in the GATKReport to reshape. Defaults to "GenotypeConcordance_Counts", but could also be one of the proportion tables ("GenotypeConcordance_EvalProportions", "GenotypeConcordance_CompProportions"). This value can also be |
sample.name |
The sample name within |
Returns a two-dimensional matrix with Eval genotypes in the rows and Comp genotypes in the columns. The genotypes themselves (HOM_REF
, NO_CALL
, etc) are specified in the row/col names of the matrix.
Phillip Dexheimer
test_file = system.file("extdata", "test_genconcord.table", package = "gsalib") report = gsa.read.gatkreport(test_file) gsa.reshape.concordance.table(report) ## Output looks like: ## CompGenotypes ##EvalGenotypes NO_CALL HOM_REF HET HOM_VAR UNAVAILABLE MIXED ## NO_CALL 0 0 0 0 0 0 ## HOM_REF 0 0 0 0 0 0 ## HET 0 0 13463 90 3901 0 ## HOM_VAR 0 0 2935 18144 4448 0 ## UNAVAILABLE 0 0 2053693 1326112 11290 0 ## MIXED 0 0 0 0 0 0
test_file = system.file("extdata", "test_genconcord.table", package = "gsalib") report = gsa.read.gatkreport(test_file) gsa.reshape.concordance.table(report) ## Output looks like: ## CompGenotypes ##EvalGenotypes NO_CALL HOM_REF HET HOM_VAR UNAVAILABLE MIXED ## NO_CALL 0 0 0 0 0 0 ## HOM_REF 0 0 0 0 0 0 ## HET 0 0 13463 90 3901 0 ## HOM_VAR 0 0 2935 18144 4448 0 ## UNAVAILABLE 0 0 2053693 1326112 11290 0 ## MIXED 0 0 0 0 0 0