Package 'gsalib'

Title: Utility Functions for 'GATK'
Description: Provides utility functions used by the Genome Analysis Toolkit ('GATK') to load tables and plot data. The 'GATK' is a toolkit for variant discovery in high-throughput sequencing data.
Authors: Kiran Garimella
Maintainer: Louis Bergelson <[email protected]>
License: MIT + file LICENSE
Version: 2.2.1
Built: 2024-10-10 04:59:05 UTC
Source: https://github.com/broadinstitute/gsalib

Help Index


Utility functions for GATK

Description

Utility functions for analysis of genome sequence data with the GATK

Details

This package is primarily meant to be used programmatically by GATK tools. However the gsa.read.gatkreport() function can be used to easily read in data from a GATKReport. A GATKReport is a multi-table document generated by GATK tools.

Author(s)

Kiran Garimella

Maintainer: Louis Bergelson <[email protected]>

References

https://gatk.broadinstitute.org/hc/en-us/articles/360035532172-GATKReport-and-gsalib

Examples

test_file = system.file("inst", "extdata", "test_gatkreport.table", package = "gsalib");
report = gsa.read.gatkreport(test_file);

Function to read in a GATKReport

Description

This function reads in data from a GATKReport. A GATKReport is a document containing multiple tables produced by the GATK. Each table is loaded as a separate data.frame object in a list.

Usage

gsa.read.gatkreport(filename)

Arguments

filename

The path to the GATKReport file.

Details

The GATKReport format replaces the multi-file output format used previously by many GATK tools and provides a single, consolidated file format. This format accommodates multiple tables and is still R-loadable through this function.

Value

Returns a LIST object, where each key is the TableName and the value is the data.frame object with the contents of the table. If multiple tables with the same name exist, each one after the first will be given names of TableName.v1, TableName.v2, ..., TableName.vN.

Note

This function accepts different versions of the GATKReport format by making internal calls to gsa.read.gatkreportv0() or gsa.read.gatkreportv1() as appropriate.

Author(s)

Kiran Garimella

References

https://gatk.broadinstitute.org/hc/en-us/articles/360035532172-GATKReport-and-gsalib

Examples

test_file = system.file("extdata", "test_gatkreport.table", package = "gsalib");
report = gsa.read.gatkreport(test_file);

Function to read in an old-style GATKReport

Description

This function reads in data from a version 0.x GATKReport. It should not be called directly; instead, use gsa.read.gatkreport()

Usage

gsa.read.gatkreportv0(lines)

Arguments

lines

The lines read in from the input file.

Value

Returns a LIST object, where each key is the TableName and the value is the data.frame object with the contents of the table. If multiple tables with the same name exist, each one after the first will be given names of TableName.v1, TableName.v2, ..., TableName.vN.

Author(s)

Kiran Garimella

References

https://gatk.broadinstitute.org/hc/en-us/articles/360035532172-GATKReport-and-gsalib


Function to read in a new-style GATKReport

Description

This function reads in data from a version 1.x GATKReport. It should not be called directly; instead, use gsa.read.gatkreport()

Usage

gsa.read.gatkreportv1(lines)

Arguments

lines

The lines read in from the input file.

Value

Returns a LIST object, where each key is the TableName and the value is the data.frame object with the contents of the table. If multiple tables with the same name exist, each one after the first will be given names of TableName.v1, TableName.v2, ..., TableName.vN.

Author(s)

Kiran Garimella

References

https://gatk.broadinstitute.org/hc/en-us/articles/360035532172-GATKReport-and-gsalib


Reshape a Concordance Table

Description

Given a GATKReport generated by GenotypeConcordance (as output by gsa.read.gatkreport), this function reshapes the concordance for a specified sample into a matrix with the EvalGenotypes in rows and the CompGenotypes in columns (see the documentation for GenotypeConcordance for the definition of Eval and Comp)

Usage

gsa.reshape.concordance.table(
  report, 
  table.name="GenotypeConcordance_Counts", 
  sample.name="ALL")

Arguments

report

A GATKReport as output by gsa.read.gatkreport. If table.name is NULL, report is assumed to be the vector of concordance values to reshape.

table.name

The table name in the GATKReport to reshape. Defaults to "GenotypeConcordance_Counts", but could also be one of the proportion tables ("GenotypeConcordance_EvalProportions", "GenotypeConcordance_CompProportions"). This value can also be NULL, in which case report is reshaped directly.

sample.name

The sample name within table.name to use.

Value

Returns a two-dimensional matrix with Eval genotypes in the rows and Comp genotypes in the columns. The genotypes themselves (HOM_REF, NO_CALL, etc) are specified in the row/col names of the matrix.

Author(s)

Phillip Dexheimer

See Also

gsa.read.gatkreport

Examples

test_file = system.file("extdata", "test_genconcord.table", package = "gsalib")
report = gsa.read.gatkreport(test_file)
gsa.reshape.concordance.table(report)

## Output looks like:
##              CompGenotypes
##EvalGenotypes NO_CALL HOM_REF HET HOM_VAR UNAVAILABLE MIXED
##  NO_CALL     0       0       0       0       0           0    
##  HOM_REF     0       0       0       0       0           0    
##  HET         0       0       13463   90      3901        0    
##  HOM_VAR     0       0       2935    18144   4448        0    
##  UNAVAILABLE 0       0       2053693 1326112 11290       0    
##  MIXED       0       0       0       0       0           0