Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 49 additions & 50 deletions R/createArbalistMAE.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,52 +2,54 @@
#'
#' Import scATAC-seq or multiome results into a MultiAssayExperiment.
#'
#' @param sample.names String vector containing the sample names. Must be the
#' same length as fragment.files & filtered.feature.matrix.files.
#' @param fragment.files String vector containing the fragment file names and
#' @param sample.names Character vector containing the sample names. Must be the
#' same length as fragment.files & feature.matrix.files.
#' @param fragment.files Character vector containing the fragment file names and
#' paths. ex. 10x Cell Ranger result output atac_fragments.tsv.gz or
#' fragments.tsv.gz
#' @param filtered.feature.matrix.files String vector containing the filtered
#' @param feature.matrix.files Character vector containing the raw or filtered
#' feature matrix file names and paths. ex. 10x Cell Ranger result output
#' filtered_feature_bc_matrix.h5 or filtered_tf_bc_matrix.h5
#' @param barcode.annotation.files String vector containing barcode annotation
#' to put in the SingleCellExperiment colData. ex. 10x Cell Ranger result
#' output per_barcode_metrics.csv or singlecell.csv
#' @param sample.annotation.files String vector containing sample annotation to
#' put in the MultiAssayExperiment colData. ex. 10x Cell Ranger result output
#' summary.csv
#' @param multiome Logical whether to use createMultiomeRNASCE on the
#' filtered.feature.matrix.files to extract the RNA features and create a
#' SingleCellExperiment. If NULL, then will become TRUE if
#' filtered.feature.matrix.files contain "filtered_feature_bc_matrix.h5"
#' otherwise FALSE.
#' @param min.frags Number specifying the minimum number of mapped ATAC-seq
#' fragments required per cell to pass filtering for use in downstream
#' analyses. Cells containing greater than or equal to min.frags total
#' fragments will be retained.
#' @param max.frags Number specifying the maximum number of mapped ATAC-seq
#' fragments required per cell to pass filtering for use in downstream
#' analyses. Cells containing less than or equal to max.frags total fragments
#' will be retained.
#' @param gene.grs Genomic Ranges specifying gene coordinates for creating the
#' gene score matrix. If NULL, then the geneset will be selected based on the
#' genome version.
#' @param use.alt.exp Logical for selecting the MultiAssayExperiment structure.
#' TRUE means that there will only be one experiment in the
#' MultiAssayExperiment and all other experiments will be in alternative
#' experiments. This option is only available if the columns are the same for
#' all Matrices. FALSE means that each Matrix will be a separate experiment in
#' the MAE.
#' @param barcode.annotation.files Character vector containing barcode
#' annotation to include in the \linkS4class{SingleCellExperiment}'s colData.
#' ex. 10x Cell Ranger result output per_barcode_metrics.csv or singlecell.csv
#' @param sample.annotation.files Character vector containing sample annotation
#' to include in the \linkS4class{MultiAssayExperiment}'s colData. ex. 10x
#' Cell Ranger result output summary.csv
#' @param multiome Logical scalar whether to extract the RNA features using
#' \code{createMultiomeRNASCE} on the feature.matrix.files and create a
#' \linkS4class{SingleCellExperiment}. If \code{NULL}, this is inferred based
#' on the feature.matrix.files - \code{TRUE} if feature.matrix.files contain
#' "filtered_feature_bc_matrix.h5" otherwise \code{FALSE}.
#' @param min.frags Integer scalar specifying the minimum number of mapped
#' ATAC-seq fragments required per cell to pass filtering for use in
#' downstream analyses. Only cells containing total fragments greater than or
#' equal to \code{min.frags} will be retained.
#' @param max.frags Integer scalar specifying the maximum number of mapped
#' ATAC-seq fragments required per cell to pass filtering for use in
#' downstream analyses. Only cells containing total fragments less than or
#' equal to \code{max.frags} will be retained.
#' @param gene.grs \linkS4class{GRanges} specifying gene coordinates for
#' creating the gene score matrix. If \code{NULL}, then the gene set will be
#' selected based on the genome version inferred from the fragment files.
#' @param use.alt.exp Logical scalar for selecting the
#' \linkS4class{MultiAssayExperiment} structure. \code{TRUE} means there will
#' only be one experiment (a \linkS4class{SingleCellExperiment}) in the
#' \linkS4class{MultiAssayExperiment} and all other experiments will be in
#' alternative experiments. This option should only be used if the columns are
#' the same across all matrices. \code{FALSE} means each matrix will be a
#' separate experiment in the MAE.
#' @param main.exp.name String containing the name of the experiment that will
#' be the main experiment when use.alt.exp is TRUE.
#' @param filter.rna.features.without.intervals Logical whether to remove
#' GeneExpression Matrix features from the h5.files that do not have interval
#' specified. Often these are mitochondria genes.
#' be used as the main experiment in the \linkS4class{SingleCellExperiment}
#' when \code{use.alt.exp} is \code{TRUE}.
#' @param filter.rna.features.without.intervals Logical scalar whether to remove
#' 'GeneExpression' matrix features from the \code{feature.matrix.files} that
#' do not have interval specified. Often these are mitochondrial genes.
#' @inheritParams createMultiomeRNASCE
#' @inheritParams getExpListFromFragments
#'
#' @return A \linkS4class{MultiAssayExperiment}
#'
#' @return A \linkS4class{MultiAssayExperiment}.
#'
#' @author Natalie Fox
#' @importFrom MultiAssayExperiment colData colData<-
#' @importFrom SingleCellExperiment altExp<-
Expand Down Expand Up @@ -90,7 +92,7 @@
#' mae <- createArbalistMAE(
#' sample.names = "Sample1",
#' fragment.files = f,
#' filtered.feature.matrix.files = h5,
#' feature.matrix.files = h5,
#' seq.lengths = seq_lengths,
#' output.dir = example_dir,
#' BPPARAM = BiocParallel::SerialParam()
Expand All @@ -99,7 +101,7 @@
#' @export
createArbalistMAE <- function(sample.names,
fragment.files,
filtered.feature.matrix.files,
feature.matrix.files,
barcode.annotation.files = NULL,
sample.annotation.files = NULL,
output.dir = tempdir(),
Expand All @@ -115,8 +117,8 @@ createArbalistMAE <- function(sample.names,
BPPARAM = bpparam()) {
if (length(fragment.files) != length(sample.names)) {
stop('fragment.files and sample names need to be the same length.')
} else if (length(filtered.feature.matrix.files) != length(sample.names)) {
stop('filtered.feature.matrix.files and sample names need to be the same length.')
} else if (length(feature.matrix.files) != length(sample.names)) {
stop('feature.matrix.files and sample names need to be the same length.')
} else if (!is.null(barcode.annotation.files) &&
length(barcode.annotation.files) != length(sample.names)) {
stop('barcode.annotation.files and sample names need to be the same length.')
Expand All @@ -128,9 +130,9 @@ createArbalistMAE <- function(sample.names,
barcodes.list <- list()
barcode.anno.list <- list()
barcode.anno <- NULL
for (i in seq_along(filtered.feature.matrix.files)) {
filtered.file <- filtered.feature.matrix.files[i]
h5_barcodes <- h5read(filtered.feature.matrix.files[i], 'matrix/barcodes')
for (i in seq_along(feature.matrix.files)) {
filtered.file <- feature.matrix.files[i]
h5_barcodes <- h5read(feature.matrix.files[i], 'matrix/barcodes')
if (sample.names[i] %in% names(barcodes.list)) {
barcodes.list[[paste0(sample.names[i], '_', length(grep(
sample.names[i], names(barcodes.list)
Expand Down Expand Up @@ -171,10 +173,7 @@ createArbalistMAE <- function(sample.names,

# Add a SingleCellExperiment for RNA results if this is a multiome result
if (is.null(multiome)) {
if (any(grepl(
"filtered_feature_bc_matrix.h5",
filtered.feature.matrix.files
))) {
if (any(grepl("filtered_feature_bc_matrix.h5", feature.matrix.files))) {
multiome <- TRUE
} else {
multiome <- FALSE
Expand All @@ -183,7 +182,7 @@ createArbalistMAE <- function(sample.names,

if (multiome) {
all.exp[['GeneExpressionMatrix']] <- createMultiomeRNASCE(
h5.files = filtered.feature.matrix.files,
h5.files = feature.matrix.files,
sample.names = sample.names,
filter.features.without.intervals = filter.rna.features.without.intervals
)
Expand Down
6 changes: 3 additions & 3 deletions R/createArbalistMAEFromCellrangerDirs.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
#' Import results from scATAC-seq or multiome Cell Ranger results directories
#' into a MultiAssayExperiment.
#'
#' @param cellranger.res.dirs Vector of strings specifying a Cell Ranger
#' @param cellranger.res.dirs Named character vector specifying a Cell Ranger
#' scATAC-seq or multiome results directory. Vector names need to be sample
#' names.
#' @inheritParams createArbalistMAE
#'
#' @return A \linkS4class{MultiAssayExperiment}
#' @return A \linkS4class{MultiAssayExperiment}.
#'
#' @author Natalie Fox
#' @export
Expand Down Expand Up @@ -70,7 +70,7 @@ createArbalistMAEFromCellrangerDirs <- function(cellranger.res.dirs,
cellranger.res.dirs,
c('atac_fragments.tsv.gz', 'fragments.tsv.gz')
),
filtered.feature.matrix.files = .getFilesFromResDirs(
feature.matrix.files = .getFilesFromResDirs(
cellranger.res.dirs,
c(
'filtered_feature_bc_matrix.h5',
Expand Down
20 changes: 11 additions & 9 deletions R/createMultiomeRNASCE.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,19 @@
#' Import RNA results from multiome Cell Ranger results into a
#' SingleCellExperiment.
#'
#' @param h5.files Vector of strings specifying filtered_feature_bc_matrix.h5
#' path. ex. could just be filtered_feature_bc_matrix.h5. Vector must be the
#' same length as sample.names.
#' @param sample.names Vector of strings specifying sample names. Vector must be
#' the same length as h5.files.
#' @param h5.files Character vector specifying path to a
#' filtered_feature_bc_matrix.h5 path. ex. could just be
#' filtered_feature_bc_matrix.h5. Vector must be the same length as
#' \code{sample.names}.
#' @param sample.names Character vector specifying sample names. Vector must be the
#' same length as \code{h5.files}.
#' @param feature.type String specifying the feature type to select from
#' filtered_feature_bc_matrix.h5.
#' @param filter.features.without.intervals Logical whether to remove features
#' from the h5.files that do not have interval specified. Often these are
#' mitochondria genes.
#' @param filter.features.without.intervals Logical scalar whether to remove
#' features from the \code{h5.files} that do not have interval specified.
#' Often these are mitochondrial genes.
#'
#' @return A \linkS4class{SingleCellExperiment}
#' @return A \linkS4class{SingleCellExperiment}.
#'
#' @author Natalie Fox
#' @importFrom BiocGenerics which
Expand All @@ -26,6 +27,7 @@
#' @importFrom SummarizedExperiment SummarizedExperiment cbind rowData
#' rowRanges<- rowData<-
#' @importFrom utils object.size read.csv
#'
#' @export
createMultiomeRNASCE <- function(h5.files,
sample.names = NULL,
Expand Down
9 changes: 6 additions & 3 deletions R/extractGenomeRefFilenameFromFragmentFile.R
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
#' Retrieves genome reference file name
#'
#' Helper function to extract genome reference file name from fragment file header.
#' Helper function to extract genome reference file name from fragment file
#' header.
#'
#' @param fragment.file String specifying fragment file name
#' @return Named string vector
#' @param fragment.file String specifying fragment file name.
#' @return Character vector containing the file path to the genome reference
#' file.
#' @author Natalie Fox
#'
#' @examples
#' # Mock a fragment file
#' f <- tempfile(fileext=".tsv.gz")
Expand Down
50 changes: 33 additions & 17 deletions R/filterDuplicateFeatures.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,17 @@
#' criteria, such as the feature with the highest values. Other duplicate
#' features are removed.
#'
#' @param se \linkS4class{SummarizedExperiment}
#' @param mcol.name String specifying the colname for the experiment rowData
#' @param se \linkS4class{SummarizedExperiment}.
#' @param mcol.name String specifying the column name containing
#' feature id's in the experiment's rowData.
#' @param summary.stat Function to summarize each feature (row) of the
#' experiment
#' experiment.
#' @param selection.metric Function to select the row to keep when there are
#' duplicate rows with the same mcol.name
#' duplicate rows with the same \code{mcol.name}.
#'
#' @importFrom SummarizedExperiment mcols assay
#' @return A \linkS4class{SummarizedExperiment} with duplicate features resolved.
#' @return A \linkS4class{SummarizedExperiment} with duplicate features
#' resolved.
#' @examples
#' library(SummarizedExperiment)
#'
Expand All @@ -26,22 +28,36 @@
#'
#' @export
filterDuplicateFeatures <- function(se,
mcol.name = 'name',
mcol.name = "name",
summary.stat = sum,
selection.metric = max) {
duplicate.values <- names(which(table(mcols(se)[, mcol.name]) > 1))
if (length(duplicate.values) == 0) {
ids <- mcols(se)[[mcol.name]]
dup_flag <- duplicated(ids) | duplicated(ids, fromLast = TRUE)

# if no duplicates, return
if (!any(dup_flag)) {
return(se)
}
non.duplicate.rows <- which(!mcols(se)[, mcol.name] %in% duplicate.values)
duplicate.rows <- which(mcols(se)[, mcol.name] %in% duplicate.values)
selected.duplicate.rows <- sapply(duplicate.values, function(i) {
duplicate.rows <- which(mcols(se)[, mcol.name] %in% i)
row.summary.stats <- apply(assay(se)[duplicate.rows, ], 1, summary.stat)
return(duplicate.rows[which(row.summary.stats == selection.metric(row.summary.stats))[1]])
})

se <- se[sort(c(non.duplicate.rows, selected.duplicate.rows)), ]
# precompute summary statistic for all rows
row_summary <- apply(assay(se), 1, summary.stat)

dup_indices <- which(dup_flag)
dup_ids <- ids[dup_flag]

# Split duplicate indices by ID, pick one row per group
idx_list <- split(dup_indices, dup_ids)

selected_dup_rows <- vapply(
idx_list,
FUN.VALUE = integer(1L),
FUN = function(ix) {
vals <- row_summary[ix]
ix[which(vals == selection.metric(vals))[1]]
}
)

return(se)
non_dup_rows <- which(!dup_flag)
keep <- sort(c(non_dup_rows, selected_dup_rows))
se[keep, ]
}
37 changes: 19 additions & 18 deletions R/getExpListFromFragments.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,21 @@
#'
#' Create a list of single cell experiments from fragment files
#'
#' @param fragment.files Vector of strings specifying fragment files. Vector
#' names need to be sample names.
#' @param output.dir String containing the directory where files should be
#' output while creating the \linkS4class{MultiAssayExperiment}.
#' @param gene.grs Genomic Ranges specifying gene coordinates for creating the
#' gene score matrix. If NA, the gene accessibility matrix will not be
#' created.
#' @param barcodes.list A List with samples as names and the values a vector of
#' barcodes for that sample. If NULL, all barcodes from the fragment file will
#' be used.
#' @param fragment.files Named character vector specifying fragment files.
#' Vector names need to be sample names.
#' @param output.dir String specifying the output directory where
#' \linkS4class{MultiAssayExperiment} will be created.
#' @param gene.grs \linkS4class{GRanges} specifying gene coordinates for
#' creating the gene score matrix. If \code{NULL}, the gene accessibility
#' matrix will not be created.
#' @param barcodes.list Named character vector with samples names as names and
#' the values - a vector of barcodes to keep for that sample. If \code{NULL},
#' all barcodes from the fragment file will be used.
#' @param BPPARAM A \linkS4class{BiocParallelParam} object indicating how matrix
#' creation should be parallelized.
#' @inheritParams saveTileMatrix
#'
#' @return A list of experiments
#' @return A list of experiments.
#'
#' @author Natalie Fox
#' @importFrom SingleCellExperiment mainExpName<-
Expand Down Expand Up @@ -160,13 +160,14 @@ getExpListFromFragments <- function(fragment.files,
#' Create a SingleCellExperiment from a list of delayed matrices using
#' AmalgamatedArray.
#'
#' @param h5.res.list A list containing delayed matrices with HDF5 backends that
#' will be combined using AmalgamatedArray into a SingleCellExperiment. List
#' item names should be the sample name for the delayed matrix.
#' @param grs GRange object to be used for the rowRanges of the resulting
#' SingleCellExperiment
#' @param h5.res.list List containing delayed matrices with HDF5 backends that
#' will be combined using \linkS4class{AmalgamatedArray} into a
#' \linkS4class{SingleCellExperiment}. List item names should be the sample
#' name for the delayed matrix.
#' @param grs \linkS4class{GRanges} object to be used for the rowRanges of the
#' resulting \linkS4class{SingleCellExperiment}.
#'
#' @return A SingleCellExperiment
#' @return A SingleCellExperiment.
#'
#' @author Natalie Fox
#' @importFrom alabaster.matrix AmalgamatedArray
Expand All @@ -183,7 +184,7 @@ getExpListFromFragments <- function(fragment.files,
#' grs <- GenomicRanges::GRanges("chr1", IRanges::IRanges(1:5, width=1))
#'
#' sce <- getSCEFromH5List(h5.list, grs)
#'
#'
getSCEFromH5List <- function(h5.res.list, grs) {
# Combined the per sample results into one matrix
if (length(h5.res.list) == 1) {
Expand Down
2 changes: 1 addition & 1 deletion R/mockFragmentFile.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#' @param seq.lengths Named integer vector containing the lengths of the
#' reference sequences used for alignment. Vector names should correspond to
#' the names of the sequences.
#' @param num.fragments Integer scalar, the average number of fragments per
#' @param num.fragments Integer scalar specifying the average number of fragments per
#' cell.
#' @param cell.names Character vector containing the cell names. The length of
#' this vector is used as the total number of cells.
Expand Down
Loading