Reproducing data in squidpy package
Source:vignettes/B_reproduceSquidpyData.Rmd
B_reproduceSquidpyData.Rmd
Introduction
This vignette will demonstrate to the user how to reproduce the MERFISH data that was used in the squidpy manuscript. The authors of squidpy documented their work via a Jupyter notebook that we then translated into R code. The steps below use purely R functions / objects to obtain the MERFISH data object that is present in the squidpy package.
Data preperation
First, users will need to download the raw data which we found on Dryad. We download the data and then do some data manipulation. The steps are outlined below.
Download data
library(rdryad)
dat_path <- dryad_download("10.5061/dryad.8t8s248")
counts <- read.csv(dat_path[[1]])
Subset data
We subset the full data set into metadata and counts, then retrieve the genes.
metadata <- counts[,1:9]
counts_g <- counts[,10:170]
genes <- colnames(counts_g)
Creating the csr matrix
First we replace all NA with 0 then create a csr matrix, dropping all 0 elements.
Creating a SingleCellExperiment object
In the Jupyter notebook the authors choose to create an AnnData object. We at Bioconductor prefer to represent the data as a SummarizedExperiment or SingleCellExperiment object. The primary difference being the composition of the data. We prefer to have the data represented with the genes as rows and the samples as columns. Then we add the spatial data as a reduced dimension.
library(SingleCellExperiment)
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
##
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
##
## colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
## colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
## colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
## colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
## colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
## colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
## colWeightedMeans, colWeightedMedians, colWeightedSds,
## colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
## rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
## rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
## rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
## rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
## rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
## rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
## rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, aperm, append, as.data.frame, basename, cbind,
## colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
## get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
## match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
## Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
## table, tapply, union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:Matrix':
##
## expand, unname
## The following object is masked from 'package:utils':
##
## findMatches
## The following objects are masked from 'package:base':
##
## expand.grid, I, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
##
## rowMedians
## The following objects are masked from 'package:matrixStats':
##
## anyMissing, rowMedians
sce <- SingleCellExperiment(
assays = list(counts = t(counts_g_sparse)),
colData = metadata
)
spatial <- colData(sce)[,c("Centroid_X", "Centroid_Y")]
reducedDim(sce, "spatial") <- spatial
Notebook continued
From here we try to follow the Jupyter notebook exactly using our sce object instead of their anndata object. Most is just data transformation. The end result should be the same as what is included in the squpidpy package.
sce$Bregma = sce$Bregma * 100
unique(sce$Animal_ID)
## [1] 1 2 3 4 5 6 7 8 9 10 11 16 17 18 19 12 13 14 15 20 21 22 23 24 34
## [26] 35 36 31 32 33 25 26 27 28 29 30
## [1] -29 -24 -19 -14 -9 -4 1 6 11 16 21 26
sce_list <- sce_filt[,FALSE]
spatial_list <- DataFrame("Centroid_X" = NA, "Centroid_Y" = NA)[numeric(0),]
for (i in sort(unique(sce_filt$Bregma))) {
s_filt <- sce_filt[,sce_filt$Bregma == i]
sce_list <- cbind(sce_list, s_filt)
spatial_filt <- reducedDim(s_filt)
spatial_filt <- DataFrame(apply(spatial_filt, 2, function(x) (x - min(x))/(max(x)-min(x))))
spatial_list <- rbind(spatial_list, spatial_filt)
}
sce_total <- sce_list
spatial_total <- spatial_list
spatial3d <- cbind(spatial_total, sce_total$Bregma)
reducedDim(sce_total, "spatial3d") <- spatial3d
colData(sce)
## DataFrame with 1027848 rows and 9 columns
## Cell_ID Animal_ID Animal_sex Behavior Bregma
## <character> <integer> <character> <character> <numeric>
## 1 6749ccb4-2ed1-4029-9.. 1 Female Naive 26
## 2 6cac74bd-4ea7-4701-8.. 1 Female Naive 26
## 3 9f29bd57-16a5-4b26-b.. 1 Female Naive 26
## 4 d7eb4e0b-276e-47e3-a.. 1 Female Naive 26
## 5 54434f3a-eba9-4aec-a.. 1 Female Naive 26
## ... ... ... ... ... ...
## 1027844 eaaf93ba-75b1-40cd-a.. 30 Male Mating 11
## 1027845 bb9caf59-d960-452e-a.. 30 Male Mating 11
## 1027846 2f45d61d-3a80-470b-8.. 30 Male Mating 11
## 1027847 180ae0ff-9817-48b9-8.. 30 Male Mating 11
## 1027848 83463d3a-29c5-40c3-b.. 30 Male Mating 11
## Centroid_X Centroid_Y Cell_class Neuron_cluster_ID
## <numeric> <numeric> <character> <character>
## 1 -3211.56 2608.54 Astrocyte
## 2 -3207.92 2621.80 Inhibitory I-5
## 3 -3209.58 2633.15 Inhibitory I-6
## 4 -3203.85 2756.05 Inhibitory I-5
## 5 -3202.68 2608.80 Inhibitory I-9
## ... ... ... ... ...
## 1027844 2732.44 -2322.58 Ambiguous
## 1027845 2732.67 -2120.23 Ambiguous
## 1027846 2826.25 -2308.95 Ambiguous
## 1027847 2900.96 -2174.60 Inhibitory I-2
## 1027848 2817.02 -1716.24 Ambiguous