Reproducing data in squidpy package • SquidpyR

Introduction

This vignette will demonstrate to the user how to reproduce the MERFISH data that was used in the squidpy manuscript. The authors of squidpy documented their work via a Jupyter notebook that we then translated into R code. The steps below use purely R functions / objects to obtain the MERFISH data object that is present in the squidpy package.

Data preperation

First, users will need to download the raw data which we found on Dryad. We download the data and then do some data manipulation. The steps are outlined below.

Download data

library(rdryad)
dat_path <- dryad_download("10.5061/dryad.8t8s248")
counts <- read.csv(dat_path[[1]])

Subset data

We subset the full data set into metadata and counts, then retrieve the genes.

metadata <- counts[,1:9]
counts_g <- counts[,10:170]

genes <- colnames(counts_g)

Creating the csr matrix

First we replace all NA with 0 then create a csr matrix, dropping all 0 elements.

counts_g <- replace(counts_g, is.na(counts_g), 0)

library(Matrix)
mat <- data.matrix(counts_g)
counts_g_sparse <- as(drop0(mat), "RsparseMatrix")

Creating a SingleCellExperiment object

In the Jupyter notebook the authors choose to create an AnnData object. We at Bioconductor prefer to represent the data as a SummarizedExperiment or SingleCellExperiment object. The primary difference being the composition of the data. We prefer to have the data represented with the genes as rows and the samples as columns. Then we add the spatial data as a reduced dimension.

library(SingleCellExperiment)

## Loading required package: SummarizedExperiment

## Loading required package: MatrixGenerics

## Loading required package: matrixStats

## 
## Attaching package: 'MatrixGenerics'

## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars

## Loading required package: GenomicRanges

## Loading required package: stats4

## Loading required package: BiocGenerics

## 
## Attaching package: 'BiocGenerics'

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
##     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
##     table, tapply, union, unique, unsplit, which.max, which.min

## Loading required package: S4Vectors

## 
## Attaching package: 'S4Vectors'

## The following objects are masked from 'package:Matrix':
## 
##     expand, unname

## The following object is masked from 'package:utils':
## 
##     findMatches

## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname

## Loading required package: IRanges

## Loading required package: GenomeInfoDb

## Loading required package: Biobase

## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.

## 
## Attaching package: 'Biobase'

## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians

## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians

sce <- SingleCellExperiment(
    assays = list(counts = t(counts_g_sparse)),
    colData = metadata
)

spatial <- colData(sce)[,c("Centroid_X", "Centroid_Y")]
reducedDim(sce, "spatial") <- spatial

Notebook continued

From here we try to follow the Jupyter notebook exactly using our sce object instead of their anndata object. Most is just data transformation. The end result should be the same as what is included in the squpidpy package.

sce$Bregma = sce$Bregma * 100

unique(sce$Animal_ID)

##  [1]  1  2  3  4  5  6  7  8  9 10 11 16 17 18 19 12 13 14 15 20 21 22 23 24 34
## [26] 35 36 31 32 33 25 26 27 28 29 30

sce_filt <- sce[,sce$Animal_ID == 1]
sort(unique(sce_filt$Bregma))

##  [1] -29 -24 -19 -14  -9  -4   1   6  11  16  21  26

sce_list <- sce_filt[,FALSE]
spatial_list <- DataFrame("Centroid_X" = NA, "Centroid_Y" = NA)[numeric(0),]
for (i in sort(unique(sce_filt$Bregma))) {
    s_filt <- sce_filt[,sce_filt$Bregma == i]
    sce_list <- cbind(sce_list, s_filt)
    spatial_filt <- reducedDim(s_filt)
    spatial_filt <- DataFrame(apply(spatial_filt, 2, function(x) (x - min(x))/(max(x)-min(x))))
    spatial_list <- rbind(spatial_list, spatial_filt)
}

sce_total <- sce_list
spatial_total <- spatial_list

spatial3d <- cbind(spatial_total, sce_total$Bregma)
reducedDim(sce_total, "spatial3d") <- spatial3d

colData(sce)

## DataFrame with 1027848 rows and 9 columns
##                        Cell_ID Animal_ID  Animal_sex    Behavior    Bregma
##                    <character> <integer> <character> <character> <numeric>
## 1       6749ccb4-2ed1-4029-9..         1      Female       Naive        26
## 2       6cac74bd-4ea7-4701-8..         1      Female       Naive        26
## 3       9f29bd57-16a5-4b26-b..         1      Female       Naive        26
## 4       d7eb4e0b-276e-47e3-a..         1      Female       Naive        26
## 5       54434f3a-eba9-4aec-a..         1      Female       Naive        26
## ...                        ...       ...         ...         ...       ...
## 1027844 eaaf93ba-75b1-40cd-a..        30        Male      Mating        11
## 1027845 bb9caf59-d960-452e-a..        30        Male      Mating        11
## 1027846 2f45d61d-3a80-470b-8..        30        Male      Mating        11
## 1027847 180ae0ff-9817-48b9-8..        30        Male      Mating        11
## 1027848 83463d3a-29c5-40c3-b..        30        Male      Mating        11
##         Centroid_X Centroid_Y  Cell_class Neuron_cluster_ID
##          <numeric>  <numeric> <character>       <character>
## 1         -3211.56    2608.54   Astrocyte                  
## 2         -3207.92    2621.80  Inhibitory               I-5
## 3         -3209.58    2633.15  Inhibitory               I-6
## 4         -3203.85    2756.05  Inhibitory               I-5
## 5         -3202.68    2608.80  Inhibitory               I-9
## ...            ...        ...         ...               ...
## 1027844    2732.44   -2322.58   Ambiguous                  
## 1027845    2732.67   -2120.23   Ambiguous                  
## 1027846    2826.25   -2308.95   Ambiguous                  
## 1027847    2900.96   -2174.60  Inhibitory               I-2
## 1027848    2817.02   -1716.24   Ambiguous