Spaniel

Rachel Queen

2019-08-07

About Spaniel

Spaniel is an R package designed to visualise results of Spatial Transcriptomics experiments. To install and load Spaniel:

if (!require("devtools")) 
install.packages("devtools")


if (!require("Spaniel")) 
devtools::install_github(repo = "RachelQueen1/Spaniel", 
build_opts = c("--no-resave-data", "--no-manual"), 
build_vignettes = TRUE)

library(Spaniel)
library(Seurat)

Load data

The data used in this vignette was published in Science (AAAS) under the title “Visualization and analysis of gene expression in tissue sections by spatial transcriptomics” on July 1st 2016. A subset of the data that is used in this vignette is included in the package.

To download the full dataset vist:

www.spatialtranscriptomicsresearch.org/datasets/doi-10-1126science-aaf2403/

Expression Matrix

Spaniel requires a data frame of counts where the genes equate to the rows and the columns equate to a Spatial Transcriptomics spot/barcode.

### read in test data
counts <- readRDS(file.path(system.file(package = "Spaniel"), 
                            "extdata/counts.rds"))

The barcodes are named in the colnames of the data frame:

colnames(counts)[1:10]
##  [1] "ACAAACCTACTCCGCTGT" "ACAAAGAGATATAGCCCT" "ACAACTATGGGTTGGCGG"
##  [4] "ACAACTTGTACGGACCCT" "ACAAGAAGCCTGTCCGCG" "ACAAGAATTGACAAGATC"
##  [7] "ACAAGCCGCGCTGTTACG" "ACACAGATCCTGTTCTGA" "ACACAGCGTTGTGGAATC"
## [10] "ACACTAAATGAGCCCTAC"

The genes are named in rownames of the data frame

rownames(counts)[1:10]
##  [1] "Tulp4"   "Arhgap5" "Eef1a1"  "Map1a"   "Gsk3b"   "Qk"      "Tmod2"  
##  [8] "Glul"    "Gpm6b"   "Scd2"

Barcode Coordinates

Spaniel also requires the coordinates of the barcodes. In this example we will load the undjusted coordinates provided by spatial transcriptomics but adjusted coordinates can also be used. The barcodes need to be stored in a tab delimited text file. The first column of the file should contain the barcodes, the second coumn the x coordinate of the barcode and the third column the y coordinate.

barcodesFile <- file.path(system.file(package = "Spaniel"), 
                            "1000L2_barcodes.txt")

It is not necessary to load this file in at this point but for demonstration purposes we will load it now and look at the first few lines of the file.

barcodes <- read.csv(barcodesFile, sep = "\t", header = FALSE)
head(barcodes)
##                   V1 V2 V3
## 1 GTCCGATATGATTGCCGC 32  2
## 2 ATGAGCCGGGTTCATCTT 31  2
## 3 TGAGGCACTCTGTTGGGA 30  2
## 4 ATGATTAGTCGCCATTCG 29  2
## 5 ACTTGAGGGTAGATGTTT 28  2
## 6 ATGGCCAATACTGTTATC 27  2

Create a S4 object

Spaniel includes two functions to create S4 objects containing the counts and barcode information. These are the Seurat object: https://satijalab.org/seurat/ and a SingleCellExperiment object: https://bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html

Creating a Seurat object

The readSeurat() function can be used to create a Seurat object. The count data is stored in the counts slot of the assay slot of the object, the barcodes are stored in the meta.data slot and the ProjectName and SectionNumber arguments can be used to add information about the Sample and position on slide to the project.name slot of the Seurat object.

SeuratObj <- readSeurat(counts,
                        barcodesFile, 
                        ProjectName = "TestProj", 
                        SectionNumber = 1
)

The barcodes are stored in the metadata:

head(SeuratObj[[]])
##                    orig.ident nCount_RNA nFeature_RNA               spot
## ACAAACCTACTCCGCTGT TestProj1_      68090          399 ACAAACCTACTCCGCTGT
## ACAAAGAGATATAGCCCT TestProj1_      32289          302 ACAAAGAGATATAGCCCT
## ACAACTATGGGTTGGCGG TestProj1_     110044          442 ACAACTATGGGTTGGCGG
## ACAACTTGTACGGACCCT TestProj1_       7401          159 ACAACTTGTACGGACCCT
## ACAAGAAGCCTGTCCGCG TestProj1_      15426          193 ACAAGAAGCCTGTCCGCG
## ACAAGAATTGACAAGATC TestProj1_        354           16 ACAAGAATTGACAAGATC
##                     x  y
## ACAAACCTACTCCGCTGT  5 14
## ACAAAGAGATATAGCCCT 26  9
## ACAACTATGGGTTGGCGG 16 16
## ACAACTTGTACGGACCCT  2 28
## ACAAGAAGCCTGTCCGCG  9 29
## ACAAGAATTGACAAGATC  3  6

Corner of the count data:

GetAssayData(SeuratObj, "counts")[1:10, 1:5]
## 10 x 5 sparse Matrix of class "dgCMatrix"
##         ACAAACCTACTCCGCTGT ACAAAGAGATATAGCCCT ACAACTATGGGTTGGCGG
## Tulp4                  166                  .                 86
## Arhgap5                 79                108                150
## Eef1a1                  59                  .                195
## Map1a                    .                  .                 23
## Gsk3b                   21                  4                186
## Qk                      31                 94                368
## Tmod2                   38                  6                177
## Glul                   100                 81                852
## Gpm6b                  167                189                685
## Scd2                   297                 78                386
##         ACAACTTGTACGGACCCT ACAAGAAGCCTGTCCGCG
## Tulp4                    .                  .
## Arhgap5                  .                  1
## Eef1a1                   4                  2
## Map1a                    .                  .
## Gsk3b                    .                  .
## Qk                       .                133
## Tmod2                    .                  .
## Glul                     1                  .
## Gpm6b                   24                169
## Scd2                     .                  5

Project name

Project(SeuratObj)
## [1] "TestProj1_"

Creating a SingleCellExperiment object

Alternatively, the readSCE function can be used to create a SingleCellExperiment object. This time the ProjectName and SectionNumber are added as a column of the colData of the object. The number of genes and counts per spot are calculated by the calculateQCMetrics() from the scater bioconductor package.

sce <- readSCE(Counts = counts,
                BarcodeFile = barcodesFile, 
                ProjectName = "TestProj", 
                SectionNumber = 1)

Barcodes are stored in the colData

head(colData(sce)[1:5,1:5])
## DataFrame with 5 rows and 5 columns
##                                  spot         x         y     project
##                              <factor> <integer> <integer> <character>
## ACAAACCTACTCCGCTGT ACAAACCTACTCCGCTGT         5        14  TestProj1_
## ACAAAGAGATATAGCCCT ACAAAGAGATATAGCCCT        26         9  TestProj1_
## ACAACTATGGGTTGGCGG ACAACTATGGGTTGGCGG        16        16  TestProj1_
## ACAACTTGTACGGACCCT ACAACTTGTACGGACCCT         2        28  TestProj1_
## ACAAGAAGCCTGTCCGCG ACAAGAAGCCTGTCCGCG         9        29  TestProj1_
##                    is_cell_control
##                          <logical>
## ACAAACCTACTCCGCTGT           FALSE
## ACAAAGAGATATAGCCCT           FALSE
## ACAACTATGGGTTGGCGG           FALSE
## ACAACTTGTACGGACCCT           FALSE
## ACAAGAAGCCTGTCCGCG           FALSE

Corner of the count data:

counts(sce)[1:10, 1:5]
##         ACAAACCTACTCCGCTGT ACAAAGAGATATAGCCCT ACAACTATGGGTTGGCGG
## Tulp4                  166                  0                 86
## Arhgap5                 79                108                150
## Eef1a1                  59                  0                195
## Map1a                    0                  0                 23
## Gsk3b                   21                  4                186
## Qk                      31                 94                368
## Tmod2                   38                  6                177
## Glul                   100                 81                852
## Gpm6b                  167                189                685
## Scd2                   297                 78                386
##         ACAACTTGTACGGACCCT ACAAGAAGCCTGTCCGCG
## Tulp4                    0                  0
## Arhgap5                  0                  1
## Eef1a1                   4                  2
## Map1a                    0                  0
## Gsk3b                    0                  0
## Qk                       0                133
## Tmod2                    0                  0
## Glul                     1                  0
## Gpm6b                   24                169
## Scd2                     0                  5

Project name:

colData(sce)$project[1]
## [1] "TestProj1_"

Load image

The next step is to load the histological image into R. The image example has been scaled down in size, using photoshop, to increase the plotting speeds. The image is cropped to around te edges of the spots.

### Load histological image into R

imgFile <- file.path(system.file(package = "Spaniel"), 
                        "HE_Rep1_resized.jpg")

image <- parseImage(imgFile)

Spatial Transcriptomics using Seurat

The rest of this vignette will focus on using a Seurat workflow.

Quality Control

Assessing the number of genes and number of counts per spot is a useful quality control step. Spaniel allows QC metrics to be viewed on top of the histological image so that any quality issues can be pinpointed. Spots within the tissue region which have a low number of genes or counts may be due to experimental problems which should be addressed. Conversely spots which lie outside of the tissue and have a high number of counts or large number of genes may indicate that there is background contamination.

Visualisation

The plotting function allows the use of a binary filter to visualise which spots pass filtering thresholds. We create a filter to show spots which have greater than 280 genes and a minimum of 67500.

NOTE: The parameters are set for subset of counts used in this data set and will be lower the a dataset where the full count matrix is used.
The filter thresholds will be experiment specific and should be adjusted as necessary.

minGenes <- 280
minUMI <- 67500

filter <- SeuratObj$nCount_RNA > minUMI &
            SeuratObj$nFeature_RNA > minGenes 

ST_plot(Object = SeuratObj, 
        Grob = image, 
        PlotType = "NoGenes", 
        ShowFilter = filter)

Filtering

Spots which don’t pass QC thresholds can then be filtered from the data:

SeuratFiltered <- subset(x = SeuratObj, subset = nFeature_RNA > minGenes &
                                nCount_RNA > minUMI)


ST_plot(Object = SeuratFiltered, 
        Grob = image, 
        PlotType = "NoGenes")