Single-cell RNA sequencing shows the immunosuppressive landscape and tumor heterogeneity of HBV-associated hepatocellular carcinoma是2021年6月刚发表在NC上的文章,主要是利用单细胞转录组分析HCC患者的肿瘤异质性和免疫相关内容。本文主要复现这篇文章的生物信息分析方面的结果,包括上游的比对和定量,下游的异质性和细胞互作等内容。
The upstream analyses of Single cell RNA-seq data¶
Data collection¶
The sequence data for 8 liver carcinoma was deposited in NCBI with accession code of SRP318499. We download the bam files of 8 samples using ascp software. The size of 8 samples was listed below:
du -sh rawData/*bam
## 56G rawData/095_possorted_genome_bam.bam
## 23G rawData/104_possorted_genome_bam.bam
## 22G rawData/106_possorted_genome_bam.bam
## 51G rawData/114_possorted_genome_bam.bam
## 45G rawData/119_possorted_genome_bam.bam
## 22G rawData/713_possorted_genome_bam.bam
## 27G rawData/725_possorted_genome_bam.bam
## 53G rawData/740_possorted_genome_bam.bam
Gene expression quantification¶
We then convert bam files to fastq format, count gene expression for each samples, and aggregate 8 count matrices to one matrix. Take sample with 119 id for example:
# 1. the barcoded BAM files are covered to FASTQ files with the 10x Genomics bamtofastq tool
bamtofastq --nthreads=8 119_possorted_genome_bam.bam 119
# 2. perform read alignment, UMI counting,
cd 119
cellranger count --id=119 --transcriptome=/media/zhangfeng/myData/reference/refdata-gex-GRCh38-2020-A --fastqs=trial5d_count_new_0_1_HG3WLDMXX --chemistry=SC3Pv2 # note: the default parameter of chemistry would throw error, the SC3Pv2 should be defined
# 3. use the cellranger aggr pipeline to aggregate outputs from multiple runs of cellranger count, normalize runs to the same effective sequencing depth
cellranger aggr --id=aggr --csv=libraries.csv
We can review the summary from the
web_summary.html.
The aggr analysis detected a error : Low Post-Normalization Read Depth
of 19.8% , which means that there may be large differences in sequencing
depth across the input libraries.
So we have to quality control for each samples and then combine them
together.
Quality control¶
Firstly, I check the quality of features, counts, and the percentage of mitochondrial genes
library(Seurat)
load("cluster/2_qc/rawSeurats.RData")
dim(rawSeurats);
## [1] 26428 41630
table(rawSeurats@active.ident)
##
## 095 104 106 114 119 713 725 740
## 8388 540 757 11855 7535 1368 2464 8723
VlnPlot(rawSeurats,features = c("nFeature_RNA"))

VlnPlot(rawSeurats,features = c("nCount_RNA"))

VlnPlot(rawSeurats,features = c("percent.mt"))
I perform the qulity control analysis as proposed by Current best practices in single‐cell RNA‐seq analysis: a tutorial, which was published by Luecken at 2019. The distributions of these QC covariates are examined for outlier peaks that are filtered out by thresholding. After quality control, the features, counts, and the percentage of mitochondrial genes are checked again:
load("cluster/2_qc/cleanSeurats.RData")
dim(cleanSeurats);
## [1] 26428 17371
table(cleanSeurats@active.ident)
##
## 095 104 106 114 119 713 725 740
## 3719 406 435 5721 3030 894 1439 1727
VlnPlot(cleanSeurats,features = c("nFeature_RNA"))

VlnPlot(cleanSeurats,features = c("nCount_RNA"))

VlnPlot(cleanSeurats,features = c("percent.mt"))

The plots show that the outlier value are deleted.
Cluster and annotation¶
The Seurat package then is used to preform PCA, reduction, cluster, UMAP and TSNE analysis
1. PCA¶
I conduct the PCA analysis for all cells :
load("cluster/3_cluster/cleanSeurats_standard.RData")
DimPlot(object = cleanSeurats, reduction = "pca",label =T)

DimPlot(object = cleanSeurats, reduction = "pca",label =T,group.by = "orig.ident")

The PCA plots show that maybe batch effect exist. The CCA method is used to adjust batch effect. I then perform the CCA analysis to adjust the batch effect:
load("cluster/3_cluster/cleanSeurats_adjusted.RData")
DimPlot(object = cleanAdjustSeurats, reduction = "tsne",label =T,group.by = "orig.ident")

DimPlot(object = cleanAdjustSeurats, reduction = "umap",label =T,group.by = "orig.ident")

Due to the heterogeneity patients with liver cancer, the tumor cell across cancer samples should be independent each other. However, these cells are mixed. I think that the batch effect is over-estimated. Finally, I decide to run standard workflow as Seurat proposed. The step of batch effect adjustion analysis is skipped.
2. Reduction and TSNE analysis¶
The UMAP and TSNE plot are drew below:
load("cluster/3_cluster/cleanSeurats_standard.RData")
DimPlot(object = cleanSeurats, reduction = "tsne",label =T)

DimPlot(object = cleanSeurats, reduction = "tsne",label =T,group.by = "orig.ident")

The patients with tumor cell are split, which is coincident with our common sense. However, in comparision with TSNE, the UMAP is better, which can clearly show the difference between case identify, and infiltrating immnue cells.
3. UMAP analysis¶
DimPlot(object = cleanSeurats, reduction = "umap",label =T)

DimPlot(object = cleanSeurats, reduction = "umap",label =T,group.by = "orig.ident")

Taken together, we used UMAP to show the cluster of different cell types. The SingleR and markers are combined to define different cell types.
load("cluster/3_cluster/cleanSeurats_anno.RData")
DimPlot(object = cleanSeurats, reduction = "umap",label =T,group.by = "celltype_fine")

The downstream analyses of Single cell RNA-seq data¶
Cell type heterogeneity¶
We find that the cells from the same case are tended to cluster together:
DimPlot(object = cleanSeurats, reduction = "umap",label =T,group.by = "celltype_fine")

However, different HCC case have different proportion of immune cells. The proportion of different cells in 8 HCC cases are calculated :
library(data.table)
library(ggplot2)
library(ggpubr)
library(stringr)
load("cluster/3_cluster/cleanSeurats_anno.RData")
############### T cells and macrophages
# cell proportion in cases
anno = cleanSeurats@meta.data %>% setDT()
cellPro = anno[,.(totalNum=.N,
Non_M2=sum(celltype_fine=="Macrophages")/.N,
M2=sum(celltype_fine=="M2-macrophages")/.N,
CD8=sum(celltype_fine=="CD8 cells")/.N,
CD4=sum(celltype_fine=="CD4 cells")/.N,
B_cells=sum(celltype_fine=="B cells")/.N,
NK=sum(celltype_fine=="NK cells")/.N,
Tumor=sum(str_detect(celltype_fine,"^#"))/.N,
Other=sum(celltype_fine%in%c("Endothelial cells","Treg cells"))/.N),by="orig.ident"]
rowSums(cellPro[,-c(1:2)]) # equal to 1
## [1] 1 1 1 1 1 1 1 1
y=as.matrix(t(cellPro[,-c(1:2)]))
barplot(y,xlim=c(0, ncol(y) + 4),
col=c("red","black","blue","yellow","gray","green","white","purple"),
names.arg=cellPro$orig.ident,
legend.text=TRUE,
args.legend=list(
x=ncol(y) + 4,
y=max(colSums(y)),
bty ="n"
))

T cells and macrophages¶
The correlation between proportion of T cells and macrophages in cases are calculated:
# correlation
ggplot(cellPro, aes(x = M2, y = CD8)) +
geom_point() +
stat_smooth(method = "lm", col = "blue")+
stat_cor(label.x = 0, label.y = -0.2) +
labs(title="scRNA-seq cohort", x="M2 macrophages proportion", y="CD8T proportion")+
theme(plot.title = element_text(hjust = 0.5))

To compared with bulk RNA-seq , the immune cell deconvolution analysis is performed, and the correaltion relationship is represent by linear model :
# tcga immune cell
lihc = read.csv("cluster/6_prognosis/TCGA_LIHC_TME_results.csv",head=T)
ggplot(lihc, aes(x = Macrophages.M2, y = T.cells.CD8)) +
geom_point() +
stat_smooth(method = "lm", col = "blue")+
stat_cor(label.x = 0.4, label.y = 0.4)+
labs(title="TCGA LIHC cohort", x="M2 macrophages proportion", y="CD8T proportion")+
theme(plot.title = element_text(hjust = 0.5))

We also detect inverse correlation between the proportions of infiltrating T cells and macrophages.
Immunosuppressive marker expression of TAMs¶
We examined the expression of reported immunosuppressive genes in TAMs:
load("cluster/3_cluster/cleanSeurats_anno.RData")
FeaturePlot(cleanSeurats, features = c("LAIR1","HAVCR2","LGALS9","VSIR"))

DimPlot(object = cleanSeurats, label =T,group.by = "celltype_fine")

These markers are enriched in TAM cells. In addition, the pattern of CD163 (M2 macrophages marker) expression and LAIR1 is similar:
cellKeep = colnames(cleanSeurats)[cleanSeurats@meta.data$celltype_main=="Macrophages"]
VlnPlot(subset(cleanSeurats,cells=cellKeep), features = c("CD163", "LAIR1", "CD68","HAVCR2","LGALS9","VSIR"))

These suggest the immunosuppressive function of TAMs might be exerted via LAIR1.
In addition, the high expression of LAIR1 and HAVCR2 is significantly associated with poorer overall or disease-free survival of HCC patients.
The overall survival for LAIR1
,
and for HAVCR2
.
The disease-free survival for LAIR1
, and for HAVCR2
.
Cell trajectory¶
library(monocle3)
library(SeuratWrappers)
load("cluster/3_cluster/cleanSeurats_anno.RData")
get_earliest_principal_node <- function(cds, cluster="1"){
cell_ids <- which(colData(cds)[, "seurat_clusters"] == cluster)
closest_vertex <- cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
closest_vertex <- as.matrix(closest_vertex[colnames(cds), ])
root_pr_nodes <- igraph::V(principal_graph(cds)[["UMAP"]])$name[as.numeric(names(which.max(table(closest_vertex[cell_ids,]))))]
root_pr_nodes
}
cds <- as.cell_data_set(cleanSeurats)
cds <- estimate_size_factors(cds)
cds@rowRanges@elementMetadata@listData[["gene_short_name"]] <- rownames(cleanSeurats[["RNA"]])
We do the single cell trajectory for CD8 population, and observed that gradual transition of CD8 T cells towards the subpopulations with exhaustion status was indicated by the upregulation of PDCD1 (PD-1) and TIGIT (T cell immunoreceptor with Ig and ITIM domains).
cds_subset <- cds[, colData(cds)$seurat_clusters%in% c(10,16)]
cds_subset <- cluster_cells(cds_subset)
cds_subset <- learn_graph(cds_subset)
cds_subset <- order_cells(cds_subset,root_pr_nodes=get_earliest_principal_node(cds_subset,cluster=10))
# for CD8
plot_cells(cds_subset,
label_cell_groups=T,
color_cells_by = "pseudotime",
show_trajectory_graph=F)

plot_cells(cds_subset,
label_cell_groups=T,
genes=c("PDCD1","TIGIT"),
show_trajectory_graph=F)

We find that exhaustion status was indicated by the downregulation of FCGR3A (activation marker) and upregulation of KLRC1 (exhaustion marker).
# for NK cells
cds_subset <- cds[, colData(cds)$seurat_clusters%in% c(15)]
cds_subset <- cluster_cells(cds_subset)
cds_subset <- learn_graph(cds_subset)
cds_subset <- order_cells(cds_subset,root_pr_nodes=get_earliest_principal_node(cds_subset,cluster=15))
plot_cells(cds_subset,
label_cell_groups=T,
color_cells_by = "pseudotime",
show_trajectory_graph=F)

plot_cells(cds_subset,
label_cell_groups=T,
genes=c("FCGR3A","KLRC1"),
show_trajectory_graph=F)

We find the transition toward a more immunosuppressive state for Treg cells, as indicated by the enriched PDCD1 expression.
cds_subset <- cds[, colData(cds)$seurat_clusters%in% c(5,29)]
cds_subset <- cluster_cells(cds_subset)
cds_subset <- learn_graph(cds_subset)
cds_subset <- order_cells(cds_subset,root_pr_nodes=get_earliest_principal_node(cds_subset,cluster=5))
plot_cells(cds_subset,
label_cell_groups=T,
color_cells_by = "pseudotime",
show_trajectory_graph=F)

plot_cells(cds_subset,
label_cell_groups=T,
genes=c("PDCD1"),
show_trajectory_graph=F)

For macrophages, a dynamic transition towards more immunosuppressive M2 macrophages featured by CD163 and LAIR1 expression.
# for TAMs
cds_subset <- cds[, colData(cds)$seurat_clusters%in% c(3,9,14,18,21)]
cds_subset <- cluster_cells(cds_subset)
cds_subset <- learn_graph(cds_subset)
cds_subset <- order_cells(cds_subset,root_pr_nodes=get_earliest_principal_node(cds_subset,cluster=9))
plot_cells(cds_subset,
label_cell_groups=T,
color_cells_by = "pseudotime",
show_trajectory_graph=F)

plot_cells(cds_subset,
label_cell_groups=T,
genes=c("CD163","LAIR1"),
show_trajectory_graph=F)

Regarding the CD4 T cells, they were mainly type 1 (Th1) and type 2T helper (Th2) cells, as indicated by their respective enrichment in the expression of STAT4 (cell cluster 11) and GATA3 (cell cluster 12).
cds_subset <- cds[, colData(cds)$seurat_clusters%in% c(1,2,6,17)]
cds_subset <- cluster_cells(cds_subset)
cds_subset <- learn_graph(cds_subset)
cds_subset <- order_cells(cds_subset,root_pr_nodes=get_earliest_principal_node(cds_subset,cluster=2))
plot_cells(cds_subset,
label_cell_groups=T,
color_cells_by = "pseudotime",
show_trajectory_graph=F)

plot_cells(cds_subset,
label_cell_groups=T,
genes=c("STAT4","GATA3"),
show_trajectory_graph=F)

Immune checkpoints¶
Immune checkpoints function as “brakes” on T cell immune responses resulting in the weakening of T cell attack and immune tolerance or escape. We examined the cell-cell interaction status in different cell, and the co-stimulatory and co-inhibitory checkpoints in shaping the immunosuppressive andscape in HCC are estimated :
mypvals <- read.delim("cluster/7_interaction/out/pvalues.txt", check.names = FALSE)
mymeans <- read.delim("cluster/7_interaction/out/means.txt", check.names = FALSE)
costimulatory <- grep("ICOS|ICOSLG|TNFRSF4|TNFRSF8|TNFRSF9|TNFRSF18|CD40LG|CD40|CD27|CD70|CD28|CD80|CD86|CD226|PVR|NECTIN1|NECTIN2|NECTIN3|NECTIN4|TNFSF14|TNFRSF14",
mymeans$interacting_pair,value = T)
coinhibitory <- grep("PDCD1|CD274|PDCD1LG2|HAVCR2|LGALS9|LAIR1|PTPN6|PTPN11|CTLA4|CD80|CD86|TIGIT|PVR|NECTIN1|NECTIN2|NECTIN3|NECTIN4|CD160|BTLA|TNFRSF14",
mymeans$interacting_pair,value = T)
# plot for co-stimulatory
meansdf = mymeans %>% dplyr::filter(interacting_pair %in% costimulatory) %>%
dplyr::select(c("interacting_pair","CD4 cells|Tumor","CD8 cells|Tumor","Treg cells|Tumor","NK cells|Tumor",
"CD4 cells|Macrophages","CD8 cells|Macrophages","Treg cells|Macrophages","NK cells|Macrophages")) %>%
reshape2::melt()
colnames(meansdf)<- c("interacting_pair","CC","means")
meansdf$joinlab <- paste0(meansdf$interacting_pair,"_",meansdf$CC)
pvalsdf = mypvals %>% dplyr::filter(interacting_pair %in% costimulatory)%>%
dplyr::select(c("interacting_pair","CD4 cells|Tumor","CD8 cells|Tumor","Treg cells|Tumor","NK cells|Tumor",
"CD4 cells|Macrophages","CD8 cells|Macrophages","Treg cells|Macrophages","NK cells|Macrophages"))%>%
reshape2::melt()
colnames(pvalsdf)<- c("interacting_pair","CC","pvals")
pvalsdf$joinlab <- paste0(pvalsdf$interacting_pair,"_",pvalsdf$CC)
ccInter <- merge(pvalsdf,meansdf,by = "joinlab")
summary(ccInter$means)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0160 0.0765 0.1538 0.2085 1.1570
ccInter %>% dplyr::filter(means > 0.1) %>%
ggplot(aes(CC.x,interacting_pair.x) )+
geom_point(aes(size=means,colour= -log10(pvals+0.0001))) +
scale_color_continuous(name=c("-log10(P-value)"),low = "black", high = "red")+
theme_bw()+ labs(x="",y="")+
theme(axis.text.x = element_text(angle = -45,hjust = -0.1,vjust = 0.8))

# plot for co-inhibitory
meansdf = mymeans %>% dplyr::filter(interacting_pair %in% coinhibitory) %>%
dplyr::select(c("interacting_pair","CD4 cells|Tumor","CD8 cells|Tumor","Treg cells|Tumor","NK cells|Tumor",
"CD4 cells|Macrophages","CD8 cells|Macrophages","Treg cells|Macrophages","NK cells|Macrophages")) %>%
reshape2::melt()
colnames(meansdf)<- c("interacting_pair","CC","means")
meansdf$joinlab <- paste0(meansdf$interacting_pair,"_",meansdf$CC)
pvalsdf = mypvals %>% dplyr::filter(interacting_pair %in% coinhibitory)%>%
dplyr::select(c("interacting_pair","CD4 cells|Tumor","CD8 cells|Tumor","Treg cells|Tumor","NK cells|Tumor",
"CD4 cells|Macrophages","CD8 cells|Macrophages","Treg cells|Macrophages","NK cells|Macrophages"))%>%
reshape2::melt()
colnames(pvalsdf)<- c("interacting_pair","CC","pvals")
pvalsdf$joinlab <- paste0(pvalsdf$interacting_pair,"_",pvalsdf$CC)
ccInter <- merge(pvalsdf,meansdf,by = "joinlab")
summary(ccInter$means)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.02775 0.10650 0.17020 0.23875 1.15700
ccInter%>% dplyr::filter(means > 0.1) %>%
ggplot(aes(CC.x,interacting_pair.x) )+
geom_point(aes(size=means,colour= -log10(pvals+0.0001))) +
scale_color_continuous(name=c("-log10(P-value)"),low = "black", high = "red")+
theme_bw()+ labs(x="",y="")+
theme(axis.text.x = element_text(angle = -45,hjust = -0.1,vjust = 0.8))

We identified a prominent co-inhibitory signal via the TIGIT-NECTIN2 axis in T cells and antigen-presenting cells (APCs, i.e. macrophages and tumor cells):
FeaturePlot(cleanSeurats, features = c("TIGIT","NECTIN2"))

Cell interactions¶
We evaluated the degree of cell-cell communication according to different ligand-receptor relationships. The tumor cells contributing ligands, tumor cells contributing receptors, TAMs contributing ligands and TAMs contributing receptors are presented as :
# plot for tumor ligand
meansdf = mymeans %>% dplyr::select("interacting_pair",starts_with("Tumor"))
meansdf = meansdf[!duplicated(meansdf$interacting_pair),]
row.names(meansdf) = meansdf$interacting_pair
meansdf = as.matrix(meansdf[,-1])
pvalsdf = mypvals[!duplicated(mypvals$interacting_pair),]
row.names(pvalsdf) = pvalsdf$interacting_pair
pvalsdf = pvalsdf[row.names(meansdf),colnames(meansdf)]
meansdf = meansdf[apply(pvalsdf,1,function(x) { sum(x<0.05)>3}),]
pheatmap::pheatmap(meansdf)

# plot for tumor receptor
meansdf = mymeans %>% dplyr::select("interacting_pair",ends_with("Tumor"))
#meansdf = meansdf[!str_detect(meansdf$interacting_pair,pattern=" "),]
meansdf = meansdf[!duplicated(meansdf$interacting_pair),]
row.names(meansdf) = meansdf$interacting_pair
meansdf = as.matrix(meansdf[,-1])
pvalsdf = mypvals[!duplicated(mypvals$interacting_pair),]
row.names(pvalsdf) = pvalsdf$interacting_pair
pvalsdf = pvalsdf[row.names(meansdf),colnames(meansdf)]
meansdf = meansdf[apply(pvalsdf,1,function(x) { sum(x<0.05)>3}),]
pheatmap::pheatmap(meansdf)

#plot for TAM ligand
meansdf = mymeans %>% dplyr::select("interacting_pair",starts_with("Macrophages"))
#meansdf = meansdf[!str_detect(meansdf$interacting_pair,pattern=" "),]
meansdf = meansdf[!duplicated(meansdf$interacting_pair),]
row.names(meansdf) = meansdf$interacting_pair
meansdf = as.matrix(meansdf[,-1])
pvalsdf = mypvals[!duplicated(mypvals$interacting_pair),]
row.names(pvalsdf) = pvalsdf$interacting_pair
pvalsdf = pvalsdf[row.names(meansdf),colnames(meansdf)]
meansdf = meansdf[apply(pvalsdf,1,function(x) { sum(x<0.05)>3}),]
pheatmap::pheatmap(meansdf)

# plot for TAM receptor
meansdf = mymeans %>% dplyr::select("interacting_pair",ends_with("Macrophages"))
#meansdf = meansdf[!str_detect(meansdf$interacting_pair,pattern=" "),]
meansdf = meansdf[!duplicated(meansdf$interacting_pair),]
row.names(meansdf) = meansdf$interacting_pair
meansdf = as.matrix(meansdf[,-1])
pvalsdf = mypvals[!duplicated(mypvals$interacting_pair),]
row.names(pvalsdf) = pvalsdf$interacting_pair
pvalsdf = pvalsdf[row.names(meansdf),colnames(meansdf)]
meansdf = meansdf[apply(pvalsdf,1,function(x) { sum(x<0.05)>3}),]
pheatmap::pheatmap(meansdf)

In general, HCC tumor cells frequently interacted with immune cells.
Subclonal heterogeneity¶
The unsupervised hierarchical clustering of LCSC markers is shown as
_cluster" />.
All cells are split into 7 clusters, one of which including 6 cells can be ignored:
_umap" />
We find that the modest correlation between case identity and LCSC marker group status.
To further explore the heterogeneity landscape of HCC tumor cell, the CNV status is inferred using infercnv package.
The results showed that there are 9 major group of HCC tumor cells.
These cells from 9 major group are clustered together according to their case identify.
Idea¶
- to research molecular mechanism of key gene based on conditional knockout mice, regrading their basic structure and function, novelty, and unique expressed in certain cell.
- using new technology to sequence the HCC sample, such as spatial single RNA-seq technology.
Comments
欢迎留言,使用 GitHub 账号登录即可评论。