Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

scJoint integrates atlas-scale single-cell RNA-seq and ATAC-seq data with transfer learning

Abstract

Single-cell multiomics data continues to grow at an unprecedented pace. Although several methods have demonstrated promising results in integrating several data modalities from the same tissue, the complexity and scale of data compositions present in cell atlases still pose a challenge. Here, we present scJoint, a transfer learning method to integrate atlas-scale, heterogeneous collections of scRNA-seq and scATAC-seq data. scJoint leverages information from annotated scRNA-seq data in a semisupervised framework and uses a neural network to simultaneously train labeled and unlabeled data, allowing label transfer and joint visualization in an integrative framework. Using atlas data as well as multimodal datasets generated with ASAP-seq and CITE-seq, we demonstrate that scJoint is computationally efficient and consistently achieves substantially higher cell-type label accuracy than existing methods while providing meaningful joint visualizations. Thus, scJoint overcomes the heterogeneity of different data modalities to enable a more comprehensive understanding of cellular phenotypes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of scJoint.
Fig. 2: Analysis of mouse cell atlas subset data containing 19 overlapping cell types from RNA and ATAC.
Fig. 3: Analysis of mouse cell atlas full data.
Fig. 4: Integration of multimodal PBMC data across biological conditions: with (stimulation) or without (control) T cell activation.
Fig. 5: Analysis of paired gene expression and chromatin accessibility data from SNARE-seq.

Similar content being viewed by others

Data availability

All single-cell datasets used in this paper are publicly available. • Mouse atlas data. The scRNA-seq dataset was downloaded from Tabula Muris5 (https://tabula-muris.ds.czbiohub.org/). The sci-ATAC-seq dataset of Cusanovich et al.26 was downloaded from https://atlas.gs.washington.edu/mouse-atac/. • Human fetal atlas data. The scRNA-seq dataset from Cao et al.27 was downloaded from GSE156793. The scATAC-seq dataset from Domcke et al.28 was downloaded from GSE149683. • SNARE-seq data. The SNARE-seq dataset of adult mouse cerebral cortex14 was downloaded from GSE126074. • Multimodal PBMC data. The ASAP-seq and CITE-seq datasets from Mimitou et al.34 were obtained from GSE156478. • Human hematopoiesis data. The scRNA-seq and scATAC-seq datasets from Granja et al.40 were downloaded from https://github.com/GreenleafLab/MPAL-Single-Cell-2019.

Code availability

scJoint was implemented using PyTorch (v.1.0.0) with code available at https://github.com/SydneyBioX/scJoint.

References

  1. Stuart, T. & Satija, R. Integrative single-cell analysis. Nat. Rev. Genet. 20, 257–272 (2019).

    Article  CAS  Google Scholar 

  2. Berger, S. L. The complex language of chromatin regulation during transcription. Nature 447, 407–412 (2007).

    Article  CAS  Google Scholar 

  3. Klemm, S. L., Shipony, Z. & Greenleaf, W. J. Chromatin accessibility and the regulatory epigenome. Nat. Rev. Genet. 20, 207–220 (2019).

    Article  CAS  Google Scholar 

  4. Pott, S. & Lieb, J. D. Single-cell atac-seq: strength in numbers. Genome Biol. 16, 172 (2015).

    Article  Google Scholar 

  5. Schaum, N. et al. Single-cell transcriptomics of 20 mouse organs creates a tabula muris: the tabula muris consortium. Nature 562, 367 (2018).

    Article  Google Scholar 

  6. Regev, A. et al. Science forum: the human cell atlas. eLife 6, e27041 (2017).

    Article  Google Scholar 

  7. Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).

    Article  CAS  Google Scholar 

  8. Wang, J. et al. Data denoising with transfer learning in single-cell transcriptomics. Nat. Methods 16, 875–878 (2019).

    Article  CAS  Google Scholar 

  9. Lin, Y. et al. scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets. Proc. Natl Acad. Sci. USA 116, 9775–9784 (2019).

    Article  CAS  Google Scholar 

  10. Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with harmony. Nat. Methods 16, 1289–1296 (2019).

    Article  CAS  Google Scholar 

  11. Wang, T. et al. Bermuda: a novel deep transfer learning method for single-cell RNA sequencing batch correction reveals hidden high-resolution cellular subtypes. Genome Biol. 20, 165 (2019).

    Article  Google Scholar 

  12. Amodio, M. et al. Exploring single-cell data with deep multitasking neural networks. Nat. Methods 16, 1139–1145 (2019).

    Article  CAS  Google Scholar 

  13. Xiong, L. et al. Scale method for single-cell atac-seq analysis via latent feature extraction. Nat. Commun. 10, 4576 (2019).

    Article  Google Scholar 

  14. Chen, S., Lake, B. B. & Zhang, K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat. Biotechnol. 37, 1452–1457 (2019).

    Article  CAS  Google Scholar 

  15. Cao, J. et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361, 1380–1385 (2018).

    Article  CAS  Google Scholar 

  16. Jin, S., Zhang, L. & Nie, Q. scAI: an unsupervised approach for the integrative analysis of parallel single-cell transcriptomic and epigenomic profiles. Genome Biol. 21, 25 (2020).

    Article  Google Scholar 

  17. Argelaguet, R. et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 21, 111 (2020).

    Article  Google Scholar 

  18. Welch, J. D., Hartemink, A. J. & Prins, J. F. MATCHER: manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics. Genome Biol. 18, 138 (2017).

    Article  Google Scholar 

  19. Amodio, M. & and Krishnaswamy, S. MAGAN: aligning biological manifolds. In Proc. 35th International Conference on Machine Learning (eds. Dy, J. & Krause, A.) 215–223 (PMLR, 2018).

  20. Liu, J., Huang, Y., Vert, J.-P. & Noble, W. S. Jointly embedding multiple single-cell omics measurements. Algorithms Bioinform. 143, 10 (2019).

    PubMed  PubMed Central  Google Scholar 

  21. Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887 (2019).

    Article  CAS  Google Scholar 

  22. Duren, Z. et al. Integrative analysis of single-cell genomics data by coupled nonnegative matrix factorizations. Proc. Natl Acad. Sci. USA 115, 7723–7728 (2018).

    Article  CAS  Google Scholar 

  23. Barkas, N. et al. Joint analysis of heterogeneous single-cell RNA-seq dataset collections. Nat. Methods 16, 695–698 (2019).

    Article  CAS  Google Scholar 

  24. Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).

    Article  CAS  Google Scholar 

  25. DaiYang, K. et al. Multi-domain translation between single-cell imaging and sequencing data using autoencoders. Nat. Commun. 12, 31 (2021).

    Article  Google Scholar 

  26. Cusanovich, D. A. et al. A single-cell atlas of in vivo mammalian chromatin accessibility. Cell 174, 1309–1324 (2018).

    Article  CAS  Google Scholar 

  27. Cao, J. A human cell atlas of fetal gene expression. Science 370, eaba7721 (2020).

    Article  CAS  Google Scholar 

  28. Domcke, S. A human cell atlas of fetal chromatin accessibility. Science 370, eaba7612 (2020).

    Article  CAS  Google Scholar 

  29. van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Machine Learning Res. 9, 2579–2605 (2008).

    Google Scholar 

  30. McInnes, L., Healy, J. & Melville, J. UMAP: Uniform manifold approximation and projection for dimension reduction. Preprint at arXiv https://arxiv.org/abs/1802.03426 (2018).

  31. Pliner, H. A. et al. Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Mol. Cell 71, 858–871 (2018).

    Article  CAS  Google Scholar 

  32. Stuart, T., Srivastava, A., Madad, S., Lareau, C. A. & Satija, R. Single-cell chromatin state analysis with Signac. Nat. Methods 18, 1333–1341 (2021).

    Article  CAS  Google Scholar 

  33. Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865 (2017).

    Article  CAS  Google Scholar 

  34. Mimitou, E. P. et al. Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells. Nat. Biotechnol. 39, 1246–1258 (2021).

    Article  CAS  Google Scholar 

  35. Kim, H. J., Lin, Y., Geddes, T. A., Yang, J. Y. H. & Yang, P. CiteFuse enables multi-modal analysis of CITE-seq data. Bioinformatics 36, 4137–4143 (2020).

    Article  CAS  Google Scholar 

  36. Godfrey, D. I., MacDonald, H. R., Kronenberg, M., Smyth, M. J. & Van Kaer, L. NKT cells: what’s in a name? Nat. Rev. Immunol. 4, 231–237 (2004).

    Article  CAS  Google Scholar 

  37. Finak, G. et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278 (2015).

    Article  Google Scholar 

  38. Hao, Y. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587 (2021).

    Article  CAS  Google Scholar 

  39. Abdelaal, T. et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Bol. 20, 194 (2019).

    Article  Google Scholar 

  40. Granja, J. M. et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nat. Biotechnol. 37, 1458–1465 (2019).

    Article  CAS  Google Scholar 

  41. Wu, K. E., Yost, K. E., Chang, H. Y. & Zou, J. Babel enables cross-modality translation between multiomic profiles at single-cell resolution. Proc. Natl Acad. Sci. USA 118, e2023070118 (2021).

    Article  CAS  Google Scholar 

  42. Maecker, H. T., McCoy, J. P. & Nussenblatt, R. Standardizing immunophenotyping for the human immunology project. Nat. Rev. Immunol. 12, 191–200 (2012).

    Article  CAS  Google Scholar 

  43. Qiu, P. Embracing the dropouts in single-cell RNA-seq analysis. Nat. Commun. 11, 1169 (2020).

  44. Jiang, R., Sun, T., Song, D. & Li, J. J. Zeros in scRNA-seq data: good or bad? how to embrace or tackle zeros in scRNA-seq data analysis? Preprint at bioRxiv (2020).

Download references

Acknowledgements

We gratefully acknowledge the following funding sources: Research Training Program Tuition Fee Offset and Stipend Scholarship and Chen Family Research Scholarship to Y.L.; Australian Research Council Discovery Project grant (DP170100654) and AIR@innoHK program of the Innovation and Technology Commission of Hong Kong to J.Y.H.Y.; Australian Research Council DECRA Fellowship (DE180101252) to Y.X.R.W; NIH grants R01 HG010359 and P50 HG007735 to W.H.W.

Author information

Authors and Affiliations

Authors

Contributions

T.-Y.W., W.H.W. and Y.X.R.W. conceived and designed this project; Y.L., T.-Y.W. and S.W. performed data preprocessing, model development and evaluation of results; J.Y.H.Y., W.H.W. and Y.X.R.W. supervised the execution; Y.L., J.Y.H.Y., W.H.W. and Y.X.R.W. wrote the manuscript. All authors read and approved the manuscript.

Corresponding authors

Correspondence to Wing H. Wong or Y. X. Rachel Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks Jingshu Wang, Nancy Zhang and Qing Nie for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–25, Tables 1 and 2 and Note.

Reporting Summary.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, Y., Wu, TY., Wan, S. et al. scJoint integrates atlas-scale single-cell RNA-seq and ATAC-seq data with transfer learning. Nat Biotechnol 40, 703–710 (2022). https://doi.org/10.1038/s41587-021-01161-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41587-021-01161-6

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing