Biomedical Data Science Shared Resource Biomedical Data Science Shared Resource Toggle mobile sub-nav

Resources

Internal Datasets

As part of our commitment to fostering collaborative research, we have curated a rich collection of internal datasets. These datasets are generated and maintained by our own research community, ensuring their quality, reliability and relevance to various biomedical domains.

Molecular Twin Data Commons
Tempus Lens
OncoBiobank

External Datasets

Recognizing the importance of interdisciplinary research and the need for comprehensive data coverage, we also provide access to a wide array of external datasets. These datasets are sourced from reputable institutions, research organizations and public databases, spanning diverse fields such as genomics, proteomics, medical imaging, clinical trials and more.

All of us
- The All of Us Research Program is part of an effort by the National Institutes of Health to advance individualized healthcare by enrolling 1 million+ participants to contribute their health data over many years. If you need instructions on accessing the data, email: groupresearchdatascience@cshs.org
TCGA
Other highly accessed shared databases

Workstations

We currently have two high-powered Dell Precision workstations for computational analysis and software compilations, and one Lambda Vector workstation for machine-learning and deep-learning model training, and data mining and prediction. They have the following specs:

Dell Precision

24 core-dual CPU (a total 48 cores of up to 96 threads)
Triple quad graphic card
74TB storage
1TB memory
Running Linux (Ubuntu 20.04)

Lambda Vector

AMD Threadripper Pro 3995WX: 64 cores
4x RTX A6000 GPU
100TB storage
1TB memory
Running Linux (Ubuntu 20.04)

In addition to high-performing personal laptops with 1TB storage, 32GB memory and running Windows 11 Pro.

Storage and Backup

The core utilizes the high-performance workstations for storage and will store all your data and results for one month.

Cloud Computing

We currently have one Amazon AWS instance for web-based needs, and we are in the process of expanding our cloud computing resources.

Code and Data Share

Our code repository is version-controlled using Github, and all repositories are kept private until publication requirement (or other reasons), yet can be shared through collaborations. Results (report and figures) are typically sent via email by default—but we are happy to work with whatever file exchange you prefer. For larger file transfers, we will be using Box—but again, we’re able to work with whatever is most convenient for you.

Have Questions or Need Help?

Contact us if you have questions or would like to learn more about the Biomedical Data Science Shared Resource at Cedars-Sinai.

8687 Melrose Ave., Suite G-566
Los Angeles, CA 90069

SEND A MESSAGE