Resources
Internal Datasets
As part of our commitment to fostering collaborative research, we have curated a rich collection of internal datasets. These datasets are generated and maintained by our own research community, ensuring their quality, reliability and relevance to various biomedical domains.
- Molecular Twin Data Commons
- Tempus Lens
- OncoBiobank
External Datasets
Recognizing the importance of interdisciplinary research and the need for comprehensive data coverage, we also provide access to a wide array of external datasets. These datasets are sourced from reputable institutions, research organizations and public databases, spanning diverse fields such as genomics, proteomics, medical imaging, clinical trials and more.
- All of us
- The All of Us Research Program is part of an effort by the National Institutes of Health to advance individualized healthcare by enrolling 1 million+ participants to contribute their health data over many years. If you need instructions on accessing the data, email: groupresearchdatascience@cshs.org
- TCGA
- Other highly accessed shared databases
Workstations
We currently have two high-powered Dell Precision workstations for computational analysis and software compilations, and one Lambda Vector workstation for machine-learning and deep-learning model training, and data mining and prediction. They have the following specs:
Dell Precision
- 24 core-dual CPU (a total 48 cores of up to 96 threads)
- Triple quad graphic card
- 74TB storage
- 1TB memory
- Running Linux (Ubuntu 20.04)
Lambda Vector
- AMD Threadripper Pro 3995WX: 64 cores
- 4x RTX A6000 GPU
- 100TB storage
- 1TB memory
- Running Linux (Ubuntu 20.04)
In addition to high-performing personal laptops with 1TB storage, 32GB memory and running Windows 11 Pro.
Storage and Backup
The core utilizes the high-performance workstations for storage and will store all your data and results for one month.
Cloud Computing
We currently have one Amazon AWS instance for web-based needs, and we are in the process of expanding our cloud computing resources.
Code and Data Share
Our code repository is version-controlled using Github, and all repositories are kept private until publication requirement (or other reasons), yet can be shared through collaborations. Results (report and figures) are typically sent via email by default—but we are happy to work with whatever file exchange you prefer. For larger file transfers, we will be using Box—but again, we’re able to work with whatever is most convenient for you.
Have Questions or Need Help?
Contact us if you have questions or would like to learn more about the Biomedical Data Science Shared Resource at Cedars-Sinai.
8687 Melrose Ave., Suite G-566
Los Angeles, CA 90069