Everything about data including open source, healthcare data sets and more, in one location.
MESSIDOR: Methods to Evaluate Segmentation and Indexing Techniques in the field of Retinal Ophthalmology
1200 Retinal image dataset with annotation.
RIGA Dataset :Retinal fundus images for glaucoma analysis
A de-identified dataset of retinal fundus images for glaucoma analysis (RIGA) derived from three sources with 750 original images and 4500 manual marked images
High-Resolution Fundus (HRF) Image Database
The public database contains 15 images of healthy patients, 15 images of patients with diabetic retinopathy and 15 images of glaucomatous patients.
DR HAGIS: Diabetic Retinopathy, Hypertension, Age-related macular degeneration and Glacuoma ImageS
39 images for development of vessel extraction algorithms suitable for retinal screening programmes.
NSRR Datasets:National Sleep Research Resource
Polysomnography dataset from NSRR for sleep studies.Large collection of deidentified physiologic signals perfect for ML development.
NLST Datasets: National Cancer Institute
Datasets from National Cancer Institute of over 54000 patients. They include data on participant characteristics, screening exam results, diagnostic procedures, lung cancer, and mortality. Images from over 75,000 CT screening exams are available. Over 1,200 pathology images from a subset of NLST lung cancer patients (~500 of over 2,000 patients) may be viewed.
The HAM10000 dataset
A large collection of multi-source dermatoscopic images of common pigmented skin lesions containing 10000 images.
Associated publication link below:
The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions
Philipp Tschandl, Cliff Rosendahl & Harald Kittler
UCI Machine Learning Repository
This open source repository has more than 400 datasets including healthcare(100+) and non-healthcare ones in searchable and categorized format.
Centers for Medicare and Medicaid(CMS) datasets with ResDAC link.
CMS datasets provide US Medicare and Medicaid datasets.
ResDAC(The Research Data Assistance Center) provides free support to users of CMS datasets.
Center for disease control(CDC) Datasets
Center for disease control's datasets.Useful for incidence,prevalance of various disorders and mortality data from across the US.
Healthcare Cost and Utilization Project (HCUP) datasets
Agency for Healthcare Research and Quality's HCUP datasets used to identify, track, and analyze US national trends in health care utilization, access, charges, quality, and outcomes.
UK government's National Health services datasets.NHS choices datasets are useful for NLP and sentiment analysis both for GPs and hospitals.
OASIS Brain MRI dataset
Brain MRI datasets from Open Accesss series of Imaging Studies(OASIS).
A free and open platform for sharing MRI, MEG, EEG, iEEG, and ECoG data with over 200 datasets.
National Cancer Institute(NCI)-SEER datasets
Cancer epidemiology data available through NCI's Surveillance,Epidemiology and End Result Program(SEER)
BROAD Institute's Cancer program datasets
Cancer and genomics datasets.
The largest publicly available dataset of de-identified Chest x-rays. 370,000+ chest x-rays with 14 labels in the first version.
Version 2.0 with radiologist text reports and additional data available through link below.
A dataset of 14,000+ anonymized, radiologist labeled musculoskeletal X-rays from 12,000+ patients from Stanford ML group.
Read their article: https://arxiv.org/abs/1712.06957
1500+ knee MRI anonymized dataset from NYU.
NLTK : Natural language toolkit
One stop to learn Natural Language processing and more.
Bird, Steven, Edward Loper and Ewan Klein (2009), Natural Language Processing with Python. O’Reilly Media Inc.
Data science article collection
An excellent collection of articles on data science.
eICU Collaborative Dataset
eICU collaborative dataset of more than 130,000 patients across 300 hospitals.Deidentified dataset available for collaborative research includes vitals,clinical notes,APACHE score,diagnosis,treatment information and more.
Google Dataset Search
Google's powerful search engine to assist with dataset search.
NIH CXR14 dataset
Over 100,000 anonymized chest x-ray images and their corresponding data from more than 30,000 patients, including many with advanced lung disease.
NIH Deep Lesion
NIH release of a dataset containing 32,000 CT scan images with annotated lesions belonging to 4400 unique patients.
Blue Button 2.0
A CMS initiative to democratize research and development using beneficiary data.Greater than 70 million patient dataset available.Learn more through links below:
National Institute of Health
The link below is for NIH's strategic plan for data science in healthcare.A must read for anyone using data in healthcare for research and innovation.
NIH Clinical Center
Largest open source Chest X-Ray data set available through NIH's clinical center.See the link in the article to access the data.Also available through GITHUB and KAGGLE.
Thanks to Andrew L. Beam and many other contributors on the GITHUB page.Visit via link below or through the BRAINX COMMUNITY on LinkedIn.
Kaggle is a good source for de-identified datasets in healthcare.Visit the page using link below and explore.
Excellent data set for text based and waveform based projects which has been used in research worldwide.
A biomedical data search engine which searches for datasets across registries.
A place to store, share or find data.A platform for biomedical research.
Detailed data repositories for biomedical research.