Marinka Zitnik

Fusing bits and DNA

  • Increase font size
  • Default font size
  • Decrease font size

BioSNAP Datasets: Stanford Biomedical Network Dataset Collection

We are announcing a repository of biomedical network datasets, BioSNAP Datasets: Stanford Biomedical Network Dataset Collection!

BioSNAP aims to bring biological and medical datasets closer to computer scientists who develop new exciting algorithms. It is often very difficult for computer scientists who typically do not have any background in bioinformatics or biostatistics to obtain and construct high-quality biomedical datasets. Because of that, biomedical datasets are rarely used in ML algorithm development and benchmarking, even though biomedicine is one of the most exciting domains for ML with a unique set of challenges, hard important problems, and huge potential impact. BioSNAP aims to close this gap by providing a number of ready-to-use network datasets.

BioSNAP contains many large biomedical networks that are ready-to-use for method development, algorithm evaluation, benchmarking, and network science analyses. In this first release, BioSNAP has a few tens of network datasets that describe a dozen different entity types (e.g., genes, proteins, cells, drugs, diseases, side-effects, tissues). These datasets can be used for standard prediction tasks (node clustering, link prediction, node classification) as well as relatively new tasks (graph-level classification, multi-relational link prediction, higher-order association prediction). Many datasets contain weighted networks and can be used to define multi-layer/heterogeneous graphs with attributes.

I look forward to seeing more biomedical network data considered in machine learning and data science research.