New Proteogenomics Tool for Cancer Research

graphic showing workflows for proteogenomics Galaxy tool

Proteogenomics is a new tool for cancer research. It incorporates genomic-, transcriptomic-, and mass spectrometry (MS)-based proteomics data to verify the expression of protein sequence variants that may be associated with cancer. Proteogenomics is computationally intensive and needs integration of software from different -omics fields. It also needs hardware infrastructure that can accommodate the large memory and computational needs of enormous datasets.

A group of MSI researchers and staff have developed a new resource, based on the Galaxy bioinformatics platform, that addresses the needs of proteogenomics research. Galaxy is an open-source, web-accessible, integrated informatics solution for data-intensive research in biological and medical science. In a project called Galaxy for Proteomics (Galaxy-P) (, a University of Minnesota team extended the set of Galaxy tools and datatypes to perform proteomics analysis of data generated by mass spectrometry. The new tool for proteogenomics is a further extension of Galaxy-P. It provides workflows and training for critical aspects of proteogenomics, customized protein sequence database generation and sequence database searching and variant confirmation. Training is available on the public Galaxy instance.

A paper about the new proteogenomics tool was published in 2017 in the journal Cancer Research: MC Chambers*, PD Jagtap, JE Johnson**, T McGowan**, P Kumar, G Onsongo, CR Guerrero*, H Barsnes, M Vaudel, L Martens, B Gruning, IR Cooke, M Heydarian, KL Reddy, and TJ Griffin. 2017. An Accessible Proteogenomics Informatics Resource for Cancer Researchers. Cancer Research 77(21). DOI: 10.1158/0008-5472.CAN-17-0331.

MSI PIs are in bold above:

* MSI users in the Griffin group

  • Matt Chambers
  • Praveen Kumar
  • Candace Guerrero
  • Bjoern Gruening
  • Ira Cooke
  • Mohammad Heydarian

** MSI Staff, RIS group

An article about Galaxy-P appeared on the MSI website in November 2015: Expanding the Galaxy Platform.

Image description: Overview of the proteogenomic informatics resource. The main steps are shown comprising the two core workflows making up this resource: (i) customized protein sequence database generation from RNA-Seq data; and (ii) sequence database searching using MS/MS data and the customized protein database, followed by variant peptide confirmation. Image and description, MC Chambers et al., Cancer Res 77(21). DOI: 10.1158/0008-5472.CAN-17-0331. Image and description © 2017 American Association for Cancer Research.


posted on January 31, 2018

See all Research Spotlights.