A website used by more than 20,000 scientists to analyze their genomic data is getting an upgrade, thanks to a four-year, nearly $870,000 National Institutes of Health grant.
Professor Xijin Ge of SDSU’s Department of Mathematics and Statistics developed the integrated differential expression and pathway tool, known as iDEP, to help researchers decipher their RNA sequence data. The free website, http://bioinformatics.sdstate.edu/idep, has been available since 2017.
“I specialize in bioinformatics, a relatively new field focused on the analysis of genomics data,” explained Ge. “Biologists today are flooded with data. Lack of access to bioinformaticians is a critical barrier for many researchers, especially those in institutions with limited resources. Our goal is to empower biologists to analyze their own data, interactively and reproducibly.”
The iDEP site allows researchers to visualize changes in gene expression, meaning the RNAs that code for proteins, based on various experimental conditions. These comparisons can tell a scientist, for instance, how plant tissues respond to drought.
How iDEP began
Initially, Ge developed iDEP for SDSU researchers who wanted to analyze RNA sequence data from soybeans. To fulfill their request, he wrote code to do statistical analyses and produce visual interpretations of the gene expression data.
A few months later, he got a similar request from SDSU researchers working with mice. “I tweaked the code a little bit and gave the results to the second group of researchers,” he said. “I realized that the same code base could benefit many others. That was the starting point for this project.”
Ge used Shiny, a software package in the statisticians’ R language, to build the interactive webpages that deliver the data analytics. He then worked with the Office of Research Computing to transfer the code and host iDEP on the university server, using the high-performance computing facilities.
“We made it powerful yet easy to use,” Ge said. “The site offers, not just analytics, but also pre-compiled annotation data, which is essential for the interpretation of genomic data.” Through the web interface, researchers can access annotation data for more than 2,500 species of organisms.
“Scientists can get more results per click than at any other website,” Ge said, noting researchers can generate 30 to 40 different graphs to visualize gene interactions, be they from bacteria, plants, animals, or human tissues.
The SDSU team will make substantial improvements to the website through the NIH R01 grant, which allows the researchers to apply for a continuation award during the project’s final year. “We will convert a prototype into a mature bioinformatics tool,” Ge said.
He will hire a software developer to help rewrite the code and at least two graduate students will work on the project. Furthermore, Ge said, “In addition to user suggestions, we have many ideas, including local installation and automatic reports.”
Ge and his team will also increase the number of RNA sequence databases to which researchers can compare their results. In addition, a postdoctoral research associate will maintain the databases and provide customer outreach and support.
“It will be like running a small business, to some extent—and we want to use the website ourselves to do some research too,” Ge concluded.