Halting Data Overload: Scalable Storage for Research Data Management | Information Technology | University of Pittsburgh

Halting Data Overload: Scalable Storage for Research Data Management

Pediatrics server

Vincent Concel has one foot in the technical world and the other in the world of clinical research. As the director of research technology services for the School of Medicine’s Department of Pediatrics, he supports the cutting-edge technology that is the backbone of modern medical research. As his department generated ever larger data files, he found himself managing it on a hodge-podge of storage devices. Finding the right storage solution was critical for supporting their groundbreaking research.

Data Creep Becomes a Data Flood

As the modern workplace becomes more reliant on technology, data creep is inevitable. For business units, spreadsheets, documents, presentations, graphics, emails, and webpages seem to spontaneously multiply.

But big data from research is a whole different universe. It’s not data creep—it’s a data flood. As the technology that supports research advances, the amount of data produced grows exponentially. Concel is very familiar with this phenomenon. “New equipment allows us to do amazing work, but it also generates data at a rate that is incredibly challenging to accommodate,” he says. New gene sequencers can produce 500 GB of data in a single two-hour run, while a live-cell analysis system can exceed its 20 TB of on-board storage in just a couple weeks.

As Pediatrics systems consolidate, ingesting data from many existing sources as they go, he expects the data load to explode from its current 100 TB to a petabyte (PB) in just the next six months. He anticipates 10-15% growth per year after that.

The Limits of a Swiss Army Knife Approach

Initially, Pediatrics responded to data growth by purchasing more storage devices: file servers, external drives, network attached storage, cloud storage. Eventually, Concel found himself managing nine on-premises devices, with some faculty procuring their own devices or cloud services. “I had no clean method for securing data or tracking what was filling up and what was reaching end of life. There was no centralized control and security. I was babysitting this discombobulated, patchwork system,” Concel recalls.

Not only was the situation inefficient and labor-intensive, it was also becoming extremely expensive. He estimated a $1.6 million price tag for a 2 PB data consolidation and backup solution, with an anticipated lifespan of 5 years, which didn’t include the cost of hosting at Pitt’s Network Operations Center (NOC) or server maintenance. So Concel reached out to Pitt IT’s John Bell to talk about better options. Bell had been evaluating Isilon from Dell EMC and thought it might be an ideal solution.

Isilon: The Right Tool for the Job

Isilon is a scalable enterprise storage platform from Dell EMC for high-volume storage, backup, and archiving of data. Isilon forms an off-site cluster of storage servers that can accommodate both active data sets and long-term storage. Pricing is based on the actual amount of data stored at any given time and includes NOC hosting and all maintenance services, resulting in a predictable monthly cost.

Concel was immediately struck by Isilon’s affordability. “When I saw the price, I thought, ‘Why didn’t we jump on this the minute it came on board?’ It provides pay-as-you-go bulk pricing for storage, hosting, and maintenance, which makes it much cheaper in total.” Rather than paying $1.6 million up front for anticipated capacity, Pediatrics was able to get started with just a $32,000 investment.

Beyond the cost and unlimited capacity, Isilon also provides the features that Concel requires. It enables centralized control of users and access, without having to deal with server maintenance tasks. Given the critical nature of the data, Isilon’s redundancy, self-service restores, and other security fail-safe features are a must. And of course, the reliable, multi-gig connection ensures excellent performance for large-scale data access.

Feeling the Impact

Once Concel decided to move forward with Isilon, it only took a couple weeks to fully migrate. After performing the final sync on a Friday, he came in the following Monday and nine servers had become one. The simplified solution has consolidated and streamlined storage, improved performance, and drastically reduce overhead.  As for all those old devices, he repurposed the higher capacity ones, but most were at (or past) their end of life and have been retired.

Having a single, centrally-managed storage solution has relieved Concel’s team from the time they spent patching servers, upgrading hardware, applying patches, managing firewalls, and more. “My staff and I now spend most of our time helping people do their research, instead of dealing with tech glitches. I’m so glad I don’t have to deal with that anymore.”

Because Isilon is available to anyone at Pitt, Concel has found that collaborating with other departments is easier than ever. Rather than sending data via the internet, he can just add a researcher to a group and they can connect to the mapped drives. He is now talking with IT admins from other health sciences departments about how they might benefit from this technology.

Concel encourages IT managers to work with Pitt IT to find solutions to their tech challenges. “It is through collaboration with Pitt IT that we can find these solutions. Don’t be afraid to just ask. The solution can benefit you, and then roll out to help the rest of the University.”

-- By Karen Beaudway, Pitt IT Blogger