Toggle Accessibility Tools

Victoria Cao

Due to the large amount of data involved in bioinformatics research, it has become increasingly necessary to find an effective way to move, organize, and analyze this data. Managing workflows, from sequencers to HPC centers to results, can be complex, and the large data set sizes are challenging to distribute and analyze. It is also important to maintain metadata information for reproducibility as well as data lifecycle management. To automate metadata collection, efficiently query data, and customize data storage hierarchies, we are experimenting with integrating OSIRIS's Amazon S3-compatible services with an iRODS (Integrated Rule-Orientated Data System) middleware server, and making these resources available to MSU's supercomputing center. By employing this multi-layered software architecture, a user can readily access all of the files they need from any server-side storage system and, based on file type, tag the files with metadata. This condensation of multiple services has the potential to make the data workflows much more space- and time-efficient, leading to a more productive outcomes in computational science.