Chiminey: connecting scientists to HPC, cloud and big data

Yusuf, I, Thomas, I, Spichkova, M and Schmidt, H 2017, 'Chiminey: connecting scientists to HPC, cloud and big data', Big Data Research, vol. 8, pp. 39-49.

Document type: Journal Article
Collection: Journal Articles

Title Chiminey: connecting scientists to HPC, cloud and big data
Author(s) Yusuf, I
Thomas, I
Spichkova, M
Schmidt, H
Year 2017
Journal name Big Data Research
Volume number 8
Start page 39
End page 49
Total pages 11
Publisher Elsevier
Abstract The enabling of scientific experiments increasingly includes data, software, computational and simulation elements, often embarrassingly parallel, long running and data-intensive. Frequently, such experiments are run in a cloud environment or on high-end clusters and supercomputers. Many disciplines in sciences and engineering (and outside computer science) find the requisite computational skills attractive on the one hand but distracting from their science domain on the other. We developed Chiminey under directions by quantum physicists and molecular biologists, to ease the steep learning curve in data management and software platforms, required for the complex computational target systems. Chiminey is a smart connector mediating running specialist algorithms developed for workstations with moderately large data set and relatively small computational grunt. This connector allows the domain scientists to choose the target platform and then manages it automatically; it accepts all the necessary parameters to run many instances of their program regardless of whether this runs on a peak supercomputer, a commercial cloud like Amazon EC2 or (in Australia) the national federated university cloud system NeCTAR. Chiminey negotiates with target system schedulers, dashboards and data bases and provides an easy-to-use dashboard interface to the running jobs, regardless of the specific target platform. The smart connector encapsulates and virtualises a number of further aspects that the domain scientists directing our effort found necessary or desirable. In this article we present Chiminey and guide the reader through a hands-on tutorial of this open-source platform. The only requirement is that the reader has access to one of the supported clouds or cluster platforms – and very likely there is a matching one. The tutorial stages range in difficulty from requiring no to little technical background through to advanced sections, such as programming your own domain-specific extension on top of Chiminey application programmer interfaces. The different exercises we demonstrate include: installing the Docker deployment environment and Chiminey system; registering resources for file stores, Hadoop MapReduce and cloud virtual machines; activating hrmclite and wordcount smart connectors – two demonstrators; running a smart connector and investigating the resulting output files; and building a new smart connector. We also discuss briefly where to find more detailed information on, and what is involved in, contributing to the Chiminey open source code base.
Subject Software Engineering
Keyword(s) Big data
High performance computing
Parallel processing
DOI - identifier 10.1016/j.bdr.2017.01.004
Copyright notice © 2017 Published by Elsevier Inc.
ISSN 2214-5796
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 4 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 0 times in Scopus Article
Altmetric details:
Access Statistics: 124 Abstract Views  -  Detailed Statistics
Created: Wed, 07 Jun 2017, 08:28:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us