Scalable Distributed Infrastructure for Data Intensive
Science
Author/Presenter
Event Type
Workshop
Applications
Government Strategies, Programs, and Funding
HPC Center Planning and Operations
TimeMonday, November 13th9:45am -
10am
Location708
DescriptionThe rise of big data science has created new demands
for modern computer systems. While floating performance
has driven computer architecture and system design for
the past few decades, there is renewed interest in the
speed at which data can be ingested and processed. Early
exemplars such as Gordon, the NSF funded system at the
San Diego Supercomputing Centre, shifted the focus from
pure floating-point performance to memory and IO
rates.
At the University of Queensland we have continued this trend with the design of FlashLite, a parallel cluster equipped with large amounts of main memory, flash disk, and a distributed shared memory system (ScaleMP’s vSMP). This allows applications to place data “close” to the processor, enhancing processing speeds. Further, we have built a geographically distributed multi-tier hierarchical data fabric called MeDiCI, which provides an abstraction of very large data stores across the metropolitan area. MeDiCI leverages industry solutions such as IBM’s Spectrum Scale and SGI’s DMF platforms.
Caching underpins both FlashLite and MeDiCI. In this I will describe the design decisions and illustrate some early application studies that benefit from the approach. I will also highlight some of the challenges that need to be solved for this approach to become mainstream.
At the University of Queensland we have continued this trend with the design of FlashLite, a parallel cluster equipped with large amounts of main memory, flash disk, and a distributed shared memory system (ScaleMP’s vSMP). This allows applications to place data “close” to the processor, enhancing processing speeds. Further, we have built a geographically distributed multi-tier hierarchical data fabric called MeDiCI, which provides an abstraction of very large data stores across the metropolitan area. MeDiCI leverages industry solutions such as IBM’s Spectrum Scale and SGI’s DMF platforms.
Caching underpins both FlashLite and MeDiCI. In this I will describe the design decisions and illustrate some early application studies that benefit from the approach. I will also highlight some of the challenges that need to be solved for this approach to become mainstream.
Author/Presenter




