A fault tolerant parallel computing architecture for remote sensing satellites

Lim, S 2009, A fault tolerant parallel computing architecture for remote sensing satellites, Doctor of Philosophy (PhD), Computer Science and Information Technology, RMIT University.


Document type: Thesis
Collection: Theses

Attached Files
Name Description MIMEType Size
Lim.pdf Thesis application/pdf 7.40MB
Title A fault tolerant parallel computing architecture for remote sensing satellites
Author(s) Lim, S
Year 2009
Abstract This thesis is concerned with the design concepts of a fault tolerant, high performance parallel computing payload for remote-sensing missions. Current small satellite missions generally do not have high computational power onboard due to limitations of power, space, volume or budget. This thesis researches on a cost-effective way of designing space computing architectures that enable reliability, despite the usage of Commercial-off-the-shelf (COTS) components.

The COTS-enabling technology from this work has achieved a high reliability figure for the PPU computing payload, designed using commercial grade processors, Field Programmable Gate Arrays (FPGAs), memory chips and serial flash chips. The optimal usage of resources in the PPU has made it a valuable high performance computing resource for small satellite missions. The PPU’s computational power will enable a new class of space applications for small satellite missions.

The computing payload proposed in this thesis is a parallel cluster of COTS processing nodes, interconnected using network elements that are based on COTS FPGAs. Part of this research work has been adopted for use in the Parallel Processing Unit (PPU) - a secondary payload onboard the XSat micro-satellite. The XSat is built by the Centre of Research for Satellite Technologies (CREST), and scheduled for launch in 2011. The satellite centre is located in Nanyang Technological University, Singapore. The author is a full-time project member in this centre, in charge of the PPU payload development.

The computing payload uses parallelism of COTS processor nodes to achieve high computing performance, and fault tolerant schemes to maintain reliability. This thesis focuses on the provision of highly fault tolerant and reconfigurable networks that enable reliable communication not only among parallel processors, but also with memory chips and external interfaces. Provision of multiple communication schemes that consist of an inter-cluster ring network and mesh processor array, both of which are fault tolerant and reconfigurable, have given the payload a high probability of survival in the harsh space radiation environment. This is coupled with autonomous processor fault detection and recovery schemes.

The PPU computing payload is also highly adaptive to changing reliability and computation needs, allowing a trade-off between the two at mission runtime. The PPU adopts industrial standards for part reliability computation and system reliability modelling. The PPU’s system reliability figure is a valuable check that the extent of fault tolerance is sufficient yet not over-catered. Over-catering of fault tolerant paths results in unnecessary wastage of valuable and expensive resources onboard the satellite.
Degree Doctor of Philosophy (PhD)
Institution RMIT University
School, Department or Centre Computer Science and Information Technology
Keyword(s) Fault tolerant network
parallel computing
satellite onboard processing
reconfigurable mesh array
mesh remapping heuristics
reconfigurable network
reliability modelling
COTS computing
autonomous fault detection and recovery
Versions
Version Filter Type
Access Statistics: 668 Abstract Views, 1558 File Downloads  -  Detailed Statistics
Created: Fri, 09 Sep 2011, 15:03:53 EST by Guy Aron
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us