IDEAS home Printed from https://ideas.repec.org/h/spr/sprchp/978-3-642-32454-3_5.html
   My bibliography  Save this book chapter

AggMon: Scalable Hierarchical Cluster Monitoring

In: Sustained Simulation Performance 2012

Author

Listed:
  • Erich Focht

    (NEC HPC Europe)

  • Andreas Jeutter

    (NEC HPC Europe)

Abstract

Monitoring and supervising a huge number of compute nodes within a typical HPC cluster is an expensive task. Expensive in the sense of occupying bandwidth, and CPU power that would be better spend for application needs. In this paper, we describe a monitoring framework that is used to supervise thousands of compute nodes in a HPC cluster computer in an efficient way. Within this framework the compute nodes are organized in groups. Groups contain other groups and form a tree-like hierarchical graph. Communication paths are strictly along the edges of the graph. To decouple the components in the network a publish/subscribe messaging system based on AMQP has been chosen. Monitoring data is stored within a distributed time-series database that is located on dedicated nodes in the tree. For database queries and other administrative tasks a synchronous RPC channel, that is completely independent of the hierarchy has been implemented. A browser-based front-end to present the data to the user is currently in development.

Suggested Citation

  • Erich Focht & Andreas Jeutter, 2013. "AggMon: Scalable Hierarchical Cluster Monitoring," Springer Books, in: Michael M. Resch & Xin Wang & Wolfgang Bez & Erich Focht & Hiroaki Kobayashi (ed.), Sustained Simulation Performance 2012, edition 127, pages 51-64, Springer.
  • Handle: RePEc:spr:sprchp:978-3-642-32454-3_5
    DOI: 10.1007/978-3-642-32454-3_5
    as

    Download full text from publisher

    To our knowledge, this item is not available for download. To find whether it is available, there are three options:
    1. Check below whether another version of this item is available online.
    2. Check on the provider's web page whether it is in fact available.
    3. Perform a
    for a similarly titled item that would be available.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:sprchp:978-3-642-32454-3_5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.