Atlas T3

SUPERCOMPUTER MONITORING PORTALS

LIEBERT-CRV WEB PORTAL

 

A web portal has been setup to monitor the status of the Liebert CRV that is cooling the Supercomputer. The Liebert CRV is a 42U self contained, multi-option, precision data center cooling system that offers temperature and humidity control. It integrates within a row of 42U cluster racks, providing cooling close to the server heat source for efficient and effective data center management. The system features Liebert iCOM controls and a digital scroll compressor, resulting in high reliability and optimized system performance. The horizontal airflow is suitable for raised or non-raised floors. Liebert CRV uses environmentally friendly R410A refrigerant, and is available in different versions, depending on data center design requirements. Liebert CRV pulls air from the hot aisle, filters and conditions it, and then delivers cool air to the front of equipment racks. Each cluster rack has two temperature sensors mounted on the door panel to monitor the rack temperature. The Digital Scroll Compressor uses the latest control technology to deliver precise operation and significantly higher energy efficiency than other compressor technologies. In addition to the advantage of the dependable scroll design, Digital Scroll technology provides infinitely variable capacity modulation to enable the output to precisely match the changing cooling demands of the room.

 

Features

 

● 20 or 35kW air-cooled and water-glycol-cooled systems

● 40kW chilled water cooled systems

● Precision cooling and humidity control

● Installs within the row of racks

● Features variable speed EC plug fans and variable capacity digital scroll compressor

 ● Operates efficiently with Liebert iCOM controls

 

To learn more about Liebert, click here.

PCM WEB PORTAL

 

For the BU Supercomputer, we have implemented have PCM (Platform Cluster Manager) to monitor and manage the cluster on a daily basis.  Using open source components, PCM includes all the tools required to deploy, run, and manage clusters. Using PCM, we can perform operations that are simply not possible with a less advanced cluster management solution.

 

For details on the latest documentation updates, visit: http://my.platform.com/products/platform-cm

For a complete list of the capabilities of Platform Cluster Manager, visit: http://www.platform.com

For information about PCM on the our PCM Web portal, click here.

CACTI WEB PORTAL

We have implement the Cacti software to monitor and graph the various parameters. Cacti is a web-based graphing tool designed as a frontend to RRDtool's data storage and graphing functionality. Cacti allows a user to poll services at predetermined intervals and graph the resulting data. It is generally used to graph time-series data of metrics such as CPU load and network bandwidth utilization and  monitor the network traffic by polling the supercomputer's network switch interface via SNMP(Simple Network Management Protocol). In summary, Cacti is a complete network graphing solution designed to harness the power of RRDTool's data storage and graphing functionality. Cacti provides a fast poller, advanced graph templating, multiple data acquisition methods, and user management features out of the box.
 

For more information on Cacti click: http://www.cacti.net.

NAGIOS WEB PORTAL

We have also implemented the Nagios (a system and network monitoring application) web portal. Nagios watches hosts and services that we specify, and alerts us when things go bad and when they get better. Nagios was originally designed to run under Linux.

For more information about Nagios click here.

NETVIBES DASHBOARD

 

We have setup a OSG (Open Science Grid) dashboard portal for the ATLAS-Tier3 Supercomputer using Netvibes. This dashboard automatically updates every time it is opened. The Bellarmine University ATLAS-Tier3 Supercomputer OSG Dashboard Portal contains the following information from OSG - (i) Status Map, (ii) Current RSV Status, (iii) BDII (Berkeley Database Information Index) Information which consists of six parameters (Running Jobs, , (iv)Resource Group Summary (Services, RSV Status, FQDN, Supported VOs, VO Ownership, and WLCG Information), and (v) Availability Metrics (Service Availability and Reliability).

 

The OSG Resource and Service Validation (RSV) software provides a scalable and easy to maintain resource/service monitoring infrastructure for an OSG site admin. RSV Client allows OSG site administrators to run tests against their CEs/SEs. This provides a set of metrics to test the resource, Condor-Cron for scheduling, and a Gratia infrastructure for collecting and storing the results. The RSV Client runs metrics at scheduled time intervals and produces a webpage of local RSV results for a site administrator's viewing.

 

The Berkeley Database Information Index (BDII) consists of a standard LDAP (Lightweight Directory Access Protocol) database which is updated by an external process. The update process obtains LDIF (LDAP Data Interchange Format) from a number of sources and merges them. It then compares this to the contents of the database and creates an LDIF file of the differences. This is then used to update the database.

 

To view the Status of Bellarmine University's ATLAS Tier3 Supercomputing Site on OSG, click the OSG Status Map.

To view the Status of Bellarmine University's ATLAS Tier3 Supercomputing Site on WLCG, click the WLCG Status Map.

To view Bellarmine University's ATLAS Tier3 Supercomputing Site on Google Earth, click here.

To view Bellarmine University's ATLAS Tier3 Supercomputing Site info on WLCG GStat-2.0 Portal, click here.

To view Bellarmine University's ATLAS Tier3 Supercomputing Site info on  EGI (European Grid Initiative) Accounting Portal, click here.

 

To view the OSG RSV Status-Main page of Bellarmine University's ATLAS Tier3 Supercomputing Site, click here.

To view OIM/MyOSG Gratia Information about Bellarmine University's ATLAS Tier3 Supercomputing Site, click here.

To view OSG CMon-BDII Server Status Information about Bellarmine University's ATLAS Tier3 Supercomputing Site, click here.

To view the OSG-ReSS Central Services History about Bellarmine University's ATLAS Tier3 Supercomputing Site, click here.

To view OSG BDII Information Browser for Bellarmine University's ATLAS Tier3 Supercomputing Site, click here.

 

To view the Availability History from the MyOSG Web portal, click here.

To view the Cumulative CPU Hours (Gratia Accounting) clocked by the ATLAS Tier3 Supercomputer since 6/1/2011, click here.

To view the Number of Jobs Running per Day (Gratia Accounting) by the ATLAS Tier3 Supercomputer since 6/1/2011, click here.

To view ATLAS VO (Virtual Organization) OSG Webpage, click here.

 

More information about OSG can be found here.

More information about WLCG can be found here.

More information about WLCG Gstat-2.0 can be found here.

More information about WLCG Monitoring can be found here.

More information about EGI can be found here.

More information about EGEE (Enabling Grids for E-SciencE) can be found here.

MONALISA WEB PORTAL: (OPEN SCIENCE GRID MONITORING WITH MONALISA)

 

MonALISA, which stands for Monitoring Agents using a Large Integrated Services Architecture, has been developed over the last four years by Caltech and its partners with the support of the U.S. CMS software and computing program. The framework is based on Dynamic Distributed Service Architecture and is able to provide complete monitoring, control and global optimization services for complex systems. The MonALISA system is designed as an ensemble of autonomous multi-threaded, self-describing agent-based subsystems which are registered as dynamic services, and are able to collaborate and cooperate in performing a wide range of information gathering and processing tasks. These agents can analyze and process the information, in a distributed way, to provide optimization decisions in large scale distributed applications. An agent-based architecture provides the ability to invest the system with increasing degrees of intelligence, to reduce complexity and make global systems manageable in real time. The scalability of the system derives from the use of multithreaded execution engine to host a variety of loosely coupled self-describing dynamic services or agents and the ability of each service to register itself and then to be discovered and used by any other services, or clients that require such information. The system is designed to easily integrate existing monitoring tools and procedures and to provide this information in a dynamic, customized, self describing way to any other services or clients.

The MonALISA framework is a fully distributed service system with no single point of failure and it provides:


(i) Distributed Registration and Discovery for Services and Applications; (ii) Monitoring all aspects of complex systems - (System information for computer nodes and clusters, Network information (traffic, flows, connectivity, topology) for WAN and LAN,  Monitoring the performance of Applications, Jobs or services, End User Systems, and End To End performance measurements); (iii) Can interact with any other services to provide in near real-time customized information based on monitoring information; (iv) Secure, remote administration for services and applications; (v) Agents to supervise applications, to restart or reconfigure them, and to notify other services when certain conditions are detected; (vi) The Agent system can be used to develop higher level decision services, implemented as a distributed network of communicating agents, to perform global optimization tasks, (vii) Graphical User Interfaces to visualize complex information; and (viii) Global monitoring repositories for distributed Virtual Organizations.

 

To view Total Jobs Running on each OSG Site, click the here.

To view Computing Farm Statistics for each OSG Site, click the here.

To view the Status of OSG Jobs Running on Bellarmine University's ATLAS Tier3 Supercomputer on MONALISIA, click the here.

To view Bellarmine University's ATLAS Tier3 Supercomputer on the OSG Service Status Portal on MONALISIA, click the here.

To view Bellarmine University's ATLAS Tier3 Supercomputer Node Utilization on MONALISIA, click the here.

For more information about MONALISA, click here.