Silicon detector monitoring is a multi-host system. It's
purpose is to collect data describing current status of
hardware generate and send alarm messages to Significant
Event System.
Monitoring system consists of the following applications:
What to do when
If you need to stop monitoring:
login to 'd0ol28' as 'd0run' cd '/home/d0run/onlineMonitoring' source 'STOP_MONITORING.tcsh' check if there are not processes: 'ps -aux | grep SMTDataStore' if you see any 'SMTDataStore' kill them: kill -9 'process_number'
If you need to start monitoring:
login to 'd0ol28' as 'd0run', cd '/home/d0run/onlineMonitoring' setup d0online check if there are not processes: 'ps -aux | grep SMTDataStore' if you see any 'SMTDataStore' kill them: kill -9 'process_number' restart monitoring: source START_MONITORING.tcsh check if you can access 'http://www-d0ol.fnal.gov/smtMonitoring/'
If a new message has been sent to Significant Event
System:
confirm that examine reports the same problem get more information about problematic HDI/CHIP: using GUI on the left hand side interesting part of the ditector(crate/VBR/HDI) select intersting part of the detector (crate/VRB/HDI) using GUI on the right hand side of the 'http://www-d0ol.fnal.gov/smtMonitoring/' select information you need (singnal/hit number/ occupancy) look for problems: no events, signal 'zero', high occupancy
If you think that monitoring does not send information about faulty system to Significant Event System:
on the page http://www-d0ol.fnal.gov/smtMonitoring/ check 'IOC DATA UPDATE TIME' and 'SES UPDATE TIME' . Both times should be close to current time. The first one informs when that last data transfer has been done from IOC whereas the second one displays date when the last message has been sent to Significant Event System. if data is not collected from an IOC and you are not an expert: PAUSE the run, reboot PowerPC, start monitoring for the IOC from SMT GUI If data is not collected from an IOC and you are an EXPERT: login to the that IOC, look for suspended processes (VxWork command 'i'), look into log file /home/d0run/onlineMonitoring/IOC_d0olsmtXX_ConnectionLog.txt
- If messages are not sent to Significant Event System and you are not an expert check if SES runs whith no problems, eg. if other application can send data to SES. If only monitoring cannot talk to SES restart monitoring.
- If messages are not sent to Significant Event System and you are an EXPERT check log files /home/d0run/onlineMonitoring/SESLog.txt and /home/d0run/onlineMonitoring/IOC_d0olsmtXX_ConnectionLog.txt. Information is sent to SES only if there is new data comming from IOC. If IOC connections are broken SES messages are suspended.
What experts should know
Linux box application consists of the following threades:
The 'OCIConnection' thread waits for incomming connection
requests from frondent processors.
As soon as these connection requests appear they are
accepted and every 1 minute (currently set) data request is sent to frondent
processors by OCIConnection thread. Received data is stored in previously
created data stucture. Connections are socket based.
The 'shmServer' thread serves data to Java servlet requests.
Connection is socked based. A new socket
connection is opened for a new data request. Connection
is closed as soon as requested data is served.
There are two files written to a disk for each data request.
An html file and php file. An html file
contains information about how many separated images
are in the php file. php file contains real
data and on a base of that file histograms are created
on fly in the WWW browser.
After a data request has been succesfully completed names
of existing html files including a new created
one are sent to java servlet client. Java servlet displayes
them in the browser.
The two threads 'shmServer' and 'OCIConnection' share
the same data structures. They are synchronized
using set of semaphores. There is one semaphore created
to synchronize access to each SMT crate data.
The 'SESConnection' thread checks hardware data every
10,000 events (currently set) and generates
alarms if occupancy acceeds 25% (currently set) or if
an HDI or a chip are dead (signal from all the
strips is zero). Messages are generated based on
the content of the same data structure that is used by
'OCIConnection' thread as well. There is an semaphore
used to ensure that the data structure is not overritten while check is
being done.
Java servlets (www-d0ol/smtMonitoring) are served by WWW server and are used to display the data histograms (signal, hit count, occupancy).
Using the GUI, shown on the left hand side of the WWW
page, you can select interesting parts of the detecor. Information
can be obtained only for those HDIs that are NOT marked as 'disabled'.
Below the tree like menu of existing SMT hardware
there is a list of 'IOC DATA UPDATE TIME' which tells you when the
last data was collected from a particular IOC. Based on that time
one can easily
figure out if data base containes updated data for a
current run.
There is 'SES UPDATETIME' list for all IOCs below
already mentioned'IOC DATA UPDATE TIME'
list. Dates and times in that list inform when the last
hardware check was done and when the last
message was sent to Significan Event System.
The GUI on right hand side of WWW page allows to obtain
information about SMT hardware in text and
graphical formats.
Meaning of selection criteria for text like output:
To display data in a form of histograms one needs to select/fill
table below 'SMT MONITORING GRAPHICS'.
Meaning of selection criteria for graphics type output.
Frondent processor server: is
a part of SMT frondent processor code. As soon as 'SMT Monitoring'
is turned ON in CREATOR window SDAQ Supervisor passes
all the run commands from COOR to frondent processors. At the begin of
each run commands INIT and START are sent to SMT frondent
processor. When INIT command comes initialization procedure
is called. When START commands commes data collection starts.
As soon as initialization is completed monitoring server
tries to connect to Linux box application
(IOCConnection thread) . At that point IOCConnection
process takes over and asks server for data
every 1min (currently set). Connection is socket
based.
As soon as run is stopped command STOP is passed via
SDAQ Supervisor to frondent and monitoring
server breakes connection.
Hardware data for monitoring purposes can be collected from:
Current setup.
The whole software is stored in cvs repository in the
following directories:
SMT online monitoring application runs on 'd0ol28.fnal.gov'
D0 online host from 'd0run' account.
Application is located in directory '~d0run/onlineMonitoring'.
Since graphs are created using PHP software PHP library
needs to be supported by WWW server.
Current version installed is 'php-4.1.2'.
For graphs creation an additional packed 'jpgraph-1.6.1'
(written is PHP) was used. It is installed on a disc
accessible from online cluster in directory `/d0usr/products/jpgraph'.
There are links created in the application to the above
mentioned location.
SMT online monitoring application should be started/ stopped
using official d0 scrit start_daq/stop_daq
Usage of that script ensures that only one instance of
application runs at a time.
There are several log files created in dir `~d0run/onlineMonitoring/'
All 'html' and 'php' files that contain created graphs are in directory '/projects/smtMonitoring/jpgraph_cache'.
Files older then 1 month are purged from both directories
'/projects/smtMonitoring/log/' and '/projects/smtMonitoring/jpgraph_cache'
by an automatic cron job.
Compilation and Linking.
In order to compile C++ code ('onl_smtcalib/ssdaq/smtStore/monitoring')
one needs to 'setup D0RunII p13.10.00' or later if backward compatible.
In order to compile Java servlets ('onl_smtcalib/ssdaq/smtStore/servlet')
one needs to :
'setup tomcat' and 'setup java v1_3_1_02'. Compilation
is done automatically by 'makefile'.
Copiled files 'class' are copied to `www/WEB-INF/classes/'
and library 'jar' is copied to 'www/WEB-INF/lib/'. After successful compilation
one needs to copy 'class' and 'jar' files to
a well known location of 'tomcat' server. Currently that
are:
'/projects/elog/jakarta-tomcat-3.2.1/webapps/smtMonitoring/WEB-INF/classes/'
and
'/projects/elog/jakarta-tomcat-3.2.1/webapps/smtMonitoring/WEB-INF/lib/'
In order to compile frondent part change directory to
'onl_smtcalib/ssdaq/smtStore/frondent'
and use existing makefile that will make the whole job
for you.