E-mail exchange about datalogger monitoring: ------------------------------------------------ From Heidi: Stu, can the collector/router/datalogger tell us 1) number of events written to disk per stream for a run (Use this to cross check with event catalog) 2) number of events sent to the data logger by each L3 node for a run (by stream or trigger if possible) (Use this to cross check with internal L3 stats and datalogger output count. It would be good if we could put these numbers into the runs db. Heidi -------------------------------------------------------- From Stu: Hi Heidi, Jerry Guglielmo is definitely the expert here, so I'll cc: him to get the real answer... The Datalogger should certainly know the number of events written to disk per stream. I believe this information is already in some catalog. The by the time the events arrive at the Datalogger, they've already by "collected" from the various L3 nodes by the Collector. I don't believe there's any header info identifying the source L3 node, so the Datalogger won't have that info (the Collector and Datalogger work only with header info - they don't/can't dig into the event structure to get anything else). The Collector would know the source L3 node, but it knows very little about runs. I'll leave it to Jerry to propose a solution for this request. Stu ------------------------------------------------------- From Jerry: Hi, The datalogger does not know about the L3 nodes so it can't help there. The event catalogs stored in the database come from the datalogger and thus a sum of all partitions for a given stream is the number of events written to disk. Question 1) The datalogger cannot really be used to cross-check the event catalogs because the datalogger is the source of the event catalogs. Thus there is no independent source to do a cross check with. Question 2) Since the datalogger is not aware of L3, there is no way for it to discriminate based on L3 nodes. As Stu said that information is buried in the data and the datalogger only looks at the ITC header. Almost all information available to the datalogger is placed in the event catalogs which in turn are passed on to the luminosity server. Since presumably the Luminosity server knows the begin and end lum blocks for a run, then it can track whether it has received all partitions for a run. But, doesn't L3 report the information by lum block anyway? If so, then one really wants to check lum block by lum block whether the datalogger and L3 agree so the concept of Run becomes artificial. Are you trying to track things in near real time or is this for later processing? If not near realtime, then one needs only to flag lum blocks where the count is different (this better be unusual) and then go look at the events in that lum block and see which L3 node they came from to see who comes up short. Any time the collector receives an event that cannot be routed an alarm is sent to the SES. Also any time an event cannot be written by the datalogger an alarm is sent to the SES. We do see the former happen, but Gustaaf says the rate is too low to worry about now (events being sent from L3 long passed the end of run). What are you really trying to accomplish and on what timescale? Apart from the L3 binning I don't really see anything that isn't currently available to the luminosity server that you are requesting. What is the motivation to do this on a run basis when luminosity block seems more natural? ------------------------------------------------------ From Heidi: Isn't it true that if a sam connection fails, the event catalog and datalogger can get out of sync? ------------------------------------------------------- From Jerry: Hi, If I understand what you are saying then the answer is no. The datalogger asynchronously writes a catalog file which is later stored into SAM once the raw file has been transferred. The process that stores the catalog to the database does a verification of the store by comparing values of 3 events in the database to the local information. If a connection failed, or other problem occurs that process tries repeatedly to store the catalog. We monitor and would notice a catalog that never made it into the database. Independently, another process copies the catalog to a local cache directory for use by the luminosity server. The luminosity server has access to all the information before it reaches the SAM database. By the way, we also have a recovery program that can recreate the event catalog based on the raw file, so even under unusual circumstances we know what the datalogger actually wrote and can synch the SAM db if necessary. ----------------------------------------------------