DESIGN STUDIES FOR THE CLUSTER ALGORITHM IN THE SILICON TRIGGER CARD OF STT L2 STR, 15-FEBRUARY-1999 rev. 3-March-1999 ______________________________________________________________________________ ABSTRACT In a previous note (D0NOTE 3599) we investigated about the format and rate of the data in input to the STC (Silicon Trigger Card) of STT L2. This information was essential for beginning studies about designing a Clustering Algorithm for the FPGA of the STC. In the present study a model is described of accessing the data from the SMT front end into STC, calculating cluster centroids 'on fly' into the FPGA units of STC, and buffering centroid information to the Road Cards and to the Track Fitting Cards of the STC . Its feasibility, especially the time constraints, has taken into account the results obtained from preliminary tests on FPGA's prototypes. _____________________________________________________________________________ 1 - DATA FLOW There should be one Cluster Finder unit (FPGA) for each optical fiber cable coming from the SEQUENCER ( and from VTM ) into the STC (Silicon Trigger Card) of STT. There are four optical cables per each SEQUENCER. - For more details, see D0NOTE 3599, Ref.1 -. Each optical fiber sends the data from up to 2 HDIs ( from 3 to 9 SVX chips for each HDI ), that are read out at a rate of two 8bit words ( channel address and channel content) every 37.6 ns. The Cluster Finder FPGA must so be able to calculate clusters at a cycle of 37.6 ns - on each optical fiber -, since every 37.6 ns a new pair of channel address and channel content words comes as input to FPGA. 2 - FPGA's PROTOTYPE TESTS Tests have been made (Ref.2 and 3) that a prototype FPGA has been able to perform all the operations required by clustering ( including the time con suming division through a look-up table and possibly by use of pipelining ) in the required time interval of two clocks (18.8 ns each clock). 3 - OCCUPANCY The present time constraint of 37.6 ns should then be partially 'released' by occupancy considerations : to each Cluster Finder FPGA there are up to 2 HDIs connected, and the average expected occupancy calculated from simu lation happens to be about 5% [ see Jim Linnemann, estimate 2%, including noise. The incidence of two contiguous clusters is so extremely low. Tracks crossing a wide angle or delta rays could also be the case of 'multiple clusters' or clusters of size bigger than normal. A normal size for a cluster (from MC data and Test Beam data, is 2.5 to 3 strips ). More details about occupancy are given in sect. 9... 4 - WHAT KIND OF CLUSTERS ?....... Here is a simple sketch of clusters configurations that we will be required to deal: 2 STRIPS SIZE CLUSTER : - - _ _ <-------- 2 ADC counts _ _ above pedestal _ _ _ _ _ _ _ _ <______ pedestal ---------------------- 4 STRIPS SIZE CLUSTER : - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ------------------------- -------------------------- MORE THAN 4 STRIPS CLUSTERS : - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ----------------------------------- ---------------------------------- Special cases are spurious signals ( isolated channels with ADC count above a given threshold or noise ) , or very wide clusters ( from delta rays or tracks crossing at wide angle ). 5 - DESIGN FOR THE CLUSTERING ALGORITHM The considerations, results and constraints described in sect. I , II and III, became compatible with the following sequence of logic steps that a Cluster processor should include: * step 0 - read and save locally the EVENT NUMBER information (it should be provided to STT anyway from TCC or similar, as well as to all the Trigger components) read and save locally the SEQUENCER Header information: HDI NUMBER * step 1 reset event counters ( like PED (Pedestals count), SEED, ID1 , ADC1 etc, see below) ________ > loop over data readout ** if channel address = C0C0 , then it is end of event, set EOE = 1. Otherways, EOE = 0. This info is sent by the SEQUENCER Trailer. * step 2 - read channel address (ID1). Save address as SEED for the cluster rename channel address as 1 ( 1 gives the relative position of strip inside cluster) * step 3 - read channel content. If < pedestal, increment PED, goto step 1. If > Pedestal, save channel content in ADC1 * step 4 - for the NOMINATOR, create element of SUM, N_1 , i.e. MULTIPLY N_1 = ID1 * ADC1 SUM NOMINATOR N = N_1 * step 5 - for the DENOMINATOR, create element of SUM, D_1 i.e. D_1 = ADC1 SUM DENOMINATOR D = D_1 * step 6 - DIVIDE N / D * step 7 save result of division in temporary Buffer 'CENTROID' = N / D ----------- > next data readout step 2 - read channel address. step 2' - check that channel address is adjacent to SEED . If not adjacent, set flag END_OF_CLUSTER. Goto End_of_Cluster If adjacent to seed, set ID2=2 step 3 - read channel content. If < Pedestal, increment PED, goto End_of_Cluster If > Pedestal, save channel content in ADC2 step 4 - for the NOMINATOR,create element of SUM, N_2 : i.e. MULTIPLY N2 = ID2 * ADC2 SUM NOMINATOR N = N_1 + N_2 step 5 - for the DENOMINATOR, create element of SUM, D_2 : SUM DENOMINATOR D = D_1 + D_2 step 6 - DIVIDE N / D step 7 - save result of division in temporary buffer 'CENTROID' = N / D __________________ > next data readout ............................ End_of_Cluster go here if cahnnel address is not 'adjacent' to SEED means that the 'old' cluster has ended, so it is time to step 8 - convert CENTROID 'relative' position ( from 1 to 4 is the relative position of the centroid in the cluster) into channel address CENTROID ADDRESS step 9 - SAVE CENTROID Information: * CENTROID NUMBER in EVENT * CENTROID SIZE * CENTROID ADDRESS = N / D * SEED * send EOE (End_of_Event) if EOE = 1 step 10 - SEND CENTROID Information to ROAD Card Reset local buffers- back to step 1 - End_of_Event If EOE = 1, then same as for End_of_Cluster, plus send END_OF_EVENT Info: PED = Pedestals count NOISE = Noisy channels count NCLUSTERS = Number of attempted Clusters NCENTROIDS = Number of found Clusters .... etc 5.1 - REMARKS ABOUT DIVISION. Tests done at Fermilab (M. Johnson and B. Angstadt, Ref.2) reveal that DIVISION is very time consuming. Timing simulations, that give typical times for Addition and Multiplication of respectively 49 nsec and 30 nsec , give for Division a time of 110 Nsec up to 480 nsec or 519 nsec (to come out with the last fractional bit). Wherever the Addition and Mul- tiplication would satisfy the design requirements if pipelined, the Division it way too slow. Further timing simulations ( Henry Piekarz, Ref. 3) that use Division and consider the 6 MSB in both Nominator and Denominator, end up to use about 3 clock intervals ( 112 nsec ) to make the calculation. Alternative to Division has been investigated ( Ref.2 ), and a solu tion could be the use of a LUT ( Look-Up-Table ) that, in an Altera 10K50 FPGA chip, had a capability of 20.480 bits available and would allow to account for the full range of ADC values for the SVX chan nels ( 8bit values in the Denominator sum ) with 80% RAM occupancy. Further considerations ( Uli Heintz, Ref. 4 ) show that a resolution of 1/4 strip pitch gives 10 microns precision that;s more that satis factory to our needs. Given a max cluster size of 5 strips, this means that there are 20 (?) possible values the centroid could assume. So the size of the LUT needed is effectively much reduced. 6 - GENERAL ALGORITHM The position of the center of gravity of a cluster as calculated in sect. V does satisfy the time constraints and the data flow configuration typical of STT . A more general expression for what calculated above is described by : sum_i ( ADC_i * CHID_i ) CENTROID = ___________________________ i = 1,n (6.1) sum_i ADC_i It should be remarked that in the procedure of section V There is no constraint on the maximum number of strips n a cluster may have: the algorithm keeps updating its result for each new incoming pair of data, until the last pair belonging to that cluster is accessed. This method of calculating the cluster centroid "on-the-fly" provides the same result as the general formula (6.1) in the present section. 7 - GENERAL CONSIDERATIONS. The sequence described in sect. IV should be general enough to account for all the possible configurations of clusters. Special kind of clusters are: 1- ONE STRIP size : it could be a spurious. In that case his ADC value could be used to identify it as noise, and count it down, the same way Pedestal channels are counted down 2 - TWO to FOUR strips size : normal size for cluster coming from one track at normal angle of incidence 3 - MORE than FOUR strips size : one track at high angle or delta ray or more than one track at normal angle. In case 3, all that could be done at this stage is to provide the track fitting stage with the info that more than one track " is possible". The cluster size information ( > 4 strips) would provide that information. In shch cases, that centroid position is anyways mismatched. For delta rays, the size of the cluster tends to be even bigger. It would be wise to set a max cluster size to cut delta rays. Unless we decide to use a much more refined algorithm,clusters of size bigger than 4 strips will produce anyway the wrong value for the cen troid position. 8 - Further remarks - A - For how the hardware readout is configured, there should not be any case of 'crossing' clusters, i.e. sequence of strips that belong to two different chips, or two different sequencers, or two different crates. The probability that the same track crosses two different PHYSICAL sectors is very low (from MonteCarlo results), and has been disregarded in the design. The probablilty that the same track belong to two different HARDWARE units is ( as far as I know) inhibited. B - There are several tasks that have been assigned to STC and here we briefly summarize them: 1 - Process data from all barrel sections of an azimuthal sector in one crate 2 - accept Glink input from SMT portcards 3 - correct SMT for pedestals, gains, mask out bad channels 4 - find hit clusters in SMT data 5 - convert cluster positions to hit coordinates, allowing for alignment corrections 6 - buffer cluster and road information for transmission to L3 Out of the tasks listed above, some may be impossible to be done because of time constraints, and we want just mention that may be task 3 belongs to that category. Only real tests with a prototype FPGA and simulated events will give the needed answer. Task 5 has been under discussion recently, and it may well be that it will be done at level of TFC (Track Fit Card ), where it could be done through a look-up table and on-the-fly. More in general, it should not be forgotten to pay attention since now to the quantities that we would like to use for monitoring ( and debugging) purposes. For example some special characteristics of clusters could be used to dire ctly give a measure of performance and efficiency of STT or to detect problems and diagnostic the sources. 9 - Note on typical occupancy From Meena's note : MULTIPLICITY of clusters per 30 degrees sector per layer : (ex. layer 3, inner barrel) AVG RMS -------------------------------------------------------------- 8 interactions-minimum bias events- 4.6 3.7 1 interaction " " " 1.0 1.5 Zto bb + 2int. 2.7 2.5 Zto bb + 5int. 4.6 3.1 MULTIPLICITY of strips per cluster : 2.5 to 3 10 - Note on HARDWARE segmentation ONE CRATE ( 60 deg sector ) = 12 SEQUENCERS ONE SEQUENCER = 8 HDIs ONE HDI = from 3 to 9 SVX chips ONE chip = 128 channels = 64 strips = 1/4 barrel layer --------------------------------------------------------- ONE layer (barrel) = 256 (50microns) strips ONE strip = 2 channels ( ONE Address ; ONE ADC Count ) 11 - REFERENCES Ref. 1 - "The input data flow to L2STT Trigger Card from SMT Front Ends" - S.Tentindo Repond and B.Connolly, D0Note 3599 , 22-Jan-1999 Ref. 2 - "D0 Central Hardware Trigger Preliminary Implementation Studies of Weighted Averaging of Clusters"- R.Angstadt and M.Johnson, D0Note 3168 , Jan-24-1997 Ref. 3 - "A preliminary consideration of hit finders for the D0 Impact Parameter Trigger"- Henryk Piekarz, D0Note 3209 , Mar-21-1997 Ref. 4 - "Cluster Centroid algorithm for STT" - Uli Heintz, D0Note 3421 , 1997