Hi, today Andrew, Parag, and I have discussed how to use the storage available to us. We have put together a few pros and cons. At the meeting tomorrow we should discuss: - pros and cons of the configurations - what metrics we want to analyze after the "drill" (e.g. data throughput, job idleness, ease of operations, ...) PROS and CONS of different storage configurations. Several institutions have responded to our request for storage space, offering always at least 1 TB of disk space. It turns out that we can have 4 TB of disk at fnal, instead of the 2TB considered by design. This means that we have almost twice the amount of space initially requested (7-8 TB instead of 4 TB). In the end, we might not need to use all the space at our disposal. We considered some pros and cons of 2 main storage configurations: ================================================================== (1) input data is cached at fnal only and remote space is used for storing binary files (required space a few GB); this is the configuration for which Andrew has written an initial deployment plan; (2) data is cached at fnal and at all locations (as well as binary files). Note that we plan to store output at fnal only for (1) or (2), which is most efficient if we run merging jobs at fnal (e.g. CMS farm). In the end, we should also consider some mixed configuration e.g. data cached at fnal and OU, binary files cached everywhere. We'll need to see how well we do with our "drill" in order to decide. (1) CONS: - it does not allow for more complex/more efficient operational models. Pre-staging data does not increase efficiency: observation is that after some time, data from tape is available at the fnal sam caches and the limiting factor to transfer the data to the clusters are the data queues (rather than the tape speed) PRO: - easy to maintain: in case of failure, the configuration is simple - not a lot of extra data traffic: before the station knows where consumption happens, it places files randomly at caches: in this case, these are both at fnal - simple operational model: submit job, check job (no pre-staging) (2) CONS: - more complex deployment: in case of failure, tracking the problem is more difficult - requires maintenance of the stagers at the remote sites - more complex operational model: to use these caches efficiently we should consider pre-staging data - more data traffic: before jobs start running, the station places the files randomly at the caches; once they run i.e. the location of the consumer is known, there is a need for transferring files from the random cache to the consumer cache: this generates more data traffic PRO: - allows for multiple distributed data queues and pre-staging strategies, in case we need to optimize data throughput, job idleness - once deployed, the infrastructure can be used for MC as well (e.g. to store min-bias files) - more sense of participation in the activity by the remote sites