Alan Jonckheere 20 Oct 1995 D0Library Organization, Strengths and Weaknesses What it is: -Repository D0Library is several different things to different people. At it's simplest, and the view that most people see, it is a code repository: 1) of the "official" versions of D0's executables and utilities (D0X, LIBTest, D0News, D0Eve...); 2) of many of the official data files used by executing programs, STP, RCP and PBD files for example. 3) of official object libraries, most links use option files and libraries stored in D0Library; 3) finally, it is the repository of the official versions of source code. This is, usually, the place where people get versions of the code that they need to modify. It is a distributed repository. There is only one "source" or master library. But copies of it exist throughout the collaboration. There is normally one copy of it on each "cluster" used by collaboration members. These clusters are mostly Vax clusters, but increasingly are DEC Alphas and various flavors of Unix. It has not yet been ported to NT or any other PC operating system. The D0library consists of some 135 different "products" each of which is stored in it's own directory tree. These products each have a defined function which range from pure archival (D0$DOCS) to the code needed to created top level executables (D0$RECO, D0$D0GEANT, D0$EVEDT, D0$XFRAME). Most are various levels of utility libraries either detector based (D0$CALOR_UTIL) or algorithm based (D0$PHYSICS_UTIL) or general (UTIL, GENERAL). A few (D0$STP) contain mostly data files used by other libraries/programs. Each of these directories contain executable, object code libraries, documentation, command scripts, data files etc needed to perform it's function (plus a lot of junk usually). Most are used both online (on D0HS) and offline. But some are purely online. The purely online ones go to a small number of machines. Alan Jonckheere 20 Oct 1995 -Archival and code management system At another level D0Library is an Archive of all past developments of the D0 software and a code management system. All code is "released" to the repository through a code management archive system, DEC's CMS. The D0Library CMS libraries, one per product, contain all of the source files needed to build the object libraries, plus command files to build any executables and data files needed. They also contain one or more special files written in a makefile like language designed by Harrison Prosper that contains all of the information needed to build the product from it's sources. This file, plus the structure of the CMS library itself determine what is created when a library is "released" and where the various files are stored within the directory tree for that product. CMS actually stores the differences between the original file and all subsequent revisions of that file. Each revision is assigned a sequential "generation" number. It is possible to extract any given generation of any file (element). Another very important feature of CMS is that a history of each revision is kept, including a comment and the user's name who inserted the revision into the library. Two very important features of the current code management system are it's ability to group files in two independent ways. 1) GROUPS are collections of files. Groups can be managed independently, making management much easier. A given file can be in more than one group, and groups can be within other groups. This makes it possible to operate on files in a number of ways. 2) CLASSES are an even more important grouping. They provide the ability to define collections of specific instances of each file. Usually, a class is defined to be, for example, the latest version of each file within a particular group or groups. We use the class to tag a "release", the collection of all code that is released into D0Library at any time from this library. In principle, it is possible to reconstruct the D0Library as it was at any point in the past. This, in principle, allows us to produce *exactly* an executable that was used at that time. Alan Jonckheere 20 Oct 1995 Releases: There are actually *seven* different types of code releases in the D0 code management system, far too many. 1) Alpha releases aren't really releases in the sense that they are, or contain, private code. They may or may not use various D0 code management tools (BETA_UTIL tools). 2) Beta releases are shared code that allow a reasonably large group in close contact to develop code jointly without too large a probability of overwritting each others changes. 3) Gamma releases are *identical* to Test releases below. The code is extracted from the CMS libraries with the same procedures and all the same processing is done as for TEST releases. The only differences are that a Gamma release can be done by anyone with read access to the CMS library and it's output can go to an arbitrary location, on an arbitrary machine. 4) Test releases are updates to a library in the TEST area within D0Library. Only that code that has changed since the last official release (see below) are fetched from CMS, but all the rest of the processing is done. There are normally many test releases done for each official release. 5) Official releases are "major" versions of a product. All released elements are fetched, OLBs are rebuilt from scratch all processing is done. 6) Production releases are a method of freezing the D0Library in order to give the managers of production executables a solid base from which to develop working production executable. This is necessary since the D0Library itself is so large and changes so rapidly that debugging a complex executable and controlling it's contents is nearly impossible otherwise. Strictly speaking this is outside of D0Library itself. It is controlled and managed by the "czar" of the particular production area. 7) Production Pass releases are a method of inserting modifications, both debug and functional into a production release. Each of these release types is mimiced on the Unix machines. A release to one of the VAX systems, currently triggers a "pull" from the Unix machines. Alan Jonckheere 20 Oct 1995 Strengths of the current system: D0Library is a working repository and archive of D0 software. It works, it's robust, we *can* backup to old versions. We *can* make the required updates to all machines of interest to us, VAX, AXP and UNIX with it's various flavors. We have the audit trail needed to tell who did what and when, from which we can find out why. Each library is controlled by a single, or at most a few "czars". These people have absolute control and should have complete knowledge of what's in his or her library. This, in principle should guarantee that correct code only is released into general use. At one time, there was a clear hierarchy of libraries: 1) top level libraries that contained only the "frame" or top level controlling routines in a single EXE (D0RECO), or contained an entire standalone program (D0GEANT, XFRAME). 2) Physics Algorithm Utility libraries (PHYSICS_UTIL) 3) Detector Utility libraries (MUON_UTIL) 4) general Utility (GENERAL and UTIL) Libraries were only allowed to reference routines in lower level libraries, none at the same or higher levels. Alan Jonckheere 20 Oct 1995 Weaknesses of the current system: The biggest weaknesses in the current system are: 1) it's slow and fairly labor intensive to make and check releases. It's difficult to make corrections quickly. 2) The audit trail is difficult to follow, especially for routines that go through several of the types of releases. 3) There are too many different kinds of releases. This latter problem has been caused by the previous two, especially the slowness of it all. 4) Alpha areas are a *bad* idea. Private code is untestable, not to mention hard to find when you need it. But we will always, must always, have it. Code couldn't be developed by individuals otherwise. The problem occurs when code never leaves the private areas. We need a system where it's so easy, and safe to put code into an official area that that will usually be done. 5) The OFFICIAL version has never worked. The original concept was that that would be the version that was guaranteed to work. But there is so much coupling between libraries that linking with all the official versions of everything *never* has producted a working EXE. Get rid of it, but replace it with what? We often need at least two versions of a library. Perhaps replace the OFFICIAL version with an OLD version, which would just be the previous "Test" version. Or maybe use three, OLD, CURRENT, TEST, where by default TEST->CURRENT->OLD after say 2 weeks unless a problem is found and test is redone. Test really would be a TEST version. 6) Beta and Gamma, Beta from a semi-private CMS library and Gamma from the official one, where invented mostly to decrease the turnaround for releases. The problem occurs when code stays in Beta or Gamma for too long. You run the risk of parallel developments. Can we meld these with TEST and, perhaps, allow the library czars to actually do the releases, at least locally? 7) Production releases are *really* painful. Just ask Qizhong. It's clear that the procedure/scripts could use a lot of improvement, but the problem goes deeper. How do we make sure that code is tested before it goes into production without taking forever? The real problem is that there is never a real boundary in code development when all development is "done". Someone always has some new idea or some little "fix" they want to do. Alan Jonckheere 20 Oct 1995 8) The library itself is large ( 800 Mbytes in the VAX version, twice that on most other machines) and complicated. There are too many different libraries and they are not (now) well defined. There is far too much inter-dependency between libraries. At one time each library (product) *was* well defined, but over the years their functions have become muddled. Since there are so many libraries, and their functions aren't well defined, it is hard for anyone to know or to guess where a given routine should go. Even the Czar often has trouble determining if a given routine should be in his or her library. The current definitions of libraries need to be reviewed. Can we merge libraries or do we need to split them up? Can the library definitions be made more modular? Large libraries are impossible for a single person to track changes/contents and are harder to release. Small ones make linking etc, a pain and require frequent file moves between libraries, which is *not* easy to do. 9) There is no tight coupling between the various flavors, VAX, AXP, UNIX. Since each flavor has different compilers, different errors are flagged on each. But there is no easy way to feed these back into the archive. In fact, errors on UNIX are often ignored. 10) There is no formal mechanism for code review for most releases. This means that it's not unusual for absolute junk to get into the library. NOTE: formal review is diametrically opposed to quick release. Alan Jonckheere 20 Oct 1995 Features of "The Next Generation Library" We are assuming that the emphasise of our code development will be be moving away from strictly VAX based development to a more egalitarian development environment. In the short term this would be a switch to UNIX environments (note the plural) as the emphasis. 0) I see no major problem with the form of the D0Library as a repository, ie) separate "products" with a Czar who controls it's contents and releases. It needs reorganization as to what products are maintained and what goes into each of them. Each "product" or sub-product (ala D0Geant) should have a fairly small, close knit group of developers. 1) The Archive can be on any machine (VAX, UNIX, NT...) but we need the equivalent of CMS on that machine with most of the features of CMS, including it's Groups, Classes and ability to create text output with element name substituion so we can create the release command scripts. This could stay on VMS for the foreseeable future, until something better comes along. So I see no problem with the archive features of the library. 2) We need an easy way to update the archive remotely. You shouldn't have to login to the archive machine to update your library, but maybe you want that as part of your control of the archive. RShell could be used perhaps. If we have machine dependent blocks (D0Flavor) this update program needs to check and if necessary set the blocks to the default flavor. 3) We need to have a way to automatically compile code on several different platforms *before* it goes into the archive. 4) We need some sort of code review/verification for at least *some* of the release types. 5) We need some way of making rapid updates/corrections to at least some of the release types (czar actually does them?) 6) We need to rethink the various kinds of releases we need in light of the above. I'd recommend something like: a) ALPHA (private), have to have this. Unreviewed. b) TEST/CURRENT/OLD where TEST might be redone frequently, but would *automatically* become CURRENT after a *short* time. Current would then become OLD. Test would be done by the CZAR and would replace both BETA and GAMMA. CURRENT would replace (today's) TEST and OLD would replace OFFICIAL. Perhaps TEST would only exist at some small subset of sites, but these should include at least one of each flavor of operating system. CURRENT/OLD would go everywhere (via AFS?) This might be done by the Czar. Reviewed? CURRENT and OFFICIAL *should* be reviewed, but one thing I think we *really* need is that TEST is temporary. Otherwise CURRENT and OFFICIAL become useless. Everyone would use TEST just like they do now. c) PRODUCTION and PRODUCTION PASS would remain, but hopefully could be improved. I don't know how though. Alan Jonckheere 20 Oct 1995 5) Distribution needs improvement. a) It's been suggested that we use AFS, a distributed file system to do this. With AFS code is cached locally only when it's needed. Thus the distribution would be automatic and on demand only. I have no idea how to guarantee that the code will work though. This is easy to do for the source code. But the object libraries and executables have so many dependencies on the particular operating environment that I don't see how it'd work. b) One of the major problems we've had lately is changes in the various operating system environments. The run time libraries, versions of compilers, versions and types of network transport systems, versions and types of graphics systems... These mean that an executable linked on one machine won't run on other systems even of the same "flavor". How do we handle this?