Comments on Languages and Data Structures 1. Criteria for data structure and language suggested grading: 0 = unacceptable 1 = 2 = 3 = does essential things 4 = 5 = best we can imagine Features ======== User Defined TYPES runtime INSTANCE creation RELATIONSHIPS among instances support of PATH idea in some form PORTABLE I/O SELF-DESCRIBING I/O EVOLVING versions of types ONLINE support Quality ======= PERFORMANCE DDL (Sole source for TYPES info) READABLE EFFORT to code, recode, train Error PREVENTION (Writeability) LOCALITY of errors, code DEBUGging support How === LANGUAGE(S) MULTIple language support PREPROCESSOR? PLATFORMS VENDOR Support DEVELOPMENT tools DOC F77+ZEBRA Features ======== TYPES 3 INSTANCE 3 RELATIONSHIPS 3 PATH 3 PORTABLE I/O 4 SELF-DESCRIBING 2 EVOLVING 3 ONLINE 3 Quality ======= PERFORMANCE 5 DDL 2 (using ZBANK) READABLE 1 (without GTxxx) EFFORT 2 PREVENTION 1 deallocation at end of event by system LOCALITY 1 DEBUG 2 How === LANGUAGE(S) F77 MultiLanguage 0 PREPROCESSOR? D0FLAVOR PLATFORMS VMS,ELN,IBM,SGI VENDOR Support 4 DEVELOPMENT tools 3 (with DBANK, EVEDT, D0X; would need to rebuild some) DOC 4 (of ZEBRA; not searchable) note MZ grades much worse than FZ grades: memory manager details more unpleasant the I/O package F77+DSPACK DSPACK = NA49: routines to hide MZ, use FZ I/O Features ======== TYPES 3 dynamic definition INSTANCE 3 RELATIONSHIPS 2? not clear how pointers done PATH 4 define object of same type, new name PORTABLE I/O 4 uses FZ SELF-DESCRIBING 4 full definitions in data EVOLVING 3 self-description helps, but forces to low-level access routines? ONLINE 2? don't know enough Quality ======= PERFORMANCE 4 all access via routines DDL 3 none provided, but fully feasible; generate .h files => recompilation massively? print dump built in READABLE 3 names yes; but all via calls EFFORT 3 guess system not very hard to learn? PREVENTION 3 pointers as protected as if in link area named access; access routines evade type checks LOCALITY 3? depends on how used DEBUG 3? can have .h files, but objects are really defined independently How === LANGUAGE(S) F77,C,C++ under way? MultiLanguage 3 PREPROCESSOR? ? for C versions PLATFORMS SUN,HP VENDOR Support 2? NA49 DEVELOPMENT tools 2 memory browser already there; need DBANK equiv DOC 3 not searchable Farfalla + C++ basically implementation of TREE class + I/O idea is to create DSTs by filling C++ tree's with FZ data from reconstruction program Features ======== TYPES 5 INSTANCE 4 RELATIONSHIPS 3? not clear how pointers done PATH 4 define object of same type, new name PORTABLE I/O 4? generic Unix libg++, uses NFS utilities SELF-DESCRIBING 4 full definitions in data, but usable only with .h files EVOLVING 3 self-description helps, but forces to low-level access routines? ONLINE 2? don't know enough Quality ======= PERFORMANCE 4 DDL 3 .h file is DDL generate .h files => recompilation massively? no print dump built in could be generating more automatically? READABLE 4 new language; object names 4 letters? (Zebra somewhere?) EFFORT 2 new language PREVENTION 3? pointers can dangle in any language; C++ (OR F90) open up new opportunities (memory leaks) LOCALITY 3? depends on how used DEBUG 3? can have .h files, but objects are really defined independently (versioning again?) How === LANGUAGE(S) C++ MultiLanguage 2 (-> 3 or better with IDL?) PREPROCESSOR? no PLATFORMS Unix (g++) VENDOR Support 3? Babar looks to be adopting DEVELOPMENT tools 2? I/O looks to be by hand DOC 3 not searchable, but nice intro to C++ ideas DETAILS of criteria Features ======== User Defined TYPES type definition at run or compile time? (flex vs checking) names for all types, elements (reasonable lengths) effort to build up to structures we need from supplied types runtime INSTANCE creation dynamic allocation (now much oversight needed) user or automatic garbage collection? RELATIONSHIPS among instances reference links act as pointers in ZEBRA structural links sometimes used effectively as pointers see alsoPATH: support of PATH idea: same base type, different source; easy comparison (attributes associated with a data type) logically, now have PATH.BANK.chain_member handling of attributes: set/reset depth-1 stack? specify attribute directly in access? get full list of objects, then pick out attribute? give different name to objects of same type: geant_electron and reco_electron both of type electron? PORTABLE I/O how much effort to port to a new machine complexity of code to read event complexity of code to write event control of which objects to write handling of missing objects on reading event really need to read/write whole event? connection to mass storage SELF-DESCRIBING I/O (header or embedded or external files?) names of objects (banks) data types for all elements names of elements meaning of names (comments) preserve relationships (links) among data objects EVOLVING versions of types reading of data where object has grown more complex discipline: only add to end of existing data structure marking as "element not known?" had better not involve recompilation ONLINE support memory mapping: raw data in fixed location in Level 3 trigger software? network access? replacement for examine's global shared common? Quality ======= PERFORMANCE I/O overhead in data access DDL (Sole source for TYPES info; demand more than ZEB + ZBANK!!) a single source file for a data structure which can generate code for: header files for compiler(s) access (xx = NAME.ELE or GTBANK('NAME','ELE') not Q(LNAME+7)) print dump: (Ascii object persistence) Output Input debugger/ memory browser file browser instance comparison utilities versioning as definition evolves place in ntuple? cut on in ntuple? plot? Not really clear how this coexists with generation/use of header files in face of evolving versions? cross-check? exception handling? READABLE EFFORT to code, recode, train Error PREVENTION (Writeability) strong typing (prevent improper routine calls) detection or prevention of errors: dangling pointers (pointer type for each object type may help?) overwrites due to above, or undefined pointers memory leaks (never deallocating memory) any others peculiar to system? LOCALITY of errors, code how much code could have caused this problem? how much code SHOULD have handled this data? DEBUGging support good debugger, knowing about user data types help for same list as error prevention? How === LANGUAGE(S) Multiple Languages support PREPROCESSOR? PLATFORMS machines, operating systems VENDOR Support availability cost user base stability of vendor stability of product bug fixes prompt? effort/likelihood of adding new platform DEVELOPMENT tools debugger, ddl: see above editor support of languages, DDL compiler quality (and interaction with system used) code generation from design phase benchmark: effort to add a new data type (bank) and debug DOC comprehensible user and reference manual? interactive? searchable?