Persistence D. Petravick 20-July-1995 Desarata: ========= 1.1 Ability to dump my "structs" to disk and restore them from disk in a way that is "transparent" or "convenient". 1.2 "Pointers" are "preserved" in some meaningful way in this process, even if I have a "loop" of data structures, like the ones pictured below. typedef struct dll { struct dll *flink; struct dll *blink; . . . } DLL ; mylist -----------------------> DLL -------\ / \ / \ / \ DLL > DLL ^ / \ / \ / \ / \ / \------ DLL <------- 1.3 The data file can be accessed from another machine, which might have a different byte-ordering, struct padding, etc. 1.4 The data file can be accessed in some "transparent" way by another program have some or all of the structs the first program wrote to the file. 1.5 A data file can be browsed by a program having _none_ of the structs compile into it. However, such a program may have to "work hard" to figure out what the data means. 1.6 Some sort of automated tool either scans my source code, or generates source code from some struct description language. The result is a "meta-struct system: A system of structs which describe other structs used by the persistence system. Side effects to think about: ============================ 2.1 I may not be able to handle an arbitrary C type. To see this, consider: struct ... { char *dog; }; How can I tell if dog is a valid pointer or not? If it is valid, how large is the data area it points to? Exercise: consider enums : what sort of controls would you have to put on enums in order to make them work in a persistence scheme? 2.2 Programmer's types are very low-level things. What should the persistence scheme do if a program is trying to reconstitute an struct from an "old" data file? 2.3 No scheme that DLP knows of "just looks like paging data in" Things are brought into memory via procedure calls. SDSS have "sound" ideas for ANSI C and C++. and and distinct, though imperfect, implementations for each language. 2.4 Would a perfect persistence scheme really solve your problems? or do you need to interchange data to some system outside your persistence-oriented data system, using a "conventional" file format. 2.5 You may end up writing out "baggage data". For example programmers may use a double because some intermediate computation demands it, while a float would do just fine in the file. 2.6 You are tying your data model to things that need to change quickly. Ask yourself: Am I going to live with whatever programming mistakes I have made with my structs for the duration of the experiment? ANSI C Implementation (R. Lupton at cl.) 3.1 SDSS have a program in its shiva tool-kit, make_do, that scans C structs, and generates C Code which you normally link into all programs seeking "transparent" access to persistent data. 3.2 In the program, the code is called and generates a "name" (small number) for each known type. Note that there is no fixed map of structs to numbers in the survey. the code users the ANSI C offsetof() and sizeof() operators to figure out where, in memory, this compiler happens to lay out memory. The code generates a global list of known types. (the type-meta system). 3.3 We have the convention that a pointer is good unless it is null. 3.4 We have a memory management system layered on top of malloc, that tells us, given a pointer, just where the data for the pointer beings and ends. (not clear that we use this today). 3.5 Given a pointer, you can write to a file everything it points to. Given a file that has some data in it, you can give the system another pointer, and everything it points to will be written to the file as well. Duplicate data are resolved, so the system better had not have "changed state". 3.6 A reader can "transparently" restore any saved pointer to memory. data is resolved with properly "swizzled pointers". 3.7 A dump file contains Meta structs, data, and book-keeping for circular data. 3.8 We use the C meta-struct system extensively from TCL to manage our data. For example, we can write a TCL loop to traverse a linked list of data , pick out some values and plot them up. Example Histogram and AF package C. Stoughton shiva> ... in tclHgFromAf: tempMin, tempMax -0.500000 5.500000 h4 shiva> exprPrint h4 id 0 minimum -0.5 maximum 5.5 name test xLabel test yLabel no. entries nbin 20 contents 0x10114880 error 0x10119360 binPosition 0x10103380 underflow 0 overflow 0 entries 100 wsum 100 sum 12.7000004053 sum2 77.4100043082 shiva> exprPrint h4.contents<2> 0 shiva> loop i 1 10 { echo [exprPrint h4.contents<$i>] } 96 0 0 0 0 0 0 0 0 shiva> Dump file features in SHIVA shiva> help dump dumpPrint Print the contents of the list returned by dumpRead. USAGE: dumpPrint array dumpRead Read dump file and return a list of handles. USAGE: dumpRead file dumpClose USAGE: dumpClose Close a disk dump file dumpDateDel USAGE: dumpDateDel file Overwrite the date string in a dump file with Xs.You may not have a dump file open when you use this verb dumpDateGet USAGE: dumpDateGet Return the date string from a dump file dumpHandleRead USAGE: dumpHandleRead [shallow] read the next item from the current dump, returning the handle dumpHandleWrite USAGE: dumpHandleWrite handle Write an object described by to the current dump dumpOpen USAGE: dumpOpen file mode Open a disk dump file file for read or write. Mode may be r, w, or a for read, write, or append. If you capitalisethe mode, don't perform usual cleanup on close dumpPtrsResolve USAGE: dumpPtrsResolve Resolve pointer ids, converting them to pointers where possible dumpReopen USAGE: dumpReopen file mode Reopen a disk dump file file for read or write This is the same as dumpOpen, but doesn't initialise data structures dumpTypeGet USAGE: dumpTypeGet Return the type of the next thing in a dump file dumpValueWrite USAGE: dumpValueWrite Write something, given as a and a , to the current dump shiva> shiva> regNew 100 100 h5 shiva> dumpOpen dog w shiva> dumpHandleWrite h5 shiva> dumpClose shiva> dumpOpen dog r shiva> dumpHandleRead h6 shiva> exprPrint h6 name h5 nrow 100 ncol 100 type (enum) TYPE_U16 rows 0x10135700 rows_u8 0x0 rows_s8 0x0 rows_u16 0x10135700 rows_s16 0x0 rows_u32 0x0 rows_s32 0x0 rows_fl32 0x0 mask 0x0 row0 0 col0 0 hdr 0x0 prvt 0x1012a340 shiva> exprPrint h5 name h5 nrow 100 ncol 100 type (enum) TYPE_U16 rows 0x1012d4a0 rows_u8 0x0 rows_s8 0x0 rows_u16 0x1012d4a0 rows_s16 0x0 rows_u32 0x0 rows_s32 0x0 rows_fl32 0x0 mask 0x0 row0 0 col0 0 hdr 0x0 prvt 0x10124a90 shiva> C++ Objectivity/DB/C++ persitence. 4.1 Persistance can be gotten at a price by purchasing an OODB. This is expensive, and may limit the kinds of platforms you can run your data system on. Purchased software taints collaborative software development. This is a serious limitation. 4.2 There are "toolkits" which offer "persistence". I do not understand them. I would look at the "rogue wave" Class libraries to start. Gene Oleynik has used them. 4.2 OODBS work by building meta-classes by scanning your source. or scan an OODL which generates your C++ classes. 4.3 Vendors have thought out the "char *" problem above, and just will not save that sort of stuff. 4.4 The operator overloading feature of C++ allows the act of pointer de-referencing to bring objects from the data base into memory. a "smart pointer" type is created. It contains a real memory pointer, and a "name" for the object, allowing it to be looked up in the data base. If the memory pointer is is null, the code is called to bring the object into memory. 4.5 Therefore, these beasts are also involved in memory management. Memory management failures can occur at "any time". 4.6 C++ allows you to take the adress of a persistent object moved into memeory. However, the memory manager needs to re-use memeory. Therefore, you still need engineering rules (i.e don;t do that!) which will confuse people. Summary: 5.1 It is possible to invent a "persistence" schem for languages other than C++. 5.2 The concept is not limited to OO languages. 5.3 Issues: - Proprietary Solution? - Heavyness (takes time to scan all that source) - Apropriateness (pits programmers aginest data managers?) - Architecture issues -- do you need equivalance withthe "tabular world"?