|
TASK |
Importance |
Assignee |
Comments |
|
1. Enable 3-tier architecture for job submission and execution |
Critical |
GG/PM |
Condor team is pressured strongly |
|
1.1. Receive implementation of new Condor and test it |
|
GG/PM |
|
|
1.2. Understand the implications. At a minimum: 1.2.1. Erase interim tar file (Was: the submission client should clean up) 1.2.2. Separate the true client from the queuing system 1.2.3. Make the client part of the CDF distribution |
Important |
GG/PM |
|
|
2. Proper integration with the experiments’ local environments |
Critical |
|
|
|
2.1. Develop new job managers: 2.1.1. SAM 2.1.2. CDF CAF 2.1.3. D0 MC |
Important |
AB |
|
|
2.2. Develop generic job monitoring 2.2.1. Create the “jobs” branches in the information tree 2.2.2. Design the indirection mechanism for job details retrieval 2.2.3. Implement providers that talk to the local DB and possibly GRAM log files and other info sources 2.2.4. Develop 1 (at most 2) new web page in PHP |
Critical |
IT/VM |
|
|
2.3. Deploy a local XML database 2.3.1. Understand the interface to Manage JID mappings 2.3.2. Implement with Xindice or other XML DB 2.3.3. Make this DB server part of the JIM distribution. |
Important |
IT/AB/VM |
Has far-reaching benefits |
|
2.4. D0 MC Specific issues: 2.4.1. Finalize samg client 2.4.2. Include new LDAP schema attributes into JIM |
Critical |
IT/RW |
|
|
2.5. CDF CAF specific issues: 2.5.1. Better authentication! 2.5.2. Interface JIM with FBSNG monitoring |
Critical |
IT/FW |
|
|
3. Better site description, configuration and advertisement |
Critical |
|
Provision for interoperability with any Grid |
|
3.1. Move site configuration schema out of jim_info_providers and base it on the meta-configurator |
|
GG |
|
|
3.2. Move from gatekeeper-centric view to site-centric view. Properly abstract from the globus gatekeeper |
|
GG/IT/AB |
|
|
4. Internal solidification and bug fixes |
Important |
|
This is how a prototype becomes a V1 ;-) |
|
4.1. New samg client, extensible and flexible |
Critical |
|
DONE |
|
4.2. Repackage the JIM suite with new components such as server_run |
|
GG/PM |
|
|
4.3. Better installation scripts |
|
GG/TR |
|
|
4.4. Randomize jobs among sites that have zero cached files (Was: write a ranking algorithm that implements load balancing) |
Desirable |
AB |
|
|
4.5. Release a new version of globus_rm_server with the fixed job-manager |
Desirable |
GG/RW |
Old version has problems with rapid fire submissions – need clarification. New version is incompatible with AFS |
|
4.6. Increase coherence in the attribute names used to advertise resources and describe jobs |
Desirable |
GG/IT |
|
|
4.7. Convert tailoring scripts of all JIM products to use the meta-configurator |
Important |
GG/PM/TR |
|
|
4.8. sam_gsi_config does not install properly if the products area is NFS-shared: fix this. |
Desirable |
|
DONE |
|
4.9. Release the CLI (in samg) for job and resource status inquiries |
Desirable |
VM |
|
|
4.10. Better credential management in samg client (Was: before preparing the input sandbox the client should check the proxy): 4.10.1. Check the available proxy 4.10.2. Suggest to the user the option of making a proxy out of Kerberos ticket 4.10.3. Make the subject (and the CA) an explicit attribute of the job |
Important |
GG/VM |
|
|
4.11. Credential management at match time 4.11.1. Publish trusted CA’s in the classAds of the resource and modify Requirements as to include some match between user and resource CA’s. |
Desirable |
GG |
|
|
4.12. Publish user authorization information at the cluster/gatekeeper level, include CA’s |
Desirable |
VM |
|
|
4.13. Solve the Condor problem of using a resource only once in a cycle by means of using match counters (available in new Condor release) |
Important |
PM |
Needs experimentation |
|
4.14. Close the monitoring gap for when the job’s sandbox is transferring (Was: the monitor should have a state to indicate that the gahp server is working) |
Desirable |
--- |
|
|
4.15. New advertisement framework with dynamic information providers |
Desirable |
GG |
The site config file is re-read every time. We still want some information produced dynamically |
|
4.16. Show available resources on the Web |
Desirable |
--- |
Needs Web services framework |
|
4.17. Better error reporting for crashing/disappearing jobs |
Desirable |
PM |
Needs itemization based on new tests |
|
4.18. Verify the “liveliness” of the advertised stations |
Obscure |
--- |
How often does this become an issue? How does it mesh with the dynamic provider framework? What is the right action to take? |
|
4.19. Show data files associated with a job |
Obscure |
--- |
Need a pointer to SAM as a Web service |