aCT todo

Failed jobs:

  • log.tgz uploading in the gm -> failed in upload -> check for success
  • error codes for failed job (not in athena)

Rerun:

  • proper resuming, something wrong in logic
  • maximum retries

Resubmit:

  • lrms error -> rerun job

Performance:

  • more threads
  • merge db calls within commits

Various:

  • check http connections
  • socket timeouts
  • return values of aCTPanda funcs

Broker:

  • active job counts, stats per cluster in aCTDB, aCTBroker
  • limit jobs in waiting (preparing, accepted, finishing, queued)

Statistic and web reports

  • in db, job history
  • some mod_python to present the stats

Utilities:

  • Job checking
  • Killing, suspending tasks
  • reque long waiting jobs

Remove obsolete code

-- AndrejFilipcic - 18 Dec 2008

This topic: Main > AtlasACTTODO
Topic revision: 12 Apr 2010, AndrejFilipcic
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback