F9
aCT todo
Failed jobs:
log.tgz uploading in the gm -> failed in upload -> check for success
error codes for failed job (not in athena)
Rerun:
proper resuming, something wrong in logic
maximum retries
Resubmit:
lrms error -> rerun job
Performance:
more threads
merge db calls within commits
Various:
check http connections
socket timeouts
return values of aCTPanda funcs
Broker:
active job counts, stats per cluster in aCTDB, aCTBroker
limit jobs in waiting (preparing, accepted, finishing, queued)
Statistic and web reports
in db, job history
some mod_python to present the stats
Utilities:
Job checking
Killing, suspending tasks
reque long waiting jobs
Remove obsolete code
--
AndrejFilipcic
- 18 Dec 2008
E
dit
|
A
ttach
|
P
rint version
|
H
istory
: r2
<
r1
|
B
acklinks
|
V
iew wiki text
|
Edit
w
iki text
|
M
ore topic actions
Topic revision: r2 - 12 Apr 2010,
AndrejFilipcic
Main
Research
Argus
HERA-B
ATLAS
CPLEAR
DELPHI
Belle
Belle II
Auger
Lidar
EGEE
Dosimetry
Brahiterapija
RADDOS
Projects
ATLAS-HGTD
Nastavitve
Obvestila
Events
Seminars
Preprints
arXiv.org
HEP Experiment
HEP Phenomenology
HEP Theory
Astrophysics
News
Slashdot
freshmeat
BBC Science
Mail
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
Log In
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki?
Send feedback