Mail of Wed, 1 Mar 2006 18:52:00
Some Good News ! With some helpful suggestions from Giovanna Lehmann, I've tracked down the source of the problem for the CC hang on shutdown. If you remember, the CC would hang on shutdown if you had run any scans/tests, but not otherwise, and not always 100% repeatably. Anyway, the problem turns out to be nothing to do with Giovanna's CLIPS server (which we were using in the right way) and was instead to do with our memory allocation for the CORBA IPCServants for ScanRequest, TestRequest and SequenceRequest. Unfortunately there is not a single one-line fix that fixes up everything. Instead it is necessary to make an small change (stack -> heap storage of the servant) in each test script and each sequence script etc. So far I have only done this for RxThresholdBasedOnConfigRegisterTest, (not yet in CVS) but it should be trivial to implement the fix for the other tests too. The key thing is IF YOU SEE THIS HAPPENING AGAIN (say in a weeks time after you think I have updated all the test scrips) then in your error report NOTE WHICH TESTS/SCANS/SEQUENCES you ran, and also send me the STDERR (and stdout) of the CalibrationController -- the log in /tmp/PartitionName/UserName/CalibrationController* ... >From that information it should be possible for me to figure out which fixes I have missed out. Chris
Changes I made relating to the above mail
The files I changed to make CC no longer hang on shutdown, provided that the ONLY TEST which you did was an RxThresholdBasedOnConfigRegister? test, were those whose diffs are shown below. See comment at foot of diffs.
[pcff] /usera/sctrod/testing_4_3_RC5/SctRodDaq/CalibrationController > cvs diff cvs diff: Diffing . cvs diff: Diffing UnitTest cvs diff: Diffing data cvs diff: Diffing jsrc cvs diff: Diffing src Index: src/RunController.cpp =================================================================== RCS file: /afs/cern.ch/user/b/bgallop/cvsroot/CalibrationController/src/RunController.cpp,v retrieving revision 1.22.2.7 diff -r1.22.2.7 RunController.cpp 139c139 < RC::Controller controller(p,user,str(name),str(parentName),str(segName),str(rules),config.get(),str(server),verbosity); --- > RC::Controller * controller = new RC::Controller(p,user,str(name),str(parentName),str(segName),str(rules),config.get(),str(server),verbosity); 147c147 < controller.run(); --- > controller->run(); 150c150 < return controller.exitStatus(); --- > return controller->exitStatus(); cvs diff: Diffing src/ConfigUpdater cvs diff: Diffing src/IS cvs diff: Diffing src/Serialization cvs diff: Diffing src/ipc cvs diff: Diffing src/scripts Index: src/scripts/DefaultScan.h =================================================================== RCS file: /afs/cern.ch/user/b/bgallop/cvsroot/CalibrationController/src/scripts/DefaultScan.h,v retrieving revision 1.3 diff -r1.3 DefaultScan.h 22c22 < static DefaultScan request; --- > static DefaultScan * request = new DefaultScan(); 26c26 < request.setScanNice(s); --- > request->setScanNice(s); 28c28 < return request; --- > return *request; Index: src/scripts/DefaultSequence.h =================================================================== RCS file: /afs/cern.ch/user/b/bgallop/cvsroot/CalibrationController/src/scripts/DefaultSequence.h,v retrieving revision 1.6.10.1 diff -r1.6.10.1 DefaultSequence.h 25,27c25,27 < static DefaultSequence s; < s.request=Sct_CalibrationController::TestRequest::_duplicate(r); < return s._this(); --- > static DefaultSequence * s = new DefaultSequence(); > s->request=Sct_CalibrationController::TestRequest::_duplicate(r); > return s->_this(); Index: src/scripts/RawScan.h =================================================================== RCS file: /afs/cern.ch/user/b/bgallop/cvsroot/CalibrationController/src/scripts/RawScan.h,v retrieving revision 1.11 diff -r1.11 RawScan.h 22,23c22,23 < static RawScan request; < static const long initialWidth = request.widthCorba(); --- > static RawScan * request = new RawScan(); > static const long initialWidth = request->widthCorba(); 28c28 < request.setWidthCorba(initialWidth); --- > request->setWidthCorba(initialWidth); 34c34 < request.setScanNice(s); --- > request->setScanNice(s); 37c37 < request.setClockByTwoNice(false); /// Important - set not in clock/2 mode by default! --- > request->setClockByTwoNice(false); /// Important - set not in clock/2 mode by default! 51c51 < return request; --- > return *request; Index: src/scripts/RxThresholdBasedOnConfigRegisterTest.h =================================================================== RCS file: /afs/cern.ch/user/b/bgallop/cvsroot/CalibrationController/src/scripts/RxThresholdBasedOnConfigRegisterTest.h,v retrieving revision 1.7.4.1 diff -r1.7.4.1 RxThresholdBasedOnConfigRegisterTest.h 34,37c34,37 < static RxThresholdBasedOnConfigRegisterTest rt; < rt.setFitAlgorithm("NONE"); < rt.setAnalysisAlgorithm("RxThresholdBasedOnConfigRegisterTest"); < return rt._this(); --- > static RxThresholdBasedOnConfigRegisterTest * rt = new RxThresholdBasedOnConfigRegisterTest(); > rt->setFitAlgorithm("NONE"); > rt->setAnalysisAlgorithm("RxThresholdBasedOnConfigRegisterTest"); > return rt->_this(); cvs diff: Diffing test [pcff] /usera/sctrod/testing_4_3_RC5/SctRodDaq/CalibrationController >
Of the above files, the changes to RunController.cpp were probably not necessary and were part of an experiment.
Also, the changes to src/scripts/DefaultScan?.h turned out to have no effect on the hanging properties FOR THIS PARTICULAR TEST, however they will be committed to CVS as I suspect that they are indeed desired in the long term for another test/scan/sequence. CGL
All the other changes had a material effect on the number of ORB errors reported in CalibrationController's STDERR output (each taking away one error). Prior to making the above changes, an xThresholdBasedOnConfigRegisterTest would generate these messages on shutdown:
Sequence::Sequence():: adding module name20220170100028 Sequence::Sequence():: adding module name20220170100056 Sequence::Sequence():: adding module name20220330200595 Sequence::Sequence():: adding module name20220380200179 OWLSemaphore::wait was interrupted. Waiting will be resumed. 1/3/06 15:55:49 ERROR [IPCObjectBase::~IPCObjectBase] The IPCObjectBase destructor is called, but the corresponding servant is still active. To fix the problem you have to respect the following rules: 1. Always allocate your IPC based objects in the heap using the new operator 2. Never call the delete operator for them - use the _destroy() method instead omniORB: ERROR -- A servant has been deleted that is still activated. id: root/Sct_CalibrationController/ScanRequest<0x81f6100> (active) 1/3/06 15:55:49 ERROR [IPCObjectBase::~IPCObjectBase] The IPCObjectBase destructor is called, but the corresponding servant is still active. To fix the problem you have to respect the following rules: 1. Always allocate your IPC based objects in the heap using the new operator 2. Never call the delete operator for them - use the _destroy() method instead omniORB: ERROR -- A servant has been deleted that is still activated. id: root/Sct_CalibrationController/SequenceRequest<0x81f5de0> (active) 1/3/06 15:55:49 ERROR [IPCObjectBase::~IPCObjectBase] The IPCObjectBase destructor is called, but the corresponding servant is still active. To fix the problem you have to respect the following rules: 1. Always allocate your IPC based objects in the heap using the new operator 2. Never call the delete operator for them - use the _destroy() method instead omniORB: ERROR -- A servant has been deleted that is still activated. id: root/Sct_CalibrationController/TestRequest<0x81f67e0> (active) omniORB: Assertion failed. This indicates a bug in the application using omniORB, or maybe in omniORB itself. file: ../src/lib/omniORB/orbcore/omniServant.cc line: 219 info: activation_found
(NB: the four lines
Sequence::Sequence():: adding module name20220170100028 Sequence::Sequence():: adding module name20220170100056 Sequence::Sequence():: adding module name20220330200595 Sequence::Sequence():: adding module name20220380200179are not errors, though they are sent to stderr. They are produced BEFORE the TDAQ tries to shut the process down. CGL)