Overview | Releases | Download | Docs | Links | Help | RecentChanges

HangingCalibrationController

Mail of Wed, 1 Mar 2006 18:52:00

Some Good News !

With some helpful suggestions from Giovanna Lehmann, I've tracked 
down the source of the problem for the CC hang on shutdown.

If you remember, the CC would hang on shutdown if you had run any 
scans/tests, but not otherwise, and not always 100% repeatably.

Anyway, the problem turns out to be nothing to do with Giovanna's 
CLIPS server (which we were using in the right way) and was instead 
to do with our memory allocation for the CORBA IPCServants for 
ScanRequest, TestRequest and SequenceRequest.

Unfortunately there is not a single one-line fix that fixes up 
everything.  Instead it is necessary to make an small change
(stack -> heap storage of the servant) in each test script and each 
sequence script etc.

So far I have only done this for
RxThresholdBasedOnConfigRegisterTest, (not yet in CVS) but it should
be trivial to implement the fix for the other tests too.

The key thing is IF YOU SEE THIS HAPPENING AGAIN (say in a weeks time 
after you think I have updated all the test scrips) then in your 
error report NOTE WHICH TESTS/SCANS/SEQUENCES you ran, and also send 
me the STDERR (and stdout) of the CalibrationController -- the log in 
/tmp/PartitionName/UserName/CalibrationController* ...

>From that information it should be possible for me to figure out 
which fixes I have missed out.

Chris

Changes I made relating to the above mail

The files I changed to make CC no longer hang on shutdown, provided that the ONLY TEST which you did was an RxThresholdBasedOnConfigRegister? test, were those whose diffs are shown below. See comment at foot of diffs.

[pcff] /usera/sctrod/testing_4_3_RC5/SctRodDaq/CalibrationController > cvs diff
cvs diff: Diffing .
cvs diff: Diffing UnitTest
cvs diff: Diffing data
cvs diff: Diffing jsrc
cvs diff: Diffing src
Index: src/RunController.cpp
===================================================================
RCS file: /afs/cern.ch/user/b/bgallop/cvsroot/CalibrationController/src/RunController.cpp,v
retrieving revision 1.22.2.7
diff -r1.22.2.7 RunController.cpp
139c139
<    RC::Controller   controller(p,user,str(name),str(parentName),str(segName),str(rules),config.get(),str(server),verbosity);
---
>    RC::Controller *  controller = new RC::Controller(p,user,str(name),str(parentName),str(segName),str(rules),config.get(),str(server),verbosity);
147c147
<     controller.run();
---
>     controller->run();
150c150
<     return controller.exitStatus();
---
>     return controller->exitStatus();
cvs diff: Diffing src/ConfigUpdater
cvs diff: Diffing src/IS
cvs diff: Diffing src/Serialization
cvs diff: Diffing src/ipc
cvs diff: Diffing src/scripts
Index: src/scripts/DefaultScan.h
===================================================================
RCS file: /afs/cern.ch/user/b/bgallop/cvsroot/CalibrationController/src/scripts/DefaultScan.h,v
retrieving revision 1.3
diff -r1.3 DefaultScan.h
22c22
<     static DefaultScan request;
---
>     static DefaultScan * request = new DefaultScan();
26c26
<     request.setScanNice(s);
---
>     request->setScanNice(s);
28c28
<     return request;
---
>     return *request;
Index: src/scripts/DefaultSequence.h
===================================================================
RCS file: /afs/cern.ch/user/b/bgallop/cvsroot/CalibrationController/src/scripts/DefaultSequence.h,v
retrieving revision 1.6.10.1
diff -r1.6.10.1 DefaultSequence.h
25,27c25,27
<         static DefaultSequence s;
<       s.request=Sct_CalibrationController::TestRequest::_duplicate(r);
<       return s._this();
---
>         static DefaultSequence * s = new DefaultSequence();
>       s->request=Sct_CalibrationController::TestRequest::_duplicate(r);
>       return s->_this();
Index: src/scripts/RawScan.h
===================================================================
RCS file: /afs/cern.ch/user/b/bgallop/cvsroot/CalibrationController/src/scripts/RawScan.h,v
retrieving revision 1.11
diff -r1.11 RawScan.h
22,23c22,23
<     static RawScan request;
<     static const long initialWidth = request.widthCorba();
---
>     static RawScan * request = new RawScan();
>     static const long initialWidth = request->widthCorba();
28c28
<       request.setWidthCorba(initialWidth);
---
>       request->setWidthCorba(initialWidth);
34c34
<     request.setScanNice(s);
---
>     request->setScanNice(s);
37c37
<     request.setClockByTwoNice(false); /// Important - set not in clock/2 mode by default!
---
>     request->setClockByTwoNice(false); /// Important - set not in clock/2 mode by default!
51c51
<     return request;
---
>     return *request;
Index: src/scripts/RxThresholdBasedOnConfigRegisterTest.h
===================================================================
RCS file: /afs/cern.ch/user/b/bgallop/cvsroot/CalibrationController/src/scripts/RxThresholdBasedOnConfigRegisterTest.h,v
retrieving revision 1.7.4.1
diff -r1.7.4.1 RxThresholdBasedOnConfigRegisterTest.h
34,37c34,37
<       static RxThresholdBasedOnConfigRegisterTest rt;
<       rt.setFitAlgorithm("NONE");
<       rt.setAnalysisAlgorithm("RxThresholdBasedOnConfigRegisterTest");
<       return rt._this();
---
>       static RxThresholdBasedOnConfigRegisterTest * rt = new RxThresholdBasedOnConfigRegisterTest();
>       rt->setFitAlgorithm("NONE");
>       rt->setAnalysisAlgorithm("RxThresholdBasedOnConfigRegisterTest");
>       return rt->_this();
cvs diff: Diffing test
[pcff] /usera/sctrod/testing_4_3_RC5/SctRodDaq/CalibrationController > 

Of the above files, the changes to RunController.cpp were probably not necessary and were part of an experiment.

Also, the changes to src/scripts/DefaultScan?.h turned out to have no effect on the hanging properties FOR THIS PARTICULAR TEST, however they will be committed to CVS as I suspect that they are indeed desired in the long term for another test/scan/sequence. CGL

All the other changes had a material effect on the number of ORB errors reported in CalibrationController's STDERR output (each taking away one error). Prior to making the above changes, an xThresholdBasedOnConfigRegisterTest would generate these messages on shutdown:

Sequence::Sequence():: adding module name20220170100028
Sequence::Sequence():: adding module name20220170100056
Sequence::Sequence():: adding module name20220330200595
Sequence::Sequence():: adding module name20220380200179
OWLSemaphore::wait was interrupted. Waiting will be resumed.
1/3/06 15:55:49 ERROR [IPCObjectBase::~IPCObjectBase] 
 The IPCObjectBase destructor is called, but the corresponding servant is still active.
 To fix the problem you have to respect the following rules:
    1. Always allocate your IPC based objects in the heap using the new operator
    2. Never call the delete operator for them - use the _destroy() method instead
omniORB: ERROR -- A servant has been deleted that is still activated.
      id: root/Sct_CalibrationController/ScanRequest<0x81f6100> (active)
1/3/06 15:55:49 ERROR [IPCObjectBase::~IPCObjectBase] 
 The IPCObjectBase destructor is called, but the corresponding servant is still active.
 To fix the problem you have to respect the following rules:
    1. Always allocate your IPC based objects in the heap using the new operator
    2. Never call the delete operator for them - use the _destroy() method instead
omniORB: ERROR -- A servant has been deleted that is still activated.
      id: root/Sct_CalibrationController/SequenceRequest<0x81f5de0> (active)
1/3/06 15:55:49 ERROR [IPCObjectBase::~IPCObjectBase] 
 The IPCObjectBase destructor is called, but the corresponding servant is still active.
 To fix the problem you have to respect the following rules:
    1. Always allocate your IPC based objects in the heap using the new operator
    2. Never call the delete operator for them - use the _destroy() method instead
omniORB: ERROR -- A servant has been deleted that is still activated.
      id: root/Sct_CalibrationController/TestRequest<0x81f67e0> (active)
omniORB: Assertion failed.  This indicates a bug in the application using
omniORB, or maybe in omniORB itself.
 file: ../src/lib/omniORB/orbcore/omniServant.cc
 line: 219
 info: activation_found

(NB: the four lines

Sequence::Sequence():: adding module name20220170100028
Sequence::Sequence():: adding module name20220170100056
Sequence::Sequence():: adding module name20220330200595
Sequence::Sequence():: adding module name20220380200179
are not errors, though they are sent to stderr. They are produced BEFORE the TDAQ tries to shut the process down. CGL)