SctRodDaq Wiki: SynchTriggerNoiseDebugging

Alan's findings as of 12/7/05

One's TIM needs to be set up as described in http://www.hep.ucl.ac.uk/atlas/sct/tim/. The fact that the event counts only intermittently were right on some slots hinted at a trigger (synched to all slots) vs trigger counter (bussed) problem. In at least one TIM 3B, the switch logic was upside down.

Observations (before fixing tim with upside down logic in switches)

With 2 modules and 20V, 50k trigs/bin and 101 bins, HV at 20 V (i.e. relatively high occupancy) scan.tim=1, scan.nth=15, single L1A, Qth=0.8fC
- 100 kHz triggers, 0.01 Hz reset - Errors and Overflow and then no histogramming
- 100 kHz triggers, 1 Hz reset - ran OK.

At the end of a working scan as above, but with 150V HV and 1 Hz reset we get:

******************* Post triggers status 2 ******************
Reading from dsp 0
TrapStatus: 
Event word count: 44141 IFrame tail: 26 XFrame tail: 32
 IFrame head: 26 XFrame head: 32
Trap Command/Status: Trailer Header ISR active 
"Error count": 0
"Event count": 218
Bin: 40 Cal: 0
Num events: 0
Histogram status: 0x0 
Bin err: 0 proc time: 15
Events recvd: 136666
Checking TIM scan complete

Having fixed TIM problem, a software problem becomes obvious - distSlave was not set to 0 for this scan in Release4-upwards :( With this fixed it also runs ok from the GUI.

Also dont foget the 16-bit ness of the TIM register. Either check for this or do a work-around.

Bruce (and Alan) debugging synch triggers 29/6/2005

Debugging of SynchTrigerNoise test

Problems identified:

The events contain BC and L1 errors
EFB error mask bits can be set so that they are not reported as errors
Events containing BC and L1 errors are not histogrammed,
1. Got distracted far too long by this...
2. It would be nice if the DSP code applied the appropriate mask when looking for headers
The Rod Buffer seems to fill up so that after a certain number of events we get a nearly-full error. This happens even if events are sent at 1kHz.
The L1 and BC counters from the modules seem to do pretty-much what we expect.
The L1 counter from the ROD seems to be stuck at 9.
The BC counter on the ROD does not correspond to that on the modules.
1. Sort of (sometimes out by 1?) looked like the correct increment to be the L1 counter
The ECR counter (ROD) seems to incriment by 2 rather than 1 between bins.
The triggers-sent-per-bin-counter on the ROD seems to be offset by 1 bin from the resets of the module L1 and BC counters.
Looking at the bin divisions expected from the ECR counter, this doesn't correspond with the output histogram

debugging tools used:

.X synchNoise.cxx
- With small numbers of triggers and bins
- Aim to fit all events in the 32 frames of the event buffer
- Also useful to know what the mask pattern is:
  - tapi.modifyABCDVar(14, 1.0);
  - tapi.modifyABCDVar( 9, 120.0); // 8 channels per chip
tapi.scanEvents
- Summarises event buffers on all slaves
tapi.decodeEvents
- Decodes one event from a buffer
.L Functs.cxx
set debug option("scan_error_trap_all")
- Switch to error event mode (histogramming probably won't work)
- tapi.decodeEvent needs telling it's an error event.
- eg tapi.decodeEvent(0, 0, 0, 0, 3, 0, 1); // Where 3 is the index into the buffer
print_calib()
- There's probably something useful somewhere about what state the formatters get into that causes "ALMOST_FULL" (ie why the FIFOs don't empty)

Further investigations:

Forgot to try setting nth = 1 and a low trigger rate
- This would help when looking at L1 and BC counters in the error event dumps

Dual Triggers

Correct behaviour and output for six modules in
- groups 1,2,3,4
- groups 0,1,4,5
[ Which configuration created errors? ]
Tested both scanning, and one or other scanning.
N.B. configure2 forces use of SP_BOTH
- also set trigger sequence 2
  - if only one then the DSPs might interpret things differently
if USE_DUAL_PORTS is #defined to 1, however normal nmask seems to generate BC errors, even with only groups 0->3.
- But it did get started, didn't fail till part way through bin 3
It's possible we tested the wrong thing, should try
- groups 0, 1, 6, 7

Off-ROD redundancy

Fixed in principle for configuration setting to all modules, and modifyABCDVarROD all modules
AJB to merge to HEAD
AJB to do for single modules
needs to be tested

firmware etc

Started TIM
status
 Serial Number: 18 Version: 9
 L1ID: ffffff BCID: 6000 status: aa80
Setting TIM frequencies: 100kHz 0.01Hz
trigPower 2
trigBase 10
trigNibble 6 (6, 0)
rstPower -2
rstBase 0.1
rstNibble 31 (7, 3)
TIM initialise successful
Found configuration for 1 rods
Load configuration for rod
Initialise ROD 0 at 0x6000000
Constructed
Doing full ROD reset...
 ... done
Initialised
Started ROD
Slot: 6Serial Number:589Number of slave DSPs: 4
Status registers[0-2]: 2201 0 0 
Command registers[0-1]: 0 0 Primitive state: 0 Text State: 0
Got some text: crate: 0 rod slot: 6
Text INFO : 540
[MDSP: rodConfiguration.c,   735]::
Configuring ROD: initialization mode.

[MDSP: rodConfiguration.c,   207]::
SDSP #0 clock set to 160 MHz.

[MDSP: rodConfiguration.c,   207]::
SDSP #1 clock set to 160 MHz.

[MDSP: rodConfiguration.c,   207]::
SDSP #2 clock set to 160 MHz.

[MDSP: rodConfiguration.c,   207]::
SDSP #3 clock set to 160 MHz.

[MDSP: utilities.c,   614]::
 SCT DSP code version 1.1.1 Oct 15 2004
 Formatter code version SCT e 20  40 MHz
 EFB code version e 2b
 Router code version e 21
 Controller code version e 1c

 
ROD 0 initialised
Initialise BOC 0
Production BOC - Revision C: status
 Module Type: 46 Serial Number: 42
 Hardware Version: 1 Firmware Version: 5
 Manufacturer: cb Status Register: e

Slave code from /work/ppatlas1/config/DspImages?/SlaveDspCode?.051104_dualFixed