SctRodDaq Wiki: SystemPerformance

This page describes how to get the most out of SctRodDaq....

General Setup

Typically, all processes in a normal SctRodDaq configuration are CPU bound. This means that if you want it to go faster buy faster/more CPUs! In multiple machine configurations, some components can get limited by disk access speeds, but this is a small effect for a whole barrel.

A normal setup would have one, possibly slow, machine for general processes such as the SctGUI, CalibrationController and so on. One machine for fitting (this should also be the NFS server) and one machine for archiving/analysis.

Note that currently, the most computers it is worth having is four as the software cannot be split up further (perhaps this should be reconsidered for the future).

Typical speeds

These are all given for 3 GHz Pentium 4's where appropriate - feel free to update as things change!

Fitting a threshold scan takes about 0.3s/module
Archiving raw data takes about 0.3-0.4s/module
Analysis is very fast for all tests except the response curve which takes about 0.5-0.6s/module
Data readout from to the SBC from the RODs takes about 0.8s/module for a large (eg NMask) scan dropping to 0.3-0.4s/module for a typical threshold scan.
A scan takes about 6 minutes for a fully populated ROD including module setup time (which is about 3 minutes).
If there is no fitting to be done for a module, the FittingService still reads the data in, this should take less than 0.02s/module.

Ideas for improving system performance

Switch from FIFO buffers to FILO buffers
- This should be a simple one line change. The only place it might cause problems is in the AnalysisService (as multiple scans are composed into a test there), but it should be the case that it doesn't make any assumption about the order of arrivals (if it does, then that's a bug).
- This would give a 10-20% performance gain.
- It might be confusing for the user as data would appear in a slightly odd order.
- This works by processing the data already cached in RAM first.
If analogue tests are slow, then try switching to the analytic fit algorithm. This is considerably faster (factor of 3 or more) than the normal fit algorithm, but the results are not quite so good (see my thesis).
Make sure the software has been compiled with optimisations on (this is the case for the binary releases).
Stop archiving (some) raw data.
- This would be a relatively simple change and would have a large impact on performance -- the ArchivingService is the slowest of all the services.
- It would be possible to ensure that the binary files are not deleted by the GUI, but to run a cron job overnight that archives them and deletes them then.
- The ArchivingService takes about 20-30% of the performance in the Oxford system.
- This would also have the advantage of ensuring that the FitScanResult? and TestResult data are saved by the end of a Test (approximately!) so there would be no need to remember to wait for the ArchivingService to finish before shutting everything down.
- Which part of the ArchivingService is slow?
  - If it's gzipping to level 9 then maybe a temporary store to level 1 would mean the data gets to disk quickly.
  - Otherwise, probably not worth optimising too much, as new system in future.
Buy more computers - potentially quicker and cheaper than making large changes to the software
Be aware that using the GUI to view data during a scan will have a real impact on performance.
We could consider using the Intel compiler - this has given 20-30% performance improvements in other applications. (Perhaps investigating the GCC options for emitting SSE/SSE2 instructions would be useful too).
Longer term, remove ROOT (see SctRodDaqRelease4), perform more optimisation.

Computer Specifications

Some hints on buying computers for the DAQ.

CPU performance dominates
- So, at the expense of other things, buy more and faster CPUs.
- 3 GHz would seem to give a good price/performance ratio currently, but faster CPUs are better of course!
Cache size is also important - 256Kb is too small, go for at least 512Kb (there doesn't seem to be much improvement with large sizes).
RAM is useful, but most processes are economical with their RAM usage. The main effect is to determine when data has to be start being read from disk. This effect can be mitigated by making the switch from FIFO to FILO buffers.
I am unsure if there is any advantage is switching to faster RAM. I suspect not with 3.2 GHz and slower CPUs
Hard drives can be important, but they do not dominate.

So, in summary, if money is tight, buy the fastest CPU - just get standard everything else.