Tutorials

The ATLAS Physics Analysis Workbook - Release 12

This tutorial was held during the Latin American Software Workshop which took place in March 2008. The main focus of the tutorial is to introduce PhysicsAnalysisTools necessary to produce and analyze DPD. We will use Athena to produce D2/3PD and analyze them using interactive Athena analysis and ROOT using AthenaROOTAccess. For this tutorial we will use TopPhysDPDMaker and TopPhysTools as an example working package.

This tutorial was held during the BNL FDR Jamboree which took place in March 2008. The main focus of the tutorial is to introduce PhysicsAnalysisTools necessary to produce and analyze DPD. We will use Athena to produce D2/3PD and analyze them using interactive Athena analysis and ROOT using AthenaROOTAccess. For this tutorial we will use TopPhysDPDMaker and TopPhysTools as an example working package.

Trigger

Main page explaining Trigger Configuration from point of view of ATHENA appears to be the main resource. It comments that you can find the available trigger type with as follows.
The transform argument to set the trigger configuration is triggerConfig
Options include: DEFAULT NONE OFF full full_no_Bphysics full_no_prescale lumi1E31
For complete list see the JobProperty TriggerFlags.TriggerMenuSetup: LXR DOxygen or:

athena -i
from TriggerJobOpts.TriggerFlags import TriggerFlags
print TriggerFlags.triggerMenuSetup.allowedValues

which at present gives

[ 'default', 'full', 'full_no_prescale', 
  'full_no_Bphysics', 'full_no_Bphysics_no_prescale', 
  'lumi1E31', 'lumi1E31_no_prescale', 'lumi1E31_no_Bphysics', 
  'lumi1E31_no_Bphysics_no_prescale', 'lumi1E32', 
  'lumi1E32_no_prescale', 'lumi1E32_no_Bphysics', 
  'lumi1E32_no_Bphysics_no_prescale']

Note:

DEFAULT currently corresponds to the menu named 'full', but this may be changed in the future.
NONE and OFF are meant to switch of the trigger. This means disabling whatever the appropriate trigger functionality is for that transform. Just because the trigger is on does not mean it will try to run the full trigger selection, so you should only need to turn it off if there is a problem with it. See below for details.

A different page has another angle: a href="https://twiki.cern.ch/twiki/bin/view/Atlas/RunTrigger">RunTrigger suggests:

TriggerFlags.triggerMenuSetup = 'default'   ! all chains defined for 10^31, 10^32 and 10^33 menu without prescale factors are used
TriggerFlags.triggerMenuSetup = 'lumi1E31_no_prescale'   ! 10^31 menu without prescale factors applied
TriggerFlags.triggerMenuSetup = 'lumi1E31'  ! 10^31 menu with prescale factors applied
TriggerFlags.triggerMenuSetup = 'lumi1E32_no_prescale'   ! 10^32 menu without prescale factors applied
TriggerFlags.triggerMenuSetup = 'lumi1E32'   ! 10^32 menu with prescale factors applied

Job Transforms

What is a transformation?: A transformation is a shell script that takes a few input parameters and executes athena.py to perform a task. Transformations are normally used to run jobs on the grid, but can also be very handy to run some production. ATLAS is now using python as a scripting language, so transformations will become python scripts shortly. This text taken from RomeProductionTransformations.

Some job transforms can be found here. Files like csc_buildTAG_trf. When you run them you see things like:

bash-3.00$ csc_buildTAG_trf.py --help
JobTransform csc_buildTAG version RecJobTransforms-00-05-89
Make TAG's from AOD's using RecExCommon. If outputAODFile argument is not NONE,
then also a merged AOD file is created
usage: csc_buildTAG_trf.py [options]            [jobconfig]
       Arguments can be passed in order (positional) or as name=value (named). Arguments in [] are optional.
       Option -h,--help: get detailed help
Options:
   -h,--help                        Print detailed help
   -l ,--loglevel=    Output message level. Possible values: ['INFO', 'ALL', 'VERBOSE', 'WARNING', 'ERROR', 'DEBUG', 'FATAL']
   -t,--test                        Run in test mode: skip some checks
   -a ,--athenaopts=  Options to be passed on to athena
Arguments:
    1 inputAODFile (list) # Input file that contains AOD's
    2 outputTAGFile (str) # Output file that contains TAG's
    3 outputAODFile (str) # Output file that contains AOD's
    4 outputHPTVFile1 (str) # Output file that contains ntuples.
    5 outputHPTVFile2 (str) # Output file that contains ntuples.
    6 outputHPTVFile3 (str) # Output file that contains ntuples.
    7 outputHPTVFile4 (str) # Output file that contains ntuples.
    8 geometryVersion (str) # Geometry Version
    9 maxEvents (int) # Maximum number of events to process
   10 triggerConfig (str) # Configuration string to use for TrigT1 and HLT. Set to 'NONE' to switch off trigger,    and set to 'DEFAULT' to use the default of the used release.
   11 AODCorrection (bool) # True/False: Apply correction for 1mm range cut problem.
  [12 jobConfig] (list) default='NONE' # Joboptions file with user settings, in particular the configuration settings. Default packages: .,RecJobTransforms,PyJobTransforms

I have tried to run the above with as few correct arguments as possible. At present I get furthest with

csc_buildTAG_trf.py /atlas/lester/DATA/FDR/user.AyanaTamuHolloway.streamtest.004881.inclEle.recon.AOD.v13003003_V2.AOD._00001.pool.root outputtagfile.lester outputaodfile.lester outputhptvfile1.lester outputhptvfile2.lester outputhptvfile3.lester outputhptvfile4.lester geometryversion.lester 14215469 full False jobconfig.lester >&  MOO

TAG

Main TAG Twiki page

Here is the *DEVELOPMENT* TAG event selector AOD skimmer web-tool that creates files for you. The authors call it the "ELSSI: the ATLAS Event Level Selection Service Interface." ... and here is the *PRODUCTION* version of the same thing.

Here are some TAG tips from the TopFdrTag page.

TAG examples with Streams and GANGA

Ganga

Currently aliased to

/atlas/lester/usr/local/install/4.4.7/bin/ganga

Simple local hello world example (runs on eg pcfk):

j=Job()
j.name='Local Hello World Test of Lester'
j.submit()

Simple LCG hello world job example:

j=Job()
j.backend=LCG()
j.name='LCG Hello World Test of Lester'
j.submit()

Example job to data

Finding datasets and compute elements

d = DQ2Dataset()
d.list_datasets(name='fdr08*')
d.list_datasets(name='*0003077*')
d.list_locations('fdr08_run1.0003077.StreamMuon.merge.AOD.o1_r12_t1')

Also

j.inputdata.dataset
j.inputdata.list_locations()
j.inputdata.list_locations_num_files()
j.inputdata.list_locations_ce()

DQ2 outside Ganga

(Copied from ~lester/LESTERHOME/notes/grid-setup-cam.sh originally based on Atlas TWIKI UsingDQ2.) In the past I have used:

source /afs/cern.ch/project/gd/LCG-share/sl4/etc/profile.d/grid_env.sh
voms-proxy-init -voms atlas
source /afs/usatlas.bnl.gov/Grid/Don-Quijote/dq2_user_client/setup.sh.CERN

The Atlas TWIKI UsingDQ2 page however now recommends

source /afs/cern.ch/project/gd/LCG-share/current/external/etc/profile.d/grid-env.sh
voms-proxy-init -voms atlas
source /afs/cern.ch/atlas/offline/external/GRID/ddm/endusers/setup.sh.CERN

though presumably one can replace ".CERN" with ".any" also?

13.0.40 setup

source /atlas/lester/13.0.40/setup.sh
source /atlas/lester/13.0.40/hack/PhysicsAnalysis/AnalysisCommon/UserAnalysis/cmt/setup.sh
cd  /atlas/lester/13.0.40/hack/PhysicsAnalysis/AnalysisCommon/UserAnalysis/cmt

Was largely prepared for FullDressRehearsal running as follows:

cmt co -r UserAnalysis-00-10-12 PhysicsAnalysis/AnalysisCommon/UserAnalysis
cd PhysicsAnalysis/AnalysisCommon/UserAnalysis/cmt
source setup.sh
cp /r16/atlas/cowden/13.0.40/PhysicsAnalysis/AnalysisCommon/UserAnalysis/src/AnalysisSkeleton.cxx ../src
cp /r16/atlas/cowden/13.0.40/PhysicsAnalysis/AnalysisCommon/UserAnalysis/UserAnalysis/AnalysisSkeleton.h ../UserAnalysis
cp /r16/atlas/cowden/13.0.40/PhysicsAnalysis/AnalysisCommon/UserAnalysis/run/AnalysisSkeleton_topOptions.py ../run

Which resulted in these diffs and this for the job options resulting in this.

Then build, etc. Used a Ganga job script saved here created thus:

j=Job()
j.application=Athena()
j.application.prepare()
j.application.option_file='/var/pcfl/atlas/lester/13.0.40/hack/PhysicsAnalysis/AnalysisCommon/UserAnalysis/run/AnalysisSkeleton_topOptions.py'
j.application.max_events='1500'
j.inputdata=DQ2Dataset()
j.inputdata.type='DQ2_LOCAL'
j.inputdata.dataset='fdr08_run1.0003077.StreamMuon.merge.AOD.o1_r12_t1'

j.backend=LCG()
j.backend.CE='serv03.hep.phy.cam.ac.uk:2119/jobmanager-lcgcondor-atlas' #  NOT USING JOB-TO-DATA !

j.outputdata=ATLASOutputDataset()
j.outputdata.outputdata=['AnalysisSkeleton.aan.root']  # Note that this line must agree exactly wityh the relevant lines in the AnalysisSkeleton_topOptions.py  file:  ServiceMgr.THistSvc.Output = [ "AANT DATAFILE='AnalysisSkeleton.aan.root' OPT='RECREATE'" ] as j.outputdata.outputdata tells ganga what to grab, and it won't be able to grab a file that has a different name than we tell it.  The real name of the written file is specified in the jobOptions, and ganga can't control that.
j.submit()

Note that job.outputsandbox and job.outputdata are not the same thing. Retrieve outputdata (eg stuff on castor) with:

j.ouputdata                # to view what is there
j.ouputdata.retrieve()     # to actually bring it home from castor

which plonked

/usera/lester/gangadir/workspace/Local/37/output/AnalysisSkeleton.aan.root

onto my local filesystem.

Job to data version of the above

j=Job()
j.application=Athena()
j.application.prepare()
j.application.option_file='/var/pcfl/atlas/lester/13.0.40/hack/PhysicsAnalysis/AnalysisCommon/UserAnalysis/run/AnalysisSkeleton_topOptions.py'
j.application.max_events='1500'
j.inputdata=DQ2Dataset()
j.inputdata.type='DQ2_LOCAL'
j.inputdata.dataset='fdr08_run1.0003077.StreamMuon.merge.AOD.o1_r12_t1'

j.inputdata.match_ce_all=True        # USE JOB-TO-DATA !
j.backend=LCG()

j.outputdata=ATLASOutputDataset()
j.outputdata.outputdata=['NewAnalysisSkeleton.aan.root']  # Note that this line must agree exactly wityh the relevant lines in the AnalysisSkeleton_topOptions.py
j.submit()

The name of the AOD dataset must be supplied. You can search for datasets using the dq2_ls command-line tool or the web page.
If you don't know which specific CE you want the job to run on, you must set j.inputdata.match_ce_all=True
If the AOD dataset is not registered as COMPLETE somewhere on DQ2, set j.inputdata.min_num_files=0 so that the Resource Broker will also consider sites where the dataset is registered INCOMPLETE (and where it is quite likely that the whole dataset is actually present, just not yet registered as COMPLETE)

How do Ganga and Athena negotiate specification of input collections?

The top job options file might contain a section looking like this:

# The AOD input file
ServiceMgr.EventSelector.InputCollections = [ "AOD.pool.root" ]

which looks, at face value, like it specifies the input collection. Yet when you use Ganga you tell Ganga what input collection you want to use, and it even goes off and finds the CE nearest to that input collection etc. So when athena is fired up, how does it know to look for the data you actually want, and not the mysterious "AOD.pool.root" from above?

The answer seems to be that Ganga creates a file

athena-lcg.sh

which it puts in

_input_sandbox_*.tgz

and which has a section

cat - >input.py <0):
            print 'Input: %s' % name
            ic.append('%s' % name)
    EventSelector.InputCollections = ic
    if os.environ.has_key('ATHENA_MAX_EVENTS'):
        theApp.EvtMax = int(os.environ['ATHENA_MAX_EVENTS'])
    else:
        theApp.EvtMax = -1
EOF

    # Configuration error
    else
        # WRAPLCG_WNCHECK_UNSPEC
        retcode=410100
    fi

else
# no input_files
cat - >input.py <
which creates a file at job run-time called input.py
 from text file input_files
 which ganga created earlier and which is also bunfled in the input sandbox.  If input_files
 is non-empty, the input.py file appears to contain a line defining the input colletion: 
EventSelector.InputCollections = ic

which somehow makes its way into the top-options file, and presumably replaces the default value we want to get rid of.

What runs in what order in a ganga lcg grid job?

Ganga creates a JDL file called __jdlfile__ which demands that the a (python) script called __jobscript_49__ be run, in the presence of two other files in the input sandbox:
"/usera/lester/gangadir/workspace/Local/49/input/_input_sandbox_49.tgz",
"/usera/lester/gangadir/workspace/Local/49/input/_input_sandbox_49_master.tgz"

The main things that this file does are:
(1) Obtain and untar the input sandboxes.
(2) Run the ./athena-lcg.sh in the current working directory after making it executable:
	execSyscmdEnhanced('%s %s' % (appexec,appargs))
		is the relevant line which basically amounts to
	exec ./athena-lcg.sh ''
(3) Pack up output into sandboxes and upload to storage elements where necessary.


Note that athena-lcg.sh is the bash script which
(1) sets up PATHS, LD_LIBS, GCCworkarounds and other ENVIRONMENTS in preparation for gcc, athena etc.
(2) compiles your software if you asked for this.
(3) stages in input files using dq2.
(4) creates the input.py file which specifies the InputCollections for athena to use appropriate to the materials just staged in.  input.py also gains, if necessary lines like theApp.EvtMax = 12345 where appropriate.
(5) echos "Running Athena ..."
(6) cats input.py so that we can see what it will use
(7) starts athena with:

	athena.py $ATHENA_OPTIONS input.py

    where $ATHENA_OPTIONS is in practice a variable it set from the content of a file it put in the input sandbox called athena_options.  In practice this contains the name of your top-level job options file, so in practice athena is started with:

	athena.py YourTopOptions.py input.py

    which explains how the input collection is successfully overwritten by Ganga.