Berkeley Nuclear Data Software
|
This page is meant to be a collaborative accumulation of knowledge to ease the process of getting started with software in this repo, and generally in using computers to do science.
If you are getting started with us and feel like there are things that are missing from this document and the next person getting started would find usefull please add them!
Getting started using computers to do science is not an easy task. Google is your best friend. The people around you can also be great resources as they have likely encountered the issue you are having. If you find yourself stuck on somehting for a long time, reach out, there is support.
Although one could talk about a quiet space to think here, this is referring to a functional virtual workspace that you can compute in. This can be accomplished in any modern operating system, but generally includes
a decent text editor
Software at it's base is text, and you will need to interact with it via a text editor. Notepad could be used to edit code, but it will be painful.
Some features I might consider requirements
Some features I find highly productivity enhancing
a command line interface
Most of the code you interact with will require use of the command line. Even if someone went through the effort to produce a GUI, you will likely need to launch it from the command line
a package manager for sourcing commonly required
Software will have dependencies. This means it will rely on code that someone else wrote and published. A package manager will allow you to fetch the software you need in a packaged format appropriate for the os you are using.
We have people functionally computing in several workspaces, and this will likely expand over time. Below are common ones people are currently using and what the previous components discussed refer to specifically
All of our acquisition computers and several student are running native linux installs. I have over time gravitated to Fedora's KDE spind most of the acquisitioin machines are also running this. The newest version runs out of the box on all the machines I've thrown it at, and it has built in support for nvidia GPU's in the extended repos.
Setup notes to go from a basic Fedora install, to ready to build and run software can be found here
Reflecting on our previous list
Text Editor : Sublime Text
from the command line
will start an instance of sublime text with it's file browser pointing to the current directory
Package Manager : dnf
syntax looks like
Any othern modern linux OS should be fine as long as it supports c++ 14 or greater.
Windows now has a windows subsytem for linux (WSL) which for most purposes looks like running linux natively. There are two versions WSL1 and WSL2. The way WSL1 is implemented has slow file access, but is generically functional WSL2 I have not personally worked with. The file access speed has been fixed, but we have had a couple of people struggle to get graphical programs working correctly.
Other issued we've seen include files saved on windows have different line endings, so if a file was downloaded and saved in the windows os and then moved into the WSL space, it can cause issues.
Reflecting on our previous list
Text Editor : Sublime Text (recomend editing in linux space to avoid above issue)
from the command line
will start an instance of sublime text with it's file browser pointing to the current directory
Package Manager : aptitude
syntax looks like
The software in the repo should work on osx, and it has been built and tested, but I am not currently working in this space. If you get to a good working space on osx please update this to look like above
Reflecting on our previous list
Text Editor : Sublime Text (XCode can work, but it's pretty heavy)
from the command line
will start an instance of sublime text with it's file browser pointing to the current directory
The university provides students with access to a copy of VMWare which will run on all of the previous listed operating systems.
I put together a virtual machine with a functional work environment that can be downloaded and used. It takes up twenty GB and will be slightly slower than baremetal linux.
It probably requires the least effort, but is likely the most computer resource intensive.
Let's chat if you think this is the best way forward.
I have very little to add here that google won't help you with almost immediatly
If you've never used a terminal before this introduction has a good description of what you need to get around and deal with normal computing from the comamnd line.
One of the really powerful things in the command line is the ability to open a terminal on another computer. This is done through a program called secure shell (ssh). Check out this discussion for commands and usage.
If you've never encountered version control systems maybe start here
The most common thing I hear when getting new people working is "I didn't want to break it" Version control allows us to not worry about that. You won't break it. Even if you try some really sketchy stuff, the fact that we are using a distributed version control system means that there is a complete copy of the software and it's entire development history on many computers.
We use git with bitbucket and they have a nice tutorial section. Much of thelanguage is pretty focused towards formal software development, but it includes examples of how to do normal tasks
Some best practices
Working in a compiled language means that a program needs to be run on the code that has been written to turn it into machine language that the processor of your computer understands. This generally is done on the computer the code will be run on.
The simplest example of compiling a progam follows.
Below is hello world in C++ Copy it and save it in a text file named hello.cpp
navigate to where you saved the text file and run
this created a file caled a.out by running your c++ compiler on the file and creating a machine language representation of the code that is executable.
You can run using
and it should give you hello world.
What if you wanted to call the program something else? Compilers can take many options and a.out is just the default output name when compiling an exectuble. If instead we had passed the -o flag to the compiler with a name such as
we instead get hello.exe as a resulting executable file.
The compiler can generate another type of machine language file called a library. Libraries are not intended to be executables themselves but have instructions for the processor that will be used by an executable.
When the executable is compiled it gets a reference to where the library lives and what code is being used in it. This is referred to as linking. Executables, will in general be linked against a large number of libraries.
Most of the code in this repo is compiled into libraries that are then shared by any of the executables.
This is where build systems come in. Build systems manage the large number of compilation options that result from managing the compilation of many files into libraries, compiling executables and linking them all together.
For the most part this stuff will be maintained and working, but if you're interested, an intro to the build system used here can be found here
Our code base is largely in c++ and is compiled with the cmake build system. It has been primarily written in the c++ 2011 standard but ocassionaly using elements of the 2014 standard. C++ is one of the most used languages in the world and as such has a remarkable collection of resources online. If you are trying to do something, and don't know where to start a websearch will likely get you most of the way there. This code base is built using object oriented design. There are many resources to get started. A nice domain relevant thurough take can be found here If you have never encountered object oriented design, or not experienced through the lens of c++ it is worth going through those slides. It's a pretty deep dive and will take some time. A shorter generic take on just object oriented concepts can be found here
Our code base has a few dependencies, some are common, some are unique to our instruments. A couple of the bigger ones are detailed in the following sections
Cern has produced an extremely large code base for doing particle and nuclear physics analysis called ROOT. It is worth remembering that at it's base it is an object oriented c++ library. It also has a large user base, so web searches can yield results when trying to solve problems. There is also a reasonably active forum. If the web search doesn't provide results, you can ask question there.
Along with the analysis classes, ROOT has developed a real time interpretter for c++ called cint which is the program that runs when you type ROOT. The interpretter can be usefull for debugging as well as doing analysis and accepts valid c++ for most cases.
We use root in several contexts. We make extensive use of their histogram classes, They are well documented, reasonably fast, include visualization tools for multidimensional histograms, and have proved quite reliable. I haven't seen a better histogram package in any language. We commonly use root TTree's for file input and output. You will also find use of their minimization package Minuit for parameter estimation. This robust minimization packed is also tightly bound to their histogramm and graph classes when you need to fit a function.
GEANT4 is used for the monte carlo simulations contained in our library. GEANT4 This section can be expanded on...