Getting Started With Scientific Software

This page is meant to be a collaborative accumulation of knowledge to ease the process of getting started with software in this repo, and generally in using computers to do science.

If you are getting started with us and feel like there are things that are missing from this document and the next person getting started would find usefull please add them!

Getting started using computers to do science is not an easy task. Google is your best friend. The people around you can also be great resources as they have likely encountered the issue you are having. If you find yourself stuck on somehting for a long time, reach out, there is support.

Topics

A Functional Workspace

Although one could talk about a quiet space to think here, this is referring to a functional virtual workspace that you can compute in. This can be accomplished in any modern operating system, but generally includes

a decent text editor

Software at it's base is text, and you will need to interact with it via a text editor. Notepad could be used to edit code, but it will be painful.

Some features I might consider requirements
- Ability to manage and interact with multiple files
- Syntax highlighting
Some features I find highly productivity enhancing
- Project navigation support
- Support for multiple cursors
- Hotkeys (all of them) highly recommend Sublime Text
a command line interface

Most of the code you interact with will require use of the command line. Even if someone went through the effort to produce a GUI, you will likely need to launch it from the command line
a package manager for sourcing commonly required

Software will have dependencies. This means it will rely on code that someone else wrote and published. A package manager will allow you to fetch the software you need in a packaged format appropriate for the os you are using.

Common Workspaces

We have people functionally computing in several workspaces, and this will likely expand over time. Below are common ones people are currently using and what the previous components discussed refer to specifically

- Bare Metal Linux

All of our acquisition computers and several student are running native linux installs. I have over time gravitated to Fedora's KDE spind most of the acquisitioin machines are also running this. The newest version runs out of the box on all the machines I've thrown it at, and it has built in support for nvidia GPU's in the extended repos.

Setup notes to go from a basic Fedora install, to ready to build and run software can be found here

Reflecting on our previous list

Text Editor : Sublime Text

from the command line
subl .

will start an instance of sublime text with it's file browser pointing to the current directory
Command Line Interface : Konsole Launched from the application launcher
Package Manager : dnf

syntax looks like
sudo dnf install <package>

Any othern modern linux OS should be fine as long as it supports c++ 14 or greater.

- Windows 10

Windows now has a windows subsytem for linux (WSL) which for most purposes looks like running linux natively. There are two versions WSL1 and WSL2. The way WSL1 is implemented has slow file access, but is generically functional WSL2 I have not personally worked with. The file access speed has been fixed, but we have had a couple of people struggle to get graphical programs working correctly.

Other issued we've seen include files saved on windows have different line endings, so if a file was downloaded and saved in the windows os and then moved into the WSL space, it can cause issues.

Reflecting on our previous list

Text Editor : Sublime Text (recomend editing in linux space to avoid above issue)

from the command line
subl .

will start an instance of sublime text with it's file browser pointing to the current directory
Command Line Interface : Ubuntu X. Launched from the start menu
Package Manager : aptitude

syntax looks like
sudo apt-get install <package>

- Mac OSX

The software in the repo should work on osx, and it has been built and tested, but I am not currently working in this space. If you get to a good working space on osx please update this to look like above

Reflecting on our previous list

Text Editor : Sublime Text (XCode can work, but it's pretty heavy)

from the command line
subl .

will start an instance of sublime text with it's file browser pointing to the current directory
Command Line Interface : terminal Launched from the launchpad
Package Manager : homebrew and macports are the options here. I was more succesful with homebrew recently

- Virtual Linux

The university provides students with access to a copy of VMWare which will run on all of the previous listed operating systems.

I put together a virtual machine with a functional work environment that can be downloaded and used. It takes up twenty GB and will be slightly slower than baremetal linux.

It probably requires the least effort, but is likely the most computer resource intensive.

Let's chat if you think this is the best way forward.

The Terminal

I have very little to add here that google won't help you with almost immediatly

If you've never used a terminal before this introduction has a good description of what you need to get around and deal with normal computing from the comamnd line.

One of the really powerful things in the command line is the ability to open a terminal on another computer. This is done through a program called secure shell (ssh). Check out this discussion for commands and usage.

Version Control

If you've never encountered version control systems maybe start here

The most common thing I hear when getting new people working is "I didn't want to break it" Version control allows us to not worry about that. You won't break it. Even if you try some really sketchy stuff, the fact that we are using a distributed version control system means that there is a complete copy of the software and it's entire development history on many computers.

We use git with bitbucket and they have a nice tutorial section. Much of thelanguage is pretty focused towards formal software development, but it includes examples of how to do normal tasks

Some best practices

Commit often
Make informative commit messages
Use a development branch if work will take time and leave the code in a "broken" state

Compiled Languages

Working in a compiled language means that a program needs to be run on the code that has been written to turn it into machine language that the processor of your computer understands. This generally is done on the computer the code will be run on.

The simplest example of compiling a progam follows.

Below is hello world in C++ Copy it and save it in a text file named hello.cpp

#include <iostream> 
int main()
{
  std::cout << "hello world" << std::endl;
  /* code */
  return 0;
}

navigate to where you saved the text file and run

g++ hello.cpp

this created a file caled a.out by running your c++ compiler on the file and creating a machine language representation of the code that is executable.

You can run using

./a.out

and it should give you hello world.

What if you wanted to call the program something else? Compilers can take many options and a.out is just the default output name when compiling an exectuble. If instead we had passed the -o flag to the compiler with a name such as

g++ hello.cpp -o hello.exe

we instead get hello.exe as a resulting executable file.

The compiler can generate another type of machine language file called a library. Libraries are not intended to be executables themselves but have instructions for the processor that will be used by an executable.

When the executable is compiled it gets a reference to where the library lives and what code is being used in it. This is referred to as linking. Executables, will in general be linked against a large number of libraries.

Most of the code in this repo is compiled into libraries that are then shared by any of the executables.

This is where build systems come in. Build systems manage the large number of compilation options that result from managing the compilation of many files into libraries, compiling executables and linking them all together.

For the most part this stuff will be maintained and working, but if you're interested, an intro to the build system used here can be found here

C++ and Object Oriented Programming

Our code base is largely in c++ and is compiled with the cmake build system. It has been primarily written in the c++ 2011 standard but ocassionaly using elements of the 2014 standard. C++ is one of the most used languages in the world and as such has a remarkable collection of resources online. If you are trying to do something, and don't know where to start a websearch will likely get you most of the way there. This code base is built using object oriented design. There are many resources to get started. A nice domain relevant thurough take can be found here If you have never encountered object oriented design, or not experienced through the lens of c++ it is worth going through those slides. It's a pretty deep dive and will take some time. A shorter generic take on just object oriented concepts can be found here

Our code base has a few dependencies, some are common, some are unique to our instruments. A couple of the bigger ones are detailed in the following sections

Cern ROOT

Cern has produced an extremely large code base for doing particle and nuclear physics analysis called ROOT. It is worth remembering that at it's base it is an object oriented c++ library. It also has a large user base, so web searches can yield results when trying to solve problems. There is also a reasonably active forum. If the web search doesn't provide results, you can ask question there.

Along with the analysis classes, ROOT has developed a real time interpretter for c++ called cint which is the program that runs when you type ROOT. The interpretter can be usefull for debugging as well as doing analysis and accepts valid c++ for most cases.

We use root in several contexts. We make extensive use of their histogram classes, They are well documented, reasonably fast, include visualization tools for multidimensional histograms, and have proved quite reliable. I haven't seen a better histogram package in any language. We commonly use root TTree's for file input and output. You will also find use of their minimization package Minuit for parameter estimation. This robust minimization packed is also tightly bound to their histogramm and graph classes when you need to fit a function.

Cern GEANT4

GEANT4 is used for the monte carlo simulations contained in our library. GEANT4 This section can be expanded on...