{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# O2AT tutorial: Neural Network regression"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For any questions to this tutorial or further help, just write to me (Mattermost, Email: christian.sonnabend@cern.ch, etc.)
\n",
"\n",
"**Instructions**\n",
"\n",
"- The notebook is divided into individual sections. Please go through them and read carefully. Our goal is to train, execute and export a simple feed-forward neural network, as well as applying it in $\\text{O}^2$. Finally we will apply it on some real (downsampled and carefully selected) data of the LHC22q_apass2 dataset!
\n",
"- You can execute each code cell by pressing 'Shift + Enter' or the ▷ button on the side of each cell
\n",
"- If you are running this locally, please follow the instructions in the README file of the git repository (one directory above: o2at-2/machineLearning) and execute the data-import in section 0)
\n",
"\n",
"We will start by importing some modules we will need"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"### Some modules we will need\n",
"\n",
"import os\n",
"import sys\n",
"\n",
"import numpy as np\n",
"import scipy as sc\n",
"import matplotlib\n",
"import matplotlib.pyplot as plt\n",
"import matplotlib.colors as colors\n",
"from matplotlib.colors import ListedColormap\n",
"\n",
"import torch\n",
"import torch.nn as nn\n",
"import torch.optim as optim\n",
"from torch.utils.data import TensorDataset, DataLoader\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"import onnx\n",
"import onnxruntime as ort\n",
"\n",
"### Global figure size\n",
"matplotlib.rcParams['figure.figsize'] = (12,7)\n",
"\n",
"### A global setting for a datapath that will be used later in section '4) Application on real data - TPC PID calibration'\n",
"datapath = \"/eos/user/a/alicesk/O2AT_ML/data/NeuralNetworkRegression\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 0) Running locally"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### **!!! DON'T EXECUTE THIS IN THE SWAN ENVIRONMENT - SKIP DIRECTLY TO PART 1) Preamble !!!** ( please :-) )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In case you want to run these notebooks locally (outside of the prepared SWAN environments), you will have to download the corresponding data. To do that, please execute the following block:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%capture\n",
"\n",
"### For a dataset of LHC22q_apass2 downsampled to 1 milliion points\n",
"os.makedirs(\"./LHC22q\")\n",
"!curl -L https://cernbox.cern.ch/s/UFfXvPTRMq2SRDh/download --output ./LHC22q/data.root\n",
"!curl -L https://cernbox.cern.ch/s/fBQZ7kxH1sx5Pus/download --output ./LHC22q/net.onnx\n",
"\n",
"### The datapath variable will be overwritten if the data is downloaded to a local folder \n",
"datapath = \"./LHC22q\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Furthermore, you will need some python modules (numpy, scipy, matplotlib, onnx, onnxruntime, scikit-learn, uproot). These can be installed easily with a pip commmand (pip install numpy scipy matplotlib onnx onnxruntime scikit-learn uproot), however they should also come with the installation of other packages needed for the BDT tutorial, so you might not need to install them separately.
\n",
"- Depending on your system and needs, please follow the instruction online to install PyTorch: https://pytorch.org/
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1) Preamble"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Neural networks are powerful regression tools that have gained tremendous popularity in the recent years. While it is typically presented that ML can be divided into classification and regression, it is only part of the magic that goes on behind the curtain. Externally, a neural network can be thought of as a funcion that maps from an input space of basically arbitrary dimensions to an output space. While input and output are clearly defined by the user, the internal workings of neural networks can only be understood in rare cases (basically only when the network is structured in a certain way, e.g. Graph Neural Networks (GNN)). Understanding the internal parameters (weights and biases) of a neural network is like trying to track an N-particle simulation: You won't succeed just doing it by hand. You need algorithms!
\n",
"\n",
"The full neural network is complex and inter-linked, however it's constituents are not! A single neuron is typically just a simple function (ReLU, Tanh, or logisitc sigmoid) that takes the input from the previous layer (X), multiplies it with an internal weight matrix (W), adds a bias term (B) evaluates the function and passes it on to the next layer.\n",
"\n",
"$\\begin{equation}\n",
"Y = \\sigma(W\\cdot X + B) \\in \\mathbb{R}\n",
"\\end{equation}$\n",
"\n",
"Since every node is constructed in such a simple way, it is easy to calculate gradients. This will then also be our fundamental tool for improving the network: Update the weights using the gradients such that the network approximates our task as good as possible.
\n",
"\n",
"All of this is handeled internally within our framework (pytorch). We just have to define WHAT we want the network to learn and HOW."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2) Basics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$\\newcommand{\\pluseq}{\\mathrel{+}=}$\n",
"\n",
"As explained in the talk on monday it is best to start with a bit of human intuition about how one would go about the topic of regression. Imagine you have a function which can take any arbitrary shape, but you have to adjust this function such that it best describes your data-points.
\n",
"\n",
"**What does it mean to best describe your datapoints?**\n",
"\n",
"Well, for regression there is a typical measure when you have a set of datapoints (X) with target values (Y), each associated with an uncertainty ($\\sigma$) and a model prediction ($\\hat{Y}$) for each datapoint (i) in your dataset\n",
"\n",
"$\n",
"\\begin{equation}\n",
"\\chi^2 = \\sum_{i=1}^{n}\\bigg(\\frac{Y_i - \\hat{Y}_i}{\\sigma_i}\\bigg)^2\n",
"\\end{equation}\n",
"$\n",
"\n",
"**But what if we don't have any errors on our datapoints and we just want to fit some arbitrary data?**\n",
"\n",
"For this case we imagine that the error on every datapoint is the same. To avoid prefactors, we choose that $\\sigma_i = 1$. We arrive at the sum of squared errors (SSE)\n",
"\n",
"$\n",
"\\begin{equation}\n",
"\\text{SSE} = \\sum_{i=1}^{n}({Y_i - \\hat{Y}_i})^2\n",
"\\end{equation}\n",
"$\n",
"\n",
"To keep numbers smaller (especially for larger datasets), it is useful to calcualte the average error per datapoint. Hence the measure for approximating a regression task, the so called loss function, that is typically used is the mean squared error (MSE)\n",
"\n",
"$\n",
"\\begin{equation}\n",
"\\text{MSE} = \\frac{1}{n}\\sum_{i=1}^{n}({Y_i - \\hat{Y}_i})^2\n",
"\\end{equation}\n",
"$\n",
"\n",
"This is a typical score used for regression and in general also for model evaluation. Using the loss, the weights can be adjusted by an algorithm called backpropagation. This algorithm internally performs a gradient descent using certain update rules (so called optimizers, e.g. Stochastic Gradient Descent (SGD), Adagrad, Adam, etc.) to update the internal parameters $\\theta$ of our network, containing all the weights and biases of each neuron. For this the gradient is calculated from the loss function to the last (output) layer of the neural network. For a neuron in the last layer L, we write $\\hat{Y} = X_L = \\sigma_L(Z_L) = \\sigma_L(X_{L-1} \\cdot W_L)$, where the bias-term is just appended to the weight matrix W and a row of ones is appended to the input X. Hence, we get\n",
"\n",
"$\n",
"\\begin{equation}\n",
"\\delta_L = \\frac{\\partial\\text{Loss}}{\\partial Z_L}\n",
"\\end{equation}\n",
"$\n",
"\n",
"From this we can calcualte the gradient w.r.t. the weights $W_L$ of the last layer as\n",
"\n",
"$\n",
"\\begin{equation}\n",
"\\frac{\\partial\\text{Loss}}{\\partial W_L} = \\frac{\\partial\\text{Loss}}{\\partial Z_L}\\cdot\\frac{\\partial Z_L}{\\partial W_L} = X_{L-1}^T \\cdot\\delta_L \n",
"\\end{equation}\n",
"$\n",
"\n",
"By the chain rule, this can be extended to layer l-1 as\n",
"\n",
"$\n",
"\\begin{equation}\n",
"\\delta_{L-1} = \\frac{\\partial\\text{Loss}}{\\partial Z_L}\\cdot\\frac{\\partial Z_L}{\\partial X_L}\\cdot\\frac{\\partial X_L}{\\partial Z_L} = \\delta_L\\cdot W_L^T\\cdot\\text{diag}(\\sigma'_{l-1}(Z_{L-1}))\n",
"\\end{equation}\n",
"$\n",
"\n",
"We have a recursion formula! Therefore we can update the weights at each step (what a \"step\" is will be defined momentarily) using gradient descent (or any other, more complex variant of it -> optimizers) by doing a full forward pass, saving $X_l$ for every layer $l$ and then a backward pass through the network calculating\n",
"\n",
"$\n",
"\\begin{align}\n",
"\\Delta W_l &\\pluseq Z_{L-1}^T\\cdot\\delta_l\\\\\n",
"\\delta_{l-1} &= \\delta_{l}\\cdot (W_l^{(t-1)})^T\\cdot\\text{diag}(\\sigma'_{l-1}(Z_{l-1}))\n",
"\\end{align}\n",
"$\n",
"\n",
"which leads to the final update of weights as\n",
"\n",
"$\n",
"\\begin{equation}\n",
"W_l^{t} = W_l^{t-1} + \\tau\\cdot\\Delta W_l\n",
"\\end{equation}\n",
"$\n",
"\n",
"where $\\tau$ is the learning rate. Any other preffered method of optimization for updating the weights can be used here. (Please let me know if you find any inconsitencies in the formulas above, I'm also just a human 😉)
\n",
"\n",
"The neural network training is usually performed in epochs: Your data is divided into a training and test dataset (typically 80/20 division) which is passed through the network fully once per iteration. This means that the network loops many times over your dataset until you are satisfied with its performance, typically measured by the loss score. However what might happen is that the full dataset covers a vast area of your phase-space and hence calculating the gradient once for the full dataset might not be very representative and might not help the network to learn well. For this, it is highly recommended to divide your training data into (mini-)batches. This means that per epoch the network still loops once over the full data but the weights are updated $n$ times if you divide your training dataset into $n$ (equally sized) batches. This is what was previously meant with \"steps\": Gradients are calculated for each batch and passed backwards through the network.\n",
"\n",
"Now that you know the basic terminologies, we can continue to some real data!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3) Regression in 1D"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's start with the most simple example: Fitting some 1D data. We will start from an analytic function that we define, such that we can see how our network performs in the end"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Define our test function\n",
"def test_function(x):\n",
" return (np.sin(0.3*x)**2 + 0.7*np.cos(2*x+np.pi/2.) + np.sin(0.3*x))/3.\n",
"\n",
"# Adding some jitter to the evaluation\n",
"def jittered_function(range=[10,20], num_points = 1000, fluc=1, func=test_function):\n",
" axis = np.linspace(*range, num_points)\n",
" jitter = np.random.uniform(-fluc,fluc, (2,num_points))\n",
" return axis, func(axis + jitter[0]) + jitter[1]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data_reg1d = jittered_function([-2,2], 10000, 0.05)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Plotting the data and the function\n",
"\n",
"plt.scatter(*data_reg1d, s=1, label='datapoints with jitter')\n",
"plt.plot(data_reg1d[0], test_function(data_reg1d[0]), c='r', label='Reg. function')\n",
"plt.legend()\n",
"plt.grid()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This example might be oversimplified, but it illustrates nicely how a neural network regression works. In real data you might have to deal with larger outliers or higher fluctuations in different regions of the input phase space. One other aspect to consider is that in practice there is no one-fits-all solution to every problem. Your network has to be tailored to your problem. The universal approximation theroem for neural networks states that a one-layer neural network $N(x)$ with a sufficient number of neurons in the first layer can fit any function $f(x)$ to arbitrary precision\n",
"\n",
"$\n",
"\\begin{equation}\n",
"\\sup_{x\\in K\\subseteq\\mathbb{R}^n}\\parallel N(x) - f(x)\\parallel\\leq\\epsilon.\n",
"\\end{equation}\n",
"$\n",
"\n",
"In practice it is however a well-known fact that deep neural networks (more layers) can fit functions much better than shallow or even one-layer neural networks with a significantly lower number of total neurons. We continue with constructing the neural network."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class Net(nn.Module):\n",
" \n",
" def __init__(self):\n",
" super(Net, self).__init__()\n",
" \n",
" ### network layers\n",
" self.fc1 = nn.Linear(1, 16)\n",
" self.fc2 = nn.Linear(16, 16)\n",
" self.fc3 = nn.Linear(16, 16)\n",
" self.fc4 = nn.Linear(16, 16)\n",
" self.fc5 = nn.Linear(16, 1)\n",
" \n",
" ### activation functions\n",
" self.relu = nn.ReLU()\n",
" self.tanh = nn.Tanh()\n",
"\n",
" def forward(self, x):\n",
" \n",
" ### network structure\n",
" out = self.tanh(self.fc1(x))\n",
" out = self.tanh(self.fc2(out))\n",
" out = self.tanh(self.fc3(out))\n",
" out = self.tanh(self.fc4(out))\n",
" out = self.fc5(out)\n",
" \n",
" return out\n",
"\n",
"\n",
"class Trainer:\n",
" def __init__(self, model, train_loader, val_loader=None, lr=0.001):\n",
" self.model = model\n",
" self.train_loader = train_loader\n",
" self.val_loader = val_loader\n",
" self.lr = lr\n",
" self.optimizer = optim.Adam(self.model.parameters(), lr=self.lr)\n",
" self.criterion = nn.MSELoss()\n",
"\n",
" def train(self, epochs):\n",
" for epoch in range(epochs):\n",
" train_loss = 0.0\n",
" val_loss = 0.0\n",
" for i, (inputs, targets) in enumerate(self.train_loader):\n",
" self.optimizer.zero_grad()\n",
" outputs = self.model(inputs)\n",
" loss = self.criterion(outputs, targets)\n",
" loss.backward()\n",
" self.optimizer.step()\n",
" train_loss += loss.item()\n",
" train_loss /= len(self.train_loader)\n",
" if self.val_loader is not None:\n",
" with torch.no_grad():\n",
" for inputs, targets in self.val_loader:\n",
" outputs = self.model(inputs)\n",
" loss = self.criterion(outputs, targets)\n",
" val_loss += loss.item()\n",
" val_loss /= len(self.val_loader)\n",
" print(\"Epoch: {}, Training Loss: {:.4f}, Validation Loss: {:.4f}\".format(epoch+1, train_loss, val_loss))\n",
" else:\n",
" print(\"Epoch: {}, Training Loss: {:.4f}\".format(epoch+1, train_loss))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Generate some sample data and instanciating the network\n",
"x_train, x_test, y_train, y_test = train_test_split(data_reg1d[0], data_reg1d[1], test_size=0.2, shuffle=True)\n",
"x_train, x_test, y_train, y_test = torch.tensor(x_train.reshape(-1,1)).float(), torch.tensor(x_test.reshape(-1,1)).float(), torch.tensor(y_train.reshape(-1,1)).float(), torch.tensor(y_test.reshape(-1,1)).float()\n",
"net = Net()\n",
"\n",
"train_dataset = TensorDataset(torch.Tensor(x_train), torch.Tensor(y_train))\n",
"train_loader = DataLoader(train_dataset, batch_size=50, shuffle=True)\n",
"val_dataset = TensorDataset(torch.Tensor(x_test), torch.Tensor(y_test))\n",
"val_loader = DataLoader(val_dataset, batch_size=10000)\n",
"\n",
"# Train the neural network\n",
"\n",
"trainer = Trainer(model=net, train_loader=train_loader, val_loader=val_loader, lr=0.003)\n",
"trainer.train(epochs=20)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Keep in mind that this neural network is very simple and no further optimizations have been performed (no data-scaling, batch-size adaptation, learning rate scheduling, structure optimization or proper weight initialization). This will limit it's overall performance but further optimizations go beyond the scope of this tutorial. Try to achieve a loss of 0.0011 or better, if the network performs with a higher loss score, try to retrain or change the learning rate."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Plotting the data and the function\n",
"\n",
"plt.scatter(*data_reg1d, s=1, label='datapoints with jitter')\n",
"plt.plot(data_reg1d[0], test_function(data_reg1d[0]), c='r', label='Reg. function')\n",
"plt.plot(data_reg1d[0], net(torch.tensor(data_reg1d[0].reshape(-1,1)).float()).detach().numpy(), c='orange', label='Network')\n",
"plt.legend()\n",
"plt.grid()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***\n",
"**Exercise**\n",
"\n",
"Change the some of the activation functions from ***tanh*** to ***ReLU*** (the Recitfying Linear Unit) in the network structure (see ***forward()*** function) and see how it performs! Which major difference do you observe?\n",
"\n",
"Tipp: You can also execute the cell below to see the extrapolation behaviour if it is not immediately obvious to you.\n",
"***"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You see, our network does its job. Let's see how it performs for extrapolation..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"extrapolation = jittered_function([-20,20], 10000, 0.05)\n",
"\n",
"# Plotting the data and the function\n",
"\n",
"plt.scatter(*extrapolation, s=1, label='datapoints with jitter')\n",
"plt.plot(extrapolation[0], test_function(extrapolation[0]), c='r', label='Reg. function')\n",
"plt.plot(extrapolation[0], net(torch.tensor(extrapolation[0].reshape(-1,1)).float()).detach().numpy(), c='orange', label='Network')\n",
"plt.legend()\n",
"plt.grid()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Important lesson learned**: Networks are really good local regressors, but should not be trusted blindly for extrapolation!
\n",
"\n",
"With this in mind, let's export the network to a pytorch compatible format and an ONNX format for application on the grid"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"### Pytorch\n",
"torch.save(net, \"./network/test_net.pt\")\n",
"\n",
"### ONNX\n",
"torch.onnx.export(net, # model being run\n",
" x_train[:1], # model example input (or a tuple for multiple inputs)\n",
" \"./network/test_net.onnx\", # where to save the model (can be a file or file-like object)\n",
" export_params=True, # store the trained parameter weights inside the model file\n",
" opset_version=14, # the ONNX version to export the model to: https://onnxruntime.ai/docs/reference/compatibility.html\n",
" do_constant_folding=True, # whether to execute constant folding for optimization\n",
" input_names=['input'], # the model's input names\n",
" output_names=['output'], # the model's output names\n",
" dynamic_axes={'input': {0: 'batch_size'}, # variable length axes\n",
" 'output': {0: 'batch_size'}})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ONNX provides very nice functionalities, such as a checker to see whether your model has been exported correctly"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"### Define a simple function to check the ONNX model\n",
"def check_onnx(path=\"./network/test_net.onnx\"):\n",
" try:\n",
" onnx_model = onnx.load(path)\n",
" onnx.checker.check_model(onnx_model)\n",
" print(\"ONNX checker: Success!\")\n",
" except:\n",
" print(\"Failure in ONNX checker!\")\n",
"\n",
"### ONNX checker\n",
"check_onnx(path=\"./network/test_net.onnx\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So far we have: Trained a neural network, plotted it against data and seen how to interface with the ONNX framework. It's time to apply your network now in O2Physics to see that it produces the same results as here (and how to actually apply it!). For this we will use the task **O2Physics/Tutorials/ML/applyOnnxModel.cxx**.
\n",
"Currently the task is hardcoded. It only operates on predefined data, however it is of course possible to apply the ONNX model on any other data that is compatible with the trained network. This could be for example track properties like track.tpcInnerParam(), track.P() or track.Tgl() or any other float / int / string / ... that you would like to evaluate your model with.\n",
"\n",
"First we execute our model for some values here in the Jupyter Notebook"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"### Let's create some datapoints\n",
"\n",
"datapoints = torch.tensor([[0.5],[np.pi],[0.],[0.001],[-3.]])\n",
"\n",
"output = net(datapoints.float()).detach().numpy()\n",
"\n",
"print(\"Input values:\\n\", np.round(datapoints.flatten().detach().numpy(),6))\n",
"print(\"\\nNetwork values:\\n\", np.round(output.flatten(),6)) # Increase '6' in order to get higher precision"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have some reference values, let's execute the O2Physics example task. For this, you can execute the O2Physics task (O2Physics/Tutorials/ML/applyOnnxModel.cxx) as\n",
"\n",
"
\n",
"