Analyze Process Data

In this tutorial, you will generate an interactive control chart from data extracted from a dozen batch records. It will take less than five minutes.

How to use this tutorial

There are three ways to follow along with this CPV tutorial (from beginner to advanced).

  1. Run the code in the cloud. Without installing anything locally, you’ll be able to both change and run the code. Open the Colab notebook.

  2. Download the code as a Jupyter notebook: analyze-process-data.ipynb. First-time users of Jupyter notebooks should follow the Getting Started instructions first.

  3. Copy and paste the code snippets below into your own Python development environment.

Install the fathomdata library

From a terminal:

$ pip install fathomdata

Confirm the installation was successful by importing the library. We name the library fd on import by convention.

import fathomdata as fd

Users who are new to Python can find more detailed instructions at Getting Started.

Create a control chart

Import the plotting library in the notebook. This allows us to render graphs within the notebook.

from bokeh.plotting import figure, output_notebook, show

output_notebook()
Loading BokehJS ...

Load sample data to use for this analysis. If you have it, your own process data will work too, but the data should be a pandas.DataFrame with columns similar to the sample data below.

#load the data
df = fd.load_dataset("titer")
df.head()
titer batch_number batch_date units deviations
0 4.625448 2021-0 2019-12-30 g/L NaN
1 5.533672 2021-1 2020-01-26 g/L NaN
2 5.763685 2021-2 2020-02-22 g/L NaN
3 5.710262 2021-3 2020-03-20 g/L NaN
4 3.635831 2021-4 2020-04-16 g/L NaN

In our example, the titer dataset was collected over a series of batches for a monoclonal antibody manufacturing campaign.

Create a control chart.

#make the figure
fig = fd.control_chart(df)
show(fig)

Good news — this process is in control!

When the control chart is created, it automatically runs common statistical process control tests. Try hovering over each data point to see additional metadata about the parameter.

Change the control limits

Control charts need to reflect real-world context.

In the dataset above, a process change was implemented starting with batch 5. We want to know if this change impacted our control of this critical process parameter. We test this hypothesis by explicitly setting the calibration set to compute the control limits only on the first five batches.

fig = fd.control_chart(df, calibration_set=df.index < 5)
show(fig)

We find that one of the batches was more than 3 sigma away from the center line of the calibration set. Hovering over that point, we see that is associated with a known batch deviation that may have an explanation for this outlier. This trend might be early evidence that the process change is not yet well controlled, or may be explained by the deviation.

Tip

In life science, getting access to digital data is often the hard part. To see how to digitize a batch record and extract process data, check out the Batch Record Digitization tutorial.