Analyze Process Data¶

In this tutorial, you will generate an interactive control chart from data extracted from a dozen batch records. It will take less than five minutes.

How to use this tutorial¶

There are three ways to follow along with this CPV tutorial (from beginner to advanced).

Run the code in the cloud. Without installing anything locally, you’ll be able to both change and run the code. Open the Colab notebook.
Download the code as a Jupyter notebook: analyze-process-data.ipynb. First-time users of Jupyter notebooks should follow the Getting Started instructions first.
Copy and paste the code snippets below into your own Python development environment.

Install the fathomdata library¶

From a terminal:

$ pip install fathomdata

Confirm the installation was successful by importing the library. We name the library fd on import by convention.

import fathomdata as fd

Users who are new to Python can find more detailed instructions at Getting Started.

Create a control chart¶

Import the plotting library in the notebook. This allows us to render graphs within the notebook.

from bokeh.plotting import figure, output_notebook, show

output_notebook()

Loading BokehJS ...

Load sample data to use for this analysis. If you have it, your own process data will work too, but the data should be a pandas.DataFrame with columns similar to the sample data below.

#load the data
df = fd.load_dataset("titer")
df.head()

	titer	batch_number	batch_date	units	deviations
0	4.625448	2021-0	2019-12-30	g/L	NaN
1	5.533672	2021-1	2020-01-26	g/L	NaN
2	5.763685	2021-2	2020-02-22	g/L	NaN
3	5.710262	2021-3	2020-03-20	g/L	NaN
4	3.635831	2021-4	2020-04-16	g/L	NaN

In our example, the titer dataset was collected over a series of batches for a monoclonal antibody manufacturing campaign.

Create a control chart.

#make the figure
fig = fd.control_chart(df)
show(fig)

Good news — this process is in control!

When the control chart is created, it automatically runs common statistical process control tests. Try hovering over each data point to see additional metadata about the parameter.

Change the control limits¶

Control charts need to reflect real-world context.

In the dataset above, a process change was implemented starting with batch 5. We want to know if this change impacted our control of this critical process parameter. We test this hypothesis by explicitly setting the calibration set to compute the control limits only on the first five batches.

fig = fd.control_chart(df, calibration_set=df.index < 5)
show(fig)

We find that one of the batches was more than 3 sigma away from the center line of the calibration set. Hovering over that point, we see that is associated with a known batch deviation that may have an explanation for this outlier. This trend might be early evidence that the process change is not yet well controlled, or may be explained by the deviation.

Tip

In life science, getting access to digital data is often the hard part. To see how to digitize a batch record and extract process data, check out the Batch Record Digitization tutorial.