# Digitize a Batch Record
Critical process data shouldn’t stay trapped in PDFs. But PDFs are messy (like the documents below). They have a mix of handwriting and text. They can be skewed and blurry.

Digitizing batch records allows you to start to analyze the data from a batch record PDFs in less than five minutes.

In [None]:
#if you do not already have fathomdata installed in your environment, uncomment and run this line (keep the exclamation mark)
#!pip install fathomdata

In [None]:
import fathomdata as fd

# Add Your API Key

You'll need a temporary API key to use this tutorial. If you don't have one already, head [here](https://docs.fathom.one/batch-record-digitization.html#get-an-api-key). Then, set your apikey using the code below.


In [None]:
fd.set_api_key('your-api-key-goes-here')

### Tip 

We prefer to keep our api as an environment variable to keep to safe, and to prevent us from accidentally checking it into source control. Here's what your code might look like

    import os
    os.environ["FATHOM_API_KEY"] = "your-api-key-here"

You can also add the environment variable to your virtual environment. Edit activate file of your virtual environment (`venv/bin/activate`) and add this line to the end:

    export FATHOM_API_KEY=your-api-key-here

Then when you need to set the api-key within your code you can do something like this:

    import os
    fd.set_api_key(os.environ["FATHOM_API_KEY"])

# Get a Sample Batch Record

Download a sample batch record [click here](https://github.com/fathom-data/fathom-sample-data/raw/main/batch3.pdf), or use the code below to download it programmatically  (this is the preferred option if running this on Colab).

You can view the batch record source at this link: [See PDF on GitHub](https://github.com/fathom-data/fathom-sample-data/blob/main/batch3.pdf)

In [None]:
with open("batch3.pdf", "wb") as f:
    pdf = fd.get_sample_batch_record("batch3")
    f.write(pdf)

Take a moment to look at the example batch record PDF. If you didn’t change the path above, the PDF will be saved in your current working directory.

This batch record contains many different types of data from raw material sources to process metrics. There is a mix of handwritten and typed text and the formatting varies throughout the record. What a mess! 

*Try to find the text in the raw pdf bytes above? What happens?*

For this tutorial, we’ll focus on extracting and cleaning any type of data stored in a table (but this is just the start!).

# Ingest the Batch Record


In [None]:
new_doc_id = fd.ingest_document("batch3.pdf") #update path to download location if you changed it
print(f"Ingested document with ID {new_doc_id}")

That’s it! Check that the ingest was successful by listing the available records. This might take a moment.

In [None]:
df = fd.available_documents()
df.head()

If the df syntax look familiar, that’s because fathomdata is built on top of [pandas](https://pandas.pydata.org/). You can interact with this record dataframe using all the pandas [slicing and indexing tools](https://pandas.pydata.org/docs/user_guide/indexing.html).


# Digitize a second sample batch record

Download [batch4.pdf](https://github.com/fathom-data/fathom-sample-data/raw/main/batch4.pdf) or reuse the code above. 

In each case, replace `batch3` with `batch4`. Make sure you get them all. There are 3 instances that need to be replaced. Run the code for a second time. When you are successful, both records will be listed in the available documents dataframe.

In [None]:
fd.available_documents()

# Use the Digitized Data

The extracted data is also returned in a pandas dataframe so it’s quickly available for custom analysis.

In [None]:
doc = fd.get_document(new_doc_id)
materials = doc.get_materials_df()
materials.head()

Next you can try some [statistical process control analytics](https://docs.fathom.one/process-validation.html) using this data.