Monitor PDF for changes

Available in Flexi and Enterprise plan

How to track a PDF file for changes?

Distill offers the capability to monitor changes in a PDF document, provided that the document is accessible via a public URL. To use this functionality, you need to use the web application available at https://monitor.distill.io. It is important to note that this feature is not supported by the browser extension.

If the PDF file is associated with the hyperlink on a webpage, click on the link to get PDF’s URL. Once you have the URL, you can follow the steps given below to add a PDF monitor:

  1. Access the web app’s Watchlist located at https://monitor.distill.io.

  2. Click on the “Add Monitor” button and select “PDF” from the list.

    Button to add PDF monitor

  3. Enter the URL of the PDF file on the Source page and click on the “Save” button. Enter PDF URL

  4. An options window will open. On this page, you have the option to configure settings such as check intervals and actions to be taken when changes occur. Once done, save it.

    configure actions, check interval for the monitor

our PDF monitor is now set. You can check its contents by clicking on the text preview of the monitor. If a change is found, you will be notified via actions previously set on the Options page.

version history

You can further compare two versions of the PDF monitor by clicking on the “Explore diff” option under the change history.

explore diff

In certain scenarios, you might need to monitor a PDF file that is linked on a webpage and the hyperlink undergoes frequent changes. In such instance, you will need to create the following two monitors:

  1. Add a PDF monitor using the PDF’s URL.
  2. Add a webpage monitor for the webpage where the PDF link shows. You will need to monitor the attribute “href” to watch the URL of the PDF for changes.

When you get an alert on the 2nd monitor for the link change, you will need to manually update the URL of the first monitor (PDF monitor).

Example: Let’s monitor the PDF file for “Form 1040” on the page https://www.irs.gov/forms-pubs/about-form-1040. Here are the steps that we need to follow:

  1. We need to add a PDF monitor for Form 1040. By clicking on the hyperlink to Form 1040 on the webpage, we will get the PDF’s URL: https://www.irs.gov/pub/irs-pdf/f1040.pdf. pdf monitor

  2. We will need to add a monitor that monitors the URL of the PDF for changes. If the URL changes, then we need to update the first monitor added in the previous step. Here are the steps to monitor PDF URL from the source webpage.

    • Add a web page monitor for https://www.irs.gov/forms-pubs/about-form-1040
      monitor pdf's link
    • Select Form 1040 which has the hyperlink to the PDF file.
    • Expand the selection panel to show the selectors and the preview.
    • Search for “href” in the attribute or property list and click to select from the list as shown below: attribute value selection You will see the PDF’s URL in the preview once the href has been selected.
    • Save the selection and configure other settings (check interval, actions, etc) at the Options page and save it.

    The webpage monitor with the above steps monitors the PDF’s link as text. To view the monitored text in detail, you will need to navigate to the change history. By default, change history shows the “visual” view of the monitor. The monitored link is visually not present on the page, so the link will not show in the visual view. You will need to change to “text” view to see it as shown below.

    Text view of monitor

Updating the URL of the PDF monitor

When the link for the PDF changes as monitored in the previous step, you will need to update the URL for your PDF monitor. Here are the steps to do this:

  1. Go to the Options page of the PDF monitor. Options Page
  2. Click on “Edit” as shown below from the Options page and replace the existing URL with the new URL. Edit URL

FAQ

1. Can I monitor parts of the PDF document?

No, PDF documents are by default full page monitors.

2. Can I monitor PDFs on my local device?

No, PDF monitors are only run in Distill’s cloud servers and are available in the Flexi and Enterprise plans.

3. Are there any file size limits for PDFs that can be monitored?

There is no size limit for a PDF. However, if a PDF takes longer to load due to its size, the monitor might display a parsing error because it couldn’t load the PDF successfully.

4. Is it possible to monitor password-protected PDF files with Distill?

No, Distill PDF monitors cannot work on password-protected PDFs.

5. What happens when a PDF I’m monitoring gets deleted?

If the PDF has been deleted or removed, the monitor will show you an error.

6. How to compare two versions of the same PDF file?

If you want to compare two versions of the same PDF file, which are now hosted on different URLs, you’ll need to update the URL of the existing monitor in Distill. For example, the URL of the PDF might have changed from www.somewebsite.com/content/version1doc.pdf to www.somewebsite.com/content/version2doc.pdf due to differences in the contents of the two versions.

To do this: Click the down caret icon on the respective PDF monitor -> select “Edit Options” -> In the options page, edit the source by replacing the old URL with the new one -> Save.

After the check, the PDF monitor will display the change history between the two versions.

Troubleshooting

There are different ways a PDF is generated, sometimes they may not work with Distill. If they do not work, you will see an error in the check log with an error code. Following are the common error codes and the ways to troubleshoot them.

  1. ERR_PDF_PARSE error is usually encountered when Distill is able to download the file but fails to parse it. This usually happens when the file is either not a PDF file or is in a format that is not supported by Distill. You can wait for a few checks to see if Distill is able to download and parse it correctly.
  2. E_DOWNLOAD implies that the PDF file download didn’t complete successfully. This can happen when the download is interrupted and could not be completed. Distill will automatically retry.
  3. E_PDF_UNKNOWN_TYPE is encountered when Distill tries to download the file but the website doesn’t send one. This usually means that either the file is not a PDF file or the website didn’t send one. This can happen for different reasons. For example, the website can choose to block requests or require cookies before letting one download the file.

If the error persists after a few checks, please email us at support@distill.io.

You can also follow along with this step by step video guide to create a PDF monitor.

For advanced PDF monitoring, such as comparing PDFs, tracking changes, and identifying new PDF links, refer to this video.

PDF files are large, checking and diffing them consumes more resources. The cost of a PDF monitor is accounted for as checks in the account. 1 PDF check is counted as 2 checks.

Was this article helpful? Leave a feedback here.