Monitor PDF for changes
Available in Flexi and Enterprise plan
How to track a PDF file for changes?
Distill offers the capability to monitor changes in a PDF document, provided that the document is accessible via a public URL. To use this functionality, you need to use the web application available at https://monitor.distill.io. It is important to note that this feature is not supported by the browser extension.
If the PDF file is associated with the hyperlink on a webpage, click on the link to get PDF’s URL. Once you have the URL, you can follow the steps given below to add a PDF monitor:
Access the web app’s Watchlist located at https://monitor.distill.io.
Click on the “Add Monitor” button and select “PDF” from the list.
Enter the URL of the PDF file on the Source page and click on the “Save” button.
An options window will open. At this page, you have the option to configure settings such as check intervals and actions to be taken when changes occur. Once done, save it.
Your PDF monitor is set now. You can check its contents by clicking on the text preview of the monitor. If a change is found, you will be notified via actions previously set on the Options page.
You can further compare two versions of the PDF monitor by clicking on the “Explore diff” option under the change history.
How can I monitor changes to a PDF file if its link on the website changes frequently?
In certain scenarios, you might need to monitor a PDF file that is linked on a webpage and the hyperlink undergoes frequent changes. In such instance, you will need to create the following two monitors:
- Add a PDF monitor using the PDF’s URL.
- Add a webpage monitor for the webpage where the PDF link shows. You will need to monitor the attribute “href” to watch the URL of the PDF for changes.
When you get an alert on the 2nd monitor for the link change, you will need to manually update the URL of the first monitor (PDF monitor).
Example: Let’s monitor the PDF file for “Form 1040” on the page https://www.irs.gov/forms-pubs/about-form-1040. Here are the steps that we need to follow:
We need to add a PDF monitor for Form 1040. By clicking on the hyperlink to Form 1040 on the webpage, we will get the PDF’s URL: https://www.irs.gov/pub/irs-pdf/f1040.pdf.
We will need to add a monitor that monitors the URL of the PDF for changes. If the URL changes, then we need to update the first monitor added in the previous step. Here are the steps to monitor PDF URL from the source webpage.
- Add a web page monitor for https://www.irs.gov/forms-pubs/about-form-1040
- Select Form 1040 which has the hyperlink to the PDF file.
- Expand the selection panel to show the selectors and the preview.
- Search for “href” in the attribute or property list and click to select from the list as shown below: You will see the PDF’s URL in the preview once the href has been selected.
- Save the selection and configure other settings (check interval, actions, etc) at the Options page and save it.
The webpage monitor with the above steps monitors the PDF’s link as text. To view the monitored text in detail, you will need to navigate to the change history. By default, change history shows the “visual” view of the monitor. The monitored link is visually not present on the page, so the link will not show in the visual view. You will need to change to “text” view to see it as shown below.
Updating the URL of the PDF monitor
When the link for the PDF changes as monitored in the previous step, you will need to update the URL for your PDF monitor. Here are the steps to do this:
- Go to the Options page of the PDF monitor.
- Click on “Edit” as shown below from the Options page and replace the existing URL with the new URL.
There are different ways a PDF is generated, sometimes they may not work with Distill. In case they do not work, you will see an error in the check log with an error code. Following are the common error codes and the ways to troubleshoot them.
- ERR_PDF_PARSE error is usually encountered when Distill is able to download the file but fails to parse it. This usually happens when the file is either not a PDF file or is in a format that is not supported by Distill. You can wait for a few checks to see if Distill is able to download and parse it correctly.
- E_DOWNLOAD implies that the PDF file download didn’t complete successfully. This can happen when the download is interrupted and could not be completed. Distill will automatically retry.
- E_PDF_UNKNOWN_TYPE is encountered when Distill tries to download the file but the website doesn’t send one. This usually means that either the file is not a PDF file or the website didn’t send one. This can happen for different reasons. For example, the website can choose to block requests or require cookies before letting one download the file.
If the error persists after a few checks, please email us at email@example.com.
You can also follow along with this step by step video guide to create a PDF monitor.
For advanced PDF monitoring, including comparing PDFs, tracking changes, and identifying new PDF links, refer to this video.
PDF files are large, checking and diffing them consumes more resources. The cost of a PDF monitor is accounted for as checks in the account. 1 PDF check is counted as 2 checks.