Since Zap Technology Solutions (now Argil DX) released its free Security Scanner there have been a lot of folks scanning their websites. However, a few scans reported that sites were more vulnerable than they actually were. Something wrong with the security scanner? No. It is due to a simple misconfiguration of the website’s AEM/CQ5-based “Page Not Found” response. You see, the standard HTTP response code for a successfully served page is 200 (OK). However, typically a “Page Not Found” page has a response code of 404 (Page Not Found).
Regardless of the response code, your customers will see the same content. However, there are things that happen behind the scenes that could impact the performance of your site and may endanger your site. If you’re using Adobe Dispatcher to cache your AEM/CQ5-served content, Dispatcher only caches content which has a 200 response code. If your content responds with a 404 (or other error code), Dispatcher doesn’t cache it.
Imagine a scenario in which a user typos a page name (hmoe.html rather than home.html) in the URL for your site. They receive a “Page Not Found” response and that page responds with a 200 code. Dispatcher will now cache that page (hmoe.html) on your web server. Since the page doesn’t really exist on your site, there will never be a request to clear that page from your dispatcher cache. It will sit there until it is manually deleted or until its parent page is deactivated. It’s probably not a very big page… Maybe 20 KB. However, imagine that 500 users per month make similar typos on URLs. That would be 500 errant pages in your Dispatcher cache per month (assuming none of the typos have already been cached). These pages will likely stay on your web servers’ drives for a very long time. Eventually, it could cause your drives to fill up which, at the very least, would prevent new content from being cached and at the worst, could cause your web server to stop serving content.
You might say that it’s unlikely enough “Page Not Found” pages would be served to cause your drives to fill up. You might be right. However, imagine another scenario; a more nefarious scenario. A person (let’s call them a hacker) could become aware of the fact that your site’s “Page Not Found,” responds with a 200 code which could cause it to be cached (not only by your systems, but if your site uses a CDN, that would also cache it). It wouldn’t take a very sophisticated hacker to repeatedly ping your site with random page names which continue to cache and fill up your web servers’ drives.
Here’s how to check how your “Page Not Found” page responds… Determine the URL of a page which you know is served by AEM/CQ5. Let’s use the example http://www.abcd.com/content/home.html. Change the page name (home) to a name that you know doesn’t exist on your site (example: blah). So your new URL will be http://www.abcd.com/content/blah.html. Browse to /response-check.html and enter your new URL into the box provided and click “Check Response.” It should provide you with the response code that the URL returns. If it’s 200 (and you know the page should not exist), it will likely take a development effort to get AEM/CQ5 to respond with the appropriate 404 response code.
This is just one of the many hidden issues, which are not readily apparent to your marketing team or to your customers, which might exist on your AEM/CQ5 implementation. Nevertheless, if this type of issue was to be exploited, you can be sure that it would affect your customers and that marketing would take note. Take a few minutes to test your Page Not Found response and then sign up for the security scanner to scan your AEM/CQ5-based website for other security issues.