Page capture hooks

The SEO4Ajax crawler has been designed to detect automatically when a page is ready to be captured by monitoring HTTP requests and responses. However, you can optimize the time the SEO4Ajax crawler takes to capture pages.

The onCaptureReady callback

The SEO4Ajax crawler injects a JavaScript function onCaptureReady in all pages before capturing them. To optimize the capture time, you can call the onCaptureReady function from your page as soon as the page is ready to be captured.

The onCaptureReady function allows:

indicating the page is ready to be captured
managing redirects
managing error pages

If you only need to indicate to the crawler the page is ready to be captured, just call the onCaptureReady function without any parameters as soon as the Ajax content has been inserted into the DOM.

Here is an example using jQuery:

$.ajax({
  url: "http://example.com/api/content"
}).done(function(data) {
  // insert your JSON data in the DOM
  if (window.onCaptureReady) {
    window.onCaptureReady();
  }
});

If you need to indicate to the crawler that the page currently being captured is invalid (e.g. the Ajax content cannot be retrieved or a server error occurred), you can pass a HTTP code as first parameter when calling the onCaptureReady function. The SEO4Ajax API will return this code as the status code when requesting the page URL.

Here is an example using jQuery:

$.ajax({
  url: "http://example.com/api/invalid-content"
}).done(function(data) {
  // ...
}).fail(function() {
  // An error occurred, the API should return this page with a 503 HTTP response
  if (window.onCaptureReady) {
    window.onCaptureReady(503);
  }
});

If you need to indicate to the crawler the page being captured is a redirect, you can pass as second parameter the URL of the redirect. The SEO4Ajax API will return this URL into the HTTP Location header.

Here is an example using jQuery:

$.ajax({
  url: "http://example.com/api/content"
}).done(function(data) {
  if (window.onCaptureReady) {
    // We prefer that people use the "http://example.com/" URL to access this page
    if (data.redirection) {
      window.onCaptureReady(302, "http://example.com/");
    }
  }
});

The willCallOnCaptureReady flag

A global variable willCallOnCaptureReady indicates if the crawler must wait for the call of onCaptureReady() callback to capture the page. Its default value is false.

When set to true the onCaptureReady() function call will be mandatory. If it is never called, the capture will not be saved. After 3 retries, the path of the page will appear in the errors view of the console.

If willCallOnCaptureReady is set to false and the onCaptureReady() function call is omitted then the capture will be triggered once the network is idle.

This value should be set a soon as possible, the safest way to set it is to insert this inlined script <script>window.willCallOnCaptureReady = true;</script> in the <head> of your HTML page.

Special paths

The alternative to the use of onCaptureReady, is to use special paths to indicate to the crawler that a page does not exist.

/error, /#error, /#!error, /#!/error indicate the page is page not found (HTTP 404)
/4xx, /#!4xx, /#!/4xx allows customizing the error HTTP code (i.e. 400 < HTTP code < 500)

Here is an example of code using jQuery:

$.ajax({
  url: "http://example.com/api/content"
}).done(function(data) {
  // ...
}).fail(funtion(jqXHR, textStatus, errorThrown) {
  if(jqXHR.status == 404) {
    history.replaceState(null, null, "/404");
  }
});

Special titles

You can also use special values for the document.title property to indicate to the crawler that a page does not exist.

404, error, 404 error, error 404, 404 not found, page not found indicate the page is page not found (HTTP 404)

Here is an example of code using jQuery:

$.ajax({
  url: "http://example.com/api/content"
}).done(function(data) {
  // ...
}).fail(funtion(jqXHR, textStatus, errorThrown) {
  if(jqXHR.status == 404) {
    document.title = "error"
  }
});

The data-s4a-remove-element attribute

You can remove any element from the document before the capture of the page. In order to do this, just add the special attribute data-s4a-remove-element to the element you want to remove from the document.

Here is an example of code using jQuery:

$(".cookies-banner").attr("data-s4a-remove-element", "");

The data-s4a-keep-script attribute

By default, the SEO4Ajax crawler removes all script tags from the captured document. You can alter this behavior and force the crawler to keep any script tags in the document. In order to do this, just add the special attribute data-s4a-keep-script to the script tag you want to keep in the document.