Jekyll linkchecker

2019, Sep 24    

Below are notes for stuff I find relevant, and keep loosing.

I use jekyll for most pages, and adding a link checker makes sense. Inspiration from here

In Gemfile, add

gem 'html-proofer'

In .gitlab-ci.yml, I then have a test step loking something like this

test:
  stage: test
  before_script:
  - bundle install
  script:
  - bundle exec jekyll build -d ./test/$BASE_URL
  - bundle exec htmlproofer --assume-extension --url-ignore '#' --log-level :debug ./test
  artifacts:
    paths:
    - test
    when: on_failure

--assume-extension is about jekyll linking to somefile instead of somefile.html. See e.g. stack overflow

Note that the build directory (i.e. ./test/$$BASE_URL above) should be changed depending on your jekyll baseurl. If you want to copy the files, the actual site will reside in ./test/$$BASE_URL.

The way gitlab has implemented pages, linking to a non-existing pages on an existing repo, will not yield an error with the linkchecker. It will succesfully fetch a login page. This is bad - I currently have no workaround.

I get stuff like

Received a 0 for http://about/  in ./test/about/index.html

In the HTML, I have <link rel="canonical" href="//about/">. The error is that I have an extra / in the url.

From the jekyll config file _config.yml,

baseurl: ""
url: "/"

The value url includes a trailing /. Removing it (i.e. setting url: "") solves it.

Encoding errors….

Yes, ruby1.9 has introduced som changes.

To be encoding specific, add the following to Gemfile

# Work around issue with invalid byte sequence in US-ASCII
# From https://gitlab.henriksen.is/espen/website/commit/e92948c585b88b6e8a5462d908319d3bd2c379b0
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8