Easily checking for deadlinks on docs.gethue.com

Published on 17 October 2019 in Administration / Version 4 - 2 minutes read - Last modified on 14 September 2020

docs.gethue.com are getting some refreshed content continuously. In addition, a series of links not working (returning a 404) have been fixed. Here is how it was done.

First we used the muffet tool. muffet is a fast link checker crawler, very easy to use:

sudo snap install muffet

Then after booting the hugo documentation server, we point to its url. We also blacklist certain urls to avoid some noisy false positives:

muffet http://localhost:35741/ --exclude ".*releases.*" -f

And here is the output:

$ muffet http://localhost:35741/ --exclude ".*releases.*" -f
http://localhost:35741/user/
404 http://localhost:35741/administrator/configuration/editor/#connectors
http://localhost:35741/developer/parsers/
404 http://localhost:35741/administrator/configuration/editor/#postgresql
http://localhost:35741/administrator/administration/reference/
dial tcp4 127.0.0.1:5555: connect: connection refused http://localhost:5555/tasks
http://localhost:35741/user/querying/
404 http://wiki.apache.org/hadoop/Hive/AdminManual/Configuration
http://localhost:35741/administrator/installation/cloud/
dial tcp4 127.0.0.1:16686: connect: connection refused http://localhost:16686
lookup prometheus on 127.0.0.53:53: server misbehaving http://prometheus:9090/graph
lookup prometheus on 127.0.0.53:53: server misbehaving http://prometheus:9090/targets
http://localhost:35741/administrator/configuration/apps/
404 http://localhost:35741/administrator/configuration/dashboard
404 http://localhost:35741/administrator/configuration/editor/
404 http://localhost:35741/developer/editor/
404 http://localhost:35741/user/editor/
dialing to the given TCP address timed out https://dev.mysql.com/downloads/connector/j/
http://localhost:35741/user/browsing/
404 http://localhost:35741/administrator/configuration/external/
http://localhost:35741/developer/sdk/
404 http://localhost:8000
404 https://github.com/cloudera/hue/blob/master/desktop/core/src/desktop/static/desktop/js/autocomplete/jison
404 https://github.com/cloudera/hue/tree/master/desktop/libs/metadata/catalog
http://localhost:35741/administrator/configuration/connectors/
404 http://localhost:35741/administrator/configuration/external/
404 http://localhost:35741/user/browsers#adls
404 http://localhost:35741/user/browsers#s3
404 http://localhost:35741/user/browsers/
http://localhost:35741/developer/development/
404 http://docs.python.org/library/hotshot.html
404 https://en.wikipedia.org/wiki/Hue_(Software
404 https://twitter.com/gethue!
lookup developer on 127.0.0.53:53: server misbehaving

Et voila! Then a few searches and replaces in the documentation content and we have a cleaner experience! Next action is to add the link checking to the Continuous Integration to fully automate the process and scale the developer productivity.

Any feedback or question? Feel free to comment here or on the Forum or @gethue and quick start SQL querying!

 

Romain from the Hue Team


comments powered by Disqus

More recent stories

20 October 2020
Tutorial on querying live streams of data with Flink SQL
Read More
20 October 2020
Tutorial on querying live streams of data with ksql (Kafka SQL)
Read More
30 September 2020
Quick checking Hue's SQL Editor connections to Databases in Kubernetes
Read More