In a recent project we had the opportunity to connect a statically generated site, programmed with Nuxt.js, to the headless content management system Livingdocs for the Swiss government's website ch.ch.
Because the CMS contains a large amount of content (e.g. roughly over 1100 pages), the API server was brought to its knees under the heavy load of requests during build time.
Let me tell you how we were able to solve this problem.
Fetching the same data over and over
Due to the nature of Nuxt.js and Livingdocs, we need to fetch data from the API from all across the code base:
We pull the publications and generate the nuxt pages from them.
We crawl the menus and build pages from there, generating them with the correct URL.
Then there are various cases of pages having to resolve other content. Examples are internal links, FAQ content, referencing the page in another language etc.
This results in hitting the API repeatedly for the same data. With 1100 pages and several more data points, this is not good. Apart from exhausting the server, this also feels like terrible engineering altogether.
Storing the data locally
An obvious solution would be to store the fetched data across all the various API calls. We failed at implementing this. Nuxt apparently doesn't generate the content in one single, continuous thread of execution, which gives us no means to store data between API calls.
Instead, we needed an external solution: A proxy cache to run alongside the Nuxt process.
Since we couldn't find any software that fit our needs or wasn't too complicated to use, we decided to code it ourselves!
The idea is simple: we loop the request to the API through a locally running service. The service fetches the data from the API and stores it locally. Every subsequent request to the same address will then be served from the store, thus avoiding multiple, useless requests for the same data.
Since we're already in Javascript land, we use Node and express.js to create a simple cache service:
const express = require('express')
const port = 1337
const app = express()
const cache = {} // The memory is enough to hold all the data
app.get('/*', async (req, res) => {
if (cache[id]) {
// return the cached data
} else {
// fetch from API and store it to the cache
}
})
const server = app.listen(port, () => {
console.log('Listening...')
})
We need to identify the records in the cache somehow. Since the API returns the data in a stateless manner, we can easily take the URL for identification. We also need to assemble the URL to the API, so we'll use that one for the identifier. The base URL comes from the .env file:
require('dotenv').config() // parse the .env file
const makeId = (str) => {
return encodeURIComponent(str)
}
const API_URL = process.env.API_URL
app.get('/*', async(req, res) => {
const url = `${API_URL}${req.originalUrl}` // make url to the server
const id = makeId(url) // make id out of it
if (cache[id])
// ...
})
If there's something in the cache, we return that to the requesting client, i.e. our NuxtJS process:
if (cache[id]) {
res.set('Content-Type', 'application/json')
res.send(cache[id])
res.end()
} else {
// ...
}
Now for the other case, we request the data from the API, proxy it to the client and store it in the cache:
const http = require('http');
// ..
} else {
http.get(url, (resp) => {
let data = ''
resp.on('data', (chunk) => { // store the chunks in data
data += chunk
})
resp.on('end', () => { // no more chunks to be received, now send it onward
res.set('Content-Type', 'application/json')
res.send(data)
res.end()
cache[id] = data // don't forget to store it in the cache!
})
})
}
})
This is basically it. We need to tweak a thing or two to be able to fetch data from the livingdocs API.
First, we want this code to handle the https as well as the http protocol. To do that, we import the https module:
const https = require('https')
Based on the incoming URL, we decide which module to use:
let requestTool = http
if (url.startsWith('https://')) {
requestTool = https
}
// Formerly: http.get(url, (resp) => {
requestTool.get(url, (resp)) => {
//..
}
Another thing is authorization: the API wants a request token before delivering its data to us. The easiest way to pass this on is to just.. well, pass this on:
// get the authorization header from the original request
const auth = req.get('Authorization')
if (!auth) {
return res.send('No authorization header provided')
}
const headers = {
Authorization: auth, // put the header in the new request
}
Now include the headers in the request:
requestTool.get(url, { headers }, (resp) => {
And we're good to go! This is how you prevent your server from going down due to too many requests.
Fetching the data from the local storage
We need to direct the Nuxt API now to fetch the data from the cache instead of the server, i.e. http://localhost:1337
We could start this script now alongside the Nuxt generation process. But good engineering makes things more reliable: let's start the proxy cache automatically when Nuxt is generating the site.
For this, we slightly adapt the command in package.json
:
"generate": "node cacheServer.js & nuxt generate",
Now there is a little caveat: the cache server keeps running after the generator has finished. To be able to stop the cache, we add another route to it:
app.post('/kill', (req, res) => {
server.close()
res.sendStatus(200)
res.end()
})
Now we can POST
to http://localhost:1337/kill
to stop the cache. We do this by hooking into the sitemap generation by adding this to the nuxt.config.js
:
hooks: {
sitemap: {
generate: {
async done() {
await axios.post('/kill')
},
},
}
}
Generating the site should now take less time and be easier on the API server.
This may seem like circumventing a problem instead of actually solving it, which would probably mean changes to the Nuxt.js core. If you have a better solution on how to do this, I sure would love to hear about it!
Featured image by Ioana Mitrea.