I have several projects that use Solr as the search engine, with custom schemas to suit the data being stored. This means that when I test the project code, it needs to be tested against a Solr instance using the correct schema.
I also like to use GitHub Actions to run tests automatically when I push to the repository or create a pull request. GitHub Actions lets you create service containers so that your tests can conveniently use tools like Solr without a lot of setup. Unfortunately, it's not straightforward to change the Solr schema that a Solr service container uses.
It took me a while to figure out how to combine all these tools at once, so here's how I did it.
Initial setup
The project I wanted to test was a CKAN extension. I'll be basing my examples in this blog post on this. Here's the initial test.yml
that I started with, which didn't use a custom Solr schema, and here's the working test.yml
I ended up with. Both of them are for a repository called ckanext-example
.
The code snippets that follow are simplified a bit for clarity.
The test.yml
that I started out with defines a CKAN container and two service containers, Solr and Postgres, all versioned according to the CKAN version we're testing against. The CKAN container is where the test
job will be run.
name: Tests
on: [push, pull_request]
jobs:
test:
strategy:
matrix:
ckan-version: ["2.10", 2.9]
fail-fast: false
name: CKAN ${{ matrix.ckan-version }}
runs-on: ubuntu-latest
container:
image: openknowledge/ckan-dev:${{ matrix.ckan-version }}
services:
solr:
image: ckan/ckan-solr:${{ matrix.ckan-version }}
postgres:
image: ckan/ckan-postgres-dev:${{ matrix.ckan-version }}
env:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
options: --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5
Here's what the Docker environment on the runner looks like:
My first idea was to mount the config files in my project repository as volumes for the Solr service container, like I would do locally:
services:
solr:
image: ckan/ckan-solr:${{ matrix.ckan-version }}
env:
SOLR_CONFIG_CKAN_DIR: /opt/solr/server/solr/configsets/ckan/conf
SOLR_HEAP: 1024m
volumes:
- ./solr/schema.xml:${SOLR_CONFIG_CKAN_DIR}/managed-schema
- ./solr/german_dictionary.txt:${SOLR_CONFIG_CKAN_DIR}/german_dictionary.txt
- ./solr/solrconfig.xml:${SOLR_CONFIG_CKAN_DIR}/solrconfig.xml
This didn't work. The service containers are all started up before the steps of the GitHub Actions job are run, including checking out the code of the repository. That means that the files I wanted to mount as volumes didn't yet exist when the service container was started up.
Create your own Solr container
I realised I had to create the Solr container as part of the job's steps, instead of using a service container.
name: Tests
on: [push, pull_request]
jobs:
test:
...
env:
WORKDIR: /__w/ckanext-example/ckanext-example
SOLR_CONFIG_CKAN_DIR: /opt/solr/server/solr/configsets/ckan/conf
...
steps:
- uses: actions/checkout@v3
- name: Create solr container
run: |
/usr/bin/docker create --name test_solr --network ${{ job.container.network }} --network-alias solr \
--workdir $WORKDIR --publish 8983:8983 \
-e "SOLR_HEAP=1024m" -e "SOLR_SCHEMA_FILE=$SOLR_CONFIG_CKAN_DIR/managed-schema" \
-e GITHUB_ACTIONS=true -e CI=true -v "${{ github.workspace }}/solr/schema.xml":"$SOLR_CONFIG_CKAN_DIR/managed-schema" \
-v "${{ github.workspace }}/solr/german_dictionary.txt":"$SOLR_CONFIG_CKAN_DIR/german_dictionary.txt" \
-v "${{ github.workspace }}/solr/solrconfig.xml":"$SOLR_CONFIG_CKAN_DIR/solrconfig.xml" \
ckan/ckan-solr:${{ matrix.ckan-version }}
docker start test_solr
The first step above checks out the code, of course.
The second step creates and starts the Solr container. Most of the arguments passed to the docker create
command are the same as would be given when the GitHub Actions runner creates the service container normally. I've added the new volumes and also defined a couple of env variables to make things simpler:
WORKDIR: /__w/ckanext-example/ckanext-example
We'll need to refer to this workdir when creating the Solr container and when running commands on the CKAN container. For convenience, I've defined it here as an env variable. Replaceckanext-example
with the name of your repository.SOLR_CONFIG_CKAN_DIR: /opt/solr/server/solr/configsets/ckan/conf
As this long path is used several times in the arguments todocker create
, I created an env variable to use as a shortcut. Be sure to replaceckan
in this path with the name of the core you're using.-v "${{ github.workspace }}/solr/schema.xml":"$SOLR_CONFIG_CKAN_DIR/managed-schema"
Here's where you mount your custom Solr schema as a volume. You can see that I mounted several additional files as volumes too, because they were needed for my setup. All my Solr config files are in a directory calledsolr/
in my repository.
Create your own container to run the tests in
If we're creating and running a Solr container as part of the job steps, we can't run the job itself in a container. That would give us a system like this:
Instead, we add another step to the job, to create and run the container that the tests will run in. This time, we are calling docker create
exactly the same way that the Github Actions runner would do.
steps:
...
- name: Create ckan container
run: |
/usr/bin/docker create --name test_ckan --network ${{ job.container.network }} --network-alias ckan \
-e "HOME=/github/home" -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" \
-v "/home/runner/work":"/__w" -v "/home/runner/work/_temp":"/__w/_temp" \
-v "/home/runner/work/_actions":"/__w/_actions" -v "/opt/hostedtoolcache":"/__t" \
-v "/home/runner/work/_temp/_github_home":"/github/home" \
-v "/home/runner/work/_temp/_github_workflow":"/github/workflow" \
--entrypoint "tail" openknowledge/ckan-dev:${{ matrix.ckan-version }} "-f" "/dev/null"
docker start test_ckan
The Postgres service container can remain as it is. It's part of the Docker local network and can communicate with both the CKAN and Solr containers.
Run your tests with docker exec
Lastly, any further steps in the job (like actually running the tests) have to be updated so that they're called in the new CKAN container, using docker exec
. Notice that we're using the $WORKDIR
env variable to specify the paths to the setup script and the tests.
- name: Install requirements and set up ckanext
run: |
docker exec test_ckan $WORKDIR/bin/install_test_requirements.sh ${{ matrix.ckan-version }}
- name: Run tests
run: |
docker exec test_ckan pytest --ckan-ini=$WORKDIR/test.ini \
--disable-warnings $WORKDIR/ckanext/example/tests
Conclusion
GitHub Actions service containers are so convenient to set up and use that I was surprised there was no way to use a custom Solr schema for testing, something I need to do in most of my projects. I hope seeing how I made this work will be helpful for your future testing!