Drupal 8 and Solr: Google-fast search on your own website | Why and how

Open Source Drupal


Node Author

Speed is important in a Drupal website, very important. For visitors as well as search engines, it’s an essential benchmark to success. But how do you keep your Drupal system fast, even when it has millions of pages and documents? Solr is the answer, here is why and how in Drupal 8.

(Dutch version here)

At the moment, we’re migrating a Drupal 7 website to a Drupal 8 system in which there will be millions of pages and documents in the near future. The aim is to have end-users navigate mainly through the search function. That means it has to be extremely fast.

Drupal 8 core contains a great search function, but because this searches standardly in a MySQL database, the search function becomes too slow when the website contains a lot of pages and documents.

Fast search within Drupal 8 with Solr 6

Just to be clear: when you have a Drupal 8 website with a lot of content, the system will be fast enough when you navigate through clicking menu items or links in content/blocks. We’re now talking about the search function and getting relevant search results fast.

Search engine Apache Solr has been Drupal’s friend in this for years. An implementation of Solr ensures a fast search function on your Drupal system and offers extra functions such as:

  • Fuzzy search: spelling mistakes and differences are possible
  • Facetted search: search filters.
  • Rich text-snippets: a piece of text like Google gives you as well, where the search query is highlighted. If you have implemented schema.org, found data will automatically be presented ‘rich’ and placed in the correct context with relevant label.
  • Did you mean…’ function: suggestions for synonyms, or keywords that are close to your search query.
  • Configure the most relevant results from a query, so the visitor sees the most relevant content at the top. Examples from how you can order: content type, comment count, date, sticky, title, headings (h1->h6), body text and lots more.

Especially the fuzzy search is very welcome; Solr really is a beautiful engine and it’s incredibly fast: millions of documents are no problem. Of course, if your servers capacities are configured correctly.

Implementing Solr in Drupal 7 versus Drupal 8

Solr in Drupal 7 is relatively simple to implement, because it has already been developed for years: lots of stable modules are released to produce all sorts of user stories.

In Drupal 8, the integration has already been developed quite a bit, but it’s not yet plug-and-play like in Drupal 7. The design of the architecture has also been changed: you had two important Solr modules for Drupal 7: Apache Solr Search and Search API Solr, both have their own eco-system of modules.

In Drupal 8 the forces were joined, whereby the development became better and extra modules don’t have to choose anymore which eco-system they will build. Modules as Facet API (now Facets), Facet API Pretty Paths, Search API Autocomplete, etc.) can now focus on one implementation instead of two (source).

Implementing Solr in Drupal 8

At the time of this writing, the Drupal 8 Solr module is in beta1, which includes in this case that it is stable enough to put it in, but it's not plug-and-play yet.

Underneath, you can find an explanation of how we got it to work:

The Solr server

We have set up a new VPS, on which Solr is installed. It doesn’t work on the live server of the website, because the concerned hoster didn’t support it.

Besides that, Solr can be managed and secured better on a dedicated, isolated server. By means of opening and closing the right ports, a connection can be made between the public server of the Drupal 8 website and the Solr server.

All other ports are closed using a firewall. You don’t need graphical interfaces, only an SSH access which is secured with keys. There is one GUI: that's Solr's. We blocked that with help of the firewall. You can also do this with help if Linux' IP tables. But make sure to add a service for this, or else you'll loose your security on reboot.

The capacity for the needed Solr server can be calculated, see here.

Install and configure Drupal Solr module

We installed the following modules:

  • Search (core)
  • Search Api - contrib module
  • Search Api Solr - contrib module
  • Composer manager - contrib module, is used for managing external libraries. In this case we are going to install the Solarium lib.

Optional

  • Search Autocomplete - contrib module, this is used so that the search box in the frontend can also be used by the end user to search through Solr. This makes fuzzy search and instant search suggestions relatively easy. This doesn’t work out-of-the-box at the moment, that’s why we have written a custom code for it (see further down this blogpost).

Installation and configuration Drupal 8 connection with Solr

I’m not going to explain the installation of Solr itself in this blogpost, you can find instructions for that here.

1. Installation ‘Solarium’ library in Drupal 8

If you’ve installed the modules mentioned above, you first go to Reports -> Composer manager. There you will see the code that you have to fill in your terminal to install ‘Solarium’ lib.

As soon as you’ve gone through the ‘composer drupal-update’, Solr initially is ready in Drupalrootmap/vendor/solarium:

FYI: you can probably find the ‘Vendor’ file in your GIT ignore file.

2. Add Solr ‘server’ to Drupal 8

  • Go to Settings -> Search and metadata -> Search API.
  • Click Add server and configure the Solr server, an example of the local installation:

Most important settings:

  • Solr version (we use version 6)
  • Port
  • Host

Click ‘Save’ -> Now your Drupal 8 system has a Solr server connection, but isn’t doing anything at the moment.

3. Add ‘Index’

To actually get Drupal content indexed in Solr, you will have to make an ‘index’. Go to Settings -> Search and metadata -> Search API. Click on ‘Add index’:

After you’ve made an index, open it and click the tab ‘Fields’, afterwards click ‘Add Fields’:

Now you can configure the index:

This interface is still a little cumbersome, probably because the Drupal Solr module currently still is in beta1. For example, you can’t see which fields you’ve added to the index, so pay attention to not add the fields multiple times.

It’s a manual task, but it’s fine to do.

You can see the added Fields under the tab ‘Fields’:

4. Configure ‘Processors’

Click the tab ‘Processors’, here you can indicate under what conditions the Solr search has to be executed:

5. Let’s index

Your Drupal content should now be indexed in Solr at a Cron run, you can start this manually. You can see the status:

6. Frontend: queries and search results for end users

This is what it’s all about in the end, search results for the Drupal website visitor:

  • Super-fast
  • Most relevant results on top
  • Spelling mistakes, no problem

Think: Google for your own website ;)

The modules that are available for this don’t work good enough (yet) for our implementation. It didn’t seem to work with Views either, you can configure an ‘index View’, but that didn’t give any results.

Custom code seemed to be the best option at the moment; to get the global ‘Search autocomplete box’ to work as well.

>> Check the code on Github here >>

Explanation of some code snippets

For explanation: see code comments, herewith are some code snippets explained:

hook_page_attachments_alter is to solve the bug from Input field value is not sent with the AJAX request when using URL call back with custom view and exposed filter. The search autocomplete is only able to use contextual filters. With this hook, you can use your own filter (or set a default).

The LuciusSolrController is divided in:

__construct for making, there’s a @todo in here.

$config = \Drupal::config('search_api.server.solr');

This actually must not happen and has to come as a service. (because of dependency injection, which is missing for this part).

create

Different services are added here (injected in the container).

search

We build the custom search here, this function has a second $processing_callback which should be used for the autocomplete version, but which still has to be adjusted.

lucius_solr_autocomplete_doc_processing

This is for the construction of the autocomplete results.

lucius_solr_autocomplete

This would have to invoke the lucius_solr_autocomplete_doc_processing callback.

getSolrConnector

Function for downloading the solrConnector.

prepare_solr_query

Function that the query fuzzy makes and cleans up using an xss filter. Probably other things have to be done here for contains etc. but for now fuzzy is sufficient. (Possibly contains has to be in the search).

Beta

This module is also still in beta, what means that it should be tested extensively, on speed, safety, stability and documentation.

I can imagine you have questions about it, let us know.

Frontend search results screenshots

Since the Drupal 8 data contains private data, we can’t show those results, instead here are a few simular screenshots from a Solr implementation in OpenLucius (a work management system contributed as a Drupal distribution).

Auto complete search box

Search results including in-document search

With help of Apache Tika to index attached document content (doc, xls, pdf, zip, etc):

FYI: Data collections in Solr

You can make data collections in Solr if you want to index more than one website or external data source or if you want to make them more searchable, then it’s good to do that in different data collections. Depending on your version of Solr, it’s also called:

  • Core (Solr v4)
  • Shard/collection (Solr v5)
  • Collection (Solr v6)

To summarize

Does your Drupal website contain a lot of content and documents? Solr helps big time. In Drupal 7, this integration already was very developed and relatively easy: no need for a custom code.

In Drupal 8, this implementation is also available in beta1, custom code is still needed to get the results as you like on the screen of the Drupal website visitor. I will not be surprised if you won’t need custom code anymore in a few months, because the Drupal community constantly works hard on the Solr modules.

Header image credit