org.opencms.search.solr.README.md Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of opencms-test Show documentation
OpenCms is an enterprise-ready, easy to use website content management system based on Java and XML technology. Offering a complete set of features, OpenCms helps content managers worldwide to create and maintain beautiful websites fast and efficiently.
There is a newer version: 18.0
Show newest version
`Version 1.0 (12/2013)`


# Abstract #

After searching with Apache's Lucene for years Apache Solr has grown and grown and can now be called an enterprise search platform that is based on Lucene. It is a standalone enterprise search server with a REST-like API. You put documents in it (called "indexing") via XML, JSON or binary over HTTP. You query it via HTTP GET and receive XML, JSON, or binary results. To get a more detailed knowledge what Solr exactly is and how it works, please visit the [Apache Solr](http://lucene.apache.org/solr/)  project website. Searching with the powerful and flexible Apache Solr's REST-like interface will drill down the development complexity. More over you can rely on existing graphical interfaces that provide comfortable AJAX based search functionality to the end user of your internet/intranet application. 

Since version 8.5 OpenCms integrates Apache Solr. This document will give you a brief introduction on the Solr/OpenCms integration details. It uses links referring a locally installed OpenCms verion >= 8.5. Assuming you run OpenCms on localhost:8080 and the OpenCmsServlet is reachable under http://localhost:8080/opencms/opencms you can click the links and the examples will open.

# Searching for Content in OpenCms #
*OpenCms 8.5 integrates Apache Solr. And not only for full text search, but as a powerful enterprise search platform as well.*


## Full featured faceted search based on Solr ##

The OpenCms standard distribution covers a full featured search demo, that shows

- Faceted search
- Auto completion
- Spellchecking
- Pagination
- Sorting
- Share results
- and more ...

Click here to open the [full featured faceted search based on Solr](http://localhost:8080/opencms/opencms/demo/search-page/).


## Retrieve OpenCms content via HTTP endpoint ##

Imagine you want search for "OpenCms" in all articles, that have been changed within the last week and sort the results by modification date:

http://localhost:8080/opencms/opencms/handleSolrSelect 
                                         // URL of the Solr HTTP endpoint
    ?q=OpenCms                           // Search for the word 'OpenCms'
    &fq=type:bs-blog                     // Restrict the results by type
    &fq=lastmodified:[NOW-7DAY TO NOW]   // Filter query on the field lastmodified with a range of seven days
    &sort=lastmodified desc              // Sort the result by date beginning with the newest one



## Pass any Solr query to the Solr select request handler ##

As parameter of the new OpenCms Solr request handler (handleSolrSelect) you can pass any "Solr valid" input parameters.

To get familiar with the Solr query syntax have a look at [Solr query syntax](https://cwiki.apache.org/confluence/display/solr/Query+Syntax+and+Parsing). OpenCms uses the [edismax](https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser) query parser as default. For advanced query syntax features the [Solr Reference Guide](https://cwiki.apache.org/confluence/display/solr/Searching) will lend a hand.

Please note that many characters in the Solr Query Syntax (most notable the plus sign: "+") are special characters in URLs, so when constructing request URLs manually, you must properly URL-Encode these characters.
                                                          q=  +popularity:[10   TO   *]     +section:0
   http://localhost:8080/opencms/opencms/handleSolrSelect?q=%2Bpopularity:[10%20TO%20*]%20%2Bsection:0


For more information, see Yonik Seeley's blog on [Nested Queries in Solr](http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/).


## Handle the response ##

The response produced by OpenCms/Solr can be XML or JSON. With an additional parameter 'wt' you can specify the [QueryResponseWriter](http://wiki.apache.org/solr/QueryResponseWriter) that should be used by Solr. For the above shown query example a result can look like this:

```xml

  
    0
    7
    
      dismax
      *,score
      50
      *:*
      
        type:v8article
        contentdate:[NOW-1DAY TO NOW]
        Title_prop:Flower
      
      0
    
  
  
    
      51041618-77f5-11e0-be13-000c2972a6a4
      [B:[B@6c1cb5
      /sites/default/.content/article/a_00003.html
      v8article
      .html
      2011-05-06T15:27:13Z
      2011-08-17T13:58:29Z
      2012-09-03T10:41:13.56Z
      1970-01-01T00:00:00Z
      292278994-08-17T07:12:55.807Z
      
        en
        de
      
      
        en
        de
      
      /system/modules/com.alkacon.opencms.v8.template3/templates/main.jsp
      /.content/style
      OpenCms 8 Demo
      Flower Today
      Nachfolgend finden Sie aktuelle Meldungen und Veranstaltungen rund um die Blume.
      In this section, you find current flower related news and events.
      
        News from the world of flowers you find current flower related news and events.
      
      
        Neuigkeiten aus der Welt der Blumen Blume aktuell Nachfolgend [...]
      
      2012-09-03T10:45:47.055Z
      1.0
    
    
      ac56418f-77fd-11e0-be13-000c2972a6a4
      [B:[B@1d0e4a2
      /sites/default/.content/article/a_00030.html
      v8article
      .html
      2011-05-06T16:27:02Z
      2011-08-17T14:03:27Z
      2012-09-03T10:41:18.155Z
      1970-01-01T00:00:00Z
      292278994-08-17T07:12:55.807Z
      
        en
        de
      
      
        en
        de
      
      /system/modules/com.alkacon.opencms.v8.template3/templates/main.jsp
      /.content/style
      OpenCms 8 Demo
      Flower Dictionary
      In der Botanik existieren zahlreiche Gewächsfamilien [...]
      There are many different types of plants [...]
      
        The different types of flowers Flower Dictionary of plants and flowers [...]
      
      
        Die verschiedenen Gewächsfamilien Blumen Lexikon In der Botanik existieren zahlreiche [...]
      
      2012-09-03T10:45:49.265Z
      1.0
    
  

```

## Send a Java-API query ##
```java
  String query = "fq=type:v8article&fq=lastmodified:[NOW-1DAY TO NOW]&fq=Title_prop:Flower";
  CmsSolrIndex index = OpenCms.getSearchManager().getIndexSolr("Solr Online Index");
  CmsSolrResultList results = index.search(getCmsObject(), query);
  for (CmsSearchResource sResource : results) {
    String path = sResource.getField(I_CmsSearchField.FIELD_PATH);
    Date date = sResource.getMultivaluedField(I_CmsSearchField.FIELD_DATE_LASTMODIFIED);
    List cats = sResource.getMultivaluedField(I_CmsSearchField.FIELD_CATEGORY);
  }
```

The class org.opencms.search.solr.CmsSolrResultList encapsulates a list of 'OpenCms resource documents' ({@link CmsSearchResource}). This list can be accessed exactly like an {@link ArrayList} which entries are {@link CmsSearchResource} that extend {@link CmsResource} and holds the Solr implementation of {@link I_CmsSearchDocument} as member. **This enables you to deal with the resulting list as you do with well known {@link List} and work on it's entries like you do on {@link CmsResource}**.


## Use CmsSolrQuery class for querying Solr ##

```java
  CmsSolrIndex index = OpenCms.getSearchManager().getIndexSolr("Solr Online Index");
  CmsSolrQuery squery = new CmsSolrQuery(getCmsObject(), "path:/sites/default/xmlcontent/article_0001.html");
  List results = index.search(getCmsObject(), squery);
```

## Advanced search features ##

- [Faceted search](http://wiki.apache.org/solr/SimpleFacetParameters)
- [Highlighting](http://wiki.apache.org/solr/HighlightingParameters)
- [Range queries](http://wiki.apache.org/solr/SolrQuerySyntax)
- [Sorting](http://wiki.apache.org/solr/CommonQueryParameters)
- [Spellchecking](http://wiki.apache.org/solr/SpellCheckComponent)
- [Auto suggestion/completion/correction](http://wiki.apache.org/solr/Suggester)
- [Thesaurus/Synonyms](http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters)


## Querying multiple cores (indexes) ##

'Core' is the wording in the Solr world for thinking of several indexes. Preferring the correct speech, let's say core instead index. Multiple cores should only be required if you have completely different applications but want a single Solr Server that manages all the data. See [Solr Core Administration](http://wiki.apache.org/solr/CoreAdmin) for detailed information. So assuming you have configured multiple Solr cores and you would like to query a specific one you have to tell Solr/OpenCms which core/index you want to search on. This is done by a special parameter:

http://localhost:8080/opencms/opencms/handleSolrSelect?   
                              // The URI of the OpenCms Solr Select Handler configured in 'opencms-system.xml'
    &core=My Solr-Index Name  // Searches on the core with the name 'My Solr-Index Name'
    &q=content_en:Flower      // for the text 'Flower'



## Using the standard OpenCms Solr collector ##

OpenCms delivers a standard Solr collector using byQuery as name to simply pass a query string and byContext as name to pass a query String and let OpenCms use the user's request context. The implementing class for this collector can be found at org.opencms.file.collectors.CmsSolrCollector.

```jsp

    
    
        			
        
            Solr Collector Demo
            
                
                <%-- Title of the article --%>
                ${content.value.Title}
                <%-- The text field of the article with image --%>
                
                    <%-- Set the requied variables for the image. --%>
                    								
                        <%-- Output of the image using cms:img tag --%>				
                        ${(cms.container.width - 20) / 3}
                        <%-- The image is scaled to the one third of the container width. --%>
                        
                    									
                    ${cms:trimToSize(cms:stripHtml(content.value.Text), 300)}
                
                
            
        
    

```




# Indexing content of OpenCms #

The OpenCms search/index configuration is done in the file **'opencms-search.xml'** (/webapps/<OPENCMS_WEBAPP>/WEB_INF/config/opencms-search.xml). The following section will explain the OpenCms specific configuration options.

But before going into details, let's say some words about the OpenCms-Index-Strategy in general. In previous days  a typical approach was to create multiple Lucene indexes per use cases. For example if you managed multiple sites or languag versions within a single OpenCms instance one would have created an index per site/language/use-case. Such an index contained only those documents/resources that were releated to that site/language/use-case. Now a days the approach is to index all data (accross all sites and languages or use-cases) in one big index. 

**Having all resources in one big index**

- reduces expense for pushing same data into several indexes,
- reduces computational effort during the indexing process and
- moves responsibilities from the index time to the search time.


### Embedded Solr Server ###

A optional node solr (XPath: opencms/search/solr) is available. To simply enable the embedded Solr Server the opencms-search.xml should start like this:

```xml



  
    

      [...]

  

```

Optionally you can configure the Solr home directory and the name of the main Solr configuration file **solr.xml**. OpenCms then concats those two paths: **{solr_home}{configfile}**

**Example:**

```xml

    /my/solr/home/folder
    rabbit.xml

```

In order to disable Solr system wide remove the **solr-node** or set the enabled attribute to 'false' like:

```xml

```


### External Solr Server ###

It is also possible to make use of an external HTTP Solr server, to do so, replace the line 
```xml

```
with the following:
```xml

```

The OpenCms SolrSelect request handler does not support the external HTTP Solr Server. So if your HTTP Solr Server is directly reachable by **http://{yourSolrServer}** there will no permission check performed and indexed data that is secret will be accessible. What means that you are **self-responsible** for resources that have permission restrictions set on the VFS of OpenCms. But of course you can use the method 

**org.opencms.search.solr.CmsSolrIndex.search(CmsObject, SolrQuery)** or 

**org.opencms.search.solr.CmsSolrIndex.search(CmsObject, String)**

and be sure permissions are checked also for HTTP Solr Servers. Maybe a future version of OpenCms will feature a secure access on HTTP Solr server.


### Configuring Solr search indexes ###

By default OpenCms comes along with a "Solr Online" and a "Solr Offline" index. To add a new Solr index you can use the default configuration as copy template.

```xml

  Solr Online
  auto
  Online
  all
  solr_fields
  
    solr_source
  

```


### Index sources ###

Index sources for Solr can be configured in the file **opencms-search.xml** exactly the same way as you do for Lucene indexes. In order to use the advanced XSD field mapping for XML contents, you must add the new document type **xmlcontent-solr** to the list of document types that are indexed:

```xml

  solr_source
  
  
    /sites/default/
  
  
    xmlcontent-solr
    containerpage
    xmlpage
    text
    pdf
    image
    msoffice-ole2
    msoffice-ooxml
    openoffice
  

```


### Solr XML document types ###

A special document type called **xmlcontent-solr** implemented in **CmsSolrDocumentXmlContent** performs a localized content extraction that is later used to fill the Solr input documents. As explained in section "Custom fields for XML contents" it is possible to define a mapping between elements defined in the XSD of an XML resource type and a field of the Solr document. The values for those defined XSD field mappings are also extracted by the document type named **xmlcontent-solr**. More over there is another Solr related document type doing the extraction for container pages: **containerpage-solr**.

```xml

  xmlcontent-solr
  org.opencms.search.solr.CmsSolrDocumentXmlContent
  
    text/html
  
  
    xmlcontent-solr
  

```

### The Solr default field configuration ###

By default the field configuration for OpenCms Solr indexes is implemented by the class **org.opencms.search.solr.CmsSolrFieldConfiguration**. The Solr field configuration declared in **opencms-search.xml** looks like the following.

```xml

  solr_fields
  The Solr search index field configuration.
  

```

### Migrating a Lucene index to a Solr index ###

An existing Lucene field configuration can easily be transformed into a Solr index. To do so create a new Solr field configuration. Therefore you can use the snippet shown in the sction above as template and copy the list of fields from the Lucene index you want to convert into that skeleton.

There exists a specific strategy to map the Lucene field names to Solr field names:

* **Exact name matching:** OpenCms tries to determine an explicit Solr field that has the exact name like the value of the name-attribute. E.g. OpenCms tries to find an explicit Solr filed definition named **meta** for **<field name="meta"> ... </field>**. To make use of this strategy you have to edit the **schema.xml** of Solr manually and add an explicit field definition named according to the exact Lucene field names.

* **Type specific fields:** In the existing Lucene configuration type specific field definitions are not designated, but the Solr **schema.xml** defines different data types for fields. If you are interested in making use of those type specific advantages (like language specific field analyzing/tokenizing) without manipulating the **schema.xml** of Solr, you have to define a type attribute for those fields at least. The value of the attribute **type** can be any name of each **<dynamicField>** configured in the **schema.xml** that starts with a ***_**. The resulting field inside the Solr document is then named **<luceneFieldName>_<dynamicFieldSuffix>**.

* **Fallback:** If you don't have defined a type attribute and there does not exist an explicit field in the **schema.xml** named according to the Lucene field name OpenCms uses **text_general** as fallback. E.g. a Lucene field **<field name="title" index="true"> ... </field>** will be stored as a dynamic field named **title_txt** in the Solr index.

An originally field configuration like:

```xml
      
        standard
        The standard OpenCms 8.0 search index field configuration.
        
          
            
          
          
            Title
          
          
            Title
          
          
            Keywords
          
          
            Description
          
          
            Title
            Keywords
            Description
          
        
      
```

Could look after the conversion like this:

```xml
      
        standard
        The standard OpenCms 8.0 Solr search index field configuration.
        
          
            
          
          
            Title
          
          
            Title
          
          
            Keywords
          
          
            Description
          
          
            Title
            Keywords
            Description
          
        
      
```

## Indexed data ##

The following sections will show what data is indexed by default and what possibilities are offered by OpenCms to configure / implement additional field configurations / mappings.

### The Solr index schema (schema.xml) ###

Have a look at the Solr **schema.xml** first. In the file **<CATALINA_HOME>/webapps/<MOPENCMS>/WEB-INF/solr/conf/schema.xml** you will find the field definitions that will be used by OpenCms that were briefly summarized before.

```xml

   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
 

 
 
 
 
 
 
 
 
 
 
 
 
 
  
 

 
 
 

 id
```

### Default index fields ###

OpenCms indexes several information for each resource by default:

* **id** - Structure id used as unique identifier for an document (The structure id of the resource)
* **path** - Full root path (The root path of the resource e.g. /sites/default/flower_en/.content/article.html)
* **path_hierarchy** - The full path as (path tokenized field type: text_path)
* **parent-folders** - Parent folders (multi-valued field containing an entry for each parent path)
* **type** - Type name (the resource type name)
* **res_locales** - Existing locale nodes for XML content and all available locales in case of binary files
* **created** - The creation date (The date when the resource itself has being created)
* **lastmodified** - The date last modified (The last modification date of the resource itself)
* **contentdate** - The content date (The date when the resource's content has been modified)
* **released** - The release and expiration date of the resource
* **content** A general content field that holds all extracted resource data (all languages, type text_general)
* **contentblob** - The serialized extraction result (content_blob) to improve the extraction performance while indexing
* **category** - All categories as general text
* **category_exact** - All categories as exact string for faceting reasons
* **text_<locale>** - Extracted textual content optimized for the language specific search (Default languages: en, de, el, es, fr, hu, it)
* **timestamp** - The time when the document was indexed last time
* ***_prop** - All properties of a resource as searchable and stored text (field name: <Property_Definition_Name>_prop)
* ***_exact** - All properties as exact not stored string (field name: <Property_Definition_Name>_exact)

### Custom fields for XML contents ###

Declarative field configuration with field mappings can also be bone via the **XSD-Content-Definition** of an XML resource type as defined in the **DefaultAppinfoTypes.xsd**

```xsd
  
    
      
    
    
    
  

  
    
      
    
    
    
    
    
    
    
  
```

You are able to declare search field mappings for XML content elements directly in the XSD Content Definition. A XSD using this feature can then look like:

```xml
  
    
      
        Author
      
    
    
      
        Homepage
        
        search.special
        dateReleased
        special
      
    
    
      
    
    
      
    
    
      
    
    
      
    
  
```

### Dynamic field mappings ###

If the requirements for the field mapping are more "dynamic" than just: **'static piece of content' -> 'specified field defined in the Solr schema'**, you are able to implement the the interface **org.opencms.search.fields.I_CmsSearchFieldMapping**.

### Custom field configuration ###

Declarative field configurations with field mappings can be defined in the file **opencms-search.xml**. You can use exactly the same features as already known for OpenCms Lucene field configurations.

### Extend the CmsSolrFieldConfiguration ###

If the standard configuration options are still not flexible enough you are able to extends from the class: **org.opencms.search.solr.CmsSolrFieldConfiguration** and define a custom Solr field configuration in the **opencms-search.xml**:

```xml
  
    solr_fields
    The Solr search index field configuration.
    
  
```




# Behind the walls #

## The request handler ##
The class org.opencms.main.OpenCmsSolrHandler offers the same functionality as the default select request handler of an standard Solr server installation. In the OpenCms default system configuration (opencms-system.xml) the Solr request handler is configured:
```xml
  
    
  
```

Alternativly the request handler class can be used as Servlet, therefore add the handler class to
the WEB-INF/web.xml of your OpenCms application:

```xml
  
    Zhe OpenCms Solr servlet.
    OpenCmsSolrServlet
    org.opencms.main.OpenCmsSolrHandler
    1
  
    [...]
  
    OpenCmsSolrServlet
    /solr/*
  
```

## Permission check ##
OpenCms performs a permission check for all resulting documents and throws those away that
the current user is not allowed to retrieve and expands the result for the next best matching
documents on the fly. This security check is very cost intensive and should be
replaced/improved with a pure index based permission check.


## Configurable post processor ##
OpenCms offers the capability for post search processing Solr documents after the document has been checked for permissions. This capability allows you to add fields to the found document before the search result is returned. In order to make use of the post processor you have to add an optional parameter for the search index as follows:

```xml
  
    Solr Offline
    offline
    Offline
    all
    solr_fields
    
      [...]
    
    my.package.MyPostProcessor
  
```

The specified class for the parameter **org.opencms.search.solr.CmsSolrIndex.postProcessor** must be an implementation of **org.opencms.search.solr.I_CmsSolrPostSearchProcessor**.

## Multilingual support ##
There is a default strategy implemented for the multi-language support within OpenCms Solr search index. For binary documents the language is determined automatically based on the extracted text. The default mechanism is implemented with: [Laguage detection](http://code.google.com/p/language-detection/)

For XML contents we have the concrete language/locale information and the localized fields are ending with underscore followed by the locale. E.g.: **content_en, content_de or text_en, text_de**. By default all the field mappings definied within the XSD of a resource type are extended by the **‘_<locale>’**.

## Multilingual dependency resolving ##
Based on the file name of a resource in OpenCms there exists a concept to index documents that are distributed over more than one resource in OpenCms. The standard implementation can be found at: 

**org.opencms.search.documents.CmsDocumentDependency**

## Extraction result cache ##
For better index performance the extracted result is cached for siblings

**@see org.opencms.search.extractors.I_CmsExtractionResult**


# Frequently asked questions #

## How is Solr integrated in general? ##

Independent from OpenCms a standard Solr Server offers a HTTP-Interface that is reachable
at: [Standard Solr Server URL](http://localhost:8983/solr/select). In order to query a Solr server you can attach each valid Solr query documented at: [Solr query syntax](http://wiki.apache.org/solr/SolrQuerySyntax) to this URL. The HTTP response can either be JSON or XML and the answer of the query
http://localhost:8983/solr/select?q=*:*&rows=2 could look like:

```xml
  
    
      0
      32
      
        *:*
        2
        0
     
     
       ...
       ...
     
  
```

Solr is implemented in Java and there exists an Apache library called solrj that enables to access a running Solr server by writing native Java code against this API. The Solr integration in OpenCms offers both Interfaces: Java and HTTP. The default URL for request Solr responses from OpenCms is:

**http://localhost:8080/opencms/opencms/handleSolrSelect**

this handler can answer any syntactically correct Solr query.

The following code shows a simple example how to use the OpenCms Java API to send a Solr query:

```jsp
//////////////////
// SEARCH START //
//////////////////

CmsObject cmsO = new CmsJspActionElement(pageContext, request, response).getCmsObject();

String query = ((request.getParameter("query") != null && request.getParameter("query") != "") 
				? "q=" + request.getParameter("query") : "")
                + "&fq=type:ddarticle&sort=path asc&rows=5";

CmsSolrResultList hits = OpenCms.getSearchManager().getIndexSolr("Solr Offline").search(cmsO, query);
if (hits.size() > 0) { %>
  New way: 
    <%= hits.getNumFound() %> found / rows <%= hits.getRows() %>
  

  <%
    //////////////////
    // RESULTS LOOP //
    //////////////////
    for (CmsSearchResource resource : hits) { %>
      
        
          Path: <%= resource.getRootPath() %>
          German: <%= resource.getDocument().getFieldValueAsString("Title_de")%>
          English: <%= resource.getDocument().getFieldValueAsString("Title_en")%>
        
       
    <% } %>
  

<% } %>
```

## How to sort text for specific languages? ##

In this example, text is sorted according to the default German rules provided by Java. The
rules for sorting German in Java are defined in a package called Java Locale.
Locales are typically defined as a combination of language and country, but you can specify just
the language if you want. For example, if you specify "de" as the language, you will get sorting
that works well for German language. If you specify "de" as the language and "CH" as the
country, you will get German sorting specifically tailored for Switzerland. You can see a list of
supported Locales [here](http://docs.oracle.com/javase/1.5.0/docs/guide/intl/locale.doc.html). And in order to get more general information about how text analysis is
working with Solr have a look at [Language Analysis](https://cwiki.apache.org/confluence/display/solr/Language+Analysis) page.

```xml


  
    
    
  

...


...


```

## How to highlight the search query in results? ##

### Does OpenCms support result highlighting? ###
Yes, use the OpenCms Solr Select handler at:

http://localhost:8080/opencms/opencms/handleSolrSelect

and you will find the highlighting section below the list of documents within the returned
XML/JSON:

```xml

  
    
      YIPI YOHO text text text
    
  
  [...]

```

### Does the Java API of OpenCms support highlighting? ###
Currently the OpenCms search API does not support full featured Solr highlighting. But you can
make use of the Solr default highlighting mechanism or course @see [1] or [2] and:

1. Call org.opencms.search.solr.CmsSolrResultList#getSolrQueryResponse() that returns a
SolrQueryResponse that is documented at: http://lucene.apache.org/solr/api-
3_6_1/org/apache/solr/response/SolrQueryResponse.html

2. Or you can use the above mentioned OpenCms Solr Select handler at:
localhost:8080/opencms/opencms/handleSolrSelect

### Is highlighting a performance killer? ###
Yes, for this reason highlighting is turned off before the first search is executed. After all not
permitted resources are filtered out of the result list, the highlighting is performed again.


## Solr indexing questions ##

### Please explain the differences between the "Solr Online and Offline"? ###
Please explain the differences between the "Solr Online and Offline"?
As the name of the indexes let assume Offline indexes are also containing changes that have
not yet been published and Online indexes only contain thoses resources that have already
been published. The "Online EN VFS" is a Lucene based index and also contains only those
resources that have been published.

### When executing a Solr query, does only the solr index get used? ###
No, permissions are checked by OpenCms API afterwards.
14.5.3.7 Where to find general information about Solr?
If you are interested in Solr in general the Solr wiki is a good starting point:
http://wiki.apache.org/solr/ The Documentation from CMS side you will find within the distributed
PDF file.

### Is there a way to create a full backup of the complete index? ###
You can copy the index folder 'WEB-INF/index/${INDEX_NAME}’' by hand.

### How to rebuild indexes with a fail-safe? ###
Edit the opencms-search.xml within your WEB-INF/config directory and add the following node
to your index:

```xml
true
```

This will create a snapshot as explained here:

http://wiki.apache.org/solr/CollectionDistribution

### Solr result size gets limited to 50 by default, how to get more than 50 results? ###
In order to return only permission checked resources (what is an expensive task) we only return
this limited number of results. For paging over results please have a look at at the Solr
parameters: rows and start: http://wiki.apache.org/solr/CommonQueryParameters

Since version 8.5.x you can increase the resulting documents to a size of your choice.


## Solr mailing list questions ##

### A class cast exception is thrown, what can I do? ###
You have to set the right classes for the index, and the field configuration otherwise the Lucene
search index implementation is used.
```xml
[...]
  [...]

```

### Is it possible to map elements with maxoccurres > 1? ###
Since version >= 8.5.1 they are mapped to a multivalved field.

### How to index OpenCmsDateTime elements? ###
```xml

  

```
This XSD search field mapping will result in multiple Solr fields (one per locale): arelease_<locale>_dt


# Solr development references #

## General documentation ##
- [The official Apache Solr documentation page](http://lucene.apache.org/solr/documentation.html)
- [Apache Solr Reference Guide from lucidworks](http://docs.lucidworks.com/display/solr/Apache+Solr+Reference+Guide)
- [SearchHub as another information source for Apache Solr related topics](http://searchhub.org)

## Performance guides ##
- [Apache Solr performance Wiki page](http://wiki.apache.org/solr/SolrPerformanceFactors)
- [The Seven Deadly Sins of Solr](http://searchhub.org/2010/01/21/the-seven-deadly-sins-of-solr/)

## Spellchecker configuration ##
- [Apache Solr Spellchecker and Configuration](http://www.arunchinnachamy.com/apache-solr-spellchecker-configuration/)
- [Super flexible AutoComplete with Apache Solr](http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/)

## External data sources ##
- [The "Data Import Request Handler" to adapt external data](http://wiki.apache.org/solr/DataImportHandler)

## Permission architecture ##
- [How Nuxeo indexes ACLs](https://github.com/nuxeo/nuxeo-solr/tree/master/architecture)
org.opencms.search.solr.README.md Maven / Gradle / Ivy

Solr Collector Demo

${content.value.Title}

New way: <%= hits.getNumFound() %> found / rows <%= hits.getRows() %>