All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.apache.lucene.search.spans.package.html Maven / Gradle / Ivy

There is a newer version: 2024.11.18751.20241128T090041Z-241100
Show newest version





The calculus of spans.

A span is a <doc,startPosition,endPosition> tuple.

The following span query operators are implemented:

  • A {@link org.apache.lucene.search.spans.SpanTermQuery SpanTermQuery} matches all spans containing a particular {@link org.apache.lucene.index.Term Term}.
  • A {@link org.apache.lucene.search.spans.SpanNearQuery SpanNearQuery} matches spans which occur near one another, and can be used to implement things like phrase search (when constructed from {@link org.apache.lucene.search.spans.SpanTermQuery}s) and inter-phrase proximity (when constructed from other {@link org.apache.lucene.search.spans.SpanNearQuery}s).
  • A {@link org.apache.lucene.search.spans.SpanOrQuery SpanOrQuery} merges spans from a number of other {@link org.apache.lucene.search.spans.SpanQuery}s.
  • A {@link org.apache.lucene.search.spans.SpanNotQuery SpanNotQuery} removes spans matching one {@link org.apache.lucene.search.spans.SpanQuery SpanQuery} which overlap (or comes near) another. This can be used, e.g., to implement within-paragraph search.
  • A {@link org.apache.lucene.search.spans.SpanFirstQuery SpanFirstQuery} matches spans matching q whose end position is less than n. This can be used to constrain matches to the first part of the document.
  • A {@link org.apache.lucene.search.spans.SpanPositionRangeQuery SpanPositionRangeQuery} is a more general form of SpanFirstQuery that can constrain matches to arbitrary portions of the document.
In all cases, output spans are minimally inclusive. In other words, a span formed by matching a span in x and y starts at the lesser of the two starts and ends at the greater of the two ends.

For example, a span query which matches "John Kerry" within ten words of "George Bush" within the first 100 words of the document could be constructed with:

SpanQuery john   = new SpanTermQuery(new Term("content", "john"));
SpanQuery kerry  = new SpanTermQuery(new Term("content", "kerry"));
SpanQuery george = new SpanTermQuery(new Term("content", "george"));
SpanQuery bush   = new SpanTermQuery(new Term("content", "bush"));

SpanQuery johnKerry =
   new SpanNearQuery(new SpanQuery[] {john, kerry}, 0, true);

SpanQuery georgeBush =
   new SpanNearQuery(new SpanQuery[] {george, bush}, 0, true);

SpanQuery johnKerryNearGeorgeBush =
   new SpanNearQuery(new SpanQuery[] {johnKerry, georgeBush}, 10, false);

SpanQuery johnKerryNearGeorgeBushAtStart =
   new SpanFirstQuery(johnKerryNearGeorgeBush, 100);

Span queries may be freely intermixed with other Lucene queries. So, for example, the above query can be restricted to documents which also use the word "iraq" with:

Query query = new BooleanQuery();
query.add(johnKerryNearGeorgeBushAtStart, true, false);
query.add(new TermQuery("content", "iraq"), true, false);




© 2015 - 2024 Weber Informatics LLC | Privacy Policy