org.apache.solr.cloud.api.collections.OverseerStatusCmd Maven / Gradle / Ivy
Show all versions of solr-core Show documentation
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.solr.cloud.api.collections;
import com.codahale.metrics.Timer;
import java.lang.invoke.MethodHandles;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import org.apache.solr.cloud.OverseerTaskProcessor;
import org.apache.solr.cloud.Stats;
import org.apache.solr.common.cloud.ClusterState;
import org.apache.solr.common.cloud.ZkNodeProps;
import org.apache.solr.common.cloud.ZkStateReader;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.common.util.SimpleOrderedMap;
import org.apache.solr.util.stats.MetricUtils;
import org.apache.zookeeper.data.Stat;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
/**
* This command returns stats about the Overseer, the cluster state updater and collection API
* activity occurring within the current Overseer node (this is important because distributed
* operations occurring on other nodes are not included in these stats, for example
* distributed cluster state updates or Per Replica States updates).
*
* More fundamentally, when the Collection API command execution is distributed, this specific
* command is not being run on the Overseer anyway (but then not much is running on the
* overseer as cluster state updates are distributed as well) so Overseer stats and status can't be
* returned and actually do not even make sense. Zookeeper based queue metrics do not make sense
* either because Zookeeper queues are then not used.
*
*
The {@link Stats} instance returned by {@link CollectionCommandContext#getOverseerStats()}
* when running in the Overseer is created in Overseer.start() and passed to the cluster state
* updater from where it is also propagated to the various Zookeeper queues to register various
* events. This class is the only place where it is used in the Collection API implementation, and
* only to return results.
*
*
TODO: create a new command returning node specific Collection API/Config set API/cluster state
* updates stats such as success and failures?
*
*
The structure of the returned results is as follows:
*
*
* - {@code leader}: {@code ID} of the current overseer leader node
*
- {@code overseer_queue_size}: count of entries in the {@code /overseer/queue}
* Zookeeper queue/directory
*
- {@code overseer_work_queue_size}: count of entries in the {@code
* /overseer/queue-work} Zookeeper queue/directory
*
- {@code overseer_collection_queue_size}: count of entries in the {@code
* /overseer/collection-queue-work} Zookeeper queue/directory
*
- {@code overseer_operations}: map (of maps) of success and error counts for
* operations. The operations (keys) tracked in this map are:
*
* - {@code am_i_leader} (Overseer checking it is still the elected Overseer as it
* processes cluster state update messages)
*
- {@code configset_}{@code
}
* - Cluster state change operation names from {@link
* org.apache.solr.common.params.CollectionParams.CollectionAction} (not all of them!)
* and {@link org.apache.solr.cloud.overseer.OverseerAction} (the complete list: {@code
* create}, {@code delete}, {@code createshard}, {@code deleteshard}, {@code
* addreplica}, {@code addreplicaprop}, {@code deletereplicaprop}, {@code
* balanceshardunique}, {@code modifycollection}, {@code state}, {@code leader}, {@code
* deletecore}, {@code addroutingrule}, {@code removeroutingrule}, {@code
* updateshardstate}, {@code downnode} and {@code quit} with this last one unlikely to
* be observed since the Overseer is exiting right away)
*
- {@code update_state} (when Overseer cluster state updater persists changes in
* Zookeeper)
*
* For each key, the value is a map composed of:
*
* - {@code requests}: success count of the given operation
*
- {@code errors}: error count of the operation
*
- More metrics (see below)
*
* - {@code collection_operations}: map (of maps) of success and error counts for
* collection related operations. The operations(keys) tracked in this map are all
* operations that start with {@code collection_}, but the {@code collection_} prefix is
* stripped of the returned value. Possible keys are therefore:
*
* - {@code am_i_leader}: originating in a stat called {@code collection_am_i_leader}
* representing Overseer checking it is still the elected Overseer as it processes
* Collection API and Config Set API messages.
*
- Collection API operation names from {@link
* org.apache.solr.common.params.CollectionParams.CollectionAction} (the stripped {@code
* collection_} prefix gets added in {@link
* OverseerCollectionMessageHandler#getTimerName(String)})
*
* For each key, the value is a map composed of:
*
* - {@code requests}: success count of the given operation
*
- {@code errors}: error count of the operation
*
- {@code recent_failures}: an optional entry containing a list of maps, each map
* having two entries, one with key {@code request} with a failed request properties (a
* {@link ZkNodeProps}) and the other with key {@code response} with the corresponding
* response properties (a {@link org.apache.solr.client.solrj.SolrResponse}).
*
- More metrics (see below)
*
* - {@code overseer_queue}: metrics on operations done on the Zookeeper queue {@code
* /overseer/queue} (see metrics below).
* The operations that can be done on the queue and that can be keys whose values are a
* metrics map are:
*
* - {@code offer}
*
- {@code peek}
*
- {@code peek_wait}
*
- {@code peek_wait_forever}
*
- {@code peekTopN_wait}
*
- {@code peekTopN_wait_forever}
*
- {@code poll}
*
- {@code remove}
*
- {@code remove_event}
*
- {@code take}
*
* - {@code overseer_internal_queue}: same as above but for queue {@code
* /overseer/queue-work}
*
- {@code collection_queue}: same as above but for queue {@code
* /overseer/collection-queue-work}
*
*
* Maps returned as values of keys in {@code overseer_operations}, {@code
* collection_operations}, {@code overseer_queue}, {@code overseer_internal_queue}
* and {@code collection_queue} include additional stats. These stats are provided by {@link
* MetricUtils}, and represent metrics on each type of operation execution (be it failed or
* successful), see calls to {@link Stats#time(String)}. The metric keys are:
*
*
* - {@code avgRequestsPerSecond}
*
- {@code 5minRateRequestsPerSecond}
*
- {@code 15minRateRequestsPerSecond}
*
- {@code avgTimePerRequest}
*
- {@code medianRequestTime}
*
- {@code 75thPcRequestTime}
*
- {@code 95thPcRequestTime}
*
- {@code 99thPcRequestTime}
*
- {@code 999thPcRequestTime}
*
*/
public class OverseerStatusCmd implements CollApiCmds.CollectionApiCommand {
private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
private final CollectionCommandContext ccc;
public OverseerStatusCmd(CollectionCommandContext ccc) {
this.ccc = ccc;
}
@Override
public void call(ClusterState state, ZkNodeProps message, NamedList