org.apache.drill.exec.resourcemgr.config.package-info Maven / Gradle / Ivy

Go to download
/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
/**
 * This package contains the configuration components of ResourceManagement feature in Drill. ResourceManagement will
 * have it's own configuration file supporting the similar hierarchy of files as supported by Drill's current
 * configuration and supports HOCON format. All the supported files for ResourceManagement is listed in
 * {@link org.apache.drill.common.config.ConfigConstants}. However whether the feature is enabled/disabled is still
 * controlled by a configuration {@link org.apache.drill.exec.ExecConstants#RM_ENABLED} available in the Drill's main
 * configuration file. The rm config files will be parsed and loaded only when the feature is enabled. The
 * configuration is a hierarchical tree {@link org.apache.drill.exec.resourcemgr.config.ResourcePoolTree} of
 * {@link org.apache.drill.exec.resourcemgr.config.ResourcePool}. At the top will be the root pool which represents
 * the entire resources (only memory in version 1) which is available to ResourceManager to use for admitting queries.
 * It is assumed that all the nodes in the Drill cluster is homogeneous and given same amount of memory resources.
 * The root pool can be further divided into child ResourcePools to divide the resources among multiple child pools.
 * Each child pool get's a resource share from it's parent resource pool. In theory there is no limit on the number
 * of ResourcePools that can be configured to divide the cluster resources.
 * 
 *   In addition to other parameters defined later root ResourcePool also supports a configuration
 * {@link org.apache.drill.exec.resourcemgr.config.ResourcePoolTreeImpl#ROOT_POOL_QUEUE_SELECTION_POLICY_KEY} which
 * helps to select exactly one leaf pool out of all the possible options available for a query. For details please
 * see package-info.java of {@link org.apache.drill.exec.resourcemgr.config.selectionpolicy.QueueSelectionPolicy}.
 * {@link org.apache.drill.exec.resourcemgr.config.ResourcePoolTree#selectOneQueue(org.apache.drill.exec.ops.QueryContext,
 * org.apache.drill.exec.resourcemgr.NodeResources)} method is used by parallelizer to get a queue which will be used
 * to admit a query. The selected queue resource constraints are used by parallelizer to allocate proper resources
 * to a query so that it remains within the bounds.
 * 
 * 
 *   The ResourcePools falls under 2 category:
 * 

 *   Intermediate Pool: As the name suggests all the pools between root and leaf pool falls under this
 *   category. It helps to navigate a query through the ResourcePoolTree hierarchy to find leaf pools using selectors.
 *   The intermediate ResourcePool help to subdivide a parent resource pool resource and doesn't have an actual queue
 *   associated with it. A query will only be executed in a queue associated with a ResourcePool not the ResourcePool
 *   itself.
 *   
 *   Leaf Pool: All the ResourcePools which doesn't have any child pools associated with it are leaf
 *   ResourcePools. All the leaf pools should have a unique name associated with it and should always have exactly one
 *   queue configured with it. The queue of a leaf pool is where the queries will be admitted and a resource slice will
 *   be given to it. All the leaf ResourcePools will collectively comprise of all the resource share available to
 *   Drill's ResourceManager to allocate to all the queries.
 *   
 * 
 * Configurations Supported by ResourcePool:
 *   
 *     {@link org.apache.drill.exec.resourcemgr.config.ResourcePoolImpl#POOL_MEMORY_SHARE_KEY}: Percentage of
 *     memory share of parent ResourcePool assigned to this pool
 *     {@link org.apache.drill.exec.resourcemgr.config.ResourcePoolImpl#POOL_SELECTOR_KEY}: A selector assigned
 *     to this pool. For details please see package-info.java of
 *     {@link org.apache.drill.exec.resourcemgr.config.selectors.ResourcePoolSelector}
 *     
 *     {@link org.apache.drill.exec.resourcemgr.config.ResourcePoolImpl#POOL_QUEUE_KEY}: Queue configuration
 *     associated with this pool. It should always be configured for a leaf pool only. If configured with an
 *     intermediate pool then it will be ignored.
 *     
 *   
 * 
 * 
 *   A queue always have 1:1 relationship with a leaf pool. Queries are admitted and executed with a resource slice
 *   from the queue. It supports following configurations:
 *   

 *     {@link org.apache.drill.exec.resourcemgr.config.QueryQueueConfigImpl#MAX_ADMISSIBLE_KEY}: Upper bound on the
 *     total number of queries that can be admitted inside a queue. After this limit is reached all the queries
 *     will be moved to waiting state.
 *     {@link org.apache.drill.exec.resourcemgr.config.QueryQueueConfigImpl#MAX_WAITING_KEY}: Limits the
 *     total number of queries that can be in waiting state inside a queue. After this limit is reached all the new
 *     queries will be failed immediately.
 *      {@link org.apache.drill.exec.resourcemgr.config.QueryQueueConfigImpl#MAX_QUERY_MEMORY_PER_NODE_KEY}:
 *     Limits the maximum memory any query in this queue can consume on any node in the cluster. This is to limit a
 *     query from a queue to consume all the resources on a node so that other queues query can also have some
 *     resources available for it. Ideally it's advised that sum of value of this parameter for all queues should not
 *     exceed the total memory on a node.
 *     
 *      {@link org.apache.drill.exec.resourcemgr.config.QueryQueueConfigImpl#WAIT_FOR_PREFERRED_NODES_KEY}: This
 *     configuration helps to decide if an admitted query in a queue should wait until it has available resources on all
 *     the nodes assigned to it by planner for its execution. By default it's true. When set to false then for the nodes
 *     which doesn't have available resources for a query will be replaced with another node with enough resources.
 *     
 *   
 * 
 * Once all the configuration are parsed an in-memory structures are created then for each query planner will select
 * a queue where a query can be admitted. The queue selection process happens by traversing the ResourcePoolTree. During
 * traversal process the query metadata is evaluated against assigned selector of a ResourcePool. If the selector
 * returns true then traversal continues to it's child pools otherwise it stops there and tries another pool. With
 * the traversal it finds all the leaf pools which are eligible for admitting the query and store that information in
 * {@link org.apache.drill.exec.resourcemgr.config.QueueAssignmentResult}. Later the selected pools are passed to
 * configured QueueSelectionPolicy to select one queue for the query. Planner uses that selected queue's max query
 * memory per node parameter to limit resource assignment to all the fragments of a query on a node. After a query is
 * planned with resource constraints it is sent to leader of that queue to ask for admission. If admitted the query
 * required resources are reserved in global state store and query is executed on the cluster. For details please see
 * the design document and functional spec linked in 
 * DRILL-7026
 */
package org.apache.drill.exec.resourcemgr.config;