let.bullet-storm.1.3.2.source-code.bullet_storm_defaults.yaml Maven / Gradle / Ivy
Go to download
Show more of this group Show more artifacts with this name
Show all versions of bullet-storm Show documentation
Show all versions of bullet-storm Show documentation
This is the implementation of Bullet - a real-time query engine - in Apache Storm.
The newest version!
# The name of the Storm topology
bullet.topology.name: "bullet-topology"
# Enable metrics collection for the topology. The registered metrics classes under bullet.topology.metrics.classes are loaded.
# This enables the default storm built-in metrics. See: http://storm.apache.org/releases/1.0.1/Metrics.html
# Depending on your cluster configuration, things such as on/off heap usage, cpu usage, garbage collection time/types
# send/receive queue sizes and overflows, execute/ack/fail counts and other such system metrics can be obtained.
bullet.topology.metrics.enable: false
# Enable the collection of the Bullet built-in metrics. These will be sent to your metrics classes. This is in addition
# to the storm built-in metrics. Currently supported are:
# Counting metrics: These behave like org.apache.storm.metric.api.CountMetric but do not reset their value when getValueAndReset()
# is called. This custom metric is defined in com.yahoo.bullet.storm.metric.AbsoluteCountMetric.
# 1. bullet_active_queries : A metric that counts the number of queries active per JoinBolt. Note that this metric will
# increase and decrease as queries come and go. Make sure to configure the emit period accordingly
# otherwise short running queries may not be even counted.
# 2. bullet_created_queries : A metric that counts the number of queries created per JoinBolt.
# 3. bullet_improper_queries : A metric that counts the number of semantically incorrect queries created per JoinBolt.
# 4. bullet_rate_exceeded_queries : A metric that counts the number of queries that exceeded the configured rate limit.
# 5. bullet_batched_queries : A metric that counts the number of queries stored in the ReplayBolt(s).
# 6. bullet_active_replays : A metric that counts the number of active replays processed by the QuerySpout(s).
# Map counting metrics: These behave similar to the above counting metrics but keep a map of keys and their associated counts
# rather than a single count. This custom metric is defined in com.yahoo.bullet.storm.metric.MapCountMetric.
# 1. bullet_active_replays : A metric that counts the number of active replays in the ReplayBolt(s) per component being
# replayed to (where the keys are the component task IDs)
# 2. bullet_created_replays : A metric that counts the number of created replays in the ReplayBolt(s) per component being
# replayed to (where the keys are the component task IDs)
# Averaging metrics: These are org.apache.storm.metric.api.ReducedMetrics
# 1. bullet_filter_latency: A metric that computes the average latency to filter a single record in the Filter Bolt. The latency is
# subtracted from record tuple that the Filter Bolts receive from your Data Source component. If the
# record tuple is of the format (record, timestamp), then Bullet WILL try to read your timestamp as a Long,
# and subtract it from the time when it finishes filtering a record to compute the filtering latency. You
# will get an RuntimeException if your tuple contains more than just the record but the second position
# is not a Long.
bullet.topology.metrics.built.in.enable: false
# The time period mapping from metric name to seconds for emitting the metric. This is passed to Storm when the metrics are registered.
# A default value for all metrics can also be set by using the "default" key.
bullet.topology.metrics.built.in.emit.interval.mapping:
bullet_active_queries: 10
default: 10
# You can list full package prefixed class path to your custom implementation. The class must extend
# org.apache.storm.metric.api.IMetricsConsumer. Additionally, it must provide a static method with signature
# public static void register(Config, BulletStormConfig), where Config is of type org.apache.storm.Config and
# BulletStormConfig is of type com.yahoo.bullet.BulletStormConfig.
# Your register method is responsible for calling registerMetricsConsumer to add itself to the provided
# Config. You can add settings such as your argument for the prepare method call, or your parallelism for the
# IMetricsConsumer in this register method. Any custom metrics that you have should also be added to Config by
# adding (not replacing) to the Config.TOPOLOGY_WORKER_METRICS map. For an example, see the implementation of
# of the com.yahoo.bullet.storm.SigarLoggingMetricsConsumer.
bullet.topology.metrics.classes:
# This uses the LoggingMetricsConsumer and collects a CPU metric using org.apache.storm.metrics.sigar.CPUMetric. It
# runs with a parallelism of 1. The CPU metric is a worker level metric.
- "com.yahoo.bullet.storm.SigarLoggingMetricsConsumer"
# Enable query replay for the topology. If query replay is enabled, filter and join bolts will request replay when they
# start up. Replay bolts will send stored queries in batches to the bolts that need replay.
bullet.topology.replay.enable: false
# Interval in milliseconds at which bolts send query replay requests. Replay requests are sent repeatedly in the event
# that the signal is missed or replay fails. These requests are sent until replay is completed.
bullet.topology.replay.request.interval: 30000
# The max number of queries sent per batch during replay.
bullet.topology.replay.batch.size: 10000
# Enable compression of query batches.
bullet.topology.replay.batch.compress.enable: false
# The following CPU loads and memory on and off heap control their respective component's CPU
# and memory configuration. These settings are only used for the RAS scheduler.
# The parallelism setting controls the number of executors used for each component.
bullet.topology.dsl.spout.enable: false
bullet.topology.dsl.spout.parallelism: 10
bullet.topology.dsl.spout.cpu.load: 50.0
bullet.topology.dsl.spout.memory.on.heap.load: 256.0
bullet.topology.dsl.spout.memory.off.heap.load: 160.0
bullet.topology.dsl.bolt.enable: false
bullet.topology.dsl.bolt.parallelism: 10
bullet.topology.dsl.bolt.cpu.load: 50.0
bullet.topology.dsl.bolt.memory.on.heap.load: 256.0
bullet.topology.dsl.bolt.memory.off.heap.load: 160.0
bullet.topology.dsl.deserializer.enable: false
bullet.topology.bullet.spout.class.name:
bullet.topology.bullet.spout.args: []
bullet.topology.bullet.spout.parallelism: 10
bullet.topology.bullet.spout.cpu.load: 50.0
bullet.topology.bullet.spout.memory.on.heap.load: 256.0
bullet.topology.bullet.spout.memory.off.heap.load: 160.0
bullet.topology.query.spout.parallelism: 2
bullet.topology.query.spout.cpu.load: 20.0
bullet.topology.query.spout.memory.on.heap.load: 256.0
bullet.topology.query.spout.memory.off.heap.load: 160.0
bullet.topology.tick.spout.cpu.load: 20.0
bullet.topology.tick.spout.memory.on.heap.load: 128.0
bullet.topology.tick.spout.memory.off.heap.load: 160.0
bullet.topology.filter.bolt.parallelism: 16
bullet.topology.filter.bolt.cpu.load: 100.0
bullet.topology.filter.bolt.memory.on.heap.load: 256.0
bullet.topology.filter.bolt.memory.off.heap.load: 160.0
bullet.topology.join.bolt.parallelism: 2
bullet.topology.join.bolt.cpu.load: 100.0
bullet.topology.join.bolt.memory.on.heap.load: 512.0
bullet.topology.join.bolt.memory.off.heap.load: 160.0
bullet.topology.result.bolt.parallelism: 2
bullet.topology.result.bolt.cpu.load: 20.0
bullet.topology.result.bolt.memory.on.heap.load: 256.0
bullet.topology.result.bolt.memory.off.heap.load: 160.0
bullet.topology.loop.bolt.parallelism: 2
bullet.topology.loop.bolt.cpu.load: 20.0
bullet.topology.loop.bolt.memory.on.heap.load: 256.0
bullet.topology.loop.bolt.memory.off.heap.load: 160.0
bullet.topology.replay.bolt.parallelism: 2
bullet.topology.replay.bolt.cpu.load: 100.0
bullet.topology.replay.bolt.memory.on.heap.load: 256.0
bullet.topology.replay.bolt.memory.off.heap.load: 160.0
# This is the number of ticks after which the partitioning (if you use one - see bullet.query.partitioner.enable and
# bullet.query.partitioner.class.name) stats will be reported. The defaults values for this and the tick interval (see
# bullet.topology.tick.spout.interval.ms) makes this every 5 minutes (3000 ticks * 100 ms/tick)
bullet.topology.filter.bolt.stats.report.ticks: 3000
# This is the number of ticks for which a query will be buffered past its expiry in order to wait for
# the final windows/results to trickle in from the Filter Bolts. If tuples from your your Filter Bolts take longer than
# (bullet.topology.tick.spout.interval.ms * this setting) to get to the Join Bolt, that data will be LOST and will not
# be in the final result for the query. If you have bullet.query.window.disable set to true, then this is the value
# that is used in waiting for your final (and only) intermediate results to arrive from your Filter Bolts.
# The defaults make this (100ms * 3) or ~0.3s after the query is done for the final results to come back.
# All queries without windows or with record based windows are buffered past their expiry using this.
bullet.topology.join.bolt.query.post.finish.buffer.ticks: 3
# This is the number of ticks for which a query will be delayed in the Join Bolts before it is marked as started. This
# delay in the Join Bolt offsets the windows in this stage so that windows from the Filter Bolts arrive slightly before
# the window is marked as closed in the Join Bolt. This only applies to queries with windows that are time-based.
# If tuples from your Filter Bolts arrive sooner than this (bullet.topology.tick.spout.interval.ms * this setting), that
# data is lost. For this reason, you should make sure the bullet.query.window.min.emit.every.ms is at least twice this.
# The defaults make this (100ms * 2) or ~0.2s.
bullet.topology.join.bolt.query.pre.start.delay.ticks: 2
# Bullet Storm uses ticks underneath the hood as a "clock"-like mechanism to do metadata and query updates (checking if queries
# have expired) etc. This is particularly relevant if your data volume is low since your queries depend on these ticks to
# emit data etc appropriately. This setting controls the how frequently a tick happens - number of milliseconds between ticks.
# The bullet.query.window.min.emit.every should be several times greater than this value. See
# bullet.topology.join.bolt.query.pre.start.delay.ticks. Also, this cannot be lower than 10 ms.
bullet.topology.tick.spout.interval.ms: 100
# The Loop Bolt switches your PubSub into QUERY_SUBMISSION mode to loop back metadata messages to the rest of the
# topology. If you need to change settings for your publisher in this mode that is different from the settings
# used in the ResultBolt, override them here. They will be merged into the config and replace the existing settings in
# your Loop Bolts. This setting needs to be a Map if provided. The example below pretends that your PubSub settings
# start with bullet.pubsub.custom. You will provide yours.
# Example:
#
# bullet.topology.loop.bolt.pubsub.overrides:
# bullet.pubsub.custom.publisher.setting: 1
# bullet.pubsub.custom.nested.publisher.setting:
# foo: bar
# bar: baz
bullet.topology.loop.bolt.pubsub.overrides: {}
# Treats the Bullet connector provided to the DSL Spout (using bullet.dsl.connector.class.name) as a SpoutConnector
# and uses the provided class (provided using bullet.topology.dsl.spout.connector.class.name) as the spout to load for
# the SpoutConnector. This exists so that you can use the DSL Spout with an existing spout that you do not wish to
# modify AND still hook in DSL components like a BulletRecordConverter or a BulletDeserializer (with or without a
# DSLBolt) . However, the SpoutConnector you pass with (bullet.dsl.connector.class.name) must implement the read
# interface to read data from your spout.
bullet.topology.dsl.spout.connector.as.spout.enable: false
# The name of the Spout to load if you are using a SpoutConnector as a Spout in your DSL Spout.
bullet.topology.dsl.spout.connector.class.name: null
########################################################################################################################
################################################# Custom Storm Settings ################################################
########################################################################################################################
# You can add custom storm settings if you want by prefixing them with "bullet.topology.custom."
# They will be added as is to the Storm config when submitting the topology.
# You can also set Storm settings by adding them a "-c setting=value" when submitting your job with "storm jar ..."
########################################################################################################################
######################################### PubSub and PubSub Storm DRPC defaults ########################################
########################################################################################################################
# This is the type of PubSub context to use for the Result Bolt and the Query Spout. The Loop Bolt uses QUERY_SUBMISSION
# since it submits messages using the query publisher of your pubsub.
bullet.pubsub.context.name: "QUERY_PROCESSING"
# This is the name of the concrete implementation of PubSub to use. By default, this is the DRPC implementation of PubSub
# that comes with bullet-storm.
############################################# IMPORTANT! ###################################################
# The DRPC pubsub does not support windowing currently. You should set
# bullet.query.window.disable: true
# if you are using this pubsub. This setting will also cause the topology to not hook in the Loop Bolt. This is required.
############################################# IMPORTANT! ###################################################
bullet.pubsub.class.name: "com.yahoo.bullet.storm.drpc.DRPCPubSub"
# Common DRPC PubSub Settings
# This is your DRPC servers' hostnames.
bullet.pubsub.storm.drpc.servers:
- "127.0.0.1"
# This is your DRPC function name (a namespace since DRPC is shared in your Storm cluster) to use. Queries and Results
# will be sent to and received from here.
bullet.pubsub.storm.drpc.function: "bullet-query"
# QUERY_SUBMISSION DRPC PubSub Settings
# The following settings form the URL to send the queries to. If you have multiple servers defined for
# bullet.pubsub.storm.drpc.servers, a query will be randomly POSTed to one of them. The URL will be of the form:
# ${drpc.http.protocol}://${random.server}:${drpc.http.port}/${drpc.http.path}/${drpc.function}
bullet.pubsub.storm.drpc.http.protocol: "http"
bullet.pubsub.storm.drpc.http.port: 3774
bullet.pubsub.storm.drpc.http.path: "drpc"
# This is the timeout in ms for connecting to the DRPC server to send a query.
bullet.pubsub.storm.drpc.http.connect.timeout.ms: 5000
# This is number of times to retry when connecting to the DRPC server to send a query.
bullet.pubsub.storm.drpc.http.connect.retry.limit: 3
# QUERY_PROCESSING DRPC PubSub Settings
# This is the maximum number of pending queries that can be read by a subscriber (QuerySpout) before an ack is needed.
bullet.pubsub.storm.drpc.max.uncommitted.messages: 50
########################################################################################################################
########################################################################################################################
## You can also configure the core Bullet settings here. For documentation and defaults for those settings, refer to:
## https://github.com/yahoo/bullet-core/blob/master/src/main/resources/bullet_defaults.yaml
## Remember to use bullet.query.window.disable: true if you're using the DRPCPubSub!
########################################################################################################################
########################################################################################################################