diff --git a/docs/35.0.0/api-reference/api-reference.md b/docs/35.0.0/api-reference/api-reference.md new file mode 100644 index 0000000000..dd4a4ab638 --- /dev/null +++ b/docs/35.0.0/api-reference/api-reference.md @@ -0,0 +1,44 @@ +--- +id: api-reference +title: API reference +sidebar_label: Overview +--- + + + + +This topic is an index to the Apache Druid API documentation. + +## HTTP APIs +* [Druid SQL queries](./sql-api.md) to submit SQL queries using the Druid SQL API. +* [SQL-based ingestion](./sql-ingestion-api.md) to submit SQL-based batch ingestion requests. +* [JSON querying](./json-querying-api.md) to submit JSON-based native queries. +* [Tasks](./tasks-api.md) to manage data ingestion operations. +* [Supervisors](./supervisor-api.md) to manage supervisors for data ingestion lifecycle and data processing. +* [Retention rules](./retention-rules-api.md) to define and manage data retention rules across datasources. +* [Data management](./data-management-api.md) to manage data segments. +* [Automatic compaction](./automatic-compaction-api.md) to optimize segment sizes after ingestion. +* [Lookups](./lookups-api.md) to manage and modify key-value datasources. +* [Service status](./service-status-api.md) to monitor components within the Druid cluster. +* [Dynamic configuration](./dynamic-configuration-api.md) to configure the behavior of the Coordinator and Overlord processes. +* [Legacy metadata](./legacy-metadata-api.md) to retrieve datasource metadata. + +## Java APIs +* [SQL JDBC driver](./sql-jdbc.md) to connect to Druid and make Druid SQL queries using the Avatica JDBC driver. \ No newline at end of file diff --git a/docs/35.0.0/api-reference/automatic-compaction-api.md b/docs/35.0.0/api-reference/automatic-compaction-api.md new file mode 100644 index 0000000000..f3744a45f0 --- /dev/null +++ b/docs/35.0.0/api-reference/automatic-compaction-api.md @@ -0,0 +1,1592 @@ +--- +id: automatic-compaction-api +title: Automatic compaction API +sidebar_label: Automatic compaction +--- +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + + + +This topic describes the status and configuration API endpoints for [automatic compaction using Coordinator duties](../data-management/automatic-compaction.md#auto-compaction-using-coordinator-duties) in Apache Druid. You can configure automatic compaction in the Druid web console or API. + +:::info[Experimental] + +Instead of the automatic compaction API, you can use the supervisor API to submit auto-compaction jobs using compaction supervisors. For more information, see [Auto-compaction using compaction supervisors](../data-management/automatic-compaction.md#auto-compaction-using-compaction-supervisors). + +::: + +In this topic, `http://ROUTER_IP:ROUTER_PORT` is a placeholder for your Router service address and port. Replace it with the information for your deployment. For example, use `http://localhost:8888` for quickstart deployments. + +## Manage automatic compaction + +### Create or update automatic compaction configuration + +Creates or updates the automatic compaction configuration for a datasource. Pass the automatic compaction as a JSON object in the request body. + +The automatic compaction configuration requires only the `dataSource` property. Druid fills all other properties with default values if not specified. See [Automatic compaction dynamic configuration](../configuration/index.md#automatic-compaction-dynamic-configuration) for configuration details. + +Note that this endpoint returns an HTTP `200 OK` message code even if the datasource name does not exist. + +#### URL + +`POST` `/druid/coordinator/v1/config/compaction` + +#### Responses + + + + + + +*Successfully submitted auto compaction configuration* + + + + +--- +#### Sample request + +The following example creates an automatic compaction configuration for the datasource `wikipedia_hour`, which was ingested with `HOUR` segment granularity. This automatic compaction configuration performs compaction on `wikipedia_hour`, resulting in compacted segments that represent a day interval of data. + +In this example: + +* `wikipedia_hour` is a datasource with `HOUR` segment granularity. +* `skipOffsetFromLatest` is set to `PT0S`, meaning that no data is skipped. +* `partitionsSpec` is set to the default `dynamic`, allowing Druid to dynamically determine the optimal partitioning strategy. +* `type` is set to `index_parallel`, meaning that parallel indexing is used. +* `segmentGranularity` is set to `DAY`, meaning that each compacted segment is a day of data. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/config/compaction"\ +--header 'Content-Type: application/json' \ +--data '{ + "dataSource": "wikipedia_hour", + "skipOffsetFromLatest": "PT0S", + "tuningConfig": { + "partitionsSpec": { + "type": "dynamic" + }, + "type": "index_parallel" + }, + "granularitySpec": { + "segmentGranularity": "DAY" + } +}' +``` + + + + + +```HTTP +POST /druid/coordinator/v1/config/compaction HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json +Content-Length: 281 + +{ + "dataSource": "wikipedia_hour", + "skipOffsetFromLatest": "PT0S", + "tuningConfig": { + "partitionsSpec": { + "type": "dynamic" + }, + "type": "index_parallel" + }, + "granularitySpec": { + "segmentGranularity": "DAY" + } +} +``` + + + + +#### Sample response + +A successful request returns an HTTP `200 OK` message code and an empty response body. + + +### Remove automatic compaction configuration + +Removes the automatic compaction configuration for a datasource. This updates the compaction status of the datasource to "Not enabled." + +#### URL + +`DELETE` `/druid/coordinator/v1/config/compaction/{dataSource}` + +#### Responses + + + + + + +*Successfully deleted automatic compaction configuration* + + + + + +*Datasource does not have automatic compaction or invalid datasource name* + + + + +--- + + +#### Sample request + + + + + + +```shell +curl --request DELETE "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/config/compaction/wikipedia_hour" +``` + + + + + +```HTTP +DELETE /druid/coordinator/v1/config/compaction/wikipedia_hour HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +A successful request returns an HTTP `200 OK` message code and an empty response body. + +### Update capacity for compaction tasks + +:::info +This API is now deprecated. Use [Update cluster-level compaction config](#update-cluster-level-compaction-config) instead. +::: + +Updates the capacity for compaction tasks. The minimum number of compaction tasks is 1 and the maximum is 2147483647. + +Note that while the max compaction tasks can theoretically be set to 2147483647, the practical limit is determined by the available cluster capacity and is capped at 10% of the cluster's total capacity. + +#### URL + +`POST` `/druid/coordinator/v1/config/compaction/taskslots` + +#### Query parameters + +To limit the maximum number of compaction tasks, use the optional query parameters `ratio` and `max`: + +* `ratio` (optional) + * Type: Float + * Default: 0.1 + * Limits the ratio of the total task slots to compaction task slots. +* `max` (optional) + * Type: Int + * Default: 2147483647 + * Limits the maximum number of task slots for compaction tasks. + +#### Responses + + + + + + +*Successfully updated compaction configuration* + + + + + +*Invalid `max` value* + + + + +--- + +#### Sample request + + + + + + +```shell +curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/config/compaction/taskslots?ratio=0.2&max=250000" +``` + + + + + +```HTTP +POST /druid/coordinator/v1/config/compaction/taskslots?ratio=0.2&max=250000 HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +A successful request returns an HTTP `200 OK` message code and an empty response body. + +## View automatic compaction configuration + +### Get all automatic compaction configurations + +Retrieves all automatic compaction configurations. Returns a `compactionConfigs` object containing the active automatic compaction configurations of all datasources. + +You can use this endpoint to retrieve `compactionTaskSlotRatio` and `maxCompactionTaskSlots` values for managing resource allocation of compaction tasks. + +#### URL + +`GET` `/druid/coordinator/v1/config/compaction` + +#### Responses + + + + + + +*Successfully retrieved automatic compaction configurations* + + + + +--- + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/config/compaction" +``` + + + + + +```HTTP +GET /druid/coordinator/v1/config/compaction HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + +```json +{ + "compactionConfigs": [ + { + "dataSource": "wikipedia_hour", + "taskPriority": 25, + "inputSegmentSizeBytes": 100000000000000, + "maxRowsPerSegment": null, + "skipOffsetFromLatest": "PT0S", + "tuningConfig": { + "maxRowsInMemory": null, + "appendableIndexSpec": null, + "maxBytesInMemory": null, + "maxTotalRows": null, + "splitHintSpec": null, + "partitionsSpec": { + "type": "dynamic", + "maxRowsPerSegment": 5000000, + "maxTotalRows": null + }, + "indexSpec": null, + "indexSpecForIntermediatePersists": null, + "maxPendingPersists": null, + "pushTimeout": null, + "segmentWriteOutMediumFactory": null, + "maxNumConcurrentSubTasks": null, + "maxRetry": null, + "taskStatusCheckPeriodMs": null, + "chatHandlerTimeout": null, + "chatHandlerNumRetries": null, + "maxNumSegmentsToMerge": null, + "totalNumMergeTasks": null, + "maxColumnsToMerge": null, + "type": "index_parallel", + "forceGuaranteedRollup": false + }, + "granularitySpec": { + "segmentGranularity": "DAY", + "queryGranularity": null, + "rollup": null + }, + "dimensionsSpec": null, + "metricsSpec": null, + "transformSpec": null, + "ioConfig": null, + "taskContext": null + }, + { + "dataSource": "wikipedia", + "taskPriority": 25, + "inputSegmentSizeBytes": 100000000000000, + "maxRowsPerSegment": null, + "skipOffsetFromLatest": "PT0S", + "tuningConfig": { + "maxRowsInMemory": null, + "appendableIndexSpec": null, + "maxBytesInMemory": null, + "maxTotalRows": null, + "splitHintSpec": null, + "partitionsSpec": { + "type": "dynamic", + "maxRowsPerSegment": 5000000, + "maxTotalRows": null + }, + "indexSpec": null, + "indexSpecForIntermediatePersists": null, + "maxPendingPersists": null, + "pushTimeout": null, + "segmentWriteOutMediumFactory": null, + "maxNumConcurrentSubTasks": null, + "maxRetry": null, + "taskStatusCheckPeriodMs": null, + "chatHandlerTimeout": null, + "chatHandlerNumRetries": null, + "maxNumSegmentsToMerge": null, + "totalNumMergeTasks": null, + "maxColumnsToMerge": null, + "type": "index_parallel", + "forceGuaranteedRollup": false + }, + "granularitySpec": { + "segmentGranularity": "DAY", + "queryGranularity": null, + "rollup": null + }, + "dimensionsSpec": null, + "metricsSpec": null, + "transformSpec": null, + "ioConfig": null, + "taskContext": null + } + ], + "compactionTaskSlotRatio": 0.1, + "maxCompactionTaskSlots": 2147483647, + +} +``` +
+ +### Get automatic compaction configuration + +Retrieves the automatic compaction configuration for a datasource. + +#### URL + +`GET` `/druid/coordinator/v1/config/compaction/{dataSource}` + +#### Responses + + + + + + +*Successfully retrieved configuration for datasource* + + + + + +*Invalid datasource or datasource does not have automatic compaction enabled* + + + + +--- + +#### Sample request + +The following example retrieves the automatic compaction configuration for datasource `wikipedia_hour`. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/config/compaction/wikipedia_hour" +``` + + + + + +```HTTP +GET /druid/coordinator/v1/config/compaction/wikipedia_hour HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + +```json +{ + "dataSource": "wikipedia_hour", + "taskPriority": 25, + "inputSegmentSizeBytes": 100000000000000, + "maxRowsPerSegment": null, + "skipOffsetFromLatest": "PT0S", + "tuningConfig": { + "maxRowsInMemory": null, + "appendableIndexSpec": null, + "maxBytesInMemory": null, + "maxTotalRows": null, + "splitHintSpec": null, + "partitionsSpec": { + "type": "dynamic", + "maxRowsPerSegment": 5000000, + "maxTotalRows": null + }, + "indexSpec": null, + "indexSpecForIntermediatePersists": null, + "maxPendingPersists": null, + "pushTimeout": null, + "segmentWriteOutMediumFactory": null, + "maxNumConcurrentSubTasks": null, + "maxRetry": null, + "taskStatusCheckPeriodMs": null, + "chatHandlerTimeout": null, + "chatHandlerNumRetries": null, + "maxNumSegmentsToMerge": null, + "totalNumMergeTasks": null, + "maxColumnsToMerge": null, + "type": "index_parallel", + "forceGuaranteedRollup": false + }, + "granularitySpec": { + "segmentGranularity": "DAY", + "queryGranularity": null, + "rollup": null + }, + "dimensionsSpec": null, + "metricsSpec": null, + "transformSpec": null, + "ioConfig": null, + "taskContext": null +} +``` +
+ +### Get automatic compaction configuration history + +Retrieves the history of the automatic compaction configuration for a datasource. Returns an empty list if the datasource does not exist or there is no compaction history for the datasource. + +The response contains a list of objects with the following keys: +* `globalConfig`: A JSON object containing automatic compaction configuration that applies to the entire cluster. +* `compactionConfig`: A JSON object containing the automatic compaction configuration for the datasource. +* `auditInfo`: A JSON object containing information about the change made, such as `author`, `comment` or `ip`. +* `auditTime`: The date and time when the change was made. + +#### URL + +`GET` `/druid/coordinator/v1/config/compaction/{dataSource}/history` + +#### Query parameters +* `interval` (optional) + * Type: ISO-8601 + * Limits the results within a specified interval. Use `/` as the delimiter for the interval string. +* `count` (optional) + * Type: Int + * Limits the number of results. + +#### Responses + + + + + + +*Successfully retrieved configuration history* + + + + + +*Invalid `count` value* + + + + +--- + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/config/compaction/wikipedia_hour/history" +``` + + + + + +```HTTP +GET /druid/coordinator/v1/config/compaction/wikipedia_hour/history HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + +```json +[ + { + "globalConfig": { + "compactionTaskSlotRatio": 0.1, + "maxCompactionTaskSlots": 2147483647, + "compactionPolicy": { + "type": "newestSegmentFirst", + "priorityDatasource": "wikipedia" + }, + "useSupervisors": true, + "engine": "native" + }, + "compactionConfig": { + "dataSource": "wikipedia_hour", + "taskPriority": 25, + "inputSegmentSizeBytes": 100000000000000, + "maxRowsPerSegment": null, + "skipOffsetFromLatest": "P1D", + "tuningConfig": null, + "granularitySpec": { + "segmentGranularity": "DAY", + "queryGranularity": null, + "rollup": null + }, + "dimensionsSpec": null, + "metricsSpec": null, + "transformSpec": null, + "ioConfig": null, + "taskContext": null + }, + "auditInfo": { + "author": "", + "comment": "", + "ip": "127.0.0.1" + }, + "auditTime": "2023-07-31T18:15:19.302Z" + }, + { + "globalConfig": { + "compactionTaskSlotRatio": 0.1, + "maxCompactionTaskSlots": 2147483647, + "compactionPolicy": { + "type": "newestSegmentFirst" + }, + "useSupervisors": false, + "engine": "native" + }, + "compactionConfig": { + "dataSource": "wikipedia_hour", + "taskPriority": 25, + "inputSegmentSizeBytes": 100000000000000, + "maxRowsPerSegment": null, + "skipOffsetFromLatest": "PT0S", + "tuningConfig": { + "maxRowsInMemory": null, + "appendableIndexSpec": null, + "maxBytesInMemory": null, + "maxTotalRows": null, + "splitHintSpec": null, + "partitionsSpec": { + "type": "dynamic", + "maxRowsPerSegment": 5000000, + "maxTotalRows": null + }, + "indexSpec": null, + "indexSpecForIntermediatePersists": null, + "maxPendingPersists": null, + "pushTimeout": null, + "segmentWriteOutMediumFactory": null, + "maxNumConcurrentSubTasks": null, + "maxRetry": null, + "taskStatusCheckPeriodMs": null, + "chatHandlerTimeout": null, + "chatHandlerNumRetries": null, + "maxNumSegmentsToMerge": null, + "totalNumMergeTasks": null, + "maxColumnsToMerge": null, + "type": "index_parallel", + "forceGuaranteedRollup": false + }, + "granularitySpec": { + "segmentGranularity": "DAY", + "queryGranularity": null, + "rollup": null + }, + "dimensionsSpec": null, + "metricsSpec": null, + "transformSpec": null, + "ioConfig": null, + "taskContext": null + }, + "auditInfo": { + "author": "", + "comment": "", + "ip": "127.0.0.1" + }, + "auditTime": "2023-07-31T18:16:16.362Z" + } +] +``` +
+ +## View automatic compaction status + +### Get segments awaiting compaction + +Returns the total size of segments awaiting compaction for a given datasource. Returns a 404 response if a datasource does not have automatic compaction enabled. + +#### URL + +`GET` `/druid/coordinator/v1/compaction/progress?dataSource={dataSource}` + +#### Query parameter +* `dataSource` (required) + * Type: String + * Name of the datasource for this status information. + +#### Responses + + + + + + +*Successfully retrieved segment size awaiting compaction* + + + + + +*Unknown datasource name or datasource does not have automatic compaction enabled* + + + + +--- + +#### Sample request + +The following example retrieves the remaining segments to be compacted for datasource `wikipedia_hour`. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/compaction/progress?dataSource=wikipedia_hour" +``` + + + + + +```HTTP +GET /druid/coordinator/v1/compaction/progress?dataSource=wikipedia_hour HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + +```json +{ + "remainingSegmentSize": 7615837 +} +``` +
+ + +### Get compaction status and statistics + +Retrieves an array of `latestStatus` objects representing the status and statistics from the latest automatic compaction run for all datasources with automatic compaction enabled. + +#### Compaction status response + +The `latestStatus` object has the following properties: +* `dataSource`: Name of the datasource for this status information. +* `scheduleStatus`: Automatic compaction scheduling status. Possible values are `NOT_ENABLED` and `RUNNING`. Returns `RUNNING ` if the datasource has an active automatic compaction configuration submitted. Otherwise, returns `NOT_ENABLED`. +* `bytesAwaitingCompaction`: Total bytes of this datasource waiting to be compacted by the automatic compaction (only consider intervals/segments that are eligible for automatic compaction). +* `bytesCompacted`: Total bytes of this datasource that are already compacted with the spec set in the automatic compaction configuration. +* `bytesSkipped`: Total bytes of this datasource that are skipped (not eligible for automatic compaction) by the automatic compaction. +* `segmentCountAwaitingCompaction`: Total number of segments of this datasource waiting to be compacted by the automatic compaction (only consider intervals/segments that are eligible for automatic compaction). +* `segmentCountCompacted`: Total number of segments of this datasource that are already compacted with the spec set in the automatic compaction configuration. +* `segmentCountSkipped`: Total number of segments of this datasource that are skipped (not eligible for automatic compaction) by the automatic compaction. +* `intervalCountAwaitingCompaction`: Total number of intervals of this datasource waiting to be compacted by the automatic compaction (only consider intervals/segments that are eligible for automatic compaction). +* `intervalCountCompacted`: Total number of intervals of this datasource that are already compacted with the spec set in the automatic compaction configuration. +* `intervalCountSkipped`: Total number of intervals of this datasource that are skipped (not eligible for automatic compaction) by the automatic compaction. + +#### URL + +`GET` `/druid/coordinator/v1/compaction/status` + +#### Query parameters +* `dataSource` (optional) + * Type: String + * Filter the result by name of a specific datasource. + +#### Responses + + + + + + +*Successfully retrieved `latestStatus` object* + + + + +--- +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/compaction/status" +``` + + + + + +```HTTP +GET /druid/coordinator/v1/compaction/status HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + +```json +{ + "latestStatus": [ + { + "dataSource": "wikipedia_api", + "scheduleStatus": "RUNNING", + "bytesAwaitingCompaction": 0, + "bytesCompacted": 0, + "bytesSkipped": 64133616, + "segmentCountAwaitingCompaction": 0, + "segmentCountCompacted": 0, + "segmentCountSkipped": 8, + "intervalCountAwaitingCompaction": 0, + "intervalCountCompacted": 0, + "intervalCountSkipped": 1 + }, + { + "dataSource": "wikipedia_hour", + "scheduleStatus": "RUNNING", + "bytesAwaitingCompaction": 0, + "bytesCompacted": 5998634, + "bytesSkipped": 0, + "segmentCountAwaitingCompaction": 0, + "segmentCountCompacted": 1, + "segmentCountSkipped": 0, + "intervalCountAwaitingCompaction": 0, + "intervalCountCompacted": 1, + "intervalCountSkipped": 0 + } + ] +} +``` +
+ +## [Experimental] Unified Compaction APIs + +This section describes the new unified compaction APIs which can be used regardless of whether compaction supervisors are enabled (i.e. `useSupervisors` is `true`) or not in the compaction dynamic config. + +- If compaction supervisors are disabled, the APIs read or write the compaction dynamic config, same as the Coordinator-based compaction APIs above. +- If compaction supervisors are enabled, the APIs read or write the corresponding compaction supervisors. In conjunction with the APIs described below, the supervisor APIs may also be used to read or write the compaction supervisors as they offer greater flexibility and also serve information related to supervisor and task statuses. + +### Update cluster-level compaction config + +Updates cluster-level configuration for compaction tasks which applies to all datasources, unless explicitly overridden in the datasource compaction config. +This includes the following fields: + +|Config|Description|Default value| +|------|-----------|-------------| +|`compactionTaskSlotRatio`|Ratio of number of slots taken up by compaction tasks to the number of total task slots across all workers.|0.1| +|`maxCompactionTaskSlots`|Maximum number of task slots that can be taken up by compaction tasks and sub-tasks. Minimum number of task slots available for compaction is 1. When using MSQ engine or Native engine with range partitioning, a single compaction job occupies more than one task slot. In this case, the minimum is 2 so that at least one compaction job can always run in the cluster.|2147483647 (i.e. total task slots)| +|`compactionPolicy`|Policy to choose intervals for compaction. Currently, the only supported policy is [Newest segment first](#compaction-policy-newestsegmentfirst).|Newest segment first| +|`useSupervisors`|Whether compaction should be run on Overlord using supervisors instead of Coordinator duties.|false| +|`engine`|Engine used for running compaction tasks, unless overridden in the datasource-level compaction config. Possible values are `native` and `msq`. `msq` engine can be used for compaction only if `useSupervisors` is `true`.|`native`| + +#### Compaction policy `newestSegmentFirst` + +|Field|Description|Default value| +|-----|-----------|-------------| +|`type`|This must always be `newestSegmentFirst`|| +|`priorityDatasource`|Datasource to prioritize for compaction. The intervals of this datasource are chosen for compaction before the intervals of any other datasource. Within this datasource, the intervals are prioritized based on the chosen compaction policy.|None| + + +#### URL + +`POST` `/druid/indexer/v1/compaction/config/cluster` + +#### Responses + + + + + + +*Successfully updated compaction configuration* + + + + + +*Invalid `max` value* + + + + +--- + +#### Sample request + + + + + + +```shell +curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/config/compaction/cluster" \ +--header 'Content-Type: application/json' \ +--data '{ + "compactionTaskSlotRatio": 0.5, + "maxCompactionTaskSlots": 1500, + "compactionPolicy": { + "type": "newestSegmentFirst", + "priorityDatasource": "wikipedia" + }, + "useSupervisors": true, + "engine": "msq" +}' + +``` + + + + + +```HTTP +POST /druid/indexer/v1/compaction/config/cluster HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json + +{ + "compactionTaskSlotRatio": 0.5, + "maxCompactionTaskSlots": 1500, + "compactionPolicy": { + "type": "newestSegmentFirst", + "priorityDatasource": "wikipedia" + }, + "useSupervisors": true, + "engine": "msq" +} +``` + + + + +#### Sample response + +A successful request returns an HTTP `200 OK` message code and an empty response body. + +### Get cluster-level compaction config + +Retrieves cluster-level configuration for compaction tasks which applies to all datasources, unless explicitly overridden in the datasource compaction config. +This includes all the fields listed in [Update cluster-level compaction config](#update-cluster-level-compaction-config). + +#### URL + +`GET` `/druid/indexer/v1/compaction/config/cluster` + +#### Responses + + + + + +*Successfully retrieved cluster compaction configuration* + + + + +--- + +#### Sample request + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/compaction/config/cluster" +``` + + + + +```HTTP +GET /druid/indexer/v1/compaction/config/cluster HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + +```json +{ + "compactionTaskSlotRatio": 0.5, + "maxCompactionTaskSlots": 1500, + "compactionPolicy": { + "type": "newestSegmentFirst", + "priorityDatasource": "wikipedia" + }, + "useSupervisors": true, + "engine": "msq" +} +``` + +
+ +### Get automatic compaction configurations for all datasources + +Retrieves all datasource compaction configurations. + +#### URL + +`GET` `/druid/indexer/v1/compaction/config/datasources` + +#### Responses + + + + + + +*Successfully retrieved automatic compaction configurations* + + + + +--- + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/compaction/config/datasources" +``` + + + + + +```HTTP +GET /druid/indexer/v1/compaction/config/datasources HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + +```json +{ + "compactionConfigs": [ + { + "dataSource": "wikipedia_hour", + "taskPriority": 25, + "inputSegmentSizeBytes": 100000000000000, + "maxRowsPerSegment": null, + "skipOffsetFromLatest": "PT0S", + "tuningConfig": { + "maxRowsInMemory": null, + "appendableIndexSpec": null, + "maxBytesInMemory": null, + "maxTotalRows": null, + "splitHintSpec": null, + "partitionsSpec": { + "type": "dynamic", + "maxRowsPerSegment": 5000000, + "maxTotalRows": null + }, + "indexSpec": null, + "indexSpecForIntermediatePersists": null, + "maxPendingPersists": null, + "pushTimeout": null, + "segmentWriteOutMediumFactory": null, + "maxNumConcurrentSubTasks": null, + "maxRetry": null, + "taskStatusCheckPeriodMs": null, + "chatHandlerTimeout": null, + "chatHandlerNumRetries": null, + "maxNumSegmentsToMerge": null, + "totalNumMergeTasks": null, + "maxColumnsToMerge": null, + "type": "index_parallel", + "forceGuaranteedRollup": false + }, + "granularitySpec": { + "segmentGranularity": "DAY", + "queryGranularity": null, + "rollup": null + }, + "dimensionsSpec": null, + "metricsSpec": null, + "transformSpec": null, + "ioConfig": null, + "taskContext": null + }, + { + "dataSource": "wikipedia", + "taskPriority": 25, + "inputSegmentSizeBytes": 100000000000000, + "maxRowsPerSegment": null, + "skipOffsetFromLatest": "PT0S", + "tuningConfig": { + "maxRowsInMemory": null, + "appendableIndexSpec": null, + "maxBytesInMemory": null, + "maxTotalRows": null, + "splitHintSpec": null, + "partitionsSpec": { + "type": "dynamic", + "maxRowsPerSegment": 5000000, + "maxTotalRows": null + }, + "indexSpec": null, + "indexSpecForIntermediatePersists": null, + "maxPendingPersists": null, + "pushTimeout": null, + "segmentWriteOutMediumFactory": null, + "maxNumConcurrentSubTasks": null, + "maxRetry": null, + "taskStatusCheckPeriodMs": null, + "chatHandlerTimeout": null, + "chatHandlerNumRetries": null, + "maxNumSegmentsToMerge": null, + "totalNumMergeTasks": null, + "maxColumnsToMerge": null, + "type": "index_parallel", + "forceGuaranteedRollup": false + }, + "granularitySpec": { + "segmentGranularity": "DAY", + "queryGranularity": null, + "rollup": null + }, + "dimensionsSpec": null, + "metricsSpec": null, + "transformSpec": null, + "ioConfig": null, + "taskContext": null + } + ] +} +``` +
+ +### Get automatic compaction configuration for a datasource + +Retrieves the automatic compaction configuration for a datasource. + +#### URL + +`GET` `/druid/indexer/v1/compaction/config/datasources/{dataSource}` + +#### Responses + + + + + + +*Successfully retrieved configuration for datasource* + + + + + +*Invalid datasource or datasource does not have automatic compaction enabled* + + + + +--- + +#### Sample request + +The following example retrieves the automatic compaction configuration for datasource `wikipedia_hour`. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/compaction/config/datasources/wikipedia_hour" +``` + + + + + +```HTTP +GET /druid/indexer/v1/compaction/config/datasources/wikipedia_hour HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + +```json +{ + "dataSource": "wikipedia_hour", + "taskPriority": 25, + "inputSegmentSizeBytes": 100000000000000, + "maxRowsPerSegment": null, + "skipOffsetFromLatest": "PT0S", + "tuningConfig": { + "maxRowsInMemory": null, + "appendableIndexSpec": null, + "maxBytesInMemory": null, + "maxTotalRows": null, + "splitHintSpec": null, + "partitionsSpec": { + "type": "dynamic", + "maxRowsPerSegment": 5000000, + "maxTotalRows": null + }, + "indexSpec": null, + "indexSpecForIntermediatePersists": null, + "maxPendingPersists": null, + "pushTimeout": null, + "segmentWriteOutMediumFactory": null, + "maxNumConcurrentSubTasks": null, + "maxRetry": null, + "taskStatusCheckPeriodMs": null, + "chatHandlerTimeout": null, + "chatHandlerNumRetries": null, + "maxNumSegmentsToMerge": null, + "totalNumMergeTasks": null, + "maxColumnsToMerge": null, + "type": "index_parallel", + "forceGuaranteedRollup": false + }, + "granularitySpec": { + "segmentGranularity": "DAY", + "queryGranularity": null, + "rollup": null + }, + "dimensionsSpec": null, + "metricsSpec": null, + "transformSpec": null, + "ioConfig": null, + "taskContext": null +} +``` +
+ +### Create or update automatic compaction configuration for a datasource + +Creates or updates the automatic compaction configuration for a datasource. Pass the automatic compaction as a JSON object in the request body. + +The automatic compaction configuration requires only the `dataSource` property. Druid fills all other properties with default values if not specified. See [Automatic compaction dynamic configuration](../configuration/index.md#automatic-compaction-dynamic-configuration) for configuration details. + +Note that this endpoint returns an HTTP `200 OK` message code even if the datasource name does not exist. + +#### URL + +`POST` `/druid/indexer/v1/compaction/config/datasources/wikipedia_hour` + +#### Responses + + + + + + +*Successfully submitted auto compaction configuration* + + + + +--- +#### Sample request + +The following example creates an automatic compaction configuration for the datasource `wikipedia_hour`, which was ingested with `HOUR` segment granularity. This automatic compaction configuration performs compaction on `wikipedia_hour`, resulting in compacted segments that represent a day interval of data. + +In this example: + +* `wikipedia_hour` is a datasource with `HOUR` segment granularity. +* `skipOffsetFromLatest` is set to `PT0S`, meaning that no data is skipped. +* `partitionsSpec` is set to the default `dynamic`, allowing Druid to dynamically determine the optimal partitioning strategy. +* `type` is set to `index_parallel`, meaning that parallel indexing is used. +* `segmentGranularity` is set to `DAY`, meaning that each compacted segment is a day of data. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/compaction/config/datasources/wikipedia_hour"\ +--header 'Content-Type: application/json' \ +--data '{ + "dataSource": "wikipedia_hour", + "skipOffsetFromLatest": "PT0S", + "tuningConfig": { + "partitionsSpec": { + "type": "dynamic" + }, + "type": "index_parallel" + }, + "granularitySpec": { + "segmentGranularity": "DAY" + } +}' +``` + + + + + +```HTTP +POST /druid/indexer/v1/compaction/config/datasources/wikipedia_hour HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json +Content-Length: 281 + +{ + "dataSource": "wikipedia_hour", + "skipOffsetFromLatest": "PT0S", + "tuningConfig": { + "partitionsSpec": { + "type": "dynamic" + }, + "type": "index_parallel" + }, + "granularitySpec": { + "segmentGranularity": "DAY" + } +} +``` + + + + +#### Sample response + +A successful request returns an HTTP `200 OK` message code and an empty response body. + + +### Delete automatic compaction configuration for a datasource + +Removes the automatic compaction configuration for a datasource. This updates the compaction status of the datasource to "Not enabled." + +#### URL + +`DELETE` `/druid/indexer/v1/compaction/config/datasources/{dataSource}` + +#### Responses + + + + + + +*Successfully deleted automatic compaction configuration* + + + + + +*Datasource does not have automatic compaction or invalid datasource name* + + + + +--- + + +#### Sample request + + + + + + +```shell +curl --request DELETE "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/compaction/config/datasources/wikipedia_hour" +``` + + + + + +```HTTP +DELETE /druid/indexer/v1/compaction/config/wikipedia_hour HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +A successful request returns an HTTP `200 OK` message code and an empty response body. + +### Get compaction status for all datasources + +Retrieves an array of `latestStatus` objects representing the status and statistics from the latest automatic compaction run for all the datasources to which the user has read access. +The response payload is in the same format as [Compaction status response](#compaction-status-response). + +#### URL + +`GET` `/druid/indexer/v1/compaction/status/datasources` + +#### Responses + + + + + + +*Successfully retrieved `latestStatus` object* + + + + +--- +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/compaction/status/datasources" +``` + + + + + +```HTTP +GET /druid/indexer/v1/compaction/status/datasources HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + +```json +{ + "latestStatus": [ + { + "dataSource": "wikipedia_api", + "scheduleStatus": "RUNNING", + "bytesAwaitingCompaction": 0, + "bytesCompacted": 0, + "bytesSkipped": 64133616, + "segmentCountAwaitingCompaction": 0, + "segmentCountCompacted": 0, + "segmentCountSkipped": 8, + "intervalCountAwaitingCompaction": 0, + "intervalCountCompacted": 0, + "intervalCountSkipped": 1 + }, + { + "dataSource": "wikipedia_hour", + "scheduleStatus": "RUNNING", + "bytesAwaitingCompaction": 0, + "bytesCompacted": 5998634, + "bytesSkipped": 0, + "segmentCountAwaitingCompaction": 0, + "segmentCountCompacted": 1, + "segmentCountSkipped": 0, + "intervalCountAwaitingCompaction": 0, + "intervalCountCompacted": 1, + "intervalCountSkipped": 0 + } + ] +} +``` +
+ +### Get compaction status for a single datasource + +Retrieves the latest status from the latest automatic compaction run for a datasource. The response payload is in the same format as [Compaction status response](#compaction-status-response) with zero or one entry. + +#### URL + +`GET` `/druid/indexer/v1/compaction/status/datasources/{dataSource}` + +#### Responses + + + + + + +*Successfully retrieved `latestStatus` object* + + + + +--- +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/compaction/status/datasources/wikipedia_hour" +``` + + + + + +```HTTP +GET /druid/indexer/v1/compaction/status/datasources/wikipedia_hour HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + +```json +{ + "latestStatus": [ + { + "dataSource": "wikipedia_hour", + "scheduleStatus": "RUNNING", + "bytesAwaitingCompaction": 0, + "bytesCompacted": 5998634, + "bytesSkipped": 0, + "segmentCountAwaitingCompaction": 0, + "segmentCountCompacted": 1, + "segmentCountSkipped": 0, + "intervalCountAwaitingCompaction": 0, + "intervalCountCompacted": 1, + "intervalCountSkipped": 0 + } + ] +} +``` +
diff --git a/docs/35.0.0/api-reference/data-management-api.md b/docs/35.0.0/api-reference/data-management-api.md new file mode 100644 index 0000000000..fe37c6a814 --- /dev/null +++ b/docs/35.0.0/api-reference/data-management-api.md @@ -0,0 +1,607 @@ +--- +id: data-management-api +title: Data management API +sidebar_label: Data management +--- + + + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +This topic describes the data management API endpoints for Apache Druid. +This includes information on how to mark segments as used or unused and delete them from Druid. + +In this topic, `http://ROUTER_IP:ROUTER_PORT` is a placeholder for your Router service address and port. +Replace it with the information for your deployment. +For example, use `http://localhost:8888` for quickstart deployments. + +:::info +- Coordinator APIs for data management are now deprecated. Use new APIs served by the Overlord instead. +- Do not use these APIs while an indexing task or kill task is in progress for the same datasource and interval. +::: + +## Segment management + +You can mark segments as used by sending POST requests to the datasource, but the Coordinator may subsequently mark segments as unused if they meet any configured [drop rules](../operations/rule-configuration.md#drop-rules). +Even if these API requests update segments to used, you still need to configure a [load rule](../operations/rule-configuration.md#load-rules) to load them onto Historical processes. + +When you use these APIs concurrently with an indexing task or a kill task, the behavior is undefined. +Druid terminates some segments and marks others as used. +Furthermore, it is possible that all segments could be unused, yet an indexing task might still be able to read data from these segments and complete successfully. + +All of the following APIs, except [Segment deletion](#segment-deletion) are served by the Overlord as it is the service responsible for performing actions on segment metadata on behalf of indexing tasks. +This makes it the single source of truth for segment metadata, thus ensuring a consistent view across the Druid cluster and allowing the Overlord to cache metadata to improve performance. + +### Segment IDs + +You must provide segment IDs when using many of the endpoints described in this topic. +For information on segment IDs, see [Segment identification](../design/segments.md#segment-identification). +For information on finding segment IDs in the web console, see [Segments](../operations/web-console.md#segments). + +### Mark a single segment unused + +Marks the state of a segment as unused, using the segment ID. +This is a "soft delete" of the segment from Historicals. +To undo this action, [mark the segment used](#mark-a-single-segment-as-used). + +Note that this endpoint returns an HTTP `200 OK` response code even if the segment ID or datasource doesn't exist. +Check the response payload to confirm if any segment was actually updated. + +#### URL + +`DELETE` `/druid/indexer/v1/datasources/{datasource}/segments/{segmentId}` + +#### Header + +The following headers are required for this request: + +```json +Content-Type: application/json +Accept: application/json, text/plain +``` + +#### Responses + + + + + + +*Successfully updated segment* + + + + +--- + +#### Sample request + +The following example updates the segment `wikipedia_hour_2015-09-12T16:00:00.000Z_2015-09-12T17:00:00.000Z_2023-08-10T04:12:03.860Z` from datasource `wikipedia_hour` as `unused`. + + + + + + +```shell +curl --request DELETE "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/datasources/wikipedia_hour/segments/wikipedia_hour_2015-09-12T16:00:00.000Z_2015-09-12T17:00:00.000Z_2023-08-10T04:12:03.860Z" \ +--header 'Content-Type: application/json' \ +--header 'Accept: application/json, text/plain' +``` + + + + + +```HTTP +DELETE /druid/indexer/v1/datasources/wikipedia_hour/segments/wikipedia_hour_2015-09-12T16:00:00.000Z_2015-09-12T17:00:00.000Z_2023-08-10T04:12:03.860Z HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json +Accept: application/json, text/plain +``` + + + + +#### Sample response + +
+ View the response + +```json +{ + "segmentStateChanged": true, + "numChangedSegments": 1 +} +``` +
+ +### Mark a single segment as used + +Marks the state of a segment as used, using the segment ID. + +#### URL + +`POST` `/druid/indexer/v1/datasources/{datasource}/segments/{segmentId}` + +#### Header + +The following headers are required for this request: + +```json +Content-Type: application/json +Accept: application/json, text/plain +``` + +#### Responses + + + + + + +*Successfully updated segments* + + + + +--- + +#### Sample request + +The following example updates the segment with ID `wikipedia_hour_2015-09-12T18:00:00.000Z_2015-09-12T19:00:00.000Z_2023-08-10T04:12:03.860Z` to used. + + + + + + +```shell +curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/datasources/wikipedia_hour/segments/wikipedia_hour_2015-09-12T18:00:00.000Z_2015-09-12T19:00:00.000Z_2023-08-10T04:12:03.860Z" \ +--header 'Content-Type: application/json' \ +--header 'Accept: application/json, text/plain' +``` + + + + + +```HTTP +POST /druid/indexer/v1/datasources/wikipedia_hour/segments/wikipedia_hour_2015-09-12T18:00:00.000Z_2015-09-12T19:00:00.000Z_2023-08-10T04:12:03.860Z HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json +Accept: application/json, text/plain +``` + + + + +#### Sample response + +
+ View the response + +```json +{ + "segmentStateChanged": true, + "numChangedSegments": 1 +} +``` +
+ +### Mark a group of segments unused + +Marks the state of a group of segments as unused, using an array of segment IDs or an interval. +Pass the array of segment IDs or interval as a JSON object in the request body. + +For the interval, specify the start and end times as ISO 8601 strings to identify segments inclusive of the start time and exclusive of the end time. +Optionally, specify an array of segment versions with interval. Druid updates only the segments completely contained +within the specified interval that match the optional list of versions; partially overlapping segments are not affected. + +#### URL + +`POST` `/druid/indexer/v1/datasources/{datasource}/markUnused` + +#### Request body + +The group of segments is sent as a JSON request payload that accepts the following properties: + +|Property|Description|Required|Example| +|--------|-----------|--------|-------| +|`interval`|ISO 8601 segments interval.|Yes, if `segmentIds` is not specified.|`"2015-09-12T03:00:00.000Z/2015-09-12T05:00:00.000Z"`| +|`segmentIds`|List of segment IDs.|Yes, if `interval` is not specified.|`["segmentId1", "segmentId2"]`| +|`versions`|List of segment versions. Must be provided with `interval`.|No.|`["2024-03-14T16:00:04.086Z", ""2024-03-12T16:00:04.086Z"]`| + +#### Responses + + + + + + +*Successfully updated segments* + + + + + +*Invalid datasource name* + + + + + +*Invalid request payload* + + + + +--- + +#### Sample request + +The following example marks two segments from the `wikipedia_hour` datasource unused based on their segment IDs. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/datasources/wikipedia_hour/markUnused" \ +--header 'Content-Type: application/json' \ +--data '{ + "segmentIds": [ + "wikipedia_hour_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2023-08-10T04:12:03.860Z", + "wikipedia_hour_2015-09-12T04:00:00.000Z_2015-09-12T05:00:00.000Z_2023-08-10T04:12:03.860Z" + ] +}' +``` + + + + + +```HTTP +POST /druid/indexer/v1/datasources/wikipedia_hour/markUnused HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json +Content-Length: 230 + +{ + "segmentIds": [ + "wikipedia_hour_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2023-08-10T04:12:03.860Z", + "wikipedia_hour_2015-09-12T04:00:00.000Z_2015-09-12T05:00:00.000Z_2023-08-10T04:12:03.860Z" + ] +} +``` + + + + +#### Sample response + +
+ View the response + +```json +{ + "numChangedSegments": 2 +} +``` +
+ +### Mark a group of segments used + +Marks the state of a group of segments as used, using an array of segment IDs or an interval. +Pass the array of segment IDs or interval as a JSON object in the request body. + +For the interval, specify the start and end times as ISO 8601 strings to identify segments inclusive of the start time and exclusive of the end time. +Optionally, specify an array of segment versions with interval. Druid updates only the segments completely contained +within the specified interval that match the optional list of versions; partially overlapping segments are not affected. + +#### URL + +`POST` `/druid/indexer/v1/datasources/{datasource}/markUsed` + +#### Request body + +The group of segments is sent as a JSON request payload that accepts the following properties: + +|Property|Description|Required|Example| +|--------|-----------|--------|-------| +|`interval`|ISO 8601 segments interval.|Yes, if `segmentIds` is not specified.|`"2015-09-12T03:00:00.000Z/2015-09-12T05:00:00.000Z"`| +|`segmentIds`|List of segment IDs.|Yes, if `interval` is not specified.|`["segmentId1", "segmentId2"]`| +|`versions`|List of segment versions. Must be provided with `interval`.|No.|`["2024-03-14T16:00:04.086Z", ""2024-03-12T16:00:04.086Z"]`| + +#### Responses + + + + + + +*Successfully updated segments* + + + + + +*Invalid datasource name* + + + + + +*Invalid request payload* + + + + +--- + +#### Sample request + +The following example marks two segments from the `wikipedia_hour` datasource used based on their segment IDs. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/datasources/wikipedia_hour/markUsed" \ +--header 'Content-Type: application/json' \ +--data '{ + "segmentIds": [ + "wikipedia_hour_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2023-08-10T04:12:03.860Z", + "wikipedia_hour_2015-09-12T04:00:00.000Z_2015-09-12T05:00:00.000Z_2023-08-10T04:12:03.860Z" + ] +}' +``` + + + + + +```HTTP +POST /druid/indexer/v1/datasources/wikipedia_hour/markUsed HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json +Content-Length: 230 + +{ + "segmentIds": [ + "wikipedia_hour_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2023-08-10T04:12:03.860Z", + "wikipedia_hour_2015-09-12T04:00:00.000Z_2015-09-12T05:00:00.000Z_2023-08-10T04:12:03.860Z" + ] +} +``` + + + + +#### Sample response + +
+ View the response + +```json +{ + "numChangedSegments": 2 +} +``` +
+ +### Mark all segments unused + +Marks the state of all segments of a datasource as unused. +This action performs a "soft delete" of the segments from Historicals. + +Note that this endpoint returns an HTTP `200 OK` response code even if the datasource doesn't exist. +Check the response payload to confirm if any segment was actually updated. + +#### URL + +`DELETE` `/druid/indexer/v1/datasources/{datasource}` + +#### Responses + + + + + + +*Successfully updated segments* + + + + +--- + +#### Sample request + + + + + + +```shell +curl --request DELETE "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/datasources/wikipedia_hour" +``` + + + + + +```HTTP +DELETE /druid/indexer/v1/datasources/wikipedia_hour HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + +```json +{ + "numChangedSegments": 24 +} +``` +
+ +### Mark all non-overshadowed segments used + +Marks the state of all unused segments of a datasource as used given that they are not already overshadowed by other segments. +The endpoint returns the number of changed segments. + +Note that this endpoint returns an HTTP `200 OK` response code even if the datasource doesn't exist. +Check the response payload to get the number of segments actually updated. + +#### URL + +`POST` `/druid/indexer/v1/datasources/{datasource}` + +#### Header + +The following headers are required for this request: + +```json +Content-Type: application/json +Accept: application/json, text/plain +``` + +#### Responses + + + + + + +*Successfully updated segments* + + + + +--- + +#### Sample request + +The following example updates all unused segments of `wikipedia_hour` to used. +`wikipedia_hour` contains one unused segment eligible to be marked as used. + + + + + + +```shell +curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/datasources/wikipedia_hour" \ +--header 'Content-Type: application/json' \ +--header 'Accept: application/json, text/plain' +``` + + + + + +```HTTP +POST /druid/indexer/v1/datasources/wikipedia_hour HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json +Accept: application/json, text/plain +``` + + + + +#### Sample response + +
+ View the response + +```json +{ + "numChangedSegments": 1 +} +``` +
+ +## Segment deletion + +### Permanently delete segments + +The DELETE endpoint sends a [kill task](../ingestion/tasks.md) for a given interval and datasource. The interval value is an ISO 8601 string delimited by `_`. This request permanently deletes all metadata for unused segments and removes them from deep storage. + +Note that this endpoint returns an HTTP `200 OK` response code even if the datasource doesn't exist. + +This endpoint supersedes the deprecated endpoint: `DELETE /druid/coordinator/v1/datasources/{datasource}?kill=true&interval={interval}` + +#### URL + +`DELETE` `/druid/coordinator/v1/datasources/{datasource}/intervals/{interval}` + +#### Responses + + + + + + +*Successfully sent kill task* + + + + +--- + +#### Sample request + +The following example sends a kill task to permanently delete segments in the datasource `wikipedia_hour` from the interval `2015-09-12` to `2015-09-13`. + + + + + + +```shell +curl --request DELETE "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/datasources/wikipedia_hour/intervals/2015-09-12_2015-09-13" +``` + + + + + +```HTTP +DELETE /druid/coordinator/v1/datasources/wikipedia_hour/intervals/2015-09-12_2015-09-13 HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +A successful request returns an HTTP `200 OK` and an empty response body. diff --git a/docs/35.0.0/api-reference/dynamic-configuration-api.md b/docs/35.0.0/api-reference/dynamic-configuration-api.md new file mode 100644 index 0000000000..cad61e4b88 --- /dev/null +++ b/docs/35.0.0/api-reference/dynamic-configuration-api.md @@ -0,0 +1,665 @@ +--- +id: dynamic-configuration-api +title: Dynamic configuration API +sidebar_label: Dynamic configuration +--- +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + + + +This document describes the API endpoints to retrieve and manage dynamic configurations for the [Coordinator](../design/coordinator.md) and [Overlord](../design/overlord.md) in Apache Druid. + +In this topic, `http://ROUTER_IP:ROUTER_PORT` is a placeholder for your Router service address and port. +Replace it with the information for your deployment. +For example, use `http://localhost:8888` for quickstart deployments. + +## Coordinator dynamic configuration + +The Coordinator has dynamic configurations to tune certain behavior on the fly, without requiring a service restart. +For information on the supported properties, see [Coordinator dynamic configuration](../configuration/index.md#dynamic-configuration). + +### Get dynamic configuration + +Retrieves the current Coordinator dynamic configuration. Returns a JSON object with the dynamic configuration properties. + +#### URL + +`GET` `/druid/coordinator/v1/config` + +#### Responses + + + + + + +*Successfully retrieved dynamic configuration* + + + + +--- + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/config" +``` + + + + + +```HTTP +GET /druid/coordinator/v1/config HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+View the response + +```json +{ + "millisToWaitBeforeDeleting": 900000, + "maxSegmentsToMove": 100, + "replicantLifetime": 15, + "replicationThrottleLimit": 500, + "balancerComputeThreads": 1, + "killDataSourceWhitelist": [], + "killPendingSegmentsSkipList": [], + "maxSegmentsInNodeLoadingQueue": 500, + "decommissioningNodes": [], + "decommissioningMaxPercentOfMaxSegmentsToMove": 70, + "pauseCoordination": false, + "replicateAfterLoadTimeout": false, + "maxNonPrimaryReplicantsToLoad": 2147483647, + "useRoundRobinSegmentAssignment": true, + "smartSegmentLoading": true, + "debugDimensions": null, + "turboLoadingNodes": [], + "cloneServers": {} + +} +``` + +
+ +### Update dynamic configuration + +Submits a JSON-based dynamic configuration spec to the Coordinator. +For information on the supported properties, see [Dynamic configuration](../configuration/index.md#dynamic-configuration). + +#### URL + +`POST` `/druid/coordinator/v1/config` + +#### Header parameters + +The endpoint supports a set of optional header parameters to populate the `author` and `comment` fields in the configuration history. + +* `X-Druid-Author` + * Type: String + * Author of the configuration change. +* `X-Druid-Comment` + * Type: String + * Description for the update. + +#### Responses + + + + + + +*Successfully updated dynamic configuration* + + + + +--- + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/config" \ +--header 'Content-Type: application/json' \ +--data '{ + "millisToWaitBeforeDeleting": 900000, + "maxSegmentsToMove": 5, + "percentOfSegmentsToConsiderPerMove": 100, + "useBatchedSegmentSampler": true, + "replicantLifetime": 15, + "replicationThrottleLimit": 10, + "balancerComputeThreads": 1, + "emitBalancingStats": true, + "killDataSourceWhitelist": [], + "killPendingSegmentsSkipList": [], + "maxSegmentsInNodeLoadingQueue": 100, + "decommissioningNodes": [], + "decommissioningMaxPercentOfMaxSegmentsToMove": 70, + "pauseCoordination": false, + "replicateAfterLoadTimeout": false, + "maxNonPrimaryReplicantsToLoad": 2147483647, + "useRoundRobinSegmentAssignment": true, + "turboLoadingNodes": [], + "cloneServers": {} +}' +``` + + + + + +```HTTP +POST /druid/coordinator/v1/config HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json +Content-Length: 683 + +{ + "millisToWaitBeforeDeleting": 900000, + "maxSegmentsToMove": 5, + "percentOfSegmentsToConsiderPerMove": 100, + "useBatchedSegmentSampler": true, + "replicantLifetime": 15, + "replicationThrottleLimit": 10, + "balancerComputeThreads": 1, + "emitBalancingStats": true, + "killDataSourceWhitelist": [], + "killPendingSegmentsSkipList": [], + "maxSegmentsInNodeLoadingQueue": 100, + "decommissioningNodes": [], + "decommissioningMaxPercentOfMaxSegmentsToMove": 70, + "pauseCoordination": false, + "replicateAfterLoadTimeout": false, + "maxNonPrimaryReplicantsToLoad": 2147483647, + "useRoundRobinSegmentAssignment": true, + "turboLoadingNodes": [], + "cloneServers": {} +} +``` + + + + +#### Sample response + +A successful request returns an HTTP `200 OK` message code and an empty response body. + +### Get dynamic configuration history + +Retrieves the history of changes to Coordinator dynamic configuration over an interval of time. Returns an empty array if there are no history records available. + +#### URL + +`GET` `/druid/coordinator/v1/config/history` + +#### Query parameters + +The endpoint supports a set of optional query parameters to filter results. + +* `interval` + * Type: String + * Limit the results to the specified time interval in ISO 8601 format delimited with `/`. For example, `2023-07-13/2023-07-19`. The default interval is one week. You can change this period by setting `druid.audit.manager.auditHistoryMillis` in the `runtime.properties` file for the Coordinator. + +* `count` + * Type: Integer + * Limit the number of results to the last `n` entries. + +#### Responses + + + + + + +*Successfully retrieved history* + + + + + +--- + +#### Sample request + +The following example retrieves the dynamic configuration history between `2022-07-13` and `2024-07-19`. The response is limited to 10 entries. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/config/history?interval=2022-07-13%2F2024-07-19&count=10" +``` + + + + + +```HTTP +GET /druid/coordinator/v1/config/history?interval=2022-07-13/2024-07-19&count=10 HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + +```json +[ + { + "key": "coordinator.config", + "type": "coordinator.config", + "auditInfo": { + "author": "", + "comment": "", + "ip": "127.0.0.1" + }, + "payload": "{\"millisToWaitBeforeDeleting\":900000,\"maxSegmentsToMove\":5,\"replicantLifetime\":15,\"replicationThrottleLimit\":10,\"balancerComputeThreads\":1,\"killDataSourceWhitelist\":[],\"killPendingSegmentsSkipList\":[],\"maxSegmentsInNodeLoadingQueue\":100,\"decommissioningNodes\":[],\"decommissioningMaxPercentOfMaxSegmentsToMove\":70,\"pauseCoordination\":false,\"replicateAfterLoadTimeout\":false,\"maxNonPrimaryReplicantsToLoad\":2147483647,\"useRoundRobinSegmentAssignment\":true,\"smartSegmentLoading\":true,\"debugDimensions\":null,\"decommissioningNodes\":[]}", + "auditTime": "2023-10-03T20:59:51.622Z" + } +] +``` +
+ +## Overlord dynamic configuration + +The Overlord has dynamic configurations to tune how Druid assigns tasks to workers. +For information on the supported properties, see [Overlord dynamic configuration](../configuration/index.md#overlord-dynamic-configuration). + +### Get dynamic configuration + +Retrieves the current Overlord dynamic configuration. +Returns a JSON object with the dynamic configuration properties. +Returns an empty response body if there is no current Overlord dynamic configuration. + +#### URL + +`GET` `/druid/indexer/v1/worker` + +#### Responses + + + + + + +*Successfully retrieved dynamic configuration* + + + + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/worker" +``` + + + + + +```HTTP +GET /druid/indexer/v1/worker HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + +```json +{ + "type": "default", + "selectStrategy": { + "type": "fillCapacityWithCategorySpec", + "workerCategorySpec": { + "categoryMap": {}, + "strong": true + } + }, + "autoScaler": null +} +``` + +
+ +### Update dynamic configuration + +Submits a JSON-based dynamic configuration spec to the Overlord. +For information on the supported properties, see [Overlord dynamic configuration](../configuration/index.md#overlord-dynamic-configuration). + +#### URL + +`POST` `/druid/indexer/v1/worker` + +#### Header parameters + +The endpoint supports a set of optional header parameters to populate the `author` and `comment` fields in the configuration history. + +* `X-Druid-Author` + * Type: String + * Author of the configuration change. +* `X-Druid-Comment` + * Type: String + * Description for the update. + +#### Responses + + + + + + +*Successfully updated dynamic configuration* + + + + +--- + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/worker" \ +--header 'Content-Type: application/json' \ +--data '{ + "type": "default", + "selectStrategy": { + "type": "fillCapacityWithCategorySpec", + "workerCategorySpec": { + "categoryMap": {}, + "strong": true + } + }, + "autoScaler": null +}' +``` + + + + + +```HTTP +POST /druid/indexer/v1/worker HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json +Content-Length: 196 + +{ + "type": "default", + "selectStrategy": { + "type": "fillCapacityWithCategorySpec", + "workerCategorySpec": { + "categoryMap": {}, + "strong": true + } + }, + "autoScaler": null +} +``` + + + + +#### Sample response + +A successful request returns an HTTP `200 OK` message code and an empty response body. + +### Get dynamic configuration history + +Retrieves the history of changes to Overlord dynamic configuration over an interval of time. Returns an empty array if there are no history records available. + +#### URL + +`GET` `/druid/indexer/v1/worker/history` + +#### Query parameters + +The endpoint supports a set of optional query parameters to filter results. + +* `interval` + * Type: String + * Limit the results to the specified time interval in ISO 8601 format delimited with `/`. For example, `2023-07-13/2023-07-19`. The default interval is one week. You can change this period by setting `druid.audit.manager.auditHistoryMillis` in the `runtime.properties` file for the Overlord. + +* `count` + * Type: Integer + * Limit the number of results to the last `n` entries. + +#### Responses + + + + + + +*Successfully retrieved history* + + + + +--- + +#### Sample request + +The following example retrieves the dynamic configuration history between `2022-07-13` and `2024-07-19`. The response is limited to 10 entries. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/worker/history?interval=2022-07-13%2F2024-07-19&count=10" +``` + + + + + +```HTTP +GET /druid/indexer/v1/worker/history?interval=2022-07-13%2F2024-07-19&count=10 HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + +```json +[ + { + "key": "worker.config", + "type": "worker.config", + "auditInfo": { + "author": "", + "comment": "", + "ip": "127.0.0.1" + }, + "payload": "{\"type\":\"default\",\"selectStrategy\":{\"type\":\"fillCapacityWithCategorySpec\",\"workerCategorySpec\":{\"categoryMap\":{},\"strong\":true}},\"autoScaler\":null}", + "auditTime": "2023-10-03T21:49:49.991Z" + } +] +``` + +
+ +### Get an array of worker nodes in the cluster + +Returns an array of all the worker nodes in the cluster along with its corresponding metadata. + +`GET` `/druid/indexer/v1/workers` + +#### Responses + + + + + + +*Successfully retrieved worker nodes* + + + + +--- + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/workers" +``` + + + + + +```HTTP +GET /druid/indexer/v1/workers HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + +```json +[ + { + "worker": { + "scheme": "http", + "host": "localhost:8091", + "ip": "198.51.100.0", + "capacity": 2, + "version": "0", + "category": "_default_worker_category" + }, + "currCapacityUsed": 0, + "currParallelIndexCapacityUsed": 0, + "availabilityGroups": [], + "runningTasks": [], + "lastCompletedTaskTime": "2023-09-29T19:13:05.505Z", + "blacklistedUntil": null + } +] +``` + +
+ +### Get scaling events + +Returns Overlord scaling events if autoscaling runners are in use. +Returns an empty response body if there are no Overlord scaling events. + +#### URL + +`GET` `/druid/indexer/v1/scaling` + +#### Responses + + + + + + +*Successfully retrieved scaling events* + + + + +--- + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/scaling" +``` + + + + + +```HTTP +GET /druid/indexer/v1/scaling HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +A successful request returns a `200 OK` response and an array of scaling events. diff --git a/docs/35.0.0/api-reference/json-querying-api.md b/docs/35.0.0/api-reference/json-querying-api.md new file mode 100644 index 0000000000..5d03ec8b31 --- /dev/null +++ b/docs/35.0.0/api-reference/json-querying-api.md @@ -0,0 +1,925 @@ +--- +id: json-querying-api +title: JSON querying API +sidebar_label: JSON querying +--- +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + + + +This topic describes the API endpoints to submit JSON-based [native queries](../querying/querying.md) to Apache Druid. + +In this topic, `http://SERVICE_IP:SERVICE_PORT` is a placeholder for the server address of deployment and the service port. For example, on the quickstart configuration, replace `http://ROUTER_IP:ROUTER_PORT` with `http://localhost:8888`. + + +## Submit a query + +Submits a JSON-based native query. The body of the request is the native query itself. + +Druid supports different types of queries for different use cases. All queries require the following properties: +* `queryType`: A string representing the type of query. Druid supports the following native query types: `timeseries`, `topN`, `groupBy`, `timeBoundaries`, `segmentMetadata`, `datasourceMetadata`, `scan`, and `search`. +* `dataSource`: A string or object defining the source of data to query. The most common value is the name of the datasource to query. For more information, see [Datasources](../querying/datasource.md). + +For additional properties based on your query type or use case, see [available native queries](../querying/querying.md#available-queries). + +### URL + +`POST` `/druid/v2` + +### Query parameters + +* `pretty` (optional) + * Druid returns the response in a pretty-printed format using indentation and line breaks. + +### Responses + + + + + + +*Successfully submitted query* + + + + + +*Error thrown due to bad query. Returns a JSON object detailing the error with the following format:* + +```json +{ + "error": "A well-defined error code.", + "errorMessage": "A message with additional details about the error.", + "errorClass": "Class of exception that caused this error.", + "host": "The host on which the error occurred." +} +``` +For more information on possible error messages, see [query execution failures](../querying/querying.md#query-execution-failures). + + + + +--- + +### Example query: `topN` + +The following example shows a `topN` query. The query analyzes the `social_media` datasource to return the top five users from the `username` dimension with the highest number of views from the `views` metric. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/v2?pretty=null" \ +--header 'Content-Type: application/json' \ +--data '{ + "queryType": "topN", + "dataSource": "social_media", + "dimension": "username", + "threshold": 5, + "metric": "views", + "granularity": "all", + "aggregations": [ + { + "type": "longSum", + "name": "views", + "fieldName": "views" + } + ], + "intervals": [ + "2022-01-01T00:00:00.000/2024-01-01T00:00:00.000" + ] +}' +``` + + + + +```HTTP +POST /druid/v2?pretty=null HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json +Content-Length: 336 + +{ + "queryType": "topN", + "dataSource": "social_media", + "dimension": "username", + "threshold": 5, + "metric": "views", + "granularity": "all", + "aggregations": [ + { + "type": "longSum", + "name": "views", + "fieldName": "views" + } + ], + "intervals": [ + "2022-01-01T00:00:00.000/2024-01-01T00:00:00.000" + ] +} +``` + + + + +#### Example response: `topN` + +
+ View the response + + ```json +[ + { + "timestamp": "2023-07-03T18:49:54.848Z", + "result": [ + { + "views": 11591218026, + "username": "gus" + }, + { + "views": 11578638578, + "username": "miette" + }, + { + "views": 11561618880, + "username": "leon" + }, + { + "views": 11552609824, + "username": "mia" + }, + { + "views": 11551537517, + "username": "milton" + } + ] + } +] + ``` +
+ +### Example query: `groupBy` + +The following example submits a JSON query of the `groupBy` type to retrieve the `username` with the highest votes to posts ratio from the `social_media` datasource. + +In this query: +* The `upvoteSum` aggregation calculates the sum of the `upvotes` for each user. +* The `postCount` aggregation calculates the sum of posts for each user. +* The `upvoteToPostRatio` is a post-aggregation of the `upvoteSum` and the `postCount`, divided to calculate the ratio. +* The result is sorted based on the `upvoteToPostRatio` in descending order. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/v2" \ +--header 'Content-Type: application/json' \ +--data '{ + "queryType": "groupBy", + "dataSource": "social_media", + "dimensions": ["username"], + "granularity": "all", + "aggregations": [ + { "type": "doubleSum", "name": "upvoteSum", "fieldName": "upvotes" }, + { "type": "count", "name": "postCount", "fieldName": "post_title" } + ], + "postAggregations": [ + { + "type": "arithmetic", + "name": "upvoteToPostRatio", + "fn": "/", + "fields": [ + { "type": "fieldAccess", "name": "upvoteSum", "fieldName": "upvoteSum" }, + { "type": "fieldAccess", "name": "postCount", "fieldName": "postCount" } + ] + } + ], + "intervals": ["2022-01-01T00:00:00.000/2024-01-01T00:00:00.000"], + "limitSpec": { + "type": "default", + "limit": 1, + "columns": [ + { "dimension": "upvoteToPostRatio", "direction": "descending" } + ] + } +}' +``` + + + + + +```HTTP +POST /druid/v2?pretty=null HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json +Content-Length: 817 + +{ + "queryType": "groupBy", + "dataSource": "social_media", + "dimensions": ["username"], + "granularity": "all", + "aggregations": [ + { "type": "doubleSum", "name": "upvoteSum", "fieldName": "upvotes" }, + { "type": "count", "name": "postCount", "fieldName": "post_title" } + ], + "postAggregations": [ + { + "type": "arithmetic", + "name": "upvoteToPostRatio", + "fn": "/", + "fields": [ + { "type": "fieldAccess", "name": "upvoteSum", "fieldName": "upvoteSum" }, + { "type": "fieldAccess", "name": "postCount", "fieldName": "postCount" } + ] + } + ], + "intervals": ["2022-01-01T00:00:00.000/2024-01-01T00:00:00.000"], + "limitSpec": { + "type": "default", + "limit": 1, + "columns": [ + { "dimension": "upvoteToPostRatio", "direction": "descending" } + ] + } +} +``` + + + + +#### Example response: `groupBy` + +
+ View the response + +```json +[ + { + "version": "v1", + "timestamp": "2022-01-01T00:00:00.000Z", + "event": { + "upvoteSum": 8.0419541E7, + "upvoteToPostRatio": 69.53014661762697, + "postCount": 1156614, + "username": "miette" + } + } +] +``` +
+ +## Get segment information for query + +Retrieves an array that contains objects with segment information, including the server locations associated with the query provided in the request body. + +### URL + +`POST` `/druid/v2/candidates` + +### Query parameters + +* `pretty` (optional) + * Druid returns the response in a pretty-printed format using indentation and line breaks. + +### Responses + + + + + + +*Successfully retrieved segment information* + + + + + +*Error thrown due to bad query. Returns a JSON object detailing the error with the following format:* + +```json +{ + "error": "A well-defined error code.", + "errorMessage": "A message with additional details about the error.", + "errorClass": "Class of exception that caused this error.", + "host": "The host on which the error occurred." +} +``` + +For more information on possible error messages, see [query execution failures](../querying/querying.md#query-execution-failures). + + + + +--- + +### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/v2/candidates" \ +--header 'Content-Type: application/json' \ +--data '{ + "queryType": "topN", + "dataSource": "social_media", + "dimension": "username", + "threshold": 5, + "metric": "views", + "granularity": "all", + "aggregations": [ + { + "type": "longSum", + "name": "views", + "fieldName": "views" + } + ], + "intervals": [ + "2022-01-01T00:00:00.000/2024-01-01T00:00:00.000" + ] +}' +``` + + + + + +```HTTP +POST /druid/v2/candidates HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json +Content-Length: 336 + +{ + "queryType": "topN", + "dataSource": "social_media", + "dimension": "username", + "threshold": 5, + "metric": "views", + "granularity": "all", + + "aggregations": [ + { + "type": "longSum", + "name": "views", + "fieldName": "views" + } + ], + "intervals": [ + "2020-01-01T00:00:00.000/2024-01-01T00:00:00.000" + ] +} +``` + + + + +### Sample response + +
+ View the response + + ```json +[ + { + "interval": "2023-07-03T18:00:00.000Z/2023-07-03T19:00:00.000Z", + "version": "2023-07-03T18:51:18.905Z", + "partitionNumber": 0, + "size": 21563693, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-03T19:00:00.000Z/2023-07-03T20:00:00.000Z", + "version": "2023-07-03T19:00:00.657Z", + "partitionNumber": 0, + "size": 6057236, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-05T21:00:00.000Z/2023-07-05T22:00:00.000Z", + "version": "2023-07-05T21:09:58.102Z", + "partitionNumber": 0, + "size": 223926186, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-05T21:00:00.000Z/2023-07-05T22:00:00.000Z", + "version": "2023-07-05T21:09:58.102Z", + "partitionNumber": 1, + "size": 20244827, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-05T22:00:00.000Z/2023-07-05T23:00:00.000Z", + "version": "2023-07-05T22:00:00.524Z", + "partitionNumber": 0, + "size": 104628051, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-05T22:00:00.000Z/2023-07-05T23:00:00.000Z", + "version": "2023-07-05T22:00:00.524Z", + "partitionNumber": 1, + "size": 1603995, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-05T23:00:00.000Z/2023-07-06T00:00:00.000Z", + "version": "2023-07-05T23:21:55.242Z", + "partitionNumber": 0, + "size": 181506843, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T00:00:00.000Z/2023-07-06T01:00:00.000Z", + "version": "2023-07-06T00:02:08.498Z", + "partitionNumber": 0, + "size": 9170974, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T00:00:00.000Z/2023-07-06T01:00:00.000Z", + "version": "2023-07-06T00:02:08.498Z", + "partitionNumber": 1, + "size": 23969632, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T01:00:00.000Z/2023-07-06T02:00:00.000Z", + "version": "2023-07-06T01:13:53.982Z", + "partitionNumber": 0, + "size": 599895, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T01:00:00.000Z/2023-07-06T02:00:00.000Z", + "version": "2023-07-06T01:13:53.982Z", + "partitionNumber": 1, + "size": 1627041, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T02:00:00.000Z/2023-07-06T03:00:00.000Z", + "version": "2023-07-06T02:55:50.701Z", + "partitionNumber": 0, + "size": 629753, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T02:00:00.000Z/2023-07-06T03:00:00.000Z", + "version": "2023-07-06T02:55:50.701Z", + "partitionNumber": 1, + "size": 1342360, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T04:00:00.000Z/2023-07-06T05:00:00.000Z", + "version": "2023-07-06T04:02:36.562Z", + "partitionNumber": 0, + "size": 2131434, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T05:00:00.000Z/2023-07-06T06:00:00.000Z", + "version": "2023-07-06T05:23:27.856Z", + "partitionNumber": 0, + "size": 797161, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T05:00:00.000Z/2023-07-06T06:00:00.000Z", + "version": "2023-07-06T05:23:27.856Z", + "partitionNumber": 1, + "size": 1176858, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T06:00:00.000Z/2023-07-06T07:00:00.000Z", + "version": "2023-07-06T06:46:34.638Z", + "partitionNumber": 0, + "size": 2148760, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T07:00:00.000Z/2023-07-06T08:00:00.000Z", + "version": "2023-07-06T07:38:28.050Z", + "partitionNumber": 0, + "size": 2040748, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T08:00:00.000Z/2023-07-06T09:00:00.000Z", + "version": "2023-07-06T08:27:31.407Z", + "partitionNumber": 0, + "size": 678723, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T08:00:00.000Z/2023-07-06T09:00:00.000Z", + "version": "2023-07-06T08:27:31.407Z", + "partitionNumber": 1, + "size": 1437866, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T10:00:00.000Z/2023-07-06T11:00:00.000Z", + "version": "2023-07-06T10:02:42.079Z", + "partitionNumber": 0, + "size": 1671296, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T11:00:00.000Z/2023-07-06T12:00:00.000Z", + "version": "2023-07-06T11:27:23.902Z", + "partitionNumber": 0, + "size": 574893, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T11:00:00.000Z/2023-07-06T12:00:00.000Z", + "version": "2023-07-06T11:27:23.902Z", + "partitionNumber": 1, + "size": 1427384, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T12:00:00.000Z/2023-07-06T13:00:00.000Z", + "version": "2023-07-06T12:52:00.846Z", + "partitionNumber": 0, + "size": 2115172, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T14:00:00.000Z/2023-07-06T15:00:00.000Z", + "version": "2023-07-06T14:32:33.926Z", + "partitionNumber": 0, + "size": 589108, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T14:00:00.000Z/2023-07-06T15:00:00.000Z", + "version": "2023-07-06T14:32:33.926Z", + "partitionNumber": 1, + "size": 1392649, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T15:00:00.000Z/2023-07-06T16:00:00.000Z", + "version": "2023-07-06T15:53:25.467Z", + "partitionNumber": 0, + "size": 2037851, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T16:00:00.000Z/2023-07-06T17:00:00.000Z", + "version": "2023-07-06T16:02:26.568Z", + "partitionNumber": 0, + "size": 230400650, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T16:00:00.000Z/2023-07-06T17:00:00.000Z", + "version": "2023-07-06T16:02:26.568Z", + "partitionNumber": 1, + "size": 38209056, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + }, + { + "interval": "2023-07-06T17:00:00.000Z/2023-07-06T18:00:00.000Z", + "version": "2023-07-06T17:00:02.391Z", + "partitionNumber": 0, + "size": 211099463, + "locations": [ + { + "name": "localhost:8083", + "host": "localhost:8083", + "hostAndTlsPort": null, + "maxSize": 300000000000, + "type": "historical", + "tier": "_default_tier", + "priority": 0 + } + ] + } +] + ``` +
diff --git a/docs/35.0.0/api-reference/legacy-metadata-api.md b/docs/35.0.0/api-reference/legacy-metadata-api.md new file mode 100644 index 0000000000..d22be18a7e --- /dev/null +++ b/docs/35.0.0/api-reference/legacy-metadata-api.md @@ -0,0 +1,344 @@ +--- +id: legacy-metadata-api +title: Legacy metadata API +sidebar_label: Legacy metadata +--- + + + +This document describes the legacy API endpoints to retrieve datasource metadata from Apache Druid. Use the [SQL metadata tables](../querying/sql-metadata-tables.md) to retrieve datasource metadata instead. + +## Segment loading + +`GET /druid/coordinator/v1/loadstatus` + +Returns the percentage of segments actually loaded in the cluster versus segments that should be loaded in the cluster. + +`GET /druid/coordinator/v1/loadstatus?simple` + +Returns the number of segments left to load until segments that should be loaded in the cluster are available for queries. This does not include segment replication counts. + +`GET /druid/coordinator/v1/loadstatus?full` + +Returns the number of segments left to load in each tier until segments that should be loaded in the cluster are all available. This includes segment replication counts. + +`GET /druid/coordinator/v1/loadstatus?full&computeUsingClusterView` + +Returns the number of segments not yet loaded for each tier until all segments loading in the cluster are available. +The result includes segment replication counts. It also factors in the number of available nodes that are of a service type that can load the segment when computing the number of segments remaining to load. +A segment is considered fully loaded when: +- Druid has replicated it the number of times configured in the corresponding load rule. +- Or the number of replicas for the segment in each tier where it is configured to be replicated equals the available nodes of a service type that are currently allowed to load the segment in the tier. + +`GET /druid/coordinator/v1/loadqueue` + +Returns the ids of segments to load and drop for each Historical process. + +`GET /druid/coordinator/v1/loadqueue?simple` + +Returns the number of segments to load and drop, as well as the total segment load and drop size in bytes for each Historical process. + +`GET /druid/coordinator/v1/loadqueue?full` + +Returns the serialized JSON of segments to load and drop for each Historical process. + +## Segment loading by datasource + +Note that all _interval_ query parameters are ISO 8601 strings—for example, 2016-06-27/2016-06-28. +Also note that these APIs only guarantees that the segments are available at the time of the call. +Segments can still become missing because of historical process failures or any other reasons afterward. + +`GET /druid/coordinator/v1/datasources/{dataSourceName}/loadstatus?forceMetadataRefresh={boolean}&interval={myInterval}` + +Returns the percentage of segments actually loaded in the cluster versus segments that should be loaded in the cluster for the given +datasource over the given interval (or last 2 weeks if interval is not given). `forceMetadataRefresh` is required to be set. +* Setting `forceMetadataRefresh` to true will force the coordinator to poll latest segment metadata from the metadata store +(Note: `forceMetadataRefresh=true` refreshes Coordinator's metadata cache of all datasources. This can be a heavy operation in terms +of the load on the metadata store but can be necessary to make sure that we verify all the latest segments' load status) +* Setting `forceMetadataRefresh` to false will use the metadata cached on the coordinator from the last force/periodic refresh. +If no used segments are found for the given inputs, this API returns `204 No Content` + +`GET /druid/coordinator/v1/datasources/{dataSourceName}/loadstatus?simple&forceMetadataRefresh={boolean}&interval={myInterval}` + +Returns the number of segments left to load until segments that should be loaded in the cluster are available for the given datasource +over the given interval (or last 2 weeks if interval is not given). This does not include segment replication counts. `forceMetadataRefresh` is required to be set. +* Setting `forceMetadataRefresh` to true will force the coordinator to poll latest segment metadata from the metadata store +(Note: `forceMetadataRefresh=true` refreshes Coordinator's metadata cache of all datasources. This can be a heavy operation in terms +of the load on the metadata store but can be necessary to make sure that we verify all the latest segments' load status) +* Setting `forceMetadataRefresh` to false will use the metadata cached on the coordinator from the last force/periodic refresh. +If no used segments are found for the given inputs, this API returns `204 No Content` + +`GET /druid/coordinator/v1/datasources/{dataSourceName}/loadstatus?full&forceMetadataRefresh={boolean}&interval={myInterval}` + +Returns the number of segments left to load in each tier until segments that should be loaded in the cluster are all available for the given datasource over the given interval (or last 2 weeks if interval is not given). This includes segment replication counts. `forceMetadataRefresh` is required to be set. +* Setting `forceMetadataRefresh` to true will force the coordinator to poll latest segment metadata from the metadata store +(Note: `forceMetadataRefresh=true` refreshes Coordinator's metadata cache of all datasources. This can be a heavy operation in terms +of the load on the metadata store but can be necessary to make sure that we verify all the latest segments' load status) +* Setting `forceMetadataRefresh` to false will use the metadata cached on the coordinator from the last force/periodic refresh. + +You can pass the optional query parameter `computeUsingClusterView` to factor in the available cluster services when calculating +the segments left to load. See [Coordinator Segment Loading](#segment-loading) for details. +If no used segments are found for the given inputs, this API returns `204 No Content` + +## Metadata store information + +:::info + Note: Much of this information is available in a simpler, easier-to-use form through the Druid SQL + [`sys.segments`](../querying/sql-metadata-tables.md#segments-table) table. +::: + +`GET /druid/coordinator/v1/metadata/segments` + +Returns a list of all segments for each datasource enabled in the cluster. + +`GET /druid/coordinator/v1/metadata/segments?datasources={dataSourceName1}&datasources={dataSourceName2}` + +Returns a list of all segments for one or more specific datasources enabled in the cluster. + +`GET /druid/coordinator/v1/metadata/segments?includeOvershadowedStatus` + +Returns a list of all segments for each datasource with the full segment metadata and an extra field `overshadowed`. + +`GET /druid/coordinator/v1/metadata/segments?includeOvershadowedStatus&includeRealtimeSegments` + +Returns a list of all published and realtime segments for each datasource with the full segment metadata and extra fields `overshadowed`,`realtime` & `numRows`. Realtime segments are returned only when `druid.centralizedDatasourceSchema.enabled` is set on the Coordinator. + +`GET /druid/coordinator/v1/metadata/segments?includeOvershadowedStatus&datasources={dataSourceName1}&datasources={dataSourceName2}` + +Returns a list of all segments for one or more specific datasources with the full segment metadata and an extra field `overshadowed`. + +`GET /druid/coordinator/v1/metadata/segments?includeOvershadowedStatus&includeRealtimeSegments&datasources={dataSourceName1}&datasources={dataSourceName2}` + +Returns a list of all published and realtime segments for the specified datasources with the full segment metadata and extra fields `overshadwed`,`realtime` & `numRows`. Realtime segments are returned only when `druid.centralizedDatasourceSchema.enabled` is set on the Coordinator. + +`GET /druid/coordinator/v1/metadata/datasources` + +Returns a list of the names of datasources with at least one used segment in the cluster, retrieved from the metadata database. Users should call this API to get the eventual state that the system will be in. + +`GET /druid/coordinator/v1/metadata/datasources?includeUnused` + +Returns a list of the names of datasources, regardless of whether there are used segments belonging to those datasources in the cluster or not. + +`GET /druid/coordinator/v1/metadata/datasources?includeDisabled` + +Returns a list of the names of datasources, regardless of whether the datasource is disabled or not. + +`GET /druid/coordinator/v1/metadata/datasources?full` + +Returns a list of all datasources with at least one used segment in the cluster. Returns all metadata about those datasources as stored in the metadata store. + +`GET /druid/coordinator/v1/metadata/datasources/{dataSourceName}` + +Returns full metadata for a datasource as stored in the metadata store. + +`GET /druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments` + +Returns a list of all segments for a datasource as stored in the metadata store. + +`GET /druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments?full` + +Returns a list of all segments for a datasource with the full segment metadata as stored in the metadata store. + +`GET /druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments/{segmentId}` + +Returns full segment metadata for a specific segment as stored in the metadata store, if the segment is used. If the +segment is unused, or is unknown, a 404 response is returned. + +`GET /druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments/{segmentId}?includeUnused=true` + +Returns full segment metadata for a specific segment as stored in the metadata store. If it is unknown, a 404 response +is returned. + +`GET /druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments` + +Returns a list of all segments, overlapping with any of given intervals, for a datasource as stored in the metadata store. Request body is array of string IS0 8601 intervals like `[interval1, interval2,...]`—for example, `["2012-01-01T00:00:00.000/2012-01-03T00:00:00.000", "2012-01-05T00:00:00.000/2012-01-07T00:00:00.000"]`. + +`GET /druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments?full` + +Returns a list of all segments, overlapping with any of given intervals, for a datasource with the full segment metadata as stored in the metadata store. Request body is array of string ISO 8601 intervals like `[interval1, interval2,...]`—for example, `["2012-01-01T00:00:00.000/2012-01-03T00:00:00.000", "2012-01-05T00:00:00.000/2012-01-07T00:00:00.000"]`. + +`POST /druid/coordinator/v1/metadata/dataSourceInformation` + +Returns information about the specified datasources, including the datasource schema. + +`POST /druid/coordinator/v1/metadata/bootstrapSegments` + +Returns information about bootstrap segments for all datasources. The returned set includes all broadcast segments if broadcast rules are configured. + + + +## Datasources + +Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`—for example, `2016-06-27_2016-06-28`. + +`GET /druid/coordinator/v1/datasources` + +Returns a list of datasource names found in the cluster as seen by the coordinator. This view is updated every [`druid.coordinator.period`](../configuration/index.md#coordinator-operation). + +`GET /druid/coordinator/v1/datasources?simple` + +Returns a list of JSON objects containing the name and properties of datasources found in the cluster. Properties include segment count, total segment byte size, replicated total segment byte size, minTime, and maxTime. + +`GET /druid/coordinator/v1/datasources?full` + +Returns a list of datasource names found in the cluster with all metadata about those datasources. + +`GET /druid/coordinator/v1/datasources/{dataSourceName}` + +Returns a JSON object containing the name and properties of a datasource. Properties include segment count, total segment byte size, replicated total segment byte size, minTime, and maxTime. + +`GET /druid/coordinator/v1/datasources/{dataSourceName}?full` + +Returns full metadata for a datasource. + +`GET /druid/coordinator/v1/datasources/{dataSourceName}/intervals` + +Returns a set of segment intervals. + +`GET /druid/coordinator/v1/datasources/{dataSourceName}/intervals?simple` + +Returns a map of an interval to a JSON object containing the total byte size of segments and number of segments for that interval. + +`GET /druid/coordinator/v1/datasources/{dataSourceName}/intervals?full` + +Returns a map of an interval to a map of segment metadata to a set of server names that contain the segment for that interval. + +`GET /druid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}` + +Returns a set of segment ids for an interval. + +`GET /druid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}?simple` + +Returns a map of segment intervals contained within the specified interval to a JSON object containing the total byte size of segments and number of segments for an interval. + +`GET /druid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}?full` + +Returns a map of segment intervals contained within the specified interval to a map of segment metadata to a set of server names that contain the segment for an interval. + +`GET /druid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}/serverview` + +Returns a map of segment intervals contained within the specified interval to information about the servers that contain the segment for an interval. + +`GET /druid/coordinator/v1/datasources/{dataSourceName}/segments` + +Returns a list of all segments for a datasource in the cluster. + +`GET /druid/coordinator/v1/datasources/{dataSourceName}/segments?full` + +Returns a list of all segments for a datasource in the cluster with the full segment metadata. + +`GET /druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}` + +Returns full segment metadata for a specific segment in the cluster. + +`GET /druid/coordinator/v1/datasources/{dataSourceName}/tiers` + +Return the tiers that a datasource exists in. + +## Intervals + +Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/` as in `2016-06-27_2016-06-28`. + +`GET /druid/coordinator/v1/intervals` + +Returns all intervals for all datasources with total size and count. + +`GET /druid/coordinator/v1/intervals/{interval}` + +Returns aggregated total size and count for all intervals that intersect given ISO interval. + +`GET /druid/coordinator/v1/intervals/{interval}?simple` + +Returns total size and count for each interval within given ISO interval. + +`GET /druid/coordinator/v1/intervals/{interval}?full` + +Returns total size and count for each datasource for each interval within given ISO interval. + +## Server information + +`GET /druid/coordinator/v1/servers` + +Returns a list of servers URLs using the format `{hostname}:{port}`. Note that +processes that run with different types will appear multiple times with different +ports. + +`GET /druid/coordinator/v1/servers?simple` + +Returns a list of server data objects in which each object has the following keys: +* `host`: host URL include (`{hostname}:{port}`) +* `type`: process type (`indexer-executor`, `historical`) +* `currSize`: storage size currently used +* `maxSize`: maximum storage size +* `priority` +* `tier` + + +## Query server + +This section documents the API endpoints for the services that reside on Query servers (Brokers) in the suggested [three-server configuration](../design/architecture.md#druid-servers). + +### Broker + +#### Datasource information + +Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/` +as in `2016-06-27_2016-06-28`. + +:::info + Note: Much of this information is available in a simpler, easier-to-use form through the Druid SQL + [`INFORMATION_SCHEMA.TABLES`](../querying/sql-metadata-tables.md#tables-table), + [`INFORMATION_SCHEMA.COLUMNS`](../querying/sql-metadata-tables.md#columns-table), and + [`sys.segments`](../querying/sql-metadata-tables.md#segments-table) tables. +::: + +`GET /druid/v2/datasources` + +Returns a list of queryable datasources. + +`GET /druid/v2/datasources/{dataSourceName}` + +Returns the dimensions and metrics of the datasource. Optionally, you can provide request parameter "full" to get list of served intervals with dimensions and metrics being served for those intervals. You can also provide request param "interval" explicitly to refer to a particular interval. + +If no interval is specified, a default interval spanning a configurable period before the current time will be used. The default duration of this interval is specified in ISO 8601 duration format via: `druid.query.segmentMetadata.defaultHistory` + +`GET /druid/v2/datasources/{dataSourceName}/dimensions` + +:::info + This API is deprecated and will be removed in future releases. Please use [SegmentMetadataQuery](../querying/segmentmetadataquery.md) instead + which provides more comprehensive information and supports all dataSource types including streaming dataSources. It's also encouraged to use [INFORMATION_SCHEMA tables](../querying/sql-metadata-tables.md) + if you're using SQL. +::: + +Returns the dimensions of the datasource. + +`GET /druid/v2/datasources/{dataSourceName}/metrics` + +:::info + This API is deprecated and will be removed in future releases. Please use [SegmentMetadataQuery](../querying/segmentmetadataquery.md) instead + which provides more comprehensive information and supports all dataSource types including streaming dataSources. It's also encouraged to use [INFORMATION_SCHEMA tables](../querying/sql-metadata-tables.md) + if you're using SQL. +::: + +Returns the metrics of the datasource. + +`GET /druid/v2/datasources/{dataSourceName}/candidates?intervals={comma-separated-intervals}&numCandidates={numCandidates}` + +Returns segment information lists including server locations for the given datasource and intervals. If "numCandidates" is not specified, it will return all servers for each interval. diff --git a/docs/35.0.0/api-reference/lookups-api.md b/docs/35.0.0/api-reference/lookups-api.md new file mode 100644 index 0000000000..4a122917b5 --- /dev/null +++ b/docs/35.0.0/api-reference/lookups-api.md @@ -0,0 +1,279 @@ +--- +id: lookups-api +title: Lookups API +sidebar_label: Lookups +--- + + + +This document describes the API endpoints to configure, update, retrieve, and manage lookups for Apache Druid. + +## Configure lookups + +### Bulk update + +Lookups can be updated in bulk by posting a JSON object to `/druid/coordinator/v1/lookups/config`. The format of the json object is as follows: + +```json +{ + "": { + "": { + "version": "", + "lookupExtractorFactory": { + "type": "", + "": "" + } + } + } +} +``` + +Note that "version" is an arbitrary string assigned by the user, when making updates to existing lookup then user would need to specify a lexicographically higher version. + +For example, a config might look something like: + +```json +{ + "__default": { + "country_code": { + "version": "v0", + "lookupExtractorFactory": { + "type": "map", + "map": { + "77483": "United States" + } + } + }, + "site_id": { + "version": "v0", + "lookupExtractorFactory": { + "type": "cachedNamespace", + "extractionNamespace": { + "type": "jdbc", + "connectorConfig": { + "createTables": true, + "connectURI": "jdbc:mysql:\/\/localhost:3306\/druid", + "user": "druid", + "password": "diurd" + }, + "table": "lookupTable", + "keyColumn": "country_id", + "valueColumn": "country_name", + "tsColumn": "timeColumn" + }, + "firstCacheTimeout": 120000, + "injective": true + } + }, + "site_id_customer1": { + "version": "v0", + "lookupExtractorFactory": { + "type": "map", + "map": { + "847632": "Internal Use Only" + } + } + }, + "site_id_customer2": { + "version": "v0", + "lookupExtractorFactory": { + "type": "map", + "map": { + "AHF77": "Home" + } + } + } + }, + "realtime_customer1": { + "country_code": { + "version": "v0", + "lookupExtractorFactory": { + "type": "map", + "map": { + "77483": "United States" + } + } + }, + "site_id_customer1": { + "version": "v0", + "lookupExtractorFactory": { + "type": "map", + "map": { + "847632": "Internal Use Only" + } + } + } + }, + "realtime_customer2": { + "country_code": { + "version": "v0", + "lookupExtractorFactory": { + "type": "map", + "map": { + "77483": "United States" + } + } + }, + "site_id_customer2": { + "version": "v0", + "lookupExtractorFactory": { + "type": "map", + "map": { + "AHF77": "Home" + } + } + } + } +} +``` + +All entries in the map will UPDATE existing entries. No entries will be deleted. + +### Update lookup + +A `POST` to a particular lookup extractor factory via `/druid/coordinator/v1/lookups/config/{tier}/{id}` creates or updates that specific extractor factory. + +For example, a post to `/druid/coordinator/v1/lookups/config/realtime_customer1/site_id_customer1` might contain the following: + +```json +{ + "version": "v1", + "lookupExtractorFactory": { + "type": "map", + "map": { + "847632": "Internal Use Only" + } + } +} +``` + +This will replace the `site_id_customer1` lookup in the `realtime_customer1` with the definition above. + +Assign a unique version identifier each time you update a lookup extractor factory. Otherwise the call will fail. + +### Get all lookups + +A `GET` to `/druid/coordinator/v1/lookups/config/all` will return all known lookup specs for all tiers. + +### Get lookup + +A `GET` to a particular lookup extractor factory is accomplished via `/druid/coordinator/v1/lookups/config/{tier}/{id}` + +Using the prior example, a `GET` to `/druid/coordinator/v1/lookups/config/realtime_customer2/site_id_customer2` should return + +```json +{ + "version": "v1", + "lookupExtractorFactory": { + "type": "map", + "map": { + "AHF77": "Home" + } + } +} +``` + +### Delete lookup + +A `DELETE` to `/druid/coordinator/v1/lookups/config/{tier}/{id}` will remove that lookup from the cluster. If it was last lookup in the tier, then tier is deleted as well. + +### Delete tier + +A `DELETE` to `/druid/coordinator/v1/lookups/config/{tier}` will remove that tier from the cluster. + +### List tier names + +A `GET` to `/druid/coordinator/v1/lookups/config` will return a list of known tier names in the dynamic configuration. +To discover a list of tiers currently active in the cluster in addition to ones known in the dynamic configuration, the parameter `discover=true` can be added as per `/druid/coordinator/v1/lookups/config?discover=true`. + +### List lookup names + +A `GET` to `/druid/coordinator/v1/lookups/config/{tier}` will return a list of known lookup names for that tier. + +These end points can be used to get the propagation status of configured lookups to processes using lookups such as Historicals. + +## Lookup status + +### List load status of all lookups + +`GET` `/druid/coordinator/v1/lookups/status` with optional query parameter `detailed`. + +### List load status of lookups in a tier + +`GET` `/druid/coordinator/v1/lookups/status/{tier}` with optional query parameter `detailed`. + +### List load status of single lookup + +`GET` `/druid/coordinator/v1/lookups/status/{tier}/{lookup}` with optional query parameter `detailed`. + +### List lookup state of all processes + +`GET` `/druid/coordinator/v1/lookups/nodeStatus` with optional query parameter `discover` to discover tiers advertised by other Druid nodes, or by default, returning all configured lookup tiers. The default response will also include the lookups which are loaded, being loaded, or being dropped on each node, for each tier, including the complete lookup spec. Add the optional query parameter `detailed=false` to only include the 'version' of the lookup instead of the complete spec. + +### List lookup state of processes in a tier + +`GET` `/druid/coordinator/v1/lookups/nodeStatus/{tier}` + +### List lookup state of single process + +`GET` `/druid/coordinator/v1/lookups/nodeStatus/{tier}/{host:port}` + +## Internal API + +The Peon, Router, Broker, and Historical processes all have the ability to consume lookup configuration. +There is an internal API these processes use to list/load/drop their lookups starting at `/druid/listen/v1/lookups`. +These follow the same convention for return values as the cluster wide dynamic configuration. Following endpoints +can be used for debugging purposes but not otherwise. + +### Get lookups + +A `GET` to the process at `/druid/listen/v1/lookups` will return a json map of all the lookups currently active on the process. +The return value will be a json map of the lookups to their extractor factories. + +```json +{ + "site_id_customer2": { + "version": "v1", + "lookupExtractorFactory": { + "type": "map", + "map": { + "AHF77": "Home" + } + } + } +} +``` + +### Get lookup + +A `GET` to the process at `/druid/listen/v1/lookups/some_lookup_name` will return the LookupExtractorFactory for the lookup identified by `some_lookup_name`. +The return value will be the json representation of the factory. + +```json +{ + "version": "v1", + "lookupExtractorFactory": { + "type": "map", + "map": { + "AHF77": "Home" + } + } +} +``` \ No newline at end of file diff --git a/docs/35.0.0/api-reference/retention-rules-api.md b/docs/35.0.0/api-reference/retention-rules-api.md new file mode 100644 index 0000000000..c21e546abd --- /dev/null +++ b/docs/35.0.0/api-reference/retention-rules-api.md @@ -0,0 +1,562 @@ +--- +id: retention-rules-api +title: Retention rules API +sidebar_label: Retention rules +--- +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + + + +This topic describes the API endpoints for managing retention rules in Apache Druid. You can configure retention rules in the Druid web console or API. + +Druid uses retention rules to determine what data is retained in the cluster. Druid supports load, drop, and broadcast rules. For more information, see [Using rules to drop and retain data](../operations/rule-configuration.md). + +In this topic, `http://ROUTER_IP:ROUTER_PORT` is a placeholder for your Router service address and port. Replace it with the information for your deployment. For example, use `http://localhost:8888` for quickstart deployments. + +## Update retention rules for a datasource + +Updates one or more retention rules for a datasource. The request body takes an array of retention rule objects. For details on defining retention rules, see the following sources: + +* [Load rules](../operations/rule-configuration.md#load-rules) +* [Drop rules](../operations/rule-configuration.md#drop-rules) +* [Broadcast rules](../operations/rule-configuration.md#broadcast-rules) + +This request overwrites any existing rules for the datasource. +Druid reads rules in the order in which they appear; for more information, see [rule structure](../operations/rule-configuration.md). + +Note that this endpoint returns an HTTP `200 OK` even if the datasource does not exist. + +### URL + +`POST` `/druid/coordinator/v1/rules/{dataSource}` + +### Header parameters + +The endpoint supports a set of optional header parameters to populate the `author` and `comment` fields in the `auditInfo` property for audit history. + +* `X-Druid-Author` (optional) + * Type: String + * A string representing the author making the configuration change. +* `X-Druid-Comment` (optional) + * Type: String + * A string describing the update. + +### Responses + + + + + + +*Successfully updated retention rules for specified datasource* + + + + +--- + +### Sample request + +The following example sets a set of broadcast, load, and drop retention rules for the `kttm1` datasource. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/rules/kttm1" \ +--header 'X-Druid-Author: doc intern' \ +--header 'X-Druid-Comment: submitted via api' \ +--header 'Content-Type: application/json' \ +--data '[ + { + "type": "broadcastForever" + }, + { + "type": "loadForever", + "tieredReplicants": { + "_default_tier": 2 + }, + "useDefaultTierForNull": true + }, + { + "type": "dropByPeriod", + "period": "P1M" + } +]' +``` + + + + + +```HTTP +POST /druid/coordinator/v1/rules/kttm1 HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +X-Druid-Author: doc intern +X-Druid-Comment: submitted via api +Content-Type: application/json +Content-Length: 273 + +[ + { + "type": "broadcastForever" + }, + { + "type": "loadForever", + "tieredReplicants": { + "_default_tier": 1 + }, + "useDefaultTierForNull": true + }, + { + "type": "dropByPeriod", + "period": "P1M" + } +] +``` + + + + +### Sample response + +A successful request returns an HTTP `200 OK` message code and an empty response body. + +## Update default retention rules for all datasources + +Updates one or more default retention rules for all datasources. Submit retention rules as an array of objects in the request body. For details on defining retention rules, see the following sources: + +* [Load rules](../operations/rule-configuration.md#load-rules) +* [Drop rules](../operations/rule-configuration.md#drop-rules) +* [Broadcast rules](../operations/rule-configuration.md#broadcast-rules) + +This request overwrites any existing rules for all datasources. To remove default retention rules for all datasources, submit an empty rule array in the request body. Rules are read in the order in which they appear; for more information, see [rule structure](../operations/rule-configuration.md). + +### URL + +`POST` `/druid/coordinator/v1/rules/_default` + +### Header parameters + +The endpoint supports a set of optional header parameters to populate the `author` and `comment` fields in the `auditInfo` property for audit history. + +* `X-Druid-Author` (optional) + * Type: String + * A string representing the author making the configuration change. +* `X-Druid-Comment` (optional) + * Type: String + * A string describing the update. + +### Responses + + + + + + +*Successfully updated default retention rules* + + + + + +*Error with request body* + + + + +--- + +### Sample request + +The following example updates the default retention rule for all datasources with a `loadByInterval` rule. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/rules/_default" \ +--header 'Content-Type: application/json' \ +--data '[ + { + "type": "loadByInterval", + "tieredReplicants": {}, + "useDefaultTierForNull": false, + "interval": "2010-01-01/2020-01-01" + } +]' +``` + + + + + +```HTTP +POST /druid/coordinator/v1/rules/_default HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json +Content-Length: 205 + +[ + { + "type": "loadByInterval", + "tieredReplicants": {}, + "useDefaultTierForNull": false, + "interval": "2010-01-01/2020-01-01" + } +] +``` + + + + +### Sample response + +A successful request returns an HTTP `200 OK` message code and an empty response body. + +## Get an array of all retention rules + +Retrieves all current retention rules in the cluster including the default retention rule. Returns an array of objects for each datasource and their associated retention rules. + +### URL + +`GET` `/druid/coordinator/v1/rules` + +### Responses + + + + + + +*Successfully retrieved retention rules* + + + + +--- + +### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/rules" +``` + + + + + +```HTTP +GET /druid/coordinator/v1/rules HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +### Sample response + +
+ View the response + + ```json +{ + "_default": [ + { + "tieredReplicants": { + "_default_tier": 2 + }, + "type": "loadForever" + } + ], + "social_media": [ + { + "interval": "2023-01-01T00:00:00.000Z/2023-02-01T00:00:00.000Z", + "type": "dropByInterval" + } + ], + "wikipedia_api": [], +} + ``` +
+ +## Get an array of retention rules for a datasource + +Retrieves an array of rule objects for a single datasource. Returns an empty array if there are no retention rules. + +Note that this endpoint returns an HTTP `200 OK` message code even if the datasource doesn't exist. + +### URL + +`GET` `/druid/coordinator/v1/rules/{dataSource}` + +### Query parameters + +* `full` (optional) + * Includes the default retention rule for the datasource in the response. + +### Responses + + + + + + +*Successfully retrieved retention rules* + + + + +--- + +### Sample request + +The following example retrieves the custom retention rules and default retention rules for datasource with the name `social_media`. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/rules/social_media?full=null" +``` + + + + + +```HTTP +GET /druid/coordinator/v1/rules/social_media?full=null HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +### Sample response + +
+ View the response + + ```json +[ + { + "interval": "2020-01-01T00:00:00.000Z/2022-02-01T00:00:00.000Z", + "type": "dropByInterval" + }, + { + "interval": "2010-01-01T00:00:00.000Z/2020-01-01T00:00:00.000Z", + "tieredReplicants": { + "_default_tier": 2 + }, + "type": "loadByInterval" + }, + { + "tieredReplicants": { + "_default_tier": 2 + }, + "type": "loadForever" + } +] + ``` + +
+ +## Get audit history for all datasources + +Retrieves the audit history of rules for all datasources over an interval of time. The default interval is 1 week. You can change this period by setting `druid.audit.manager.auditHistoryMillis` in the `runtime.properties` file for the Coordinator. + +### URL + +`GET` `/druid/coordinator/v1/rules/history` + +### Query parameters + +Note that the following query parameters cannot be chained. + +* `interval` (optional) + * Type: ISO 8601. + * Limits the number of results to the specified time interval. Delimit with `/`. For example, `2023-07-13/2023-07-19`. +* `count` (optional) + * Type: Int + * Limits the number of results to the last `n` entries. + +### Responses + + + + + + +*Successfully retrieved audit history* + + + + + +*Request in the incorrect format* + + + + + +*`count` query parameter too large* + + + + +--- + +### Sample request + +The following example retrieves the audit history for all datasources from `2023-07-13` to `2023-07-19`. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/rules/history?interval=2023-07-13%2F2023-07-19" +``` + + + + + +```HTTP +GET /druid/coordinator/v1/rules/history?interval=2023-07-13/2023-07-19 HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +### Sample response + +
+ View the response + + ```json +[ + { + "key": "social_media", + "type": "rules", + "auditInfo": { + "author": "console", + "comment": "test", + "ip": "127.0.0.1" + }, + "payload": "[{\"interval\":\"2023-01-01T00:00:00.000Z/2023-02-01T00:00:00.000Z\",\"type\":\"dropByInterval\"}]", + "auditTime": "2023-07-13T18:05:33.066Z" + }, + { + "key": "social_media", + "type": "rules", + "auditInfo": { + "author": "console", + "comment": "test", + "ip": "127.0.0.1" + }, + "payload": "[]", + "auditTime": "2023-07-18T18:10:21.203Z" + }, + { + "key": "wikipedia_api", + "type": "rules", + "auditInfo": { + "author": "console", + "comment": "test", + "ip": "127.0.0.1" + }, + "payload": "[{\"tieredReplicants\":{\"_default_tier\":2},\"type\":\"loadForever\"}]", + "auditTime": "2023-07-18T18:10:44.519Z" + }, + { + "key": "wikipedia_api", + "type": "rules", + "auditInfo": { + "author": "console", + "comment": "test", + "ip": "127.0.0.1" + }, + "payload": "[]", + "auditTime": "2023-07-18T18:11:02.110Z" + }, + { + "key": "social_media", + "type": "rules", + "auditInfo": { + "author": "console", + "comment": "test", + "ip": "127.0.0.1" + }, + "payload": "[{\"interval\":\"2023-07-03T18:49:54.848Z/2023-07-03T18:49:55.861Z\",\"type\":\"dropByInterval\"}]", + "auditTime": "2023-07-18T18:32:50.060Z" + }, + { + "key": "social_media", + "type": "rules", + "auditInfo": { + "author": "console", + "comment": "test", + "ip": "127.0.0.1" + }, + "payload": "[{\"interval\":\"2020-01-01T00:00:00.000Z/2022-02-01T00:00:00.000Z\",\"type\":\"dropByInterval\"}]", + "auditTime": "2023-07-18T18:34:09.657Z" + }, + { + "key": "social_media", + "type": "rules", + "auditInfo": { + "author": "console", + "comment": "test", + "ip": "127.0.0.1" + }, + "payload": "[{\"interval\":\"2020-01-01T00:00:00.000Z/2022-02-01T00:00:00.000Z\",\"type\":\"dropByInterval\"},{\"tieredReplicants\":{\"_default_tier\":2},\"type\":\"loadForever\"}]", + "auditTime": "2023-07-18T18:38:37.223Z" + }, + { + "key": "social_media", + "type": "rules", + "auditInfo": { + "author": "console", + "comment": "test", + "ip": "127.0.0.1" + }, + "payload": "[{\"interval\":\"2020-01-01T00:00:00.000Z/2022-02-01T00:00:00.000Z\",\"type\":\"dropByInterval\"},{\"interval\":\"2010-01-01T00:00:00.000Z/2020-01-01T00:00:00.000Z\",\"tieredReplicants\":{\"_default_tier\":2},\"type\":\"loadByInterval\"}]", + "auditTime": "2023-07-18T18:49:43.964Z" + } +] + ``` +
diff --git a/docs/35.0.0/api-reference/service-status-api.md b/docs/35.0.0/api-reference/service-status-api.md new file mode 100644 index 0000000000..47d2a5a6d3 --- /dev/null +++ b/docs/35.0.0/api-reference/service-status-api.md @@ -0,0 +1,1469 @@ +--- +id: service-status-api +title: Service status API +sidebar_label: Service status +--- +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + + + + +This document describes the API endpoints to retrieve service status, cluster information for Apache Druid. + +In this document, `http://SERVICE_IP:SERVICE_PORT` is a placeholder for the server address of deployment and the service port. For example, on the quickstart configuration, replace `http://ROUTER_IP:ROUTER_PORT` with `http://localhost:8888`. + +## Common + +All services support the following endpoints. + +You can use each endpoint with the ports for each type of service. The following table contains port addresses for a local configuration: + +|Service|Port address| +| ------ | ------------ | +| Coordinator|8081| +| Overlord|8081| +| Router|8888| +| Broker|8082| +| Historical|8083| +| Middle Manager|8091| + +### Get service information + +Retrieves the Druid version, loaded extensions, memory used, total memory, and other useful information about the individual service. + +Modify the host and port for the endpoint to match the service to query. Refer to the [default service ports](#common) for the port numbers. + +#### URL + +`GET` `/status` + +#### Responses + + + + + + +
+ +*Successfully retrieved service information* + +
+
+ +--- + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/status" +``` + + + + + +```http +GET /status HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + { + "version": "26.0.0", + "modules": [ + { + "name": "org.apache.druid.common.aws.AWSModule", + "artifact": "druid-aws-common", + "version": "26.0.0" + }, + { + "name": "org.apache.druid.common.gcp.GcpModule", + "artifact": "druid-gcp-common", + "version": "26.0.0" + }, + { + "name": "org.apache.druid.storage.hdfs.HdfsStorageDruidModule", + "artifact": "druid-hdfs-storage", + "version": "26.0.0" + }, + { + "name": "org.apache.druid.indexing.kafka.KafkaIndexTaskModule", + "artifact": "druid-kafka-indexing-service", + "version": "26.0.0" + }, + { + "name": "org.apache.druid.query.aggregation.datasketches.theta.SketchModule", + "artifact": "druid-datasketches", + "version": "26.0.0" + }, + { + "name": "org.apache.druid.query.aggregation.datasketches.theta.oldapi.OldApiSketchModule", + "artifact": "druid-datasketches", + "version": "26.0.0" + }, + { + "name": "org.apache.druid.query.aggregation.datasketches.quantiles.DoublesSketchModule", + "artifact": "druid-datasketches", + "version": "26.0.0" + }, + { + "name": "org.apache.druid.query.aggregation.datasketches.tuple.ArrayOfDoublesSketchModule", + "artifact": "druid-datasketches", + "version": "26.0.0" + }, + { + "name": "org.apache.druid.query.aggregation.datasketches.hll.HllSketchModule", + "artifact": "druid-datasketches", + "version": "26.0.0" + }, + { + "name": "org.apache.druid.query.aggregation.datasketches.kll.KllSketchModule", + "artifact": "druid-datasketches", + "version": "26.0.0" + }, + { + "name": "org.apache.druid.msq.guice.MSQExternalDataSourceModule", + "artifact": "druid-multi-stage-query", + "version": "26.0.0" + }, + { + "name": "org.apache.druid.msq.guice.MSQIndexingModule", + "artifact": "druid-multi-stage-query", + "version": "26.0.0" + }, + { + "name": "org.apache.druid.msq.guice.MSQDurableStorageModule", + "artifact": "druid-multi-stage-query", + "version": "26.0.0" + }, + { + "name": "org.apache.druid.msq.guice.MSQServiceClientModule", + "artifact": "druid-multi-stage-query", + "version": "26.0.0" + }, + { + "name": "org.apache.druid.msq.guice.MSQSqlModule", + "artifact": "druid-multi-stage-query", + "version": "26.0.0" + }, + { + "name": "org.apache.druid.msq.guice.SqlTaskModule", + "artifact": "druid-multi-stage-query", + "version": "26.0.0" + } + ], + "memory": { + "maxMemory": 268435456, + "totalMemory": 268435456, + "freeMemory": 139060688, + "usedMemory": 129374768, + "directMemory": 134217728 + } + } + ``` +
+ +### Get service health + +Retrieves the online status of the individual Druid service. It is a simple health check to determine if the service is running and accessible. If online, it will always return a boolean `true` value, indicating that the service can receive API calls. This endpoint is suitable for automated health checks. + +Modify the host and port for the endpoint to match the service to query. Refer to the [default service ports](#common) for the port numbers. + +Additional checks for readiness should use the [Historical segment readiness](#get-segment-readiness) and [Broker query readiness](#get-broker-query-readiness) endpoints. + +#### URL + +`GET` `/status/health` + +#### Responses + + + + + + +
+ +*Successfully retrieved service health* + +
+
+ +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/status/health" +``` + + + + + +```http +GET /status/health HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + true + ``` + +
+ + +### Get configuration properties + +Retrieves the current configuration properties of the individual service queried. + +Modify the host and port for the endpoint to match the service to query. Refer to the [default service ports](#common) for the port numbers. + +#### URL + +`GET` `/status/properties` + +#### Responses + + + + + + +
+ +*Successfully retrieved service configuration properties* + +
+
+ +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/status/properties" +``` + + + + + +```http +GET /status/properties HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + { +{ + "gopherProxySet": "false", + "awt.toolkit": "sun.lwawt.macosx.LWCToolkit", + "druid.monitoring.monitors": "[\"org.apache.druid.java.util.metrics.JvmMonitor\"]", + "java.specification.version": "11", + "sun.cpu.isalist": "", + "druid.plaintextPort": "8888", + "sun.jnu.encoding": "UTF-8", + "druid.indexing.doubleStorage": "double", + "druid.metadata.storage.connector.port": "1527", + "java.class.path": "/Users/genericUserPath", + "log4j.shutdownHookEnabled": "true", + "java.vm.vendor": "Homebrew", + "sun.arch.data.model": "64", + "druid.extensions.loadList": "[\"druid-hdfs-storage\", \"druid-kafka-indexing-service\", \"druid-datasketches\", \"druid-multi-stage-query\"]", + "java.vendor.url": "https://github.com/Homebrew/homebrew-core/issues", + "druid.router.coordinatorServiceName": "druid/coordinator", + "user.timezone": "UTC", + "druid.global.http.eagerInitialization": "false", + "os.name": "Mac OS X", + "java.vm.specification.version": "11", + "sun.java.launcher": "SUN_STANDARD", + "user.country": "US", + "sun.boot.library.path": "/opt/homebrew/Cellar/openjdk@11/11.0.19/libexec/openjdk.jdk/Contents/Home/lib", + "sun.java.command": "org.apache.druid.cli.Main server router", + "http.nonProxyHosts": "local|*.local|169.254/16|*.169.254/16", + "jdk.debug": "release", + "druid.metadata.storage.connector.host": "localhost", + "sun.cpu.endian": "little", + "druid.zk.paths.base": "/druid", + "user.home": "/Users/genericUser", + "user.language": "en", + "java.specification.vendor": "Oracle Corporation", + "java.version.date": "2023-04-18", + "java.home": "/opt/homebrew/Cellar/openjdk@11/11.0.19/libexec/openjdk.jdk/Contents/Home", + "druid.service": "druid/router", + "druid.selectors.coordinator.serviceName": "druid/coordinator", + "druid.metadata.storage.connector.connectURI": "jdbc:derby://localhost:1527/var/druid/metadata.db;create=true", + "file.separator": "/", + "druid.selectors.indexing.serviceName": "druid/overlord", + "java.vm.compressedOopsMode": "Zero based", + "druid.metadata.storage.type": "derby", + "line.separator": "\n", + "druid.log.path": "/Users/genericUserPath", + "java.vm.specification.vendor": "Oracle Corporation", + "java.specification.name": "Java Platform API Specification", + "druid.indexer.logs.directory": "var/druid/indexing-logs", + "java.awt.graphicsenv": "sun.awt.CGraphicsEnvironment", + "druid.router.defaultBrokerServiceName": "druid/broker", + "druid.storage.storageDirectory": "var/druid/segments", + "sun.management.compiler": "HotSpot 64-Bit Tiered Compilers", + "ftp.nonProxyHosts": "local|*.local|169.254/16|*.169.254/16", + "java.runtime.version": "11.0.19+0", + "user.name": "genericUser", + "druid.indexer.logs.type": "file", + "druid.host": "localhost", + "log4j2.is.webapp": "false", + "path.separator": ":", + "os.version": "12.6.5", + "druid.lookup.enableLookupSyncOnStartup": "false", + "java.runtime.name": "OpenJDK Runtime Environment", + "druid.zk.service.host": "localhost", + "file.encoding": "UTF-8", + "druid.sql.planner.useGroupingSetForExactDistinct": "true", + "druid.router.managementProxy.enabled": "true", + "java.vm.name": "OpenJDK 64-Bit Server VM", + "java.vendor.version": "Homebrew", + "druid.startup.logging.logProperties": "true", + "java.vendor.url.bug": "https://github.com/Homebrew/homebrew-core/issues", + "log4j.shutdownCallbackRegistry": "org.apache.druid.common.config.Log4jShutdown", + "java.io.tmpdir": "var/tmp", + "druid.sql.enable": "true", + "druid.emitter.logging.logLevel": "info", + "java.version": "11.0.19", + "user.dir": "/Users/genericUser/Downloads/apache-druid-26.0.0", + "os.arch": "aarch64", + "java.vm.specification.name": "Java Virtual Machine Specification", + "druid.node.type": "router", + "java.awt.printerjob": "sun.lwawt.macosx.CPrinterJob", + "sun.os.patch.level": "unknown", + "java.util.logging.manager": "org.apache.logging.log4j.jul.LogManager", + "java.library.path": "/Users/genericUserPath", + "java.vendor": "Homebrew", + "java.vm.info": "mixed mode", + "java.vm.version": "11.0.19+0", + "druid.emitter": "noop", + "sun.io.unicode.encoding": "UnicodeBig", + "druid.storage.type": "local", + "java.class.version": "55.0", + "socksNonProxyHosts": "local|*.local|169.254/16|*.169.254/16", + "druid.server.hiddenProperties": "[\"druid.s3.accessKey\",\"druid.s3.secretKey\",\"druid.metadata.storage.connector.password\", \"password\", \"key\", \"token\", \"pwd\"]" +} +``` + +
+ +### Get node discovery status and cluster integration confirmation + +Retrieves a JSON map of the form `{"selfDiscovered": true/false}`, indicating whether the node has received a confirmation from the central node discovery mechanism (currently ZooKeeper) of the Druid cluster that the node has been added to the cluster. + +Only consider a Druid node "healthy" or "ready" in automated deployment/container management systems when this endpoint returns `{"selfDiscovered": true}`. Nodes experiencing network issues may become isolated and are not healthy. +For nodes that use Zookeeper segment discovery, a response of `{"selfDiscovered": true}` indicates that the node's Zookeeper client has started receiving data from the Zookeeper cluster, enabling timely discovery of segments and other nodes. + +#### URL + +`GET` `/status/selfDiscovered/status` + +#### Responses + + + + + + +
+ +*Node was successfully added to the cluster* + +
+
+ +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/status/selfDiscovered/status" +``` + + + + + +```http +GET /status/selfDiscovered/status HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + { + "selfDiscovered": true + } + ``` + +
+ + +### Get node self-discovery status + +Returns an HTTP status code to indicate node discovery within the Druid cluster. This endpoint is similar to the `status/selfDiscovered/status` endpoint, but relies on HTTP status codes alone. +Use this endpoint for monitoring checks that are unable to examine the response body. For example, AWS load balancer health checks. + +#### URL + +`GET` `/status/selfDiscovered` + +#### Responses + + + + + + +
+ +*Successfully retrieved node status* + +
+ + + +
+ +*Unsuccessful node self-discovery* + +
+
+ +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/status/selfDiscovered" +``` + + + + + +```http +GET /status/selfDiscovered HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +A successful response to this endpoint results in an empty response body. + +## Coordinator + +### Get Coordinator leader address + +Retrieves the address of the current leader Coordinator of the cluster. If any request is sent to a non-leader Coordinator, the request is automatically redirected to the leader Coordinator. + +#### URL + +`GET` `/druid/coordinator/v1/leader` + +#### Responses + + + + + + +
+ +*Successfully retrieved leader Coordinator address* + +
+
+ +--- + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/leader" +``` + + + + + +```http +GET /druid/coordinator/v1/leader HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + http://localhost:8081 + ``` + +
+ +### Get Coordinator leader status + +Retrieves a JSON object with a `leader` key. Returns `true` if this server is the current leader Coordinator of the cluster. To get the individual address of the leader Coordinator node, see the [leader endpoint](#get-coordinator-leader-address). + +Use this endpoint as a load balancer status check when you only want the active leader to be considered in-service at the load balancer. + +#### URL + +`GET` `/druid/coordinator/v1/isLeader` + +#### Responses + + + + + + +
+ +*Current server is the leader* + +
+ + + +
+ +*Current server is not the leader* + +
+
+ +--- + +#### Sample request + + + + + + +```shell +curl "http://COORDINATOR_IP:COORDINATOR_PORT/druid/coordinator/v1/isLeader" +``` + + + + + +```http +GET /druid/coordinator/v1/isLeader HTTP/1.1 +Host: http://COORDINATOR_IP:COORDINATOR_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + { + "leader": true + } + ``` + +
+ + +### Get Historical Cloning Status + +Retrieves the current status of Historical cloning from the Coordinator. + +#### URL + +`GET` `/druid/coordinator/v1/config/cloneStatus` + +#### Responses + + + + + + +
+ +*Successfully retrieved cloning status* + +
+
+ +#### Sample request + + + + + + +```shell +curl "http://COORDINATOR_IP:COORDINATOR_PORT/druid/coordinator/v1/config/cloneStatus" +``` + + + + + +```http +GET /druid/coordinator/v1/config/cloneStatus HTTP/1.1 +Host: http://COORDINATOR_IP:COORDINATOR_PORT +``` + + + + +#### Sample response + +
+ View the response + +```json +{ + "cloneStatus": [ + { + "sourceServer": "localhost:8089", + "targetServer": "localhost:8083", + "state": "IN_PROGRESS", + "segmentLoadsRemaining": 0, + "segmentDropsRemaining": 0, + "bytesToLoad": 0 + } + ] +} +``` + +
+ +### Get Broker dynamic configuration view + +Retrieves the list of Brokers which have an up-to-date view of Coordinator dynamic configuration. + +#### URL + +`GET` `/druid/coordinator/v1/config/syncedBrokers` + +#### Responses + + + + + + +
+ +*Successfully retrieved Broker Configuration view* + +
+
+ +#### Sample request + + + + + + +```shell +curl "http://COORDINATOR_IP:COORDINATOR_PORT/druid/coordinator/v1/config/syncedBrokers" +``` + + + + + +```http +GET /druid/coordinator/v1/config/syncedBrokers HTTP/1.1 +Host: http://COORDINATOR_IP:COORDINATOR_PORT +``` + + + + +#### Sample response + +
+ View the response + +```json +{ + "syncedBrokers": [ + { + "host": "localhost", + "port": 8082, + "lastSyncTimestampMillis": 1745756337472 + } + ] +} +``` + +
+ +## Overlord + +### Get Overlord leader address + +Retrieves the address of the current leader Overlord of the cluster. In a cluster of multiple Overlords, only one Overlord assumes the leading role, while the remaining Overlords remain on standby. + +#### URL + +`GET` `/druid/indexer/v1/leader` + +#### Responses + + + + + + +
+ +*Successfully retrieved leader Overlord address* + +
+
+ +--- + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/leader" +``` + + + + + +```http +GET /druid/indexer/v1/leader HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + http://localhost:8081 + ``` + +
+ + +### Get Overlord leader status + +Retrieves a JSON object with a `leader` property. The value can be `true` or `false`, indicating if this server is the current leader Overlord of the cluster. To get the individual address of the leader Overlord node, see the [leader endpoint](#get-overlord-leader-address). + +Use this endpoint as a load balancer status check when you only want the active leader to be considered in-service at the load balancer. + +#### URL + +`GET` `/druid/indexer/v1/isLeader` + +#### Responses + + + + + + +
+ +*Current server is the leader* + +
+ + + +
+ +*Current server is not the leader* + +
+
+ +--- + +#### Sample request + + + + + + +```shell +curl "http://OVERLORD_IP:OVERLORD_PORT/druid/indexer/v1/isLeader" +``` + + + + + +```http +GET /druid/indexer/v1/isLeader HTTP/1.1 +Host: http://OVERLORD_IP:OVERLORD_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + { + "leader": true + } + ``` + +
+ + +## Middle Manager + +### Get Middle Manager state status + +Retrieves the enabled state of the Middle Manager process. Returns JSON object keyed by the combined `druid.host` and `druid.port` with a boolean `true` or `false` state as the value. + +#### URL + +`GET` `/druid/worker/v1/enabled` + +#### Responses + + + + + + +
+ +*Successfully retrieved Middle Manager state* + +
+
+ +--- + +#### Sample request + + + + + + +```shell +curl "http://MIDDLEMANAGER_IP:MIDDLEMANAGER_PORT/druid/worker/v1/enabled" +``` + + + + + +```http +GET /druid/worker/v1/enabled HTTP/1.1 +Host: http://MIDDLEMANAGER_IP:MIDDLEMANAGER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + { + "localhost:8091": true + } + ``` + +
+ +### Get active tasks + +Retrieves a list of active tasks being run on the Middle Manager. Returns JSON list of task ID strings. Note that for normal usage, you should use the `/druid/indexer/v1/tasks` [Tasks API](./tasks-api.md) endpoint or one of the task state specific variants instead. + +#### URL + +`GET` `/druid/worker/v1/tasks` + +#### Responses + + + + + + +
+ +*Successfully retrieved active tasks* + +
+
+ +--- + +#### Sample request + + + + + + +```shell +curl "http://MIDDLEMANAGER_IP:MIDDLEMANAGER_PORT/druid/worker/v1/tasks" +``` + + + + + +```http +GET /druid/worker/v1/tasks HTTP/1.1 +Host: http://MIDDLEMANAGER_IP:MIDDLEMANAGER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + [ + "index_parallel_wikipedia_mgchefio_2023-06-13T22:18:05.360Z" + ] + ``` + +
+ +### Get task log + +Retrieves task log output stream by task ID. For normal usage, you should use the `/druid/indexer/v1/task/{taskId}/log` +[Tasks API](./tasks-api.md) endpoint instead. + +#### URL + +`GET` `/druid/worker/v1/task/{taskId}/log` + +### Shut down running task + +Shuts down a running task by ID. For normal usage, you should use the `/druid/indexer/v1/task/{taskId}/shutdown` +[Tasks API](./tasks-api.md) endpoint instead. + +#### URL + +`POST` `/druid/worker/v1/task/{taskId}/shutdown` + +#### Responses + + + + + +
+ +*Successfully shut down a task* + +
+
+ +--- + +#### Sample request + +The following example shuts down a task with specified ID `index_kafka_wikiticker_f7011f8ffba384b_fpeclode`. + + + + + + +```shell +curl "http://MIDDLEMANAGER_IP:MIDDLEMANAGER_PORT/druid/worker/v1/task/index_kafka_wikiticker_f7011f8ffba384b_fpeclode/shutdown" +``` + + + + + +```http +POST /druid/worker/v1/task/index_kafka_wikiticker_f7011f8ffba384b_fpeclode/shutdown HTTP/1.1 +Host: http://MIDDLEMANAGER_IP:MIDDLEMANAGER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + { + "task":"index_kafka_wikiticker_f7011f8ffba384b_fpeclode" + } + ``` + +
+ +### Disable Middle Manager + +Disables a Middle Manager, causing it to stop accepting new tasks but complete all existing tasks. Returns a JSON object +keyed by the combined `druid.host` and `druid.port`. + +#### URL + +`POST` `/druid/worker/v1/disable` + +#### Responses + + + + + + +
+ +*Successfully disabled Middle Manager* + +
+
+ +#### Sample request + + + + + + +```shell +curl "http://MIDDLEMANAGER_IP:MIDDLEMANAGER_PORT/druid/worker/v1/disable" +``` + + + + + +```http +POST /druid/worker/v1/disable HTTP/1.1 +Host: http://MIDDLEMANAGER_IP:MIDDLEMANAGER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + { + "localhost:8091":"disabled" + } + ``` + +
+ +### Enable Middle Manager + +Enables a Middle Manager, allowing it to accept new tasks again if it was previously disabled. Returns a JSON object keyed by the combined `druid.host` and `druid.port`. + +#### URL + +`POST` `/druid/worker/v1/enable` + +#### Responses + + + + + + +
+ +*Successfully enabled Middle Manager* + +
+
+ +#### Sample request + + + + + + +```shell +curl "http://MIDDLEMANAGER_IP:MIDDLEMANAGER_PORT/druid/worker/v1/enable" +``` + + + + + +```http +POST /druid/worker/v1/enable HTTP/1.1 +Host: http://MIDDLEMANAGER_IP:MIDDLEMANAGER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + { + "localhost:8091":"enabled" + } + ``` + +
+ +## Historical + +### Get segment load status + +Retrieves a JSON object of the form `{"cacheInitialized":value}`, where value is either `true` or `false` indicating if all segments in the local cache have been loaded. + +Use this endpoint to know when a Broker service is ready to accept queries after a restart. + +#### URL + +`GET` `/druid/historical/v1/loadstatus` + +#### Responses + + + + + + +
+ +*Successfully retrieved status* + +
+
+ +#### Sample request + + + + + + +```shell +curl "http://HISTORICAL_IP:HISTORICAL_PORT/druid/historical/v1/loadstatus" +``` + + + + + +```http +GET /druid/historical/v1/loadstatus HTTP/1.1 +Host: http://HISTORICAL_IP:HISTORICAL_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + { + "cacheInitialized": true + } + ``` + +
+ +### Get segment readiness + +Retrieves a status code to indicate if all segments in the local cache have been loaded. Similar to `/druid/historical/v1/loadstatus`, but instead of returning JSON with a flag, it returns status codes. + +#### URL + +`GET` `/druid/historical/v1/readiness` + +#### Responses + + + + + + +
+ +*Segments in local cache successfully loaded* + +
+ + + +
+ +*Segments in local cache have not been loaded* + +
+
+ +#### Sample request + + + + + + +```shell +curl "http://HISTORICAL_IP:HISTORICAL_PORT/druid/historical/v1/readiness" +``` + + + + + +```http +GET /druid/historical/v1/readiness HTTP/1.1 +Host: http://HISTORICAL_IP:HISTORICAL_PORT +``` + + + + +#### Sample response + +A successful response to this endpoint results in an empty response body. + +## Load Status + +### Get Broker query load status + +Retrieves a flag indicating if the Broker knows about all segments in the cluster. Use this endpoint to know when a Broker service is ready to accept queries after a restart. + +#### URL + +`GET` `/druid/broker/v1/loadstatus` + +#### Responses + + + + + + +
+ +*Segments successfully loaded* + +
+
+ +#### Sample request + + + + + + +```shell +curl "http://BROKER_IP:BROKER_PORT/druid/broker/v1/loadstatus" +``` + + + + + +```http +GET /druid/broker/v1/loadstatus HTTP/1.1 +Host: http://: +``` + + + + +#### Sample response + +
+ View the response + + ```json + { + "inventoryInitialized": true + } + ``` + +
+ +### Get Broker query readiness + +Retrieves a status code to indicate Broker readiness. Readiness signifies the Broker knows about all segments in the cluster and is ready to accept queries after a restart. Similar to `/druid/broker/v1/loadstatus`, but instead of returning a JSON, it returns status codes. + +#### URL + +`GET` `/druid/broker/v1/readiness` + +#### Responses + + + + + + +
+ +*Segments successfully loaded* + +
+ + + +
+ +*Segments have not been loaded* + +
+
+ +#### Sample request + + + + + + +```shell +curl "http://BROKER_IP:BROKER_PORT/druid/broker/v1/readiness" +``` + + + + + +```http +GET /druid/broker/v1/readiness HTTP/1.1 +Host: http://BROKER_IP:BROKER_PORT +``` + + + + +#### Sample response + +A successful response to this endpoint results in an empty response body. diff --git a/docs/35.0.0/api-reference/sql-api.md b/docs/35.0.0/api-reference/sql-api.md new file mode 100644 index 0000000000..af60cee4c8 --- /dev/null +++ b/docs/35.0.0/api-reference/sql-api.md @@ -0,0 +1,1727 @@ +--- +id: sql-api +title: Druid SQL API +sidebar_label: Druid SQL +--- +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + + +:::info + Apache Druid supports two query languages: Druid SQL and [native queries](../querying/querying.md). + This document describes the SQL language. +::: + +In this topic, `http://ROUTER_IP:ROUTER_PORT` is a placeholder for your Router service address and port. Replace it with the information for your deployment. For example, use `http://localhost:8888` for quickstart deployments. + +## Query from Historicals + +### Submit a query + +Submits a SQL-based query in the JSON or text format request body. +Returns a JSON object with the query results and optional metadata for the results. You can also use this endpoint to query [metadata tables](../querying/sql-metadata-tables.md). + +Each query has an associated SQL query ID. You can set this ID manually using the SQL context parameter `sqlQueryId`. If not set, Druid automatically generates `sqlQueryId` and returns it in the response header for `X-Druid-SQL-Query-Id`. Note that you need the `sqlQueryId` to [cancel a query](#cancel-a-query). + +#### URL + +`POST` `/druid/v2/sql` + +#### JSON Format Request body + +To send queries in JSON format, the `Content-Type` in the HTTP request MUST be `application/json`. +If there are multiple `Content-Type` headers, the **first** one is used. + +The request body takes the following properties: + +* `query`: SQL query string. HTTP requests are permitted to include multiple `SET` statements to assign [SQL query context parameter](../querying/sql-query-context.md) values to apply to the query statement, see [SET](../querying/sql.md#set) for details. Context parameters set by `SET` statements take priority over values set in `context`. +* `resultFormat`: String that indicates the format to return query results. Select one of the following formats: + * `object`: Returns a JSON array of JSON objects with the HTTP response header `Content-Type: application/json`. + Object field names match the columns returned by the SQL query in the same order as the SQL query. + + * `array`: Returns a JSON array of JSON arrays with the HTTP response header `Content-Type: application/json`. + Each inner array has elements matching the columns returned by the SQL query, in order. + + * `objectLines`: Returns newline-delimited JSON objects with the HTTP response header `Content-Type: text/plain`. + Newline separation facilitates parsing the entire response set as a stream if you don't have a streaming JSON parser. + This format includes a single trailing newline character so you can detect a truncated response. + + * `arrayLines`: Returns newline-delimited JSON arrays with the HTTP response header `Content-Type: text/plain`. + Newline separation facilitates parsing the entire response set as a stream if you don't have a streaming JSON parser. + This format includes a single trailing newline character so you can detect a truncated response. + + * `csv`: Returns comma-separated values with one row per line. Sent with the HTTP response header `Content-Type: text/csv`. + Druid uses double quotes to escape individual field values. For example, a value with a comma returns `"A,B"`. + If the field value contains a double quote character, Druid escapes it with a second double quote character. + For example, `foo"bar` becomes `foo""bar`. + This format includes a single trailing newline character so you can detect a truncated response. + +* `header`: Boolean value that determines whether to return information on column names. When set to `true`, Druid returns the column names as the first row of the results. To also get information on the column types, set `typesHeader` or `sqlTypesHeader` to `true`. For a comparative overview of data formats and configurations for the header, see the [Query output format](#query-output-format) table. + +* `typesHeader`: Adds Druid runtime type information in the header. Requires `header` to be set to `true`. Complex types, like sketches, will be reported as `COMPLEX` if a particular complex type name is known for that field, or as `COMPLEX` if the particular type name is unknown or mixed. + +* `sqlTypesHeader`: Adds SQL type information in the header. Requires `header` to be set to `true`. + + For compatibility, Druid returns the HTTP header `X-Druid-SQL-Header-Included: yes` when all of the following conditions are met: + * The `header` property is set to true. + * The version of Druid supports `typesHeader` and `sqlTypesHeader`, regardless of whether either property is set. + +* `context`: JSON object containing optional [SQL query context parameters](../querying/sql-query-context.md), such as to set the query ID, time zone, and whether to use an approximation algorithm for distinct count. You can also set the context through the SQL SET command. For more information, see [Druid SQL overview](../querying/sql.md#set). + +* `parameters`: List of query parameters for parameterized queries. Each parameter in the array should be a JSON object containing the parameter's SQL data type and parameter value. For more information on using dynamic parameters, see [Dynamic parameters](../querying/sql.md#dynamic-parameters). For a list of supported SQL types, see [Data types](../querying/sql-data-types.md). + + For example: + + ```json + { + "query": "SELECT \"arrayDouble\", \"stringColumn\" FROM \"array_example\" WHERE ARRAY_CONTAINS(\"arrayDouble\", ?) AND \"stringColumn\" = ?", + "parameters": [ + {"type": "ARRAY", "value": [999.0, null, 5.5]}, + {"type": "VARCHAR", "value": "bar"} + ] + } + ``` + +##### Text Format Request body + +Druid also allows you to submit SQL queries in text format which is simpler than above JSON format. +To do this, just set the `Content-Type` request header to `text/plain` or `application/x-www-form-urlencoded`, and pass SQL via the HTTP Body. + +If `application/x-www-form-urlencoded` is used, make sure the SQL query is URL-encoded. + +If there are multiple `Content-Type` headers, the **first** one is used. + +For response, the `resultFormat` is always `object` with the HTTP response header `Content-Type: application/json`. +If you want more control over the query context or response format, use the above JSON format request body instead. + +The following example demonstrates how to submit a SQL query in text format: + +```commandline +echo 'SELECT 1' | curl -H 'Content-Type: text/plain' http://ROUTER_IP:ROUTER_PORT/druid/v2/sql --data @- +``` + +We can also use `application/x-www-form-urlencoded` to submit URL-encoded SQL queries as shown by the following examples: + +```commandline +echo 'SELECT%20%31' | curl http://ROUTER_IP:ROUTER_PORT/druid/v2/sql --data @- +echo 'SELECT 1' | curl http://ROUTER_IP:ROUTER_PORT/druid/v2/sql --data-urlencode @- +``` + +The `curl` tool uses `application/x-www-form-urlencoded` as Content-Type header if the header is not given. + +The first example pass the URL-encoded query `SELECT%20%31`, which is `SELECT 1`, to the `curl` and `curl` will directly sends it to the server. +While the second example passes the raw query `SELECT 1` to `curl` and the `curl` encodes the query to `SELECT%20%31` because of `--data-urlencode` option and sends the encoded text to the server. + +#### Responses + + + + + + +*Successfully submitted query* + + + + + +*Error thrown due to bad query. Returns a JSON object detailing the error with the following format:* + +```json +{ + "error": "A well-defined error code.", + "errorMessage": "A message with additional details about the error.", + "errorClass": "Class of exception that caused this error.", + "host": "The host on which the error occurred." +} +``` + + + + +*Request not sent due to unexpected conditions. Returns a JSON object detailing the error with the following format:* + +```json +{ + "error": "A well-defined error code.", + "errorMessage": "A message with additional details about the error.", + "errorClass": "Class of exception that caused this error.", + "host": "The host on which the error occurred." +} +``` + + + + +#### Client-side error handling and truncated responses + +Druid reports errors that occur before the response body is sent as JSON with an HTTP 500 status code. The errors are reported using the same format as [native Druid query errors](../querying/querying.md#query-errors). +If an error occurs while Druid is sending the response body, the server handling the request stops the response midstream and logs an error. + +This means that when you call the SQL API, you must properly handle response truncation. +For `object` and `array` formats, truncated responses are invalid JSON. +For line-oriented formats, Druid includes a newline character as the final character of every complete response. Absence of a final newline character indicates a truncated response. + +If you detect a truncated response, treat it as an error. + +--- + +#### Sample request + +In the following example, this query demonstrates the following actions: +- Retrieves all rows from the `wikipedia` datasource. +- Filters the results where the `user` value is `BlueMoon2662`. +- Applies the `sqlTimeZone` context parameter to set the time zone of results to `America/Los_Angeles`. +- Returns descriptors for `header`, `typesHeader`, and `sqlTypesHeader`. + + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql" \ +--header 'Content-Type: application/json' \ +--data '{ + "query": "SELECT * FROM wikipedia WHERE user='\''BlueMoon2662'\''", + "context" : {"sqlTimeZone" : "America/Los_Angeles"}, + "header" : true, + "typesHeader" : true, + "sqlTypesHeader" : true +}' +``` + + + + + +```HTTP +POST /druid/v2/sql HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json +Content-Length: 201 + +{ + "query": "SELECT * FROM wikipedia WHERE user='BlueMoon2662'", + "context" : {"sqlTimeZone" : "America/Los_Angeles"}, + "header" : true, + "typesHeader" : true, + "sqlTypesHeader" : true +} +``` + + + + +You can also specify query-level context parameters directly within the SQL query string using the `SET` command. For more details, see [SET](../querying/sql.md#set). + +The following request body is functionally equivalent to the previous example and uses SET instead of the `context` parameter: + +```JSON +{ + "query": "SET sqlTimeZone='America/Los_Angeles'; SELECT * FROM wikipedia WHERE user='BlueMoon2662'", + "header": true, + "typesHeader": true, + "sqlTypesHeader": true +} +``` + + +#### Sample response + +
+ View the response + +```json +[ + { + "__time": { + "type": "LONG", + "sqlType": "TIMESTAMP" + }, + "channel": { + "type": "STRING", + "sqlType": "VARCHAR" + }, + "cityName": { + "type": "STRING", + "sqlType": "VARCHAR" + }, + "comment": { + "type": "STRING", + "sqlType": "VARCHAR" + }, + "countryIsoCode": { + "type": "STRING", + "sqlType": "VARCHAR" + }, + "countryName": { + "type": "STRING", + "sqlType": "VARCHAR" + }, + "isAnonymous": { + "type": "STRING", + "sqlType": "VARCHAR" + }, + "isMinor": { + "type": "STRING", + "sqlType": "VARCHAR" + }, + "isNew": { + "type": "STRING", + "sqlType": "VARCHAR" + }, + "isRobot": { + "type": "STRING", + "sqlType": "VARCHAR" + }, + "isUnpatrolled": { + "type": "STRING", + "sqlType": "VARCHAR" + }, + "metroCode": { + "type": "LONG", + "sqlType": "BIGINT" + }, + "namespace": { + "type": "STRING", + "sqlType": "VARCHAR" + }, + "page": { + "type": "STRING", + "sqlType": "VARCHAR" + }, + "regionIsoCode": { + "type": "STRING", + "sqlType": "VARCHAR" + }, + "regionName": { + "type": "STRING", + "sqlType": "VARCHAR" + }, + "user": { + "type": "STRING", + "sqlType": "VARCHAR" + }, + "delta": { + "type": "LONG", + "sqlType": "BIGINT" + }, + "added": { + "type": "LONG", + "sqlType": "BIGINT" + }, + "deleted": { + "type": "LONG", + "sqlType": "BIGINT" + } + }, + { + "__time": "2015-09-11T17:47:53.259-07:00", + "channel": "#ja.wikipedia", + "cityName": null, + "comment": "/* 対戦通算成績と得失点 */", + "countryIsoCode": null, + "countryName": null, + "isAnonymous": "false", + "isMinor": "true", + "isNew": "false", + "isRobot": "false", + "isUnpatrolled": "false", + "metroCode": null, + "namespace": "Main", + "page": "アルビレックス新潟の年度別成績一覧", + "regionIsoCode": null, + "regionName": null, + "user": "BlueMoon2662", + "delta": 14, + "added": 14, + "deleted": 0 + } +] +``` +
+ +### Cancel a query + +Cancels a query on the Router or the Broker with the associated `sqlQueryId`. The `sqlQueryId` can be manually set when the query is submitted in the query context parameter, or if not set, Druid will generate one and return it in the response header when the query is successfully submitted. Note that Druid does not enforce a unique `sqlQueryId` in the query context. If you've set the same `sqlQueryId` for multiple queries, Druid cancels all requests with that query ID. + +When you cancel a query, Druid handles the cancellation in a best-effort manner. Druid immediately marks the query as canceled and aborts the query execution as soon as possible. However, the query may continue running for a short time after you make the cancellation request. + +Cancellation requests require READ permission on all resources used in the SQL query. + +#### URL + +`DELETE` `/druid/v2/sql/{sqlQueryId}` + +#### Responses + + + + + + +*Successfully deleted query* + + + + + +*Authorization failure* + + + + + +*Invalid `sqlQueryId` or query was completed before cancellation request* + + + + +--- + +#### Sample request + +The following example cancels a request with the set query ID `request01`. + + + + + + +```shell +curl --request DELETE "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/request01" +``` + + + + + +```HTTP +DELETE /druid/v2/sql/request01 HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +A successful response results in an `HTTP 202` message code and an empty response body. + +### Query output format + +The following table shows examples of how Druid returns the column names and data types based on the result format and the type request. +In all cases, `header` is true. +The examples includes the first row of results, where the value of `user` is `BlueMoon2662`. + +``` +| Format | typesHeader | sqlTypesHeader | Example output | +|--------|-------------|----------------|--------------------------------------------------------------------------------------------| +| object | true | false | [ { "user" : { "type" : "STRING" } }, { "user" : "BlueMoon2662" } ] | +| object | true | true | [ { "user" : { "type" : "STRING", "sqlType" : "VARCHAR" } }, { "user" : "BlueMoon2662" } ] | +| object | false | true | [ { "user" : { "sqlType" : "VARCHAR" } }, { "user" : "BlueMoon2662" } ] | +| object | false | false | [ { "user" : null }, { "user" : "BlueMoon2662" } ] | +| array | true | false | [ [ "user" ], [ "STRING" ], [ "BlueMoon2662" ] ] | +| array | true | true | [ [ "user" ], [ "STRING" ], [ "VARCHAR" ], [ "BlueMoon2662" ] ] | +| array | false | true | [ [ "user" ], [ "VARCHAR" ], [ "BlueMoon2662" ] ] | +| array | false | false | [ [ "user" ], [ "BlueMoon2662" ] ] | +| csv | true | false | user STRING BlueMoon2662 | +| csv | true | true | user STRING VARCHAR BlueMoon2662 | +| csv | false | true | user VARCHAR BlueMoon2662 | +| csv | false | false | user BlueMoon2662 | +``` + +## Query from deep storage + +You can use the `sql/statements` endpoint to query segments that exist only in deep storage and are not loaded onto your Historical processes as determined by your load rules. + +Note that at least one segment of a datasource must be available on a Historical process so that the Broker can plan your query. A quick way to check if this is true is whether or not a datasource is visible in the Druid console. + + +For more information, see [Query from deep storage](../querying/query-from-deep-storage.md). + +### Submit a query + +Submit a query for data stored in deep storage. Any data ingested into Druid is placed into deep storage. The query is contained in the "query" field in the JSON object within the request payload. + +Note that at least part of a datasource must be available on a Historical process so that Druid can plan your query and only the user who submits a query can see the results. + +#### URL + +`POST` `/druid/v2/sql/statements` + +#### Request body + +Generally, the `sql` and `sql/statements` endpoints support the same response body fields with minor differences. For general information about the available fields, see [Submit a query to the `sql` endpoint](#submit-a-query). + +Keep the following in mind when submitting queries to the `sql/statements` endpoint: + +- Apart from the context parameters mentioned [here](../multi-stage-query/reference.md#context-parameters) there are additional context parameters for `sql/statements` specifically: + + - `executionMode` determines how query results are fetched. Druid currently only supports `ASYNC`. You must manually retrieve your results after the query completes. + - `selectDestination` determines where final results get written. By default, results are written to task reports. Set this parameter to `durableStorage` to instruct Druid to write the results from SELECT queries to durable storage, which allows you to fetch larger result sets. For result sets with more than 3000 rows, it is highly recommended to use `durableStorage`. Note that this requires you to have [durable storage for MSQ](../operations/durable-storage.md) enabled. + +#### Responses + + + + + + +*Successfully queried from deep storage* + + + + + +*Error thrown due to bad query. Returns a JSON object detailing the error with the following format:* + +```json +{ + "error": "Summary of the encountered error.", + "errorClass": "Class of exception that caused this error.", + "host": "The host on which the error occurred.", + "errorCode": "Well-defined error code.", + "persona": "Role or persona associated with the error.", + "category": "Classification of the error.", + "errorMessage": "Summary of the encountered issue with expanded information.", + "context": "Additional context about the error." +} +``` + + + + +--- + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/statements" \ +--header 'Content-Type: application/json' \ +--data '{ + "query": "SELECT * FROM wikipedia WHERE user='\''BlueMoon2662'\''", + "context": { + "executionMode":"ASYNC" + } +}' +``` + + + + + +```HTTP +POST /druid/v2/sql/statements HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json +Content-Length: 134 + +{ + "query": "SELECT * FROM wikipedia WHERE user='BlueMoon2662'", + "context": { + "executionMode":"ASYNC" + } +} +``` + + + + +#### Sample response + +
+ View the response + + ```json +{ + "queryId": "query-b82a7049-b94f-41f2-a230-7fef94768745", + "state": "ACCEPTED", + "createdAt": "2023-07-26T21:16:25.324Z", + "schema": [ + { + "name": "__time", + "type": "TIMESTAMP", + "nativeType": "LONG" + }, + { + "name": "channel", + "type": "VARCHAR", + "nativeType": "STRING" + }, + { + "name": "cityName", + "type": "VARCHAR", + "nativeType": "STRING" + }, + { + "name": "comment", + "type": "VARCHAR", + "nativeType": "STRING" + }, + { + "name": "countryIsoCode", + "type": "VARCHAR", + "nativeType": "STRING" + }, + { + "name": "countryName", + "type": "VARCHAR", + "nativeType": "STRING" + }, + { + "name": "isAnonymous", + "type": "BIGINT", + "nativeType": "LONG" + }, + { + "name": "isMinor", + "type": "BIGINT", + "nativeType": "LONG" + }, + { + "name": "isNew", + "type": "BIGINT", + "nativeType": "LONG" + }, + { + "name": "isRobot", + "type": "BIGINT", + "nativeType": "LONG" + }, + { + "name": "isUnpatrolled", + "type": "BIGINT", + "nativeType": "LONG" + }, + { + "name": "metroCode", + "type": "BIGINT", + "nativeType": "LONG" + }, + { + "name": "namespace", + "type": "VARCHAR", + "nativeType": "STRING" + }, + { + "name": "page", + "type": "VARCHAR", + "nativeType": "STRING" + }, + { + "name": "regionIsoCode", + "type": "VARCHAR", + "nativeType": "STRING" + }, + { + "name": "regionName", + "type": "VARCHAR", + "nativeType": "STRING" + }, + { + "name": "user", + "type": "VARCHAR", + "nativeType": "STRING" + }, + { + "name": "delta", + "type": "BIGINT", + "nativeType": "LONG" + }, + { + "name": "added", + "type": "BIGINT", + "nativeType": "LONG" + }, + { + "name": "deleted", + "type": "BIGINT", + "nativeType": "LONG" + } + ], + "durationMs": -1 +} + ``` +
+ +### Get query status + +Retrieves information about the query associated with the given query ID. The response matches the response from the POST API if the query is accepted or running and the execution mode is `ASYNC`. In addition to the fields that this endpoint shares with `POST /sql/statements`, a completed query's status includes the following: + +- A `result` object that summarizes information about your results, such as the total number of rows and sample records. +- A `pages` object that includes the following information for each page of results: + - `numRows`: the number of rows in that page of results. + - `sizeInBytes`: the size of the page. + - `id`: the page number that you can use to reference a specific page when you get query results. + +If the optional query parameter `detail` is supplied, then the response also includes the following: +- A `stages` object that summarizes information about the different stages being used for query execution, such as stage number, phase, start time, duration, input and output information, processing methods, and partitioning. +- A `counters` object that provides details on the rows, bytes, and files processed at various stages for each worker across different channels, along with sort progress. +- A `warnings` object that provides details about any warnings. + +#### URL + +`GET` `/druid/v2/sql/statements/{queryId}` + +#### Query parameters +* `detail` (optional) + * Type: Boolean + * Default: false + * Fetch additional details about the query, which includes the information about different stages, counters for each stage, and any warnings. + +#### Responses + + + + + + +*Successfully retrieved query status* + + + + + +*Error thrown due to bad query. Returns a JSON object detailing the error with the following format:* + +```json +{ + "error": "Summary of the encountered error.", + "errorCode": "Well-defined error code.", + "persona": "Role or persona associated with the error.", + "category": "Classification of the error.", + "errorMessage": "Summary of the encountered issue with expanded information.", + "context": "Additional context about the error." +} +``` + + + + +#### Sample request + +The following example retrieves the status of a query with specified ID `query-9b93f6f7-ab0e-48f5-986a-3520f84f0804`. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/statements/query-9b93f6f7-ab0e-48f5-986a-3520f84f0804?detail=true" +``` + + + + + +```HTTP +GET /druid/v2/sql/statements/query-9b93f6f7-ab0e-48f5-986a-3520f84f0804?detail=true HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json +{ + "queryId": "query-9b93f6f7-ab0e-48f5-986a-3520f84f0804", + "state": "SUCCESS", + "createdAt": "2023-07-26T22:57:46.620Z", + "schema": [ + { + "name": "__time", + "type": "TIMESTAMP", + "nativeType": "LONG" + }, + { + "name": "channel", + "type": "VARCHAR", + "nativeType": "STRING" + }, + { + "name": "cityName", + "type": "VARCHAR", + "nativeType": "STRING" + }, + { + "name": "comment", + "type": "VARCHAR", + "nativeType": "STRING" + }, + { + "name": "countryIsoCode", + "type": "VARCHAR", + "nativeType": "STRING" + }, + { + "name": "countryName", + "type": "VARCHAR", + "nativeType": "STRING" + }, + { + "name": "isAnonymous", + "type": "BIGINT", + "nativeType": "LONG" + }, + { + "name": "isMinor", + "type": "BIGINT", + "nativeType": "LONG" + }, + { + "name": "isNew", + "type": "BIGINT", + "nativeType": "LONG" + }, + { + "name": "isRobot", + "type": "BIGINT", + "nativeType": "LONG" + }, + { + "name": "isUnpatrolled", + "type": "BIGINT", + "nativeType": "LONG" + }, + { + "name": "metroCode", + "type": "BIGINT", + "nativeType": "LONG" + }, + { + "name": "namespace", + "type": "VARCHAR", + "nativeType": "STRING" + }, + { + "name": "page", + "type": "VARCHAR", + "nativeType": "STRING" + }, + { + "name": "regionIsoCode", + "type": "VARCHAR", + "nativeType": "STRING" + }, + { + "name": "regionName", + "type": "VARCHAR", + "nativeType": "STRING" + }, + { + "name": "user", + "type": "VARCHAR", + "nativeType": "STRING" + }, + { + "name": "delta", + "type": "BIGINT", + "nativeType": "LONG" + }, + { + "name": "added", + "type": "BIGINT", + "nativeType": "LONG" + }, + { + "name": "deleted", + "type": "BIGINT", + "nativeType": "LONG" + } + ], + "durationMs": 25591, + "result": { + "numTotalRows": 1, + "totalSizeInBytes": 375, + "dataSource": "__query_select", + "sampleRecords": [ + [ + 1442018873259, + "#ja.wikipedia", + "", + "/* 対戦通算成績と得失点 */", + "", + "", + 0, + 1, + 0, + 0, + 0, + 0, + "Main", + "アルビレックス新潟の年度別成績一覧", + "", + "", + "BlueMoon2662", + 14, + 14, + 0 + ] + ], + "pages": [ + { + "id": 0, + "numRows": 1, + "sizeInBytes": 375 + } + ] + }, + "stages": [ + { + "stageNumber": 0, + "definition": { + "id": "query-9b93f6f7-ab0e-48f5-986a-3520f84f0804_0", + "input": [ + { + "type": "table", + "dataSource": "wikipedia", + "intervals": [ + "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z" + ], + "filter": { + "type": "equals", + "column": "user", + "matchValueType": "STRING", + "matchValue": "BlueMoon2662" + }, + "filterFields": [ + "user" + ] + } + ], + "processor": { + "type": "scan", + "query": { + "queryType": "scan", + "dataSource": { + "type": "inputNumber", + "inputNumber": 0 + }, + "intervals": { + "type": "intervals", + "intervals": [ + "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z" + ] + }, + "virtualColumns": [ + { + "type": "expression", + "name": "v0", + "expression": "'BlueMoon2662'", + "outputType": "STRING" + } + ], + "resultFormat": "compactedList", + "limit": 1001, + "filter": { + "type": "equals", + "column": "user", + "matchValueType": "STRING", + "matchValue": "BlueMoon2662" + }, + "columns": [ + "__time", + "added", + "channel", + "cityName", + "comment", + "commentLength", + "countryIsoCode", + "countryName", + "deleted", + "delta", + "deltaBucket", + "diffUrl", + "flags", + "isAnonymous", + "isMinor", + "isNew", + "isRobot", + "isUnpatrolled", + "metroCode", + "namespace", + "page", + "regionIsoCode", + "regionName", + "v0" + ], + "context": { + "__resultFormat": "array", + "__user": "allowAll", + "executionMode": "async", + "finalize": true, + "maxNumTasks": 2, + "maxParseExceptions": 0, + "queryId": "33b53acb-7533-4880-a81b-51c16c489eab", + "scanSignature": "[{\"name\":\"__time\",\"type\":\"LONG\"},{\"name\":\"added\",\"type\":\"LONG\"},{\"name\":\"channel\",\"type\":\"STRING\"},{\"name\":\"cityName\",\"type\":\"STRING\"},{\"name\":\"comment\",\"type\":\"STRING\"},{\"name\":\"commentLength\",\"type\":\"LONG\"},{\"name\":\"countryIsoCode\",\"type\":\"STRING\"},{\"name\":\"countryName\",\"type\":\"STRING\"},{\"name\":\"deleted\",\"type\":\"LONG\"},{\"name\":\"delta\",\"type\":\"LONG\"},{\"name\":\"deltaBucket\",\"type\":\"LONG\"},{\"name\":\"diffUrl\",\"type\":\"STRING\"},{\"name\":\"flags\",\"type\":\"STRING\"},{\"name\":\"isAnonymous\",\"type\":\"STRING\"},{\"name\":\"isMinor\",\"type\":\"STRING\"},{\"name\":\"isNew\",\"type\":\"STRING\"},{\"name\":\"isRobot\",\"type\":\"STRING\"},{\"name\":\"isUnpatrolled\",\"type\":\"STRING\"},{\"name\":\"metroCode\",\"type\":\"STRING\"},{\"name\":\"namespace\",\"type\":\"STRING\"},{\"name\":\"page\",\"type\":\"STRING\"},{\"name\":\"regionIsoCode\",\"type\":\"STRING\"},{\"name\":\"regionName\",\"type\":\"STRING\"},{\"name\":\"v0\",\"type\":\"STRING\"}]", + "sqlOuterLimit": 1001, + "sqlQueryId": "33b53acb-7533-4880-a81b-51c16c489eab", + "sqlStringifyArrays": false + }, + "columnTypes": [ + "LONG", + "LONG", + "STRING", + "STRING", + "STRING", + "LONG", + "STRING", + "STRING", + "LONG", + "LONG", + "LONG", + "STRING", + "STRING", + "STRING", + "STRING", + "STRING", + "STRING", + "STRING", + "STRING", + "STRING", + "STRING", + "STRING", + "STRING", + "STRING" + ], + "granularity": { + "type": "all" + }, + "legacy": false + } + }, + "signature": [ + { + "name": "__boost", + "type": "LONG" + }, + { + "name": "__time", + "type": "LONG" + }, + { + "name": "added", + "type": "LONG" + }, + { + "name": "channel", + "type": "STRING" + }, + { + "name": "cityName", + "type": "STRING" + }, + { + "name": "comment", + "type": "STRING" + }, + { + "name": "commentLength", + "type": "LONG" + }, + { + "name": "countryIsoCode", + "type": "STRING" + }, + { + "name": "countryName", + "type": "STRING" + }, + { + "name": "deleted", + "type": "LONG" + }, + { + "name": "delta", + "type": "LONG" + }, + { + "name": "deltaBucket", + "type": "LONG" + }, + { + "name": "diffUrl", + "type": "STRING" + }, + { + "name": "flags", + "type": "STRING" + }, + { + "name": "isAnonymous", + "type": "STRING" + }, + { + "name": "isMinor", + "type": "STRING" + }, + { + "name": "isNew", + "type": "STRING" + }, + { + "name": "isRobot", + "type": "STRING" + }, + { + "name": "isUnpatrolled", + "type": "STRING" + }, + { + "name": "metroCode", + "type": "STRING" + }, + { + "name": "namespace", + "type": "STRING" + }, + { + "name": "page", + "type": "STRING" + }, + { + "name": "regionIsoCode", + "type": "STRING" + }, + { + "name": "regionName", + "type": "STRING" + }, + { + "name": "v0", + "type": "STRING" + } + ], + "shuffleSpec": { + "type": "mix" + }, + "maxWorkerCount": 1 + }, + "phase": "FINISHED", + "workerCount": 1, + "partitionCount": 1, + "shuffle": "mix", + "output": "localStorage", + "startTime": "2024-07-31T15:20:21.255Z", + "duration": 103 + }, + { + "stageNumber": 1, + "definition": { + "id": "query-9b93f6f7-ab0e-48f5-986a-3520f84f0804_1", + "input": [ + { + "type": "stage", + "stage": 0 + } + ], + "processor": { + "type": "limit", + "limit": 1001 + }, + "signature": [ + { + "name": "__boost", + "type": "LONG" + }, + { + "name": "__time", + "type": "LONG" + }, + { + "name": "added", + "type": "LONG" + }, + { + "name": "channel", + "type": "STRING" + }, + { + "name": "cityName", + "type": "STRING" + }, + { + "name": "comment", + "type": "STRING" + }, + { + "name": "commentLength", + "type": "LONG" + }, + { + "name": "countryIsoCode", + "type": "STRING" + }, + { + "name": "countryName", + "type": "STRING" + }, + { + "name": "deleted", + "type": "LONG" + }, + { + "name": "delta", + "type": "LONG" + }, + { + "name": "deltaBucket", + "type": "LONG" + }, + { + "name": "diffUrl", + "type": "STRING" + }, + { + "name": "flags", + "type": "STRING" + }, + { + "name": "isAnonymous", + "type": "STRING" + }, + { + "name": "isMinor", + "type": "STRING" + }, + { + "name": "isNew", + "type": "STRING" + }, + { + "name": "isRobot", + "type": "STRING" + }, + { + "name": "isUnpatrolled", + "type": "STRING" + }, + { + "name": "metroCode", + "type": "STRING" + }, + { + "name": "namespace", + "type": "STRING" + }, + { + "name": "page", + "type": "STRING" + }, + { + "name": "regionIsoCode", + "type": "STRING" + }, + { + "name": "regionName", + "type": "STRING" + }, + { + "name": "v0", + "type": "STRING" + } + ], + "shuffleSpec": { + "type": "maxCount", + "clusterBy": { + "columns": [ + { + "columnName": "__boost", + "order": "ASCENDING" + } + ] + }, + "partitions": 1 + }, + "maxWorkerCount": 1 + }, + "phase": "FINISHED", + "workerCount": 1, + "partitionCount": 1, + "shuffle": "globalSort", + "output": "localStorage", + "startTime": "2024-07-31T15:20:21.355Z", + "duration": 10, + "sort": true + } + ], + "counters": { + "0": { + "0": { + "input0": { + "type": "channel", + "rows": [ + 24433 + ], + "bytes": [ + 7393933 + ], + "files": [ + 22 + ], + "totalFiles": [ + 22 + ] + } + } + }, + "1": { + "0": { + "sortProgress": { + "type": "sortProgress", + "totalMergingLevels": -1, + "levelToTotalBatches": {}, + "levelToMergedBatches": {}, + "totalMergersForUltimateLevel": -1, + "triviallyComplete": true, + "progressDigest": 1 + } + } + } + }, + "warnings": [] +} + ``` +
+ + +### Get query results + +Retrieves results for completed queries. Results are separated into pages, so you can use the optional `page` parameter to refine the results you get. Druid returns information about the composition of each page and its page number (`id`). For information about pages, see [Get query status](#get-query-status). + +If a page number isn't passed, all results are returned sequentially in the same response. If you have large result sets, you may encounter timeouts based on the value configured for `druid.router.http.readTimeout`. + +Getting the query results for an ingestion query returns an empty response. + +#### URL + +`GET` `/druid/v2/sql/statements/{queryId}/results` + +#### Query parameters +* `page` (optional) + * Type: Int + * Fetch results based on page numbers. If not specified, all results are returned sequentially starting from page 0 to N in the same response. +* `resultFormat` (optional) + * Type: String + * Defines the format in which the results are presented. The following options are supported `arrayLines`,`objectLines`,`array`,`object`, and `csv`. The default is `object`. +* `filename` (optional) + * Type: String + * If set, attaches a `Content-Disposition` header to the response with the value of `attachment; filename={filename}`. The filename must not be longer than 255 characters and must not contain the characters `/`, `\`, `:`, `*`, `?`, `"`, `<`, `>`, `|`, `\0`, `\n`, or `\r`. + +#### Responses + + + + + + +*Successfully retrieved query results* + + + + + +*Query in progress. Returns a JSON object detailing the error with the following format:* + +```json +{ + "error": "Summary of the encountered error.", + "errorCode": "Well-defined error code.", + "persona": "Role or persona associated with the error.", + "category": "Classification of the error.", + "errorMessage": "Summary of the encountered issue with expanded information.", + "context": "Additional context about the error." +} +``` + + + + + +*Query not found, failed or canceled* + + + + + +*Error thrown due to bad query. Returns a JSON object detailing the error with the following format:* + +```json +{ + "error": "Summary of the encountered error.", + "errorCode": "Well-defined error code.", + "persona": "Role or persona associated with the error.", + "category": "Classification of the error.", + "errorMessage": "Summary of the encountered issue with expanded information.", + "context": "Additional context about the error." +} +``` + + + + +--- + +#### Sample request + +The following example retrieves the status of a query with specified ID `query-f3bca219-173d-44d4-bdc7-5002e910352f`. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/statements/query-f3bca219-173d-44d4-bdc7-5002e910352f/results" +``` + + + + + +```HTTP +GET /druid/v2/sql/statements/query-f3bca219-173d-44d4-bdc7-5002e910352f/results HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json +[ + { + "__time": 1442018818771, + "channel": "#en.wikipedia", + "cityName": "", + "comment": "added project", + "countryIsoCode": "", + "countryName": "", + "isAnonymous": 0, + "isMinor": 0, + "isNew": 0, + "isRobot": 0, + "isUnpatrolled": 0, + "metroCode": 0, + "namespace": "Talk", + "page": "Talk:Oswald Tilghman", + "regionIsoCode": "", + "regionName": "", + "user": "GELongstreet", + "delta": 36, + "added": 36, + "deleted": 0 + }, + { + "__time": 1442018820496, + "channel": "#ca.wikipedia", + "cityName": "", + "comment": "Robot inserta {{Commonscat}} que enllaça amb [[commons:category:Rallicula]]", + "countryIsoCode": "", + "countryName": "", + "isAnonymous": 0, + "isMinor": 1, + "isNew": 0, + "isRobot": 1, + "isUnpatrolled": 0, + "metroCode": 0, + "namespace": "Main", + "page": "Rallicula", + "regionIsoCode": "", + "regionName": "", + "user": "PereBot", + "delta": 17, + "added": 17, + "deleted": 0 + }, + { + "__time": 1442018825474, + "channel": "#en.wikipedia", + "cityName": "Auburn", + "comment": "/* Status of peremptory norms under international law */ fixed spelling of 'Wimbledon'", + "countryIsoCode": "AU", + "countryName": "Australia", + "isAnonymous": 1, + "isMinor": 0, + "isNew": 0, + "isRobot": 0, + "isUnpatrolled": 0, + "metroCode": 0, + "namespace": "Main", + "page": "Peremptory norm", + "regionIsoCode": "NSW", + "regionName": "New South Wales", + "user": "60.225.66.142", + "delta": 0, + "added": 0, + "deleted": 0 + }, + { + "__time": 1442018828770, + "channel": "#vi.wikipedia", + "cityName": "", + "comment": "fix Lỗi CS1: ngày tháng", + "countryIsoCode": "", + "countryName": "", + "isAnonymous": 0, + "isMinor": 1, + "isNew": 0, + "isRobot": 1, + "isUnpatrolled": 0, + "metroCode": 0, + "namespace": "Main", + "page": "Apamea abruzzorum", + "regionIsoCode": "", + "regionName": "", + "user": "Cheers!-bot", + "delta": 18, + "added": 18, + "deleted": 0 + }, + { + "__time": 1442018831862, + "channel": "#vi.wikipedia", + "cityName": "", + "comment": "clean up using [[Project:AWB|AWB]]", + "countryIsoCode": "", + "countryName": "", + "isAnonymous": 0, + "isMinor": 0, + "isNew": 0, + "isRobot": 1, + "isUnpatrolled": 0, + "metroCode": 0, + "namespace": "Main", + "page": "Atractus flammigerus", + "regionIsoCode": "", + "regionName": "", + "user": "ThitxongkhoiAWB", + "delta": 18, + "added": 18, + "deleted": 0 + }, + { + "__time": 1442018833987, + "channel": "#vi.wikipedia", + "cityName": "", + "comment": "clean up using [[Project:AWB|AWB]]", + "countryIsoCode": "", + "countryName": "", + "isAnonymous": 0, + "isMinor": 0, + "isNew": 0, + "isRobot": 1, + "isUnpatrolled": 0, + "metroCode": 0, + "namespace": "Main", + "page": "Agama mossambica", + "regionIsoCode": "", + "regionName": "", + "user": "ThitxongkhoiAWB", + "delta": 18, + "added": 18, + "deleted": 0 + }, + { + "__time": 1442018837009, + "channel": "#ca.wikipedia", + "cityName": "", + "comment": "/* Imperi Austrohongarès */", + "countryIsoCode": "", + "countryName": "", + "isAnonymous": 0, + "isMinor": 0, + "isNew": 0, + "isRobot": 0, + "isUnpatrolled": 0, + "metroCode": 0, + "namespace": "Main", + "page": "Campanya dels Balcans (1914-1918)", + "regionIsoCode": "", + "regionName": "", + "user": "Jaumellecha", + "delta": -20, + "added": 0, + "deleted": 20 + }, + { + "__time": 1442018839591, + "channel": "#en.wikipedia", + "cityName": "", + "comment": "adding comment on notability and possible COI", + "countryIsoCode": "", + "countryName": "", + "isAnonymous": 0, + "isMinor": 0, + "isNew": 1, + "isRobot": 0, + "isUnpatrolled": 1, + "metroCode": 0, + "namespace": "Talk", + "page": "Talk:Dani Ploeger", + "regionIsoCode": "", + "regionName": "", + "user": "New Media Theorist", + "delta": 345, + "added": 345, + "deleted": 0 + }, + { + "__time": 1442018841578, + "channel": "#en.wikipedia", + "cityName": "", + "comment": "Copying assessment table to wiki", + "countryIsoCode": "", + "countryName": "", + "isAnonymous": 0, + "isMinor": 0, + "isNew": 0, + "isRobot": 1, + "isUnpatrolled": 0, + "metroCode": 0, + "namespace": "User", + "page": "User:WP 1.0 bot/Tables/Project/Pubs", + "regionIsoCode": "", + "regionName": "", + "user": "WP 1.0 bot", + "delta": 121, + "added": 121, + "deleted": 0 + }, + { + "__time": 1442018845821, + "channel": "#vi.wikipedia", + "cityName": "", + "comment": "clean up using [[Project:AWB|AWB]]", + "countryIsoCode": "", + "countryName": "", + "isAnonymous": 0, + "isMinor": 0, + "isNew": 0, + "isRobot": 1, + "isUnpatrolled": 0, + "metroCode": 0, + "namespace": "Main", + "page": "Agama persimilis", + "regionIsoCode": "", + "regionName": "", + "user": "ThitxongkhoiAWB", + "delta": 18, + "added": 18, + "deleted": 0 + } +] + ``` +
+ +### Cancel a query + +Cancels a running or accepted query. + +#### URL + +`DELETE` `/druid/v2/sql/statements/{queryId}` + +#### Responses + + + + + + +*A no op operation since the query is not in a state to be cancelled* + + + + + +*Successfully accepted query for cancellation* + + + + + +*Invalid query ID. Returns a JSON object detailing the error with the following format:* + +```json +{ + "error": "Summary of the encountered error.", + "errorCode": "Well-defined error code.", + "persona": "Role or persona associated with the error.", + "category": "Classification of the error.", + "errorMessage": "Summary of the encountered issue with expanded information.", + "context": "Additional context about the error." +} +``` + + + + +--- + +#### Sample request + +The following example cancels a query with specified ID `query-945c9633-2fa2-49ab-80ae-8221c38c024da`. + + + + + + +```shell +curl --request DELETE "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/statements/query-945c9633-2fa2-49ab-80ae-8221c38c024da" +``` + + + + + +```HTTP +DELETE /druid/v2/sql/statements/query-945c9633-2fa2-49ab-80ae-8221c38c024da HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +A successful request returns an HTTP `202 ACCEPTED` message code and an empty response body. diff --git a/docs/35.0.0/api-reference/sql-ingestion-api.md b/docs/35.0.0/api-reference/sql-ingestion-api.md new file mode 100644 index 0000000000..59942aff8e --- /dev/null +++ b/docs/35.0.0/api-reference/sql-ingestion-api.md @@ -0,0 +1,850 @@ +--- +id: sql-ingestion-api +title: SQL-based ingestion API +sidebar_label: SQL-based ingestion +--- +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + + +:::info + This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md) + extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which + ingestion method is right for you. +::: + +The **Query** view in the web console provides a friendly experience for the multi-stage query task engine (MSQ task engine) and multi-stage query architecture. We recommend using the web console if you don't need a programmatic interface. + +When using the API for the MSQ task engine, the action you want to take determines the endpoint you use: + +- `/druid/v2/sql/task`: Submit a query for ingestion. +- `/druid/indexer/v1/task`: Interact with a query, including getting its status or details, or canceling the query. This page describes a few of the Overlord Task APIs that you can use with the MSQ task engine. For information about Druid APIs, see the [API reference for Druid](../ingestion/tasks.md). + +In this topic, `http://ROUTER_IP:ROUTER_PORT` is a placeholder for your Router service address and port. Replace it with the information for your deployment. For example, use `http://localhost:8888` for quickstart deployments. + +## Submit a query + +Submits queries to the MSQ task engine. + +The `/druid/v2/sql/task` endpoint accepts the following: + +- [SQL requests in the JSON-over-HTTP form](sql-api.md#request-body) using the +`query`, `context`, and `parameters` fields. The endpoint ignores the `resultFormat`, `header`, `typesHeader`, and `sqlTypesHeader` fields. +- [INSERT](../multi-stage-query/reference.md#insert) and [REPLACE](../multi-stage-query/reference.md#replace) statements. +- SELECT queries (experimental feature). SELECT query results are collected from workers by the controller, and written into the [task report](#get-the-report-for-a-query-task) as an array of arrays. The behavior and result format of plain SELECT queries (without INSERT or REPLACE) is subject to change. + +### URL + +`POST` `/druid/v2/sql/task` + +### Responses + + + + + + +*Successfully submitted query* + + + + + +*Error thrown due to bad query. Returns a JSON object detailing the error with the following format:* + +```json +{ + "error": "A well-defined error code.", + "errorMessage": "A message with additional details about the error.", + "errorClass": "Class of exception that caused this error.", + "host": "The host on which the error occurred." +} +``` + + + + +*Request not sent due to unexpected conditions. Returns a JSON object detailing the error with the following format:* + +```json +{ + "error": "A well-defined error code.", + "errorMessage": "A message with additional details about the error.", + "errorClass": "Class of exception that caused this error.", + "host": "The host on which the error occurred." +} +``` + + + + +--- + +### Sample request + +The following example shows a query that fetches data from an external JSON source and inserts it into a table named `wikipedia`. +The example specifies two query context parameters: + +- `maxNumTasks=3`: Limits the maximum number of parallel tasks to 3. +- `finalizeAggregations=false`: Ensures that Druid saves the aggregation's intermediate type during ingestion. For more information, see [Rollup](../multi-stage-query/concepts.md#rollup). + + + + + + +```HTTP +POST /druid/v2/sql/task HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json + +{ + "query": "SET maxNumTasks=3;\nSET finalizeAggregations=false;\nINSERT INTO wikipedia\nSELECT\n TIME_PARSE(\"timestamp\") AS __time,\n *\nFROM TABLE(\n EXTERN(\n '{\"type\": \"http\", \"uris\": [\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n '{\"type\": \"json\"}',\n '[{\"name\": \"added\", \"type\": \"long\"}, {\"name\": \"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\": \"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": \"commentLength\", \"type\": \"long\"}, {\"name\": \"countryIsoCode\", \"type\": \"string\"}, {\"name\": \"countryName\", \"type\": \"string\"}, {\"name\": \"deleted\", \"type\": \"long\"}, {\"name\": \"delta\", \"type\": \"long\"}, {\"name\": \"deltaBucket\", \"type\": \"string\"}, {\"name\": \"diffUrl\", \"type\": \"string\"}, {\"name\": \"flags\", \"type\": \"string\"}, {\"name\": \"isAnonymous\", \"type\": \"string\"}, {\"name\": \"isMinor\", \"type\": \"string\"}, {\"name\": \"isNew\", \"type\": \"string\"}, {\"name\": \"isRobot\", \"type\": \"string\"}, {\"name\": \"isUnpatrolled\", \"type\": \"string\"}, {\"name\": \"metroCode\", \"type\": \"string\"}, {\"name\": \"namespace\", \"type\": \"string\"}, {\"name\": \"page\", \"type\": \"string\"}, {\"name\": \"regionIsoCode\", \"type\": \"string\"}, {\"name\": \"regionName\", \"type\": \"string\"}, {\"name\": \"timestamp\", \"type\": \"string\"}, {\"name\": \"user\", \"type\": \"string\"}]'\n )\n)\nPARTITIONED BY DAY" +} +``` + + + + + + +```shell +curl --location --request POST 'http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/task' \ + --header 'Content-Type: application/json' \ + --data '{ + "query": "SET maxNumTasks=3;\nSET finalizeAggregations=false;\nINSERT INTO wikipedia\nSELECT\n TIME_PARSE(\"timestamp\") AS __time,\n *\nFROM TABLE(\n EXTERN(\n '\''{\"type\": \"http\", \"uris\": [\"https://druid.apache.org/data/wikipedia.json.gz\"]}'\'',\n '\''{\"type\": \"json\"}'\'',\n '\''[{\"name\": \"added\", \"type\": \"long\"}, {\"name\": \"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\": \"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": \"commentLength\", \"type\": \"long\"}, {\"name\": \"countryIsoCode\", \"type\": \"string\"}, {\"name\": \"countryName\", \"type\": \"string\"}, {\"name\": \"deleted\", \"type\": \"long\"}, {\"name\": \"delta\", \"type\": \"long\"}, {\"name\": \"deltaBucket\", \"type\": \"string\"}, {\"name\": \"diffUrl\", \"type\": \"string\"}, {\"name\": \"flags\", \"type\": \"string\"}, {\"name\": \"isAnonymous\", \"type\": \"string\"}, {\"name\": \"isMinor\", \"type\": \"string\"}, {\"name\": \"isNew\", \"type\": \"string\"}, {\"name\": \"isRobot\", \"type\": \"string\"}, {\"name\": \"isUnpatrolled\", \"type\": \"string\"}, {\"name\": \"metroCode\", \"type\": \"string\"}, {\"name\": \"namespace\", \"type\": \"string\"}, {\"name\": \"page\", \"type\": \"string\"}, {\"name\": \"regionIsoCode\", \"type\": \"string\"}, {\"name\": \"regionName\", \"type\": \"string\"}, {\"name\": \"timestamp\", \"type\": \"string\"}, {\"name\": \"user\", \"type\": \"string\"}]'\''\n )\n)\nPARTITIONED BY DAY" +}' +``` + + + + + + +```python +import json +import requests + +url = "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/task" + +payload = json.dumps({ + "query": "SET maxNumTasks=3;\nSET finalizeAggregations=false;\nINSERT INTO wikipedia\nSELECT\n TIME_PARSE(\"timestamp\") AS __time,\n *\nFROM TABLE(\n EXTERN(\n '{\"type\": \"http\", \"uris\": [\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n '{\"type\": \"json\"}',\n '[{\"name\": \"added\", \"type\": \"long\"}, {\"name\": \"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\": \"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": \"commentLength\", \"type\": \"long\"}, {\"name\": \"countryIsoCode\", \"type\": \"string\"}, {\"name\": \"countryName\", \"type\": \"string\"}, {\"name\": \"deleted\", \"type\": \"long\"}, {\"name\": \"delta\", \"type\": \"long\"}, {\"name\": \"deltaBucket\", \"type\": \"string\"}, {\"name\": \"diffUrl\", \"type\": \"string\"}, {\"name\": \"flags\", \"type\": \"string\"}, {\"name\": \"isAnonymous\", \"type\": \"string\"}, {\"name\": \"isMinor\", \"type\": \"string\"}, {\"name\": \"isNew\", \"type\": \"string\"}, {\"name\": \"isRobot\", \"type\": \"string\"}, {\"name\": \"isUnpatrolled\", \"type\": \"string\"}, {\"name\": \"metroCode\", \"type\": \"string\"}, {\"name\": \"namespace\", \"type\": \"string\"}, {\"name\": \"page\", \"type\": \"string\"}, {\"name\": \"regionIsoCode\", \"type\": \"string\"}, {\"name\": \"regionName\", \"type\": \"string\"}, {\"name\": \"timestamp\", \"type\": \"string\"}, {\"name\": \"user\", \"type\": \"string\"}]'\n )\n)\nPARTITIONED BY DAY" +}) +headers = { + 'Content-Type': 'application/json' +} + +response = requests.post(url, headers=headers, data=payload) + +print(response.text) + +``` + + + + + +### Sample response + +
+ View the response + +```json +{ + "taskId": "query-431c4a18-9dde-4ec8-ab82-ec7fd17d5a4e", + "state": "RUNNING" +} +``` +
+ +**Response fields** + +| Field | Description | +|---|---| +| `taskId` | Controller task ID. You can use Druid's standard [Tasks API](./tasks-api.md) to interact with this controller task. | +| `state` | Initial state for the query. | + +## Get the status for a query task + +Retrieves the status of a query task. It returns a JSON object with the task's status code, runner status, task type, datasource, and other relevant metadata. + +### URL + +`GET` `/druid/indexer/v1/task/{taskId}/status` + +### Responses + + + + + + +
+ +*Successfully retrieved task status* + +
+ + + +
+ +*Cannot find task with ID* + +
+
+ +--- + +### Sample request + +The following example shows how to retrieve the status of a task with the ID `query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e`. + + + + + +```HTTP +GET /druid/indexer/v1/task/query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e/status HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + + + +```shell +curl --location --request GET 'http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/task/query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e/status' +``` + + + + + + +```python +import requests + +url = "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/task/query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e/status" + +payload={} +headers = {} + +response = requests.post(url, headers=headers, data=payload) + +print(response.text) +print(response.text) +``` + + + + + +### Sample response + +
+ View the response + +```json +{ + "task": "query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e", + "status": { + "id": "query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e", + "groupId": "query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e", + "type": "query_controller", + "createdTime": "2022-09-14T22:12:00.183Z", + "queueInsertionTime": "1970-01-01T00:00:00.000Z", + "statusCode": "RUNNING", + "status": "RUNNING", + "runnerStatusCode": "RUNNING", + "duration": -1, + "location": { + "host": "localhost", + "port": 8100, + "tlsPort": -1 + }, + "dataSource": "kttm_simple", + "errorMsg": null + } +} +``` +
+ +## Get the report for a query task + +Retrieves the task report for a query. +The report provides detailed information about the query task, including things like the stages, warnings, and errors. + +Keep the following in mind when using the task API to view reports: + +- The task report for an entire job is associated with the `query_controller` task. The `query_worker` tasks don't have their own reports; their information is incorporated into the controller report. +- The task report API may report `404 Not Found` temporarily while the task is in the process of starting up. +- As an experimental feature, the MSQ task engine supports running SELECT queries. SELECT query results are written into +the `multiStageQuery.payload.results.results` task report key as an array of arrays. The behavior and result format of plain +SELECT queries (without INSERT or REPLACE) is subject to change. +- `multiStageQuery.payload.results.resultsTruncated` denotes whether the results of the report have been truncated to prevent the reports from blowing up. + +For an explanation of the fields in a report, see [Report response fields](#report-response-fields). + +### URL + + +`GET` `/druid/indexer/v1/task/{taskId}/reports` + +### Responses + + + + + + +
+ +*Successfully retrieved task report* + +
+
+ +--- + +### Sample request + +The following example shows how to retrieve the report for a query with the task ID `query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e`. + + + + + +```HTTP +GET /druid/indexer/v1/task/query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e/reports HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + + + +```shell +curl --location --request GET 'http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/task/query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e/reports' +``` + + + + + + +```python +import requests + +url = "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/task/query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e/reports" + +headers = {} + +response = requests.post(url, headers=headers, data=payload) + +print(response.text) +print(response.text) +``` + + + + + +### Sample response + +The response shows an example report for a query. + +
+View the response + +```json +{ + "multiStageQuery": { + "type": "multiStageQuery", + "taskId": "query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e", + "payload": { + "status": { + "status": "SUCCESS", + "startTime": "2022-09-14T22:12:09.266Z", + "durationMs": 28227, + "workers": { + "0": [ + { + "workerId": "query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e-worker0_0", + "state": "SUCCESS", + "durationMs": 15511, + "pendingMs": 137 + } + ] + }, + "pendingTasks": 0, + "runningTasks": 2, + "segmentLoadWaiterStatus": { + "state": "SUCCESS", + "dataSource": "kttm_simple", + "startTime": "2022-09-14T23:12:09.266Z", + "duration": 15, + "totalSegments": 1, + "usedSegments": 1, + "precachedSegments": 0, + "onDemandSegments": 0, + "pendingSegments": 0, + "unknownSegments": 0 + }, + "segmentReport": { + "shardSpec": "NumberedShardSpec", + "details": "Cannot use RangeShardSpec, RangedShardSpec only supports string CLUSTER BY keys. Using NumberedShardSpec instead." + } + }, + "stages": [ + { + "stageNumber": 0, + "definition": { + "id": "71ecb11e-09d7-42f8-9225-1662c8e7e121_0", + "input": [ + { + "type": "external", + "inputSource": { + "type": "http", + "uris": [ + "https://static.imply.io/example-data/kttm-v2/kttm-v2-2019-08-25.json.gz" + ], + "httpAuthenticationUsername": null, + "httpAuthenticationPassword": null + }, + "inputFormat": { + "type": "json", + "flattenSpec": null, + "featureSpec": {}, + "keepNullColumns": false + }, + "signature": [ + { + "name": "timestamp", + "type": "STRING" + }, + { + "name": "agent_category", + "type": "STRING" + }, + { + "name": "agent_type", + "type": "STRING" + } + ] + } + ], + "processor": { + "type": "scan", + "query": { + "queryType": "scan", + "dataSource": { + "type": "inputNumber", + "inputNumber": 0 + }, + "intervals": { + "type": "intervals", + "intervals": [ + "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z" + ] + }, + "resultFormat": "compactedList", + "columns": [ + "agent_category", + "agent_type", + "timestamp" + ], + "context": { + "finalize": false, + "finalizeAggregations": false, + "groupByEnableMultiValueUnnesting": false, + "scanSignature": "[{\"name\":\"agent_category\",\"type\":\"STRING\"},{\"name\":\"agent_type\",\"type\":\"STRING\"},{\"name\":\"timestamp\",\"type\":\"STRING\"}]", + "sqlInsertSegmentGranularity": "{\"type\":\"all\"}", + "sqlQueryId": "3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e", + "sqlReplaceTimeChunks": "all" + }, + "granularity": { + "type": "all" + } + } + }, + "signature": [ + { + "name": "__boost", + "type": "LONG" + }, + { + "name": "agent_category", + "type": "STRING" + }, + { + "name": "agent_type", + "type": "STRING" + }, + { + "name": "timestamp", + "type": "STRING" + } + ], + "shuffleSpec": { + "type": "targetSize", + "clusterBy": { + "columns": [ + { + "columnName": "__boost" + } + ] + }, + "targetSize": 3000000 + }, + "maxWorkerCount": 1, + "shuffleCheckHasMultipleValues": true + }, + "phase": "FINISHED", + "workerCount": 1, + "partitionCount": 1, + "startTime": "2022-09-14T22:12:11.663Z", + "duration": 19965, + "sort": true + }, + { + "stageNumber": 1, + "definition": { + "id": "71ecb11e-09d7-42f8-9225-1662c8e7e121_1", + "input": [ + { + "type": "stage", + "stage": 0 + } + ], + "processor": { + "type": "segmentGenerator", + "dataSchema": { + "dataSource": "kttm_simple", + "timestampSpec": { + "column": "__time", + "format": "millis", + "missingValue": null + }, + "dimensionsSpec": { + "dimensions": [ + { + "type": "string", + "name": "timestamp", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "agent_category", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "agent_type", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "dimensionExclusions": [ + "__time" + ], + "includeAllDimensions": false + }, + "metricsSpec": [], + "granularitySpec": { + "type": "arbitrary", + "queryGranularity": { + "type": "none" + }, + "rollup": false, + "intervals": [ + "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z" + ] + }, + "transformSpec": { + "filter": null, + "transforms": [] + } + }, + "columnMappings": [ + { + "queryColumn": "timestamp", + "outputColumn": "timestamp" + }, + { + "queryColumn": "agent_category", + "outputColumn": "agent_category" + }, + { + "queryColumn": "agent_type", + "outputColumn": "agent_type" + } + ], + "tuningConfig": { + "maxNumWorkers": 1, + "maxRowsInMemory": 100000, + "rowsPerSegment": 3000000 + } + }, + "signature": [], + "maxWorkerCount": 1 + }, + "phase": "FINISHED", + "workerCount": 1, + "partitionCount": 1, + "startTime": "2022-09-14T22:12:31.602Z", + "duration": 5891 + } + ], + "counters": { + "0": { + "0": { + "input0": { + "type": "channel", + "rows": [ + 465346 + ], + "files": [ + 1 + ], + "totalFiles": [ + 1 + ] + }, + "output": { + "type": "channel", + "rows": [ + 465346 + ], + "bytes": [ + 43694447 + ], + "frames": [ + 7 + ] + }, + "shuffle": { + "type": "channel", + "rows": [ + 465346 + ], + "bytes": [ + 41835307 + ], + "frames": [ + 73 + ] + }, + "sortProgress": { + "type": "sortProgress", + "totalMergingLevels": 3, + "levelToTotalBatches": { + "0": 1, + "1": 1, + "2": 1 + }, + "levelToMergedBatches": { + "0": 1, + "1": 1, + "2": 1 + }, + "totalMergersForUltimateLevel": 1, + "progressDigest": 1 + } + } + }, + "1": { + "0": { + "input0": { + "type": "channel", + "rows": [ + 465346 + ], + "bytes": [ + 41835307 + ], + "frames": [ + 73 + ] + }, + "segmentGenerationProgress": { + "type": "segmentGenerationProgress", + "rowsProcessed": 465346, + "rowsPersisted": 465346, + "rowsMerged": 465346 + } + } + } + } + } + } +} +``` + +
+ + + +The following table describes the response fields when you retrieve a report for a MSQ task engine using the `/druid/indexer/v1/task/{taskId}/reports` endpoint: + +| Field | Description | +|---|---| +| `multiStageQuery.taskId` | Controller task ID. | +| `multiStageQuery.payload.status` | Query status container. | +| `multiStageQuery.payload.status.status` | RUNNING, SUCCESS, or FAILED. | +| `multiStageQuery.payload.status.startTime` | Start time of the query in ISO format. Only present if the query has started running. | +| `multiStageQuery.payload.status.durationMs` | Milliseconds elapsed after the query has started running. -1 denotes that the query hasn't started running yet. | +| `multiStageQuery.payload.status.workers` | Workers for the controller task.| +| `multiStageQuery.payload.status.workers.` | Array of worker tasks including retries. | +| `multiStageQuery.payload.status.workers.[].workerId` | Id of the worker task.| | +| `multiStageQuery.payload.status.workers.[].status` | RUNNING, SUCCESS, or FAILED.| +| `multiStageQuery.payload.status.workers.[].durationMs` | Milliseconds elapsed between when the worker task was first requested and when it finished. It is -1 for worker tasks with status RUNNING.| +| `multiStageQuery.payload.status.workers.[].pendingMs` | Milliseconds elapsed between when the worker task was first requested and when it fully started RUNNING. Actual work time can be calculated using `actualWorkTimeMS = durationMs - pendingMs`.| +| `multiStageQuery.payload.status.pendingTasks` | Number of tasks that are not fully started. -1 denotes that the number is currently unknown. | +| `multiStageQuery.payload.status.runningTasks` | Number of currently running tasks. Should be at least 1 since the controller is included. | +| `multiStageQuery.payload.status.segmentLoadStatus` | Segment loading container. Only present after the segments have been published. | +| `multiStageQuery.payload.status.segmentLoadStatus.state` | Either INIT, WAITING, SUCCESS, FAILED or TIMED_OUT. | +| `multiStageQuery.payload.status.segmentLoadStatus.startTime` | Time since which the controller has been waiting for the segments to finish loading. | +| `multiStageQuery.payload.status.segmentLoadStatus.duration` | The duration in milliseconds that the controller has been waiting for the segments to load. | +| `multiStageQuery.payload.status.segmentLoadStatus.totalSegments` | The total number of segments generated by the job. This includes tombstone segments (if any). | +| `multiStageQuery.payload.status.segmentLoadStatus.usedSegments` | The number of segments which are marked as used based on the load rules. Unused segments can be cleaned up at any time. | +| `multiStageQuery.payload.status.segmentLoadStatus.precachedSegments` | The number of segments which are marked as precached and served by historicals, as per the load rules. | +| `multiStageQuery.payload.status.segmentLoadStatus.onDemandSegments` | The number of segments which are not loaded on any historical, as per the load rules. | +| `multiStageQuery.payload.status.segmentLoadStatus.pendingSegments` | The number of segments remaining to be loaded. | +| `multiStageQuery.payload.status.segmentLoadStatus.unknownSegments` | The number of segments whose status is unknown. | +| `multiStageQuery.payload.status.segmentReport` | Segment report. Only present if the query is an ingestion. | +| `multiStageQuery.payload.status.segmentReport.shardSpec` | Contains the shard spec chosen. | +| `multiStageQuery.payload.status.segmentReport.details` | Contains further reasoning about the shard spec chosen. | +| `multiStageQuery.payload.status.errorReport` | Error object. Only present if there was an error. | +| `multiStageQuery.payload.status.errorReport.taskId` | The task that reported the error, if known. May be a controller task or a worker task. | +| `multiStageQuery.payload.status.errorReport.host` | The hostname and port of the task that reported the error, if known. | +| `multiStageQuery.payload.status.errorReport.stageNumber` | The stage number that reported the error, if it happened during execution of a specific stage. | +| `multiStageQuery.payload.status.errorReport.error` | Error object. Contains `errorCode` at a minimum, and may contain other fields as described in the [error code table](../multi-stage-query/reference.md#error-codes). Always present if there is an error. | +| `multiStageQuery.payload.status.errorReport.error.errorCode` | One of the error codes from the [error code table](../multi-stage-query/reference.md#error-codes). Always present if there is an error. | +| `multiStageQuery.payload.status.errorReport.error.errorMessage` | User-friendly error message. Not always present, even if there is an error. | +| `multiStageQuery.payload.status.errorReport.exceptionStackTrace` | Java stack trace in string form, if the error was due to a server-side exception. | +| `multiStageQuery.payload.stages` | Array of query stages. | +| `multiStageQuery.payload.stages[].stageNumber` | Each stage has a number that differentiates it from other stages. | +| `multiStageQuery.payload.stages[].phase` | Either NEW, READING_INPUT, POST_READING, RESULTS_COMPLETE, or FAILED. Only present if the stage has started. | +| `multiStageQuery.payload.stages[].workerCount` | Number of parallel tasks that this stage is running on. Only present if the stage has started. | +| `multiStageQuery.payload.stages[].partitionCount` | Number of output partitions generated by this stage. Only present if the stage has started and has computed its number of output partitions. | +| `multiStageQuery.payload.stages[].startTime` | Start time of this stage. Only present if the stage has started. | +| `multiStageQuery.payload.stages[].duration` | The number of milliseconds that the stage has been running. Only present if the stage has started. | +| `multiStageQuery.payload.stages[].sort` | A boolean that is set to `true` if the stage does a sort as part of its execution. | +| `multiStageQuery.payload.stages[].definition` | The object defining what the stage does. | +| `multiStageQuery.payload.stages[].definition.id` | The unique identifier of the stage. | +| `multiStageQuery.payload.stages[].definition.input` | Array of inputs that the stage has. | +| `multiStageQuery.payload.stages[].definition.broadcast` | Array of input indexes that get broadcasted. Only present if there are inputs that get broadcasted. | +| `multiStageQuery.payload.stages[].definition.processor` | An object defining the processor logic. | +| `multiStageQuery.payload.stages[].definition.signature` | The output signature of the stage. | + +## Cancel a query task + +Cancels a query task. +Returns a JSON object with the ID of the task that was canceled successfully. + +### URL + +`POST` `/druid/indexer/v1/task/{taskId}/shutdown` + +### Responses + + + + + + +
+ +*Successfully shut down task* + +
+ + + +
+ +*Cannot find task with ID or task is no longer running* + +
+
+ +--- + +### Sample request + +The following example shows how to cancel a query task with the ID `query-655efe33-781a-4c50-ae84-c2911b42d63c`. + + + + + + +```HTTP +POST /druid/indexer/v1/task/query-655efe33-781a-4c50-ae84-c2911b42d63c/shutdown HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + + + +```shell +curl --location --request POST 'http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/task/query-655efe33-781a-4c50-ae84-c2911b42d63c/shutdown' +``` + + + + + + +```python +import requests + +url = "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/task/query-655efe33-781a-4c50-ae84-c2911b42d63c/shutdown" + +payload = {} +headers = {} + +response = requests.post(url, headers=headers, data=payload) + +print(response.text) +print(response.text) +``` + + + + + +### Sample response + +The response shows the ID of the task that was canceled. + +```json +{ + "task": "query-655efe33-781a-4c50-ae84-c2911b42d63c" +} +``` \ No newline at end of file diff --git a/docs/35.0.0/api-reference/sql-jdbc.md b/docs/35.0.0/api-reference/sql-jdbc.md new file mode 100644 index 0000000000..affe9ea738 --- /dev/null +++ b/docs/35.0.0/api-reference/sql-jdbc.md @@ -0,0 +1,251 @@ +--- +id: sql-jdbc +title: SQL JDBC driver API +sidebar_label: SQL JDBC driver +--- + + + +:::info + Apache Druid supports two query languages: Druid SQL and [native queries](../querying/querying.md). + This document describes the SQL language. +::: + +You can make [Druid SQL](../querying/sql.md) queries using the [Avatica JDBC driver](https://calcite.apache.org/avatica/downloads/). +We recommend using Avatica JDBC driver version 1.23.0 or later. Note that starting with Avatica 1.21.0, you may need to set the [`transparent_reconnection`](https://calcite.apache.org/avatica/docs/client_reference.html#transparent_reconnection) property to `true` if you notice intermittent query failures. + +Once you've downloaded the Avatica client jar, add it to your classpath. + +Example connection string: + +``` +jdbc:avatica:remote:url=http://localhost:8888/druid/v2/sql/avatica/;transparent_reconnection=true +``` + +Or, to use the protobuf protocol instead of JSON: + +``` +jdbc:avatica:remote:url=http://localhost:8888/druid/v2/sql/avatica-protobuf/;transparent_reconnection=true;serialization=protobuf +``` + +The `url` is the `/druid/v2/sql/avatica/` endpoint on the Router, which routes JDBC connections to a consistent Broker. +For more information, see [Connection stickiness](#connection-stickiness). + +Set `transparent_reconnection` to `true` so your connection is not interrupted if the pool of Brokers changes membership, +or if a Broker is restarted. + +Set `serialization` to `protobuf` if using the protobuf endpoint. + +Note that as of the time of this writing, Avatica 1.23.0, the latest version, does not support passing +[connection context parameters](../querying/sql-query-context.md) from the JDBC connection string to Druid. These context parameters +must be passed using a `Properties` object instead. Refer to the Java code below for an example. + +Example Java code: + +```java +// Connect to /druid/v2/sql/avatica/ on your Broker. +String url = "jdbc:avatica:remote:url=http://localhost:8888/druid/v2/sql/avatica/;transparent_reconnection=true"; + +// Set any connection context parameters you need here. +// Any property from https://druid.apache.org/docs/latest/querying/sql-query-context.html can go here. +Properties connectionProperties = new Properties(); +connectionProperties.setProperty("sqlTimeZone", "Etc/UTC"); +//To connect to a Druid deployment protected by basic authentication, +//you can incorporate authentication details from https://druid.apache.org/docs/latest/operations/security-overview +connectionProperties.setProperty("user", "admin"); +connectionProperties.setProperty("password", "password1"); + +try (Connection connection = DriverManager.getConnection(url, connectionProperties)) { + try ( + final Statement statement = connection.createStatement(); + final ResultSet resultSet = statement.executeQuery(query) + ) { + while (resultSet.next()) { + // process result set + } + } +} +``` + +For a runnable example that includes a query that you might run, see [Examples](#examples). + +It is also possible to use a protocol buffers JDBC connection with Druid, this offer reduced bloat and potential performance +improvements for larger result sets. To use it apply the following connection URL instead, everything else remains the same + +``` +String url = "jdbc:avatica:remote:url=http://localhost:8888/druid/v2/sql/avatica-protobuf/;transparent_reconnection=true;serialization=protobuf"; +``` + +:::info + The protobuf endpoint is also known to work with the official [Golang Avatica driver](https://github.com/apache/calcite-avatica-go) +::: + +Table metadata is available over JDBC using `connection.getMetaData()` or by querying the +[INFORMATION_SCHEMA tables](../querying/sql-metadata-tables.md). For an example of this, see [Get the metadata for a datasource](#get-the-metadata-for-a-datasource). + +## Connection stickiness + +Druid's JDBC server does not share connection state between Brokers. This means that if you're using JDBC and have +multiple Druid Brokers, you should either connect to a specific Broker or use a load balancer with sticky sessions +enabled. The Druid Router process provides connection stickiness when balancing JDBC requests, and can be used to achieve +the necessary stickiness even with a normal non-sticky load balancer. Please see the +[Router](../design/router.md) documentation for more details. + +Note that the non-JDBC [JSON over HTTP](sql-api.md#submit-a-query) API is stateless and does not require stickiness. + +## Dynamic parameters + +You can use [parameterized queries](../querying/sql.md#dynamic-parameters) in JDBC code, as in this example: + +```java +PreparedStatement statement = connection.prepareStatement("SELECT COUNT(*) AS cnt FROM druid.foo WHERE dim1 = ? OR dim1 = ?"); +statement.setString(1, "abc"); +statement.setString(2, "def"); +final ResultSet resultSet = statement.executeQuery(); +``` + +Sample code where dynamic parameters replace arrays using STRING_TO_ARRAY: +```java +PreparedStatement statement = connection.prepareStatement("select l1 from numfoo where SCALAR_IN_ARRAY(l1, STRING_TO_ARRAY(CAST(? as varchar),','))"); +List li = ImmutableList.of(0, 7); +String sqlArg = Joiner.on(",").join(li); +statement.setString(1, sqlArg); +statement.executeQuery(); +``` + +Sample code using native array: +```java +PreparedStatement statement = connection.prepareStatement("select l1 from numfoo where SCALAR_IN_ARRAY(l1, ?)"); +Iterable list = ImmutableList.of(0, 7); +ArrayFactoryImpl arrayFactoryImpl = new ArrayFactoryImpl(TimeZone.getDefault()); +AvaticaType type = ColumnMetaData.scalar(Types.INTEGER, SqlType.INTEGER.name(), Rep.INTEGER); +Array array = arrayFactoryImpl.createArray(type, list); +statement.setArray(1, array); +statement.executeQuery(); +``` + +## Examples + + + +The following section contains two complete samples that use the JDBC connector: + +- [Get the metadata for a datasource](#get-the-metadata-for-a-datasource) shows you how to query the `INFORMATION_SCHEMA` to get metadata like column names. +- [Query data](#query-data) runs a select query against the datasource. + +You can try out these examples after verifying that you meet the [prerequisites](#prerequisites). + +For more information about the connection options, see [Client Reference](https://calcite.apache.org/avatica/docs/client_reference.html). + +### Prerequisites + +Make sure you meet the following requirements before trying these examples: + +- A supported [Java version](../operations/java.md) + +- [Avatica JDBC driver](https://calcite.apache.org/avatica/downloads/). You can add the JAR to your `CLASSPATH` directly or manage it externally, such as through Maven and a `pom.xml` file. + +- An available Druid instance. You can use the `micro-quickstart` configuration described in [Quickstart (local)](../tutorials/index.md). The examples assume that you are using the quickstart, so no authentication or authorization is expected unless explicitly mentioned. + +- The example `wikipedia` datasource from the quickstart is loaded on your Druid instance. If you have a different datasource loaded, you can still try these examples. You'll have to update the table name and column names to match your datasource. + +### Get the metadata for a datasource + +Metadata, such as column names, is available either through the [`INFORMATION_SCHEMA`](../querying/sql-metadata-tables.md) table or through `connection.getMetaData()`. The following example uses the `INFORMATION_SCHEMA` table to retrieve and print the list of column names for the `wikipedia` datasource that you loaded during a previous tutorial. + +```java +import java.sql.*; +import java.util.Properties; + +public class JdbcListColumns { + + public static void main(String[] args) + { + // Connect to /druid/v2/sql/avatica/ on your Router. + // You can connect to a Broker but must configure connection stickiness if you do. + String url = "jdbc:avatica:remote:url=http://localhost:8888/druid/v2/sql/avatica/;transparent_reconnection=true"; + + String query = "SELECT COLUMN_NAME,* FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'wikipedia' and TABLE_SCHEMA='druid'"; + + // Set any connection context parameters you need here. + // Any property from https://druid.apache.org/docs/latest/querying/sql-query-context.html can go here. + Properties connectionProperties = new Properties(); + + try (Connection connection = DriverManager.getConnection(url, connectionProperties)) { + try ( + final Statement statement = connection.createStatement(); + final ResultSet rs = statement.executeQuery(query) + ) { + while (rs.next()) { + String columnName = rs.getString("COLUMN_NAME"); + System.out.println(columnName); + } + } + } catch (SQLException e) { + throw new RuntimeException(e); + } + + } +} +``` + +### Query data + +Now that you know what columns are available, you can start querying the data. The following example queries the datasource named `wikipedia` for the timestamps and comments from Japan. It also sets the [query context parameter](../querying/sql-query-context.md) `sqlTimeZone`. Optionally, you can also parameterize queries by using [dynamic parameters](#dynamic-parameters). + +```java +import java.sql.*; +import java.util.Properties; + +public class JdbcCountryAndTime { + + public static void main(String[] args) + { + // Connect to /druid/v2/sql/avatica/ on your Router. + // You can connect to a Broker but must configure connection stickiness if you do. + String url = "jdbc:avatica:remote:url=http://localhost:8888/druid/v2/sql/avatica/;transparent_reconnection=true"; + + //The query you want to run. + String query = "SELECT __time, isRobot, countryName, comment FROM wikipedia WHERE countryName='Japan'"; + + // Set any connection context parameters you need here. + // Any property from https://druid.apache.org/docs/latest/querying/sql-query-context.html can go here. + Properties connectionProperties = new Properties(); + connectionProperties.setProperty("sqlTimeZone", "America/Los_Angeles"); + + try (Connection connection = DriverManager.getConnection(url, connectionProperties)) { + try ( + final Statement statement = connection.createStatement(); + final ResultSet rs = statement.executeQuery(query) + ) { + while (rs.next()) { + Timestamp timeStamp = rs.getTimestamp("__time"); + String comment = rs.getString("comment"); + System.out.println(timeStamp); + System.out.println(comment); + } + } + } catch (SQLException e) { + throw new RuntimeException(e); + } + + } +} +``` diff --git a/docs/35.0.0/api-reference/supervisor-api.md b/docs/35.0.0/api-reference/supervisor-api.md new file mode 100644 index 0000000000..38e68d4e13 --- /dev/null +++ b/docs/35.0.0/api-reference/supervisor-api.md @@ -0,0 +1,3652 @@ +--- +id: supervisor-api +title: Supervisor API +sidebar_label: Supervisors +--- +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + + +This topic describes the API endpoints to manage and monitor supervisors for Apache Druid. +The topic uses the Apache Kafka term offset to refer to the identifier for records in a partition. If you are using Amazon Kinesis, the equivalent is sequence number. + +In this topic, `http://ROUTER_IP:ROUTER_PORT` is a placeholder for your Router service address and port. Replace it with the information for your deployment. For example, use `http://localhost:8888` for quickstart deployments. + +## Supervisor information + +The following table lists the properties of a supervisor object: + +|Property|Type|Description| +|---|---|---| +|`id`|String|Unique identifier.| +|`state`|String|Generic state of the supervisor. Available states:`UNHEALTHY_SUPERVISOR`, `UNHEALTHY_TASKS`, `PENDING`, `RUNNING`, `SUSPENDED`, `STOPPING`. See [Supervisor reference](../ingestion/supervisor.md#status-report) for more information.| +|`detailedState`|String|Detailed state of the supervisor. This property contains a more descriptive, implementation-specific state that may provide more insight into the supervisor's activities than the `state` property. See [Apache Kafka ingestion](../ingestion/kafka-ingestion.md) and [Amazon Kinesis ingestion](../ingestion/kinesis-ingestion.md) for supervisor-specific states.| +|`healthy`|Boolean|Supervisor health indicator.| +|`spec`|Object|Container object for the supervisor configuration.| +|`suspended`|Boolean|Indicates whether the supervisor is in a suspended state.| + +### Get an array of active supervisor IDs + +Returns an array of strings representing the names of active supervisors. If there are no active supervisors, it returns an empty array. + +#### URL + +`GET` `/druid/indexer/v1/supervisor` + +#### Responses + + + + + + +*Successfully retrieved array of active supervisor IDs* + + + + +--- + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor" +``` + + + + + +```HTTP +GET /druid/indexer/v1/supervisor HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + [ + "wikipedia_stream", + "social_media" + ] + ``` +
+ +### Get an array of active supervisor objects + +Retrieves an array of active supervisor objects. If there are no active supervisors, it returns an empty array. For reference on the supervisor object properties, see the preceding [table](#supervisor-information). + +#### URL + +`GET` `/druid/indexer/v1/supervisor?full` + +#### Responses + + + + + + +*Successfully retrieved supervisor objects* + + + + +--- + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor?full=null" +``` + + + + + +```HTTP +GET /druid/indexer/v1/supervisor?full=null HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + [ + { + "id": "wikipedia_stream", + "state": "RUNNING", + "detailedState": "CONNECTING_TO_STREAM", + "healthy": true, + "spec": { + "type": "kafka", + "spec": { + "dataSchema": { + "dataSource": "wikipedia_stream", + "timestampSpec": { + "column": "__time", + "format": "iso", + "missingValue": null + }, + "dimensionsSpec": { + "dimensions": [ + { + "type": "string", + "name": "username", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "post_title", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "long", + "name": "views", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "upvotes", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "comments", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "string", + "name": "edited", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "dimensionExclusions": [ + "__time" + ], + "includeAllDimensions": false, + "useSchemaDiscovery": false + }, + "metricsSpec": [], + "granularitySpec": { + "type": "uniform", + "segmentGranularity": "HOUR", + "queryGranularity": { + "type": "none" + }, + "rollup": false, + "intervals": [] + }, + "transformSpec": { + "filter": null, + "transforms": [] + } + }, + "ioConfig": { + "topic": "social_media", + "inputFormat": { + "type": "json" + }, + "replicas": 1, + "taskCount": 1, + "taskDuration": "PT3600S", + "consumerProperties": { + "bootstrap.servers": "localhost:9042" + }, + "autoScalerConfig": null, + "pollTimeout": 100, + "startDelay": "PT5S", + "period": "PT30S", + "useEarliestOffset": true, + "completionTimeout": "PT1800S", + "lateMessageRejectionPeriod": null, + "earlyMessageRejectionPeriod": null, + "lateMessageRejectionStartDateTime": null, + "configOverrides": null, + "idleConfig": null, + "stream": "social_media", + "useEarliestSequenceNumber": true + }, + "tuningConfig": { + "type": "kafka", + "appendableIndexSpec": { + "type": "onheap", + "preserveExistingMetrics": false + }, + "maxRowsInMemory": 150000, + "maxBytesInMemory": 0, + "skipBytesInMemoryOverheadCheck": false, + "maxRowsPerSegment": 5000000, + "maxTotalRows": null, + "intermediatePersistPeriod": "PT10M", + "maxPendingPersists": 0, + "indexSpec": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "indexSpecForIntermediatePersists": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "reportParseExceptions": false, + "handoffConditionTimeout": 0, + "resetOffsetAutomatically": false, + "segmentWriteOutMediumFactory": null, + "workerThreads": null, + "chatRetries": 8, + "httpTimeout": "PT10S", + "shutdownTimeout": "PT80S", + "offsetFetchPeriod": "PT30S", + "intermediateHandoffPeriod": "P2147483647D", + "logParseExceptions": false, + "maxParseExceptions": 2147483647, + "maxSavedParseExceptions": 0, + "skipSequenceNumberAvailabilityCheck": false, + "repartitionTransitionDuration": "PT120S" + } + }, + "dataSchema": { + "dataSource": "wikipedia_stream", + "timestampSpec": { + "column": "__time", + "format": "iso", + "missingValue": null + }, + "dimensionsSpec": { + "dimensions": [ + { + "type": "string", + "name": "username", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "post_title", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "long", + "name": "views", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "upvotes", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "comments", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "string", + "name": "edited", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "dimensionExclusions": [ + "__time" + ], + "includeAllDimensions": false, + "useSchemaDiscovery": false + }, + "metricsSpec": [], + "granularitySpec": { + "type": "uniform", + "segmentGranularity": "HOUR", + "queryGranularity": { + "type": "none" + }, + "rollup": false, + "intervals": [] + }, + "transformSpec": { + "filter": null, + "transforms": [] + } + }, + "tuningConfig": { + "type": "kafka", + "appendableIndexSpec": { + "type": "onheap", + "preserveExistingMetrics": false + }, + "maxRowsInMemory": 150000, + "maxBytesInMemory": 0, + "skipBytesInMemoryOverheadCheck": false, + "maxRowsPerSegment": 5000000, + "maxTotalRows": null, + "intermediatePersistPeriod": "PT10M", + "maxPendingPersists": 0, + "indexSpec": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "indexSpecForIntermediatePersists": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "reportParseExceptions": false, + "handoffConditionTimeout": 0, + "resetOffsetAutomatically": false, + "segmentWriteOutMediumFactory": null, + "workerThreads": null, + "chatRetries": 8, + "httpTimeout": "PT10S", + "shutdownTimeout": "PT80S", + "offsetFetchPeriod": "PT30S", + "intermediateHandoffPeriod": "P2147483647D", + "logParseExceptions": false, + "maxParseExceptions": 2147483647, + "maxSavedParseExceptions": 0, + "skipSequenceNumberAvailabilityCheck": false, + "repartitionTransitionDuration": "PT120S" + }, + "ioConfig": { + "topic": "social_media", + "inputFormat": { + "type": "json" + }, + "replicas": 1, + "taskCount": 1, + "taskDuration": "PT3600S", + "consumerProperties": { + "bootstrap.servers": "localhost:9042" + }, + "autoScalerConfig": null, + "pollTimeout": 100, + "startDelay": "PT5S", + "period": "PT30S", + "useEarliestOffset": true, + "completionTimeout": "PT1800S", + "lateMessageRejectionPeriod": null, + "earlyMessageRejectionPeriod": null, + "lateMessageRejectionStartDateTime": null, + "configOverrides": null, + "idleConfig": null, + "stream": "social_media", + "useEarliestSequenceNumber": true + }, + "context": null, + "suspended": false + }, + "suspended": false + }, + { + "id": "social_media", + "state": "RUNNING", + "detailedState": "RUNNING", + "healthy": true, + "spec": { + "type": "kafka", + "spec": { + "dataSchema": { + "dataSource": "social_media", + "timestampSpec": { + "column": "__time", + "format": "iso", + "missingValue": null + }, + "dimensionsSpec": { + "dimensions": [ + { + "type": "string", + "name": "username", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "post_title", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "long", + "name": "views", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "upvotes", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "comments", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "string", + "name": "edited", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "dimensionExclusions": [ + "__time" + ], + "includeAllDimensions": false, + "useSchemaDiscovery": false + }, + "metricsSpec": [], + "granularitySpec": { + "type": "uniform", + "segmentGranularity": "HOUR", + "queryGranularity": { + "type": "none" + }, + "rollup": false, + "intervals": [] + }, + "transformSpec": { + "filter": null, + "transforms": [] + } + }, + "ioConfig": { + "topic": "social_media", + "inputFormat": { + "type": "json" + }, + "replicas": 1, + "taskCount": 1, + "taskDuration": "PT3600S", + "consumerProperties": { + "bootstrap.servers": "localhost:9094" + }, + "autoScalerConfig": null, + "pollTimeout": 100, + "startDelay": "PT5S", + "period": "PT30S", + "useEarliestOffset": true, + "completionTimeout": "PT1800S", + "lateMessageRejectionPeriod": null, + "earlyMessageRejectionPeriod": null, + "lateMessageRejectionStartDateTime": null, + "configOverrides": null, + "idleConfig": null, + "stream": "social_media", + "useEarliestSequenceNumber": true + }, + "tuningConfig": { + "type": "kafka", + "appendableIndexSpec": { + "type": "onheap", + "preserveExistingMetrics": false + }, + "maxRowsInMemory": 150000, + "maxBytesInMemory": 0, + "skipBytesInMemoryOverheadCheck": false, + "maxRowsPerSegment": 5000000, + "maxTotalRows": null, + "intermediatePersistPeriod": "PT10M", + "maxPendingPersists": 0, + "indexSpec": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "indexSpecForIntermediatePersists": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "reportParseExceptions": false, + "handoffConditionTimeout": 0, + "resetOffsetAutomatically": false, + "segmentWriteOutMediumFactory": null, + "workerThreads": null, + "chatRetries": 8, + "httpTimeout": "PT10S", + "shutdownTimeout": "PT80S", + "offsetFetchPeriod": "PT30S", + "intermediateHandoffPeriod": "P2147483647D", + "logParseExceptions": false, + "maxParseExceptions": 2147483647, + "maxSavedParseExceptions": 0, + "skipSequenceNumberAvailabilityCheck": false, + "repartitionTransitionDuration": "PT120S" + } + }, + "dataSchema": { + "dataSource": "social_media", + "timestampSpec": { + "column": "__time", + "format": "iso", + "missingValue": null + }, + "dimensionsSpec": { + "dimensions": [ + { + "type": "string", + "name": "username", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "post_title", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "long", + "name": "views", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "upvotes", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "comments", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "string", + "name": "edited", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "dimensionExclusions": [ + "__time" + ], + "includeAllDimensions": false, + "useSchemaDiscovery": false + }, + "metricsSpec": [], + "granularitySpec": { + "type": "uniform", + "segmentGranularity": "HOUR", + "queryGranularity": { + "type": "none" + }, + "rollup": false, + "intervals": [] + }, + "transformSpec": { + "filter": null, + "transforms": [] + } + }, + "tuningConfig": { + "type": "kafka", + "appendableIndexSpec": { + "type": "onheap", + "preserveExistingMetrics": false + }, + "maxRowsInMemory": 150000, + "maxBytesInMemory": 0, + "skipBytesInMemoryOverheadCheck": false, + "maxRowsPerSegment": 5000000, + "maxTotalRows": null, + "intermediatePersistPeriod": "PT10M", + "maxPendingPersists": 0, + "indexSpec": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "indexSpecForIntermediatePersists": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "reportParseExceptions": false, + "handoffConditionTimeout": 0, + "resetOffsetAutomatically": false, + "segmentWriteOutMediumFactory": null, + "workerThreads": null, + "chatRetries": 8, + "httpTimeout": "PT10S", + "shutdownTimeout": "PT80S", + "offsetFetchPeriod": "PT30S", + "intermediateHandoffPeriod": "P2147483647D", + "logParseExceptions": false, + "maxParseExceptions": 2147483647, + "maxSavedParseExceptions": 0, + "skipSequenceNumberAvailabilityCheck": false, + "repartitionTransitionDuration": "PT120S" + }, + "ioConfig": { + "topic": "social_media", + "inputFormat": { + "type": "json" + }, + "replicas": 1, + "taskCount": 1, + "taskDuration": "PT3600S", + "consumerProperties": { + "bootstrap.servers": "localhost:9094" + }, + "autoScalerConfig": null, + "pollTimeout": 100, + "startDelay": "PT5S", + "period": "PT30S", + "useEarliestOffset": true, + "completionTimeout": "PT1800S", + "lateMessageRejectionPeriod": null, + "earlyMessageRejectionPeriod": null, + "lateMessageRejectionStartDateTime": null, + "configOverrides": null, + "idleConfig": null, + "stream": "social_media", + "useEarliestSequenceNumber": true + }, + "context": null, + "suspended": false + }, + "suspended": false + } + ] + ``` +
+ +### Get an array of supervisor states + +Retrieves an array of objects representing active supervisors and their current state. If there are no active supervisors, it returns an empty array. For reference on the supervisor object properties, see the preceding [table](#supervisor-information). + +#### URL + +`GET` `/druid/indexer/v1/supervisor?state=true` + +#### Responses + + + + + + +*Successfully retrieved supervisor state objects* + + + + +--- + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor?state=true" +``` + + + + + +```HTTP +GET /druid/indexer/v1/supervisor?state=true HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + [ + { + "id": "wikipedia_stream", + "state": "UNHEALTHY_SUPERVISOR", + "detailedState": "UNABLE_TO_CONNECT_TO_STREAM", + "healthy": false, + "suspended": false + }, + { + "id": "social_media", + "state": "RUNNING", + "detailedState": "RUNNING", + "healthy": true, + "suspended": false + } + ] + ``` + +
+ +### Get supervisor specification + +Retrieves the specification for a single supervisor. The returned specification includes the `dataSchema`, `ioConfig`, and `tuningConfig` objects. + +#### URL + +`GET` `/druid/indexer/v1/supervisor/{supervisorId}` + +#### Responses + + + + + + +*Successfully retrieved supervisor spec* + + + + + +*Invalid supervisor ID* + + + + +--- + +#### Sample request + +The following example shows how to retrieve the specification of a supervisor with the name `wikipedia_stream`. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/wikipedia_stream" +``` + + + + + +```HTTP +GET /druid/indexer/v1/supervisor/wikipedia_stream HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + + +#### Sample response + +
+ View the response + + ```json +{ + "type": "kafka", + "spec": { + "dataSchema": { + "dataSource": "social_media", + "timestampSpec": { + "column": "__time", + "format": "iso", + "missingValue": null + }, + "dimensionsSpec": { + "dimensions": [ + { + "type": "string", + "name": "username", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "post_title", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "long", + "name": "views", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "upvotes", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "comments", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "string", + "name": "edited", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "dimensionExclusions": [ + "__time" + ], + "includeAllDimensions": false, + "useSchemaDiscovery": false + }, + "metricsSpec": [], + "granularitySpec": { + "type": "uniform", + "segmentGranularity": "HOUR", + "queryGranularity": { + "type": "none" + }, + "rollup": false, + "intervals": [] + }, + "transformSpec": { + "filter": null, + "transforms": [] + } + }, + "ioConfig": { + "topic": "social_media", + "inputFormat": { + "type": "json" + }, + "replicas": 1, + "taskCount": 1, + "taskDuration": "PT3600S", + "consumerProperties": { + "bootstrap.servers": "localhost:9094" + }, + "autoScalerConfig": null, + "pollTimeout": 100, + "startDelay": "PT5S", + "period": "PT30S", + "useEarliestOffset": true, + "completionTimeout": "PT1800S", + "lateMessageRejectionPeriod": null, + "earlyMessageRejectionPeriod": null, + "lateMessageRejectionStartDateTime": null, + "configOverrides": null, + "idleConfig": null, + "stream": "social_media", + "useEarliestSequenceNumber": true + }, + "tuningConfig": { + "type": "kafka", + "appendableIndexSpec": { + "type": "onheap", + "preserveExistingMetrics": false + }, + "maxRowsInMemory": 150000, + "maxBytesInMemory": 0, + "skipBytesInMemoryOverheadCheck": false, + "maxRowsPerSegment": 5000000, + "maxTotalRows": null, + "intermediatePersistPeriod": "PT10M", + "maxPendingPersists": 0, + "indexSpec": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "indexSpecForIntermediatePersists": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "reportParseExceptions": false, + "handoffConditionTimeout": 0, + "resetOffsetAutomatically": false, + "segmentWriteOutMediumFactory": null, + "workerThreads": null, + "chatRetries": 8, + "httpTimeout": "PT10S", + "shutdownTimeout": "PT80S", + "offsetFetchPeriod": "PT30S", + "intermediateHandoffPeriod": "P2147483647D", + "logParseExceptions": false, + "maxParseExceptions": 2147483647, + "maxSavedParseExceptions": 0, + "skipSequenceNumberAvailabilityCheck": false, + "repartitionTransitionDuration": "PT120S" + } + }, + "dataSchema": { + "dataSource": "social_media", + "timestampSpec": { + "column": "__time", + "format": "iso", + "missingValue": null + }, + "dimensionsSpec": { + "dimensions": [ + { + "type": "string", + "name": "username", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "post_title", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "long", + "name": "views", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "upvotes", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "comments", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "string", + "name": "edited", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "dimensionExclusions": [ + "__time" + ], + "includeAllDimensions": false, + "useSchemaDiscovery": false + }, + "metricsSpec": [], + "granularitySpec": { + "type": "uniform", + "segmentGranularity": "HOUR", + "queryGranularity": { + "type": "none" + }, + "rollup": false, + "intervals": [] + }, + "transformSpec": { + "filter": null, + "transforms": [] + } + }, + "tuningConfig": { + "type": "kafka", + "appendableIndexSpec": { + "type": "onheap", + "preserveExistingMetrics": false + }, + "maxRowsInMemory": 150000, + "maxBytesInMemory": 0, + "skipBytesInMemoryOverheadCheck": false, + "maxRowsPerSegment": 5000000, + "maxTotalRows": null, + "intermediatePersistPeriod": "PT10M", + "maxPendingPersists": 0, + "indexSpec": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "indexSpecForIntermediatePersists": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "reportParseExceptions": false, + "handoffConditionTimeout": 0, + "resetOffsetAutomatically": false, + "segmentWriteOutMediumFactory": null, + "workerThreads": null, + "chatRetries": 8, + "httpTimeout": "PT10S", + "shutdownTimeout": "PT80S", + "offsetFetchPeriod": "PT30S", + "intermediateHandoffPeriod": "P2147483647D", + "logParseExceptions": false, + "maxParseExceptions": 2147483647, + "maxSavedParseExceptions": 0, + "skipSequenceNumberAvailabilityCheck": false, + "repartitionTransitionDuration": "PT120S" + }, + "ioConfig": { + "topic": "social_media", + "inputFormat": { + "type": "json" + }, + "replicas": 1, + "taskCount": 1, + "taskDuration": "PT3600S", + "consumerProperties": { + "bootstrap.servers": "localhost:9094" + }, + "autoScalerConfig": null, + "pollTimeout": 100, + "startDelay": "PT5S", + "period": "PT30S", + "useEarliestOffset": true, + "completionTimeout": "PT1800S", + "lateMessageRejectionPeriod": null, + "earlyMessageRejectionPeriod": null, + "lateMessageRejectionStartDateTime": null, + "configOverrides": null, + "idleConfig": null, + "stream": "social_media", + "useEarliestSequenceNumber": true + }, + "context": null, + "suspended": false +} + ``` +
+ +### Get supervisor status + +Retrieves the current status report for a single supervisor. The report contains the state of the supervisor tasks and an array of recently thrown exceptions. + +For additional information about the status report, see [Supervisor reference](../ingestion/supervisor.md#status-report). + +#### URL + +`GET` `/druid/indexer/v1/supervisor/{supervisorId}/status` + +#### Responses + + + + + + +*Successfully retrieved supervisor status* + + + + + +*Invalid supervisor ID* + + + + +--- + +#### Sample request + +The following example shows how to retrieve the status of a supervisor with the name `social_media`. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/social_media/status" +``` + + + + + +```HTTP +GET /druid/indexer/v1/supervisor/social_media/status HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + { + "id": "social_media", + "generationTime": "2023-07-05T23:24:43.934Z", + "payload": { + "dataSource": "social_media", + "stream": "social_media", + "partitions": 1, + "replicas": 1, + "durationSeconds": 3600, + "activeTasks": [ + { + "id": "index_kafka_social_media_ab72ae4127c591c_flcbhdlh", + "startingOffsets": { + "0": 3176381 + }, + "startTime": "2023-07-05T23:21:39.321Z", + "remainingSeconds": 3415, + "type": "ACTIVE", + "currentOffsets": { + "0": 3296632 + }, + "lag": { + "0": 3 + } + } + ], + "publishingTasks": [], + "latestOffsets": { + "0": 3296635 + }, + "minimumLag": { + "0": 3 + }, + "aggregateLag": 3, + "offsetsLastUpdated": "2023-07-05T23:24:30.212Z", + "suspended": false, + "healthy": true, + "state": "RUNNING", + "detailedState": "RUNNING", + "recentErrors": [] + } + } + ``` +
+ +### Get supervisor health + +Retrieves the current health report for a single supervisor. The health of a supervisor is determined by the supervisor's `state` (as returned by the `/status` endpoint) and the `druid.supervisor.*` Overlord configuration thresholds. + +#### URL + +`GET` `/druid/indexer/v1/supervisor/{supervisorId}/health` + +#### Responses + + + + + +*Supervisor is healthy* + + + + + +*Invalid supervisor ID* + + + + + +*Supervisor is unhealthy* + + + + + +--- + +#### Sample request + +The following example shows how to retrieve the health report for a supervisor with the name `social_media`. + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/social_media/health" +``` + + + + +```HTTP +GET /druid/indexer/v1/supervisor/social_media/health HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + { + "healthy": false + } + ``` +
+ +### Get supervisor ingestion stats + +Returns a snapshot of the current ingestion row counters for each task being managed by the supervisor, along with moving averages for the row counters. See [Row stats](../ingestion/tasks.md#row-stats) for more information. + +#### URL + +`GET` `/druid/indexer/v1/supervisor/{supervisorId}/stats` + +#### Responses + + + + + +*Successfully retrieved supervisor stats* + + + + + +*Invalid supervisor ID* + + + + + +--- + +#### Sample request + +The following example shows how to retrieve the current ingestion row counters for a supervisor with the name `custom_data`. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/custom_data/stats" +``` + + + + + +```HTTP +GET /druid/indexer/v1/supervisor/custom_data/stats HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + { + "0": { + "index_kafka_custom_data_881d621078f6b7c_ccplchbi": { + "movingAverages": { + "buildSegments": { + "5m": { + "processed": 53.401225142603316, + "processedBytes": 5226.400757148808, + "unparseable": 0.0, + "thrownAway": 0.0, + "processedWithError": 0.0 + }, + "15m": { + "processed": 56.92994990102502, + "processedBytes": 5571.772059828217, + "unparseable": 0.0, + "thrownAway": 0.0, + "processedWithError": 0.0 + }, + "1m": { + "processed": 37.134921285556636, + "processedBytes": 3634.2766230628677, + "unparseable": 0.0, + "thrownAway": 0.0, + "processedWithError": 0.0 + } + } + }, + "totals": { + "buildSegments": { + "processed": 665, + "processedBytes": 65079, + "processedWithError": 0, + "thrownAway": 0, + "unparseable": 0 + } + } + } + } + } + ``` +
+ +## Audit history + +An audit history provides a comprehensive log of events, including supervisor configuration, creation, suspension, and modification history. + +### Get audit history for all supervisors + +Retrieves an audit history of specs for all supervisors. + +#### URL + +`GET` `/druid/indexer/v1/supervisor/history` + +#### Responses + + + + + + +*Successfully retrieved audit history* + + + + +--- + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/history" +``` + + + + + +```HTTP +GET /druid/indexer/v1/supervisor/history HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json +{ + "social_media": [ + { + "spec": { + "type": "kafka", + "spec": { + "dataSchema": { + "dataSource": "social_media", + "timestampSpec": { + "column": "__time", + "format": "iso", + "missingValue": null + }, + "dimensionsSpec": { + "dimensions": [ + { + "type": "string", + "name": "username", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "post_title", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "long", + "name": "views", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "upvotes", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "comments", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "string", + "name": "edited", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "dimensionExclusions": [ + "__time" + ], + "includeAllDimensions": false, + "useSchemaDiscovery": false + }, + "metricsSpec": [], + "granularitySpec": { + "type": "uniform", + "segmentGranularity": "HOUR", + "queryGranularity": { + "type": "none" + }, + "rollup": false, + "intervals": [] + }, + "transformSpec": { + "filter": null, + "transforms": [] + } + }, + "ioConfig": { + "topic": "social_media", + "inputFormat": { + "type": "json" + }, + "replicas": 1, + "taskCount": 1, + "taskDuration": "PT3600S", + "consumerProperties": { + "bootstrap.servers": "localhost:9094" + }, + "autoScalerConfig": null, + "pollTimeout": 100, + "startDelay": "PT5S", + "period": "PT30S", + "useEarliestOffset": true, + "completionTimeout": "PT1800S", + "lateMessageRejectionPeriod": null, + "earlyMessageRejectionPeriod": null, + "lateMessageRejectionStartDateTime": null, + "configOverrides": null, + "idleConfig": null, + "stream": "social_media", + "useEarliestSequenceNumber": true + }, + "tuningConfig": { + "type": "kafka", + "appendableIndexSpec": { + "type": "onheap", + "preserveExistingMetrics": false + }, + "maxRowsInMemory": 150000, + "maxBytesInMemory": 0, + "skipBytesInMemoryOverheadCheck": false, + "maxRowsPerSegment": 5000000, + "maxTotalRows": null, + "intermediatePersistPeriod": "PT10M", + "maxPendingPersists": 0, + "indexSpec": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "indexSpecForIntermediatePersists": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "reportParseExceptions": false, + "handoffConditionTimeout": 0, + "resetOffsetAutomatically": false, + "segmentWriteOutMediumFactory": null, + "workerThreads": null, + "chatRetries": 8, + "httpTimeout": "PT10S", + "shutdownTimeout": "PT80S", + "offsetFetchPeriod": "PT30S", + "intermediateHandoffPeriod": "P2147483647D", + "logParseExceptions": false, + "maxParseExceptions": 2147483647, + "maxSavedParseExceptions": 0, + "skipSequenceNumberAvailabilityCheck": false, + "repartitionTransitionDuration": "PT120S" + } + }, + "dataSchema": { + "dataSource": "social_media", + "timestampSpec": { + "column": "__time", + "format": "iso", + "missingValue": null + }, + "dimensionsSpec": { + "dimensions": [ + { + "type": "string", + "name": "username", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "post_title", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "long", + "name": "views", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "upvotes", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "comments", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "string", + "name": "edited", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "dimensionExclusions": [ + "__time" + ], + "includeAllDimensions": false, + "useSchemaDiscovery": false + }, + "metricsSpec": [], + "granularitySpec": { + "type": "uniform", + "segmentGranularity": "HOUR", + "queryGranularity": { + "type": "none" + }, + "rollup": false, + "intervals": [] + }, + "transformSpec": { + "filter": null, + "transforms": [] + } + }, + "tuningConfig": { + "type": "kafka", + "appendableIndexSpec": { + "type": "onheap", + "preserveExistingMetrics": false + }, + "maxRowsInMemory": 150000, + "maxBytesInMemory": 0, + "skipBytesInMemoryOverheadCheck": false, + "maxRowsPerSegment": 5000000, + "maxTotalRows": null, + "intermediatePersistPeriod": "PT10M", + "maxPendingPersists": 0, + "indexSpec": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "indexSpecForIntermediatePersists": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "reportParseExceptions": false, + "handoffConditionTimeout": 0, + "resetOffsetAutomatically": false, + "segmentWriteOutMediumFactory": null, + "workerThreads": null, + "chatRetries": 8, + "httpTimeout": "PT10S", + "shutdownTimeout": "PT80S", + "offsetFetchPeriod": "PT30S", + "intermediateHandoffPeriod": "P2147483647D", + "logParseExceptions": false, + "maxParseExceptions": 2147483647, + "maxSavedParseExceptions": 0, + "skipSequenceNumberAvailabilityCheck": false, + "repartitionTransitionDuration": "PT120S" + }, + "ioConfig": { + "topic": "social_media", + "inputFormat": { + "type": "json" + }, + "replicas": 1, + "taskCount": 1, + "taskDuration": "PT3600S", + "consumerProperties": { + "bootstrap.servers": "localhost:9094" + }, + "autoScalerConfig": null, + "pollTimeout": 100, + "startDelay": "PT5S", + "period": "PT30S", + "useEarliestOffset": true, + "completionTimeout": "PT1800S", + "lateMessageRejectionPeriod": null, + "earlyMessageRejectionPeriod": null, + "lateMessageRejectionStartDateTime": null, + "configOverrides": null, + "idleConfig": null, + "stream": "social_media", + "useEarliestSequenceNumber": true + }, + "context": null, + "suspended": false + }, + "version": "2023-07-03T18:51:02.970Z" + } + ] +} + ``` +
+ +### Get audit history for a specific supervisor + +Retrieves an audit history of specs for a single supervisor. + +#### URL + +`GET` `/druid/indexer/v1/supervisor/{supervisorId}/history` + +#### Query parameters + +* `count` (optional) + * Type: Integer + * Limit the number of results to the last `n` entries. Must be greater than 0 if specified. + +#### Responses + + + + + + +*Successfully retrieved supervisor audit history* + + + + + +*Invalid supervisor ID* + + + + +--- + +#### Sample request + +The following examples show how to retrieve the audit history of a supervisor with the name `wikipedia_stream`. + +**Get all history entries:** + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/wikipedia_stream/history" +``` + + + + + +```HTTP +GET /druid/indexer/v1/supervisor/wikipedia_stream/history HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +**Get last 10 history entries:** + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/wikipedia_stream/history?count=10" +``` + + + + + +```HTTP +GET /druid/indexer/v1/supervisor/wikipedia_stream/history?count=10 HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json +[ + { + "spec": { + "type": "kafka", + "spec": { + "dataSchema": { + "dataSource": "wikipedia_stream", + "timestampSpec": { + "column": "__time", + "format": "iso", + "missingValue": null + }, + "dimensionsSpec": { + "dimensions": [ + { + "type": "string", + "name": "username", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "post_title", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "long", + "name": "views", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "upvotes", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "comments", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "string", + "name": "edited", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "dimensionExclusions": [ + "__time" + ], + "includeAllDimensions": false, + "useSchemaDiscovery": false + }, + "metricsSpec": [], + "granularitySpec": { + "type": "uniform", + "segmentGranularity": "HOUR", + "queryGranularity": { + "type": "none" + }, + "rollup": false, + "intervals": [] + }, + "transformSpec": { + "filter": null, + "transforms": [] + } + }, + "ioConfig": { + "topic": "social_media", + "inputFormat": { + "type": "json" + }, + "replicas": 1, + "taskCount": 1, + "taskDuration": "PT3600S", + "consumerProperties": { + "bootstrap.servers": "localhost:9042" + }, + "autoScalerConfig": null, + "pollTimeout": 100, + "startDelay": "PT5S", + "period": "PT30S", + "useEarliestOffset": true, + "completionTimeout": "PT1800S", + "lateMessageRejectionPeriod": null, + "earlyMessageRejectionPeriod": null, + "lateMessageRejectionStartDateTime": null, + "configOverrides": null, + "idleConfig": null, + "stream": "social_media", + "useEarliestSequenceNumber": true + }, + "tuningConfig": { + "type": "kafka", + "appendableIndexSpec": { + "type": "onheap", + "preserveExistingMetrics": false + }, + "maxRowsInMemory": 150000, + "maxBytesInMemory": 0, + "skipBytesInMemoryOverheadCheck": false, + "maxRowsPerSegment": 5000000, + "maxTotalRows": null, + "intermediatePersistPeriod": "PT10M", + "maxPendingPersists": 0, + "indexSpec": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "indexSpecForIntermediatePersists": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "reportParseExceptions": false, + "handoffConditionTimeout": 0, + "resetOffsetAutomatically": false, + "segmentWriteOutMediumFactory": null, + "workerThreads": null, + "chatRetries": 8, + "httpTimeout": "PT10S", + "shutdownTimeout": "PT80S", + "offsetFetchPeriod": "PT30S", + "intermediateHandoffPeriod": "P2147483647D", + "logParseExceptions": false, + "maxParseExceptions": 2147483647, + "maxSavedParseExceptions": 0, + "skipSequenceNumberAvailabilityCheck": false, + "repartitionTransitionDuration": "PT120S" + } + }, + "dataSchema": { + "dataSource": "wikipedia_stream", + "timestampSpec": { + "column": "__time", + "format": "iso", + "missingValue": null + }, + "dimensionsSpec": { + "dimensions": [ + { + "type": "string", + "name": "username", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "post_title", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "long", + "name": "views", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "upvotes", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "comments", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "string", + "name": "edited", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "dimensionExclusions": [ + "__time" + ], + "includeAllDimensions": false, + "useSchemaDiscovery": false + }, + "metricsSpec": [], + "granularitySpec": { + "type": "uniform", + "segmentGranularity": "HOUR", + "queryGranularity": { + "type": "none" + }, + "rollup": false, + "intervals": [] + }, + "transformSpec": { + "filter": null, + "transforms": [] + } + }, + "tuningConfig": { + "type": "kafka", + "appendableIndexSpec": { + "type": "onheap", + "preserveExistingMetrics": false + }, + "maxRowsInMemory": 150000, + "maxBytesInMemory": 0, + "skipBytesInMemoryOverheadCheck": false, + "maxRowsPerSegment": 5000000, + "maxTotalRows": null, + "intermediatePersistPeriod": "PT10M", + "maxPendingPersists": 0, + "indexSpec": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "indexSpecForIntermediatePersists": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "reportParseExceptions": false, + "handoffConditionTimeout": 0, + "resetOffsetAutomatically": false, + "segmentWriteOutMediumFactory": null, + "workerThreads": null, + "chatRetries": 8, + "httpTimeout": "PT10S", + "shutdownTimeout": "PT80S", + "offsetFetchPeriod": "PT30S", + "intermediateHandoffPeriod": "P2147483647D", + "logParseExceptions": false, + "maxParseExceptions": 2147483647, + "maxSavedParseExceptions": 0, + "skipSequenceNumberAvailabilityCheck": false, + "repartitionTransitionDuration": "PT120S" + }, + "ioConfig": { + "topic": "social_media", + "inputFormat": { + "type": "json" + }, + "replicas": 1, + "taskCount": 1, + "taskDuration": "PT3600S", + "consumerProperties": { + "bootstrap.servers": "localhost:9042" + }, + "autoScalerConfig": null, + "pollTimeout": 100, + "startDelay": "PT5S", + "period": "PT30S", + "useEarliestOffset": true, + "completionTimeout": "PT1800S", + "lateMessageRejectionPeriod": null, + "earlyMessageRejectionPeriod": null, + "lateMessageRejectionStartDateTime": null, + "configOverrides": null, + "idleConfig": null, + "stream": "social_media", + "useEarliestSequenceNumber": true + }, + "context": null, + "suspended": false + }, + "version": "2023-07-05T20:59:16.872Z" + } +] + ``` +
+ +## Manage supervisors + +### Create or update a supervisor + +Creates a new supervisor spec or updates an existing one with new configuration and schema information. When updating a supervisor spec, the datasource must remain the same as the previous supervisor. + +You can define a supervisor spec for [Apache Kafka](../ingestion/kafka-ingestion.md) or [Amazon Kinesis](../ingestion/kinesis-ingestion.md) streaming ingestion methods. + +The following table lists the properties of a supervisor spec: + +|Property|Type|Description|Required| +|--------|----|-----------|--------| +|`type`|String|The supervisor type. One of`kafka` or `kinesis`.|Yes| +|`spec`|Object|The container object for the supervisor configuration.|Yes| +|`ioConfig`|Object|The I/O configuration object to define the connection and I/O-related settings for the supervisor and indexing task.|Yes| +|`dataSchema`|Object|The schema for the indexing task to use during ingestion. See [`dataSchema`](../ingestion/ingestion-spec.md#dataschema) for more information.|Yes| +|`tuningConfig`|Object|The tuning configuration object to define performance-related settings for the supervisor and indexing tasks.|No| + +When you call this endpoint on an existing supervisor, the running supervisor signals its tasks to stop reading and begin publishing, exiting itself. Druid then uses the provided configuration from the request body to create a new supervisor. Druid submits a new schema while retaining existing publishing tasks and starts new tasks at the previous task offsets. +This way, you can apply configuration changes without a pause in ingestion. + +#### URL + +`POST` `/druid/indexer/v1/supervisor` + +#### Responses + + + + + + +*Successfully created a new supervisor or updated an existing supervisor* + + + + + +*Request body content type is not in JSON format* + + + + +--- + +#### Sample request + +The following example uses JSON input format to create a supervisor spec for Kafka with a `social_media` datasource and `social_media` topic. + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor" \ +--header 'Content-Type: application/json' \ +--data '{ + "type": "kafka", + "spec": { + "ioConfig": { + "type": "kafka", + "consumerProperties": { + "bootstrap.servers": "localhost:9094" + }, + "topic": "social_media", + "inputFormat": { + "type": "json" + }, + "useEarliestOffset": true + }, + "tuningConfig": { + "type": "kafka" + }, + "dataSchema": { + "dataSource": "social_media", + "timestampSpec": { + "column": "__time", + "format": "iso" + }, + "dimensionsSpec": { + "dimensions": [ + "username", + "post_title", + { + "type": "long", + "name": "views" + }, + { + "type": "long", + "name": "upvotes" + }, + { + "type": "long", + "name": "comments" + }, + "edited" + ] + }, + "granularitySpec": { + "queryGranularity": "none", + "rollup": false, + "segmentGranularity": "hour" + } + } + } +}' +``` + + + + + +```HTTP +POST /druid/indexer/v1/supervisor HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json +Content-Length: 1359 + +{ + "type": "kafka", + "spec": { + "ioConfig": { + "type": "kafka", + "consumerProperties": { + "bootstrap.servers": "localhost:9094" + }, + "topic": "social_media", + "inputFormat": { + "type": "json" + }, + "useEarliestOffset": true + }, + "tuningConfig": { + "type": "kafka" + }, + "dataSchema": { + "dataSource": "social_media", + "timestampSpec": { + "column": "__time", + "format": "iso" + }, + "dimensionsSpec": { + "dimensions": [ + "username", + "post_title", + { + "type": "long", + "name": "views" + }, + { + "type": "long", + "name": "upvotes" + }, + { + "type": "long", + "name": "comments" + }, + "edited" + ] + }, + "granularitySpec": { + "queryGranularity": "none", + "rollup": false, + "segmentGranularity": "hour" + } + } + } +} +``` + + + + +#### Sample request with `skipRestartIfUnmodified` + +The following example sets the `skipRestartIfUnmodified` flag to true. With this flag set to true, the Supervisor will only restart if there has been a modification to the SupervisorSpec. If left unset, the flag defaults to false. +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor?skipRestartIfUnmodified=true" \ +--header 'Content-Type: application/json' \ +--data '{ + "type": "kafka", + "spec": { + "ioConfig": { + "type": "kafka", + "consumerProperties": { + "bootstrap.servers": "localhost:9094" + }, + "topic": "social_media", + "inputFormat": { + "type": "json" + }, + "useEarliestOffset": true + }, + "tuningConfig": { + "type": "kafka" + }, + "dataSchema": { + "dataSource": "social_media", + "timestampSpec": { + "column": "__time", + "format": "iso" + }, + "dimensionsSpec": { + "dimensions": [ + "username", + "post_title", + { + "type": "long", + "name": "views" + }, + { + "type": "long", + "name": "upvotes" + }, + { + "type": "long", + "name": "comments" + }, + "edited" + ] + }, + "granularitySpec": { + "queryGranularity": "none", + "rollup": false, + "segmentGranularity": "hour" + } + } + } +}' +``` + +#### Sample response + +
+ View the response + + ```json +{ + "id": "social_media" +} + ``` +
+ +### Suspend a running supervisor + +Suspends a single running supervisor. Returns the updated supervisor spec, where the `suspended` property is set to `true`. The suspended supervisor continues to emit logs and metrics. +Indexing tasks remain suspended until you [resume the supervisor](#resume-a-supervisor). + +#### URL + +`POST` `/druid/indexer/v1/supervisor/{supervisorId}/suspend` + +#### Responses + + + + + + +*Successfully shut down supervisor* + + + + + +*Supervisor already suspended* + + + + + +*Invalid supervisor ID* + + + + +--- + +#### Sample request + +The following example shows how to suspend a running supervisor with the name `social_media`. + + + + + + +```shell +curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/social_media/suspend" +``` + + + + + +```HTTP +POST /druid/indexer/v1/supervisor/social_media/suspend HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json +{ + "type": "kafka", + "spec": { + "dataSchema": { + "dataSource": "social_media", + "timestampSpec": { + "column": "__time", + "format": "iso", + "missingValue": null + }, + "dimensionsSpec": { + "dimensions": [ + { + "type": "string", + "name": "username", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "post_title", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "long", + "name": "views", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "upvotes", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "comments", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "string", + "name": "edited", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "dimensionExclusions": [ + "__time" + ], + "includeAllDimensions": false, + "useSchemaDiscovery": false + }, + "metricsSpec": [], + "granularitySpec": { + "type": "uniform", + "segmentGranularity": "HOUR", + "queryGranularity": { + "type": "none" + }, + "rollup": false, + "intervals": [] + }, + "transformSpec": { + "filter": null, + "transforms": [] + } + }, + "ioConfig": { + "topic": "social_media", + "inputFormat": { + "type": "json" + }, + "replicas": 1, + "taskCount": 1, + "taskDuration": "PT3600S", + "consumerProperties": { + "bootstrap.servers": "localhost:9094" + }, + "autoScalerConfig": null, + "pollTimeout": 100, + "startDelay": "PT5S", + "period": "PT30S", + "useEarliestOffset": true, + "completionTimeout": "PT1800S", + "lateMessageRejectionPeriod": null, + "earlyMessageRejectionPeriod": null, + "lateMessageRejectionStartDateTime": null, + "configOverrides": null, + "idleConfig": null, + "stream": "social_media", + "useEarliestSequenceNumber": true + }, + "tuningConfig": { + "type": "kafka", + "appendableIndexSpec": { + "type": "onheap", + "preserveExistingMetrics": false + }, + "maxRowsInMemory": 150000, + "maxBytesInMemory": 0, + "skipBytesInMemoryOverheadCheck": false, + "maxRowsPerSegment": 5000000, + "maxTotalRows": null, + "intermediatePersistPeriod": "PT10M", + "maxPendingPersists": 0, + "indexSpec": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "indexSpecForIntermediatePersists": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "reportParseExceptions": false, + "handoffConditionTimeout": 0, + "resetOffsetAutomatically": false, + "segmentWriteOutMediumFactory": null, + "workerThreads": null, + "chatRetries": 8, + "httpTimeout": "PT10S", + "shutdownTimeout": "PT80S", + "offsetFetchPeriod": "PT30S", + "intermediateHandoffPeriod": "P2147483647D", + "logParseExceptions": false, + "maxParseExceptions": 2147483647, + "maxSavedParseExceptions": 0, + "skipSequenceNumberAvailabilityCheck": false, + "repartitionTransitionDuration": "PT120S" + } + }, + "dataSchema": { + "dataSource": "social_media", + "timestampSpec": { + "column": "__time", + "format": "iso", + "missingValue": null + }, + "dimensionsSpec": { + "dimensions": [ + { + "type": "string", + "name": "username", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "post_title", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "long", + "name": "views", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "upvotes", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "comments", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "string", + "name": "edited", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "dimensionExclusions": [ + "__time" + ], + "includeAllDimensions": false, + "useSchemaDiscovery": false + }, + "metricsSpec": [], + "granularitySpec": { + "type": "uniform", + "segmentGranularity": "HOUR", + "queryGranularity": { + "type": "none" + }, + "rollup": false, + "intervals": [] + }, + "transformSpec": { + "filter": null, + "transforms": [] + } + }, + "tuningConfig": { + "type": "kafka", + "appendableIndexSpec": { + "type": "onheap", + "preserveExistingMetrics": false + }, + "maxRowsInMemory": 150000, + "maxBytesInMemory": 0, + "skipBytesInMemoryOverheadCheck": false, + "maxRowsPerSegment": 5000000, + "maxTotalRows": null, + "intermediatePersistPeriod": "PT10M", + "maxPendingPersists": 0, + "indexSpec": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "indexSpecForIntermediatePersists": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "reportParseExceptions": false, + "handoffConditionTimeout": 0, + "resetOffsetAutomatically": false, + "segmentWriteOutMediumFactory": null, + "workerThreads": null, + "chatRetries": 8, + "httpTimeout": "PT10S", + "shutdownTimeout": "PT80S", + "offsetFetchPeriod": "PT30S", + "intermediateHandoffPeriod": "P2147483647D", + "logParseExceptions": false, + "maxParseExceptions": 2147483647, + "maxSavedParseExceptions": 0, + "skipSequenceNumberAvailabilityCheck": false, + "repartitionTransitionDuration": "PT120S" + }, + "ioConfig": { + "topic": "social_media", + "inputFormat": { + "type": "json" + }, + "replicas": 1, + "taskCount": 1, + "taskDuration": "PT3600S", + "consumerProperties": { + "bootstrap.servers": "localhost:9094" + }, + "autoScalerConfig": null, + "pollTimeout": 100, + "startDelay": "PT5S", + "period": "PT30S", + "useEarliestOffset": true, + "completionTimeout": "PT1800S", + "lateMessageRejectionPeriod": null, + "earlyMessageRejectionPeriod": null, + "lateMessageRejectionStartDateTime": null, + "configOverrides": null, + "idleConfig": null, + "stream": "social_media", + "useEarliestSequenceNumber": true + }, + "context": null, + "suspended": true +} + ``` +
+ +### Suspend all supervisors + +Suspends all supervisors. Note that this endpoint returns an HTTP `200 Success` code message even if there are no supervisors or running supervisors to suspend. + +#### URL + +`POST` `/druid/indexer/v1/supervisor/suspendAll` + +#### Responses + + + + + + +*Successfully suspended all supervisors* + + + + +--- + +#### Sample request + + + + + + +```shell +curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/suspendAll" +``` + + + + + +```HTTP +POST /druid/indexer/v1/supervisor/suspendAll HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json +{ + "status": "success" +} + ``` +
+ +### Resume a supervisor + +Resumes indexing tasks for a supervisor. Returns an updated supervisor spec with the `suspended` property set to `false`. + +#### URL + +`POST` `/druid/indexer/v1/supervisor/{supervisorId}/resume` + +#### Responses + + + + + + +*Successfully resumed supervisor* + + + + + +*Supervisor already running* + + + + + +*Invalid supervisor ID* + + + + +--- + +#### Sample request + +The following example resumes a previously suspended supervisor with name `social_media`. + + + + + + +```shell +curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/social_media/resume" +``` + + + + + +```HTTP +POST /druid/indexer/v1/supervisor/social_media/resume HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json +{ + "type": "kafka", + "spec": { + "dataSchema": { + "dataSource": "social_media", + "timestampSpec": { + "column": "__time", + "format": "iso", + "missingValue": null + }, + "dimensionsSpec": { + "dimensions": [ + { + "type": "string", + "name": "username", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "post_title", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "long", + "name": "views", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "upvotes", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "comments", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "string", + "name": "edited", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "dimensionExclusions": [ + "__time" + ], + "includeAllDimensions": false, + "useSchemaDiscovery": false + }, + "metricsSpec": [], + "granularitySpec": { + "type": "uniform", + "segmentGranularity": "HOUR", + "queryGranularity": { + "type": "none" + }, + "rollup": false, + "intervals": [] + }, + "transformSpec": { + "filter": null, + "transforms": [] + } + }, + "ioConfig": { + "topic": "social_media", + "inputFormat": { + "type": "json" + }, + "replicas": 1, + "taskCount": 1, + "taskDuration": "PT3600S", + "consumerProperties": { + "bootstrap.servers": "localhost:9094" + }, + "autoScalerConfig": null, + "pollTimeout": 100, + "startDelay": "PT5S", + "period": "PT30S", + "useEarliestOffset": true, + "completionTimeout": "PT1800S", + "lateMessageRejectionPeriod": null, + "earlyMessageRejectionPeriod": null, + "lateMessageRejectionStartDateTime": null, + "configOverrides": null, + "idleConfig": null, + "stream": "social_media", + "useEarliestSequenceNumber": true + }, + "tuningConfig": { + "type": "kafka", + "appendableIndexSpec": { + "type": "onheap", + "preserveExistingMetrics": false + }, + "maxRowsInMemory": 150000, + "maxBytesInMemory": 0, + "skipBytesInMemoryOverheadCheck": false, + "maxRowsPerSegment": 5000000, + "maxTotalRows": null, + "intermediatePersistPeriod": "PT10M", + "maxPendingPersists": 0, + "indexSpec": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "indexSpecForIntermediatePersists": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "reportParseExceptions": false, + "handoffConditionTimeout": 0, + "resetOffsetAutomatically": false, + "segmentWriteOutMediumFactory": null, + "workerThreads": null, + "chatRetries": 8, + "httpTimeout": "PT10S", + "shutdownTimeout": "PT80S", + "offsetFetchPeriod": "PT30S", + "intermediateHandoffPeriod": "P2147483647D", + "logParseExceptions": false, + "maxParseExceptions": 2147483647, + "maxSavedParseExceptions": 0, + "skipSequenceNumberAvailabilityCheck": false, + "repartitionTransitionDuration": "PT120S" + } + }, + "dataSchema": { + "dataSource": "social_media", + "timestampSpec": { + "column": "__time", + "format": "iso", + "missingValue": null + }, + "dimensionsSpec": { + "dimensions": [ + { + "type": "string", + "name": "username", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "post_title", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "long", + "name": "views", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "upvotes", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "long", + "name": "comments", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": false + }, + { + "type": "string", + "name": "edited", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "dimensionExclusions": [ + "__time" + ], + "includeAllDimensions": false, + "useSchemaDiscovery": false + }, + "metricsSpec": [], + "granularitySpec": { + "type": "uniform", + "segmentGranularity": "HOUR", + "queryGranularity": { + "type": "none" + }, + "rollup": false, + "intervals": [] + }, + "transformSpec": { + "filter": null, + "transforms": [] + } + }, + "tuningConfig": { + "type": "kafka", + "appendableIndexSpec": { + "type": "onheap", + "preserveExistingMetrics": false + }, + "maxRowsInMemory": 150000, + "maxBytesInMemory": 0, + "skipBytesInMemoryOverheadCheck": false, + "maxRowsPerSegment": 5000000, + "maxTotalRows": null, + "intermediatePersistPeriod": "PT10M", + "maxPendingPersists": 0, + "indexSpec": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "indexSpecForIntermediatePersists": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "reportParseExceptions": false, + "handoffConditionTimeout": 0, + "resetOffsetAutomatically": false, + "segmentWriteOutMediumFactory": null, + "workerThreads": null, + "chatRetries": 8, + "httpTimeout": "PT10S", + "shutdownTimeout": "PT80S", + "offsetFetchPeriod": "PT30S", + "intermediateHandoffPeriod": "P2147483647D", + "logParseExceptions": false, + "maxParseExceptions": 2147483647, + "maxSavedParseExceptions": 0, + "skipSequenceNumberAvailabilityCheck": false, + "repartitionTransitionDuration": "PT120S" + }, + "ioConfig": { + "topic": "social_media", + "inputFormat": { + "type": "json" + }, + "replicas": 1, + "taskCount": 1, + "taskDuration": "PT3600S", + "consumerProperties": { + "bootstrap.servers": "localhost:9094" + }, + "autoScalerConfig": null, + "pollTimeout": 100, + "startDelay": "PT5S", + "period": "PT30S", + "useEarliestOffset": true, + "completionTimeout": "PT1800S", + "lateMessageRejectionPeriod": null, + "earlyMessageRejectionPeriod": null, + "lateMessageRejectionStartDateTime": null, + "configOverrides": null, + "idleConfig": null, + "stream": "social_media", + "useEarliestSequenceNumber": true + }, + "context": null, + "suspended": false +} + ``` +
+ +### Resume all supervisors + +Resumes all supervisors. Note that this endpoint returns an HTTP `200 Success` code even if there are no supervisors or suspended supervisors to resume. + +#### URL + +`POST` `/druid/indexer/v1/supervisor/resumeAll` + +#### Responses + + + + + + +*Successfully resumed all supervisors* + + + + +--- + +#### Sample request + + + + + + +```shell +curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/resumeAll" +``` + + + + + +```HTTP +POST /druid/indexer/v1/supervisor/resumeAll HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json +{ + "status": "success" +} + ``` +
+ +### Reset a supervisor + +The supervisor must be running for this endpoint to be available. + +Resets the specified supervisor. This endpoint clears supervisor metadata, prompting the supervisor to resume data reading. The supervisor restarts from the earliest or latest available position, depending on the value of the `useEarliestOffset` property. +After clearing all stored offsets, the supervisor kills and recreates active tasks, +so that tasks begin reading from valid positions. + +Use this endpoint to recover from a stopped state due to missing offsets. Use this endpoint with caution as it may result in skipped messages and lead to data loss or duplicate data. + +The indexing service keeps track of the latest persisted offsets to provide exactly-once ingestion guarantees across tasks. Subsequent tasks must start reading from where the previous task completed for Druid to accept the generated segments. If the messages at the expected starting offsets are no longer available, the supervisor refuses to start and in-flight tasks fail. Possible causes for missing messages include the message retention period elapsing or the topic being removed and re-created. Use the `reset` endpoint to recover from this condition. + +#### URL + +`POST` `/druid/indexer/v1/supervisor/{supervisorId}/reset` + +#### Responses + + + + + + +*Successfully reset supervisor* + + + + + +*Invalid supervisor ID* + + + + +--- + +#### Sample request + +The following example shows how to reset a supervisor with the name `social_media`. + + + + + + +```shell +curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/social_media/reset" +``` + + + + + +```HTTP +POST /druid/indexer/v1/supervisor/social_media/reset HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json +{ + "id": "social_media" +} + ``` +
+ +### Reset offsets for a supervisor + +The supervisor must be running for this endpoint to be available. + +Resets the specified offsets for partitions without resetting the entire set. + +This endpoint clears only the stored offsets, prompting the supervisor to resume reading data from the specified offsets. +If there are no stored offsets, the specified offsets are set in the metadata store. + +After resetting stored offsets, the supervisor kills and recreates any active tasks pertaining to the specified partitions, +so that tasks begin reading specified offsets. For partitions that are not specified in this operation, the supervisor resumes from the last stored offset. + +Use this endpoint with caution. It can cause skipped messages, leading to data loss or duplicate data. + +#### URL + +`POST` `/druid/indexer/v1/supervisor/{supervisorId}/resetOffsets` + +#### Responses + + + + + + +*Successfully reset offsets* + + + + + +*Invalid supervisor ID* + + + + +--- +#### Reset Offsets Metadata + +This section presents the structure and details of the reset offsets metadata payload. + +| Field | Type | Description | Required | +|---------|---------|---------|---------| +| `type` | String | The type of reset offsets metadata payload. It must match the supervisor's `type`. Possible values: `kafka` or `kinesis`. | Yes | +| `partitions` | Object | An object representing the reset metadata. See below for details. | Yes | + +#### Partitions + +The following table defines the fields within the `partitions` object in the reset offsets metadata payload. + +| Field | Type | Description | Required | +|---------|---------|---------|---------| +| `type` | String | Must be set as `end`. Indicates the end sequence numbers for the reset offsets. | Yes | +| `stream` | String | The stream to be reset. It must be a valid stream consumed by the supervisor. | Yes | +| `partitionOffsetMap` | Object | A map of partitions to corresponding offsets for the stream to be reset.| Yes | + +#### Sample request + +The following example shows how to reset offsets for a Kafka supervisor with the name `social_media`. For example, the supervisor is reading from a Kafka topic `ads_media_stream` and has the stored offsets: `{"0": 0, "1": 10, "2": 20, "3": 40}`. + + + + + + +```shell +curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/social_media/resetOffsets" +--header 'Content-Type: application/json' +--data-raw '{"type":"kafka","partitions":{"type":"end","stream":"ads_media_stream","partitionOffsetMap":{"0":100, "2": 650}}}' +``` + + + + + +```HTTP +POST /druid/indexer/v1/supervisor/social_media/resetOffsets HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json + +{ + "type": "kafka", + "partitions": { + "type": "end", + "stream": "ads_media_stream", + "partitionOffsetMap": { + "0": 100, + "2": 650 + } + } +} +``` + +The example operation resets offsets only for partitions `0` and `2` to 100 and 650 respectively. After a successful reset, +when the supervisor's tasks restart, they resume reading from `{"0": 100, "1": 10, "2": 650, "3": 40}`. + + + + +#### Sample response + +
+ View the response + + ```json +{ + "id": "social_media" +} + ``` +
+ +### Terminate a supervisor + +Terminates a supervisor and its associated indexing tasks, triggering the publishing of their segments. When you terminate a supervisor, Druid places a tombstone marker in the metadata store to prevent reloading on restart. + +The terminated supervisor still exists in the metadata store and its history can be retrieved. + +#### URL + +`POST` `/druid/indexer/v1/supervisor/{supervisorId}/terminate` + +#### Responses + + + + + + +*Successfully terminated a supervisor* + + + + + +*Invalid supervisor ID or supervisor not running* + + + + +--- + +#### Sample request + + + + + + +```shell +curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/social_media/terminate" +``` + + + + + +```HTTP +POST /druid/indexer/v1/supervisor/social_media/terminate HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json +{ + "id": "social_media" +} + ``` +
+ +### Terminate all supervisors + +Terminates all supervisors. Terminated supervisors still exist in the metadata store and their history can be retrieved. Note that this endpoint returns an HTTP `200 Success` code even if there are no supervisors or running supervisors to terminate. + +#### URL + +`POST` `/druid/indexer/v1/supervisor/terminateAll` + +#### Responses + + + + + + +*Successfully terminated all supervisors* + + + + +--- + +#### Sample request + + + + + + +```shell +curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/terminateAll" +``` + + + + + +```HTTP +POST /druid/indexer/v1/supervisor/terminateAll HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json +{ + "status": "success" +} + ``` +
+ +### Handoff task groups for a supervisor early + +Trigger handoff for specified task groups of a supervisor early. This is a best effort API and makes no guarantees of handoff execution + +#### URL + +`POST` `/druid/indexer/v1/supervisor/{supervisorId}/taskGroups/handoff` + +#### Sample request + +The following example shows how to handoff task groups for a supervisor with the name `social_media` and has the task groups: `1,2,3`. + + + + + + +```shell +curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/social_media/taskGroups/handoff" +--header 'Content-Type: application/json' +--data-raw '{"taskGroupIds": [1, 2, 3]}' +``` + + + + + +```HTTP +POST /druid/indexer/v1/supervisor/social_media/taskGroups/handoff HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json + +{ + "taskGroupIds": [1, 2, 3], +} +``` + + + + +#### Sample response + +
+ View the response +(empty response) +
+ +### Shut down a supervisor + +Shuts down a supervisor. This endpoint is deprecated and will be removed in future releases. Use the equivalent [terminate](#terminate-a-supervisor) endpoint instead. + +#### URL + +`POST` `/druid/indexer/v1/supervisor/{supervisorId}/shutdown` diff --git a/docs/35.0.0/api-reference/tasks-api.md b/docs/35.0.0/api-reference/tasks-api.md new file mode 100644 index 0000000000..f53037f84e --- /dev/null +++ b/docs/35.0.0/api-reference/tasks-api.md @@ -0,0 +1,1663 @@ +--- +id: tasks-api +title: Tasks API +sidebar_label: Tasks +--- +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + + + +This document describes the API endpoints for task retrieval, submission, and deletion for Apache Druid. Tasks are individual jobs performed by Druid to complete operations such as ingestion, querying, and compaction. + +In this topic, `http://ROUTER_IP:ROUTER_PORT` is a placeholder for the Router service address and port. For example, on the quickstart configuration, use `http://localhost:8888`. + +## Task information and retrieval + +### Get an array of tasks + +Retrieves an array of all tasks in the Druid cluster. Each task object includes information on its ID, status, associated datasource, and other metadata. For definitions of the response properties, see the [Tasks table](../querying/sql-metadata-tables.md#tasks-table). + +#### URL + +`GET` `/druid/indexer/v1/tasks` + +#### Query parameters + +The endpoint supports a set of optional query parameters to filter results. + +|Parameter|Type|Description| +|---|---|---| +|`state`|String|Filter list of tasks by task state, valid options are `running`, `complete`, `waiting`, and `pending`.| +| `datasource`|String| Return tasks filtered by Druid datasource.| +| `createdTimeInterval`|String (ISO-8601)| Return tasks created within the specified interval. Use `_` as the delimiter for the interval string. Do not use `/`. For example, `2023-06-27_2023-06-28`.| +| `max`|Integer|Maximum number of `complete` tasks to return. Only applies when `state` is set to `complete`.| +| `type`|String|Filter tasks by task type. See [task documentation](../ingestion/tasks.md) for more details.| + +#### Responses + + + + + + +
+ +*Successfully retrieved list of tasks* + +
+ + + +
+ +*Invalid `state` query parameter value* + +
+ + + +
+ +*Invalid query parameter* + +
+
+ +--- + +#### Sample request + +The following example shows how to retrieve a list of tasks filtered with the following query parameters: +* State: `complete` +* Datasource: `wikipedia_api` +* Time interval: between `2015-09-12` and `2015-09-13` +* Max entries returned: `10` +* Task type: `query_worker` + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/tasks/?state=complete&datasource=wikipedia_api&createdTimeInterval=2015-09-12_2015-09-13&max=10&type=query_worker" +``` + + + + + +```HTTP +GET /druid/indexer/v1/tasks/?state=complete&datasource=wikipedia_api&createdTimeInterval=2015-09-12_2015-09-13&max=10&type=query_worker HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + [ + { + "id": "query-223549f8-b993-4483-b028-1b0d54713cad-worker0_0", + "groupId": "query-223549f8-b993-4483-b028-1b0d54713cad", + "type": "query_worker", + "createdTime": "2023-06-22T22:11:37.012Z", + "queueInsertionTime": "1970-01-01T00:00:00.000Z", + "statusCode": "SUCCESS", + "status": "SUCCESS", + "runnerStatusCode": "NONE", + "duration": 17897, + "location": { + "host": "localhost", + "port": 8101, + "tlsPort": -1 + }, + "dataSource": "wikipedia_api", + "errorMsg": null + }, + { + "id": "query-fa82fa40-4c8c-4777-b832-cabbee5f519f-worker0_0", + "groupId": "query-fa82fa40-4c8c-4777-b832-cabbee5f519f", + "type": "query_worker", + "createdTime": "2023-06-20T22:51:21.302Z", + "queueInsertionTime": "1970-01-01T00:00:00.000Z", + "statusCode": "SUCCESS", + "status": "SUCCESS", + "runnerStatusCode": "NONE", + "duration": 16911, + "location": { + "host": "localhost", + "port": 8101, + "tlsPort": -1 + }, + "dataSource": "wikipedia_api", + "errorMsg": null + }, + { + "id": "query-5419da7a-b270-492f-90e6-920ecfba766a-worker0_0", + "groupId": "query-5419da7a-b270-492f-90e6-920ecfba766a", + "type": "query_worker", + "createdTime": "2023-06-20T22:45:53.909Z", + "queueInsertionTime": "1970-01-01T00:00:00.000Z", + "statusCode": "SUCCESS", + "status": "SUCCESS", + "runnerStatusCode": "NONE", + "duration": 17030, + "location": { + "host": "localhost", + "port": 8101, + "tlsPort": -1 + }, + "dataSource": "wikipedia_api", + "errorMsg": null + } + ] + ``` + +
+ +### Get an array of complete tasks + +Retrieves an array of completed tasks in the Druid cluster. This is functionally equivalent to `/druid/indexer/v1/tasks?state=complete`. For definitions of the response properties, see the [Tasks table](../querying/sql-metadata-tables.md#tasks-table). + +#### URL + +`GET` `/druid/indexer/v1/completeTasks` + +#### Query parameters + +The endpoint supports a set of optional query parameters to filter results. + +|Parameter|Type|Description| +|---|---|---| +| `datasource`|String| Return tasks filtered by Druid datasource.| +| `createdTimeInterval`|String (ISO-8601)| Return tasks created within the specified interval. The interval string should be delimited by `_` instead of `/`. For example, `2023-06-27_2023-06-28`.| +| `max`|Integer|Maximum number of `complete` tasks to return. Only applies when `state` is set to `complete`.| +| `type`|String|Filter tasks by task type. See [task documentation](../ingestion/tasks.md) for more details.| + +#### Responses + + + + + + +
+ +*Successfully retrieved list of complete tasks* + +
+ + + +
+ +*Request sent to incorrect service* + +
+
+ +--- + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/completeTasks" +``` + + + + + +```HTTP +GET /druid/indexer/v1/completeTasks HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + [ + { + "id": "query-223549f8-b993-4483-b028-1b0d54713cad-worker0_0", + "groupId": "query-223549f8-b993-4483-b028-1b0d54713cad", + "type": "query_worker", + "createdTime": "2023-06-22T22:11:37.012Z", + "queueInsertionTime": "1970-01-01T00:00:00.000Z", + "statusCode": "SUCCESS", + "status": "SUCCESS", + "runnerStatusCode": "NONE", + "duration": 17897, + "location": { + "host": "localhost", + "port": 8101, + "tlsPort": -1 + }, + "dataSource": "wikipedia_api", + "errorMsg": null + }, + { + "id": "query-223549f8-b993-4483-b028-1b0d54713cad", + "groupId": "query-223549f8-b993-4483-b028-1b0d54713cad", + "type": "query_controller", + "createdTime": "2023-06-22T22:11:28.367Z", + "queueInsertionTime": "1970-01-01T00:00:00.000Z", + "statusCode": "SUCCESS", + "status": "SUCCESS", + "runnerStatusCode": "NONE", + "duration": 30317, + "location": { + "host": "localhost", + "port": 8100, + "tlsPort": -1 + }, + "dataSource": "wikipedia_api", + "errorMsg": null + } + ] + ``` + +
+ +### Get an array of running tasks + +Retrieves an array of running task objects in the Druid cluster. It is functionally equivalent to `/druid/indexer/v1/tasks?state=running`. For definitions of the response properties, see the [Tasks table](../querying/sql-metadata-tables.md#tasks-table). + +#### URL + +`GET` `/druid/indexer/v1/runningTasks` + +#### Query parameters + +The endpoint supports a set of optional query parameters to filter results. + +|Parameter|Type|Description| +|---|---|---| +| `datasource`|String| Return tasks filtered by Druid datasource.| +| `createdTimeInterval`|String (ISO-8601)| Return tasks created within the specified interval. The interval string should be delimited by `_` instead of `/`. For example, `2023-06-27_2023-06-28`.| +| `max`|Integer|Maximum number of `complete` tasks to return. Only applies when `state` is set to `complete`.| +| `type`|String|Filter tasks by task type. See [task documentation](../ingestion/tasks.md) for more details.| + +#### Responses + + + + + + +
+ +*Successfully retrieved list of running tasks* + +
+
+ +--- + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/runningTasks" +``` + + + + + +```HTTP +GET /druid/indexer/v1/runningTasks HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + + +#### Sample response + +
+ View the response + + ```json + [ + { + "id": "query-32663269-ead9-405a-8eb6-0817a952ef47", + "groupId": "query-32663269-ead9-405a-8eb6-0817a952ef47", + "type": "query_controller", + "createdTime": "2023-06-22T22:54:43.170Z", + "queueInsertionTime": "2023-06-22T22:54:43.170Z", + "statusCode": "RUNNING", + "status": "RUNNING", + "runnerStatusCode": "RUNNING", + "duration": -1, + "location": { + "host": "localhost", + "port": 8100, + "tlsPort": -1 + }, + "dataSource": "wikipedia_api", + "errorMsg": null + } + ] + ``` + +
+ +### Get an array of waiting tasks + +Retrieves an array of waiting tasks in the Druid cluster. It is functionally equivalent to `/druid/indexer/v1/tasks?state=waiting`. For definitions of the response properties, see the [Tasks table](../querying/sql-metadata-tables.md#tasks-table). + +#### URL + +`GET` `/druid/indexer/v1/waitingTasks` + +#### Query parameters + +The endpoint supports a set of optional query parameters to filter results. + +|Parameter|Type|Description| +|---|---|---| +| `datasource`|String| Return tasks filtered by Druid datasource.| +| `createdTimeInterval`|String (ISO-8601)| Return tasks created within the specified interval. The interval string should be delimited by `_` instead of `/`. For example, `2023-06-27_2023-06-28`.| +| `max`|Integer|Maximum number of `complete` tasks to return. Only applies when `state` is set to `complete`.| +| `type`|String|Filter tasks by task type. See [task documentation](../ingestion/tasks.md) for more details.| + +#### Responses + + + + + + +
+ +*Successfully retrieved list of waiting tasks* + +
+
+ +--- + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/waitingTasks" +``` + + + + + +```HTTP +GET /druid/indexer/v1/waitingTasks HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + [ + { + "id": "index_parallel_wikipedia_auto_biahcbmf_2023-06-26T21:08:05.216Z", + "groupId": "index_parallel_wikipedia_auto_biahcbmf_2023-06-26T21:08:05.216Z", + "type": "index_parallel", + "createdTime": "2023-06-26T21:08:05.217Z", + "queueInsertionTime": "1970-01-01T00:00:00.000Z", + "statusCode": "RUNNING", + "status": "RUNNING", + "runnerStatusCode": "WAITING", + "duration": -1, + "location": { + "host": null, + "port": -1, + "tlsPort": -1 + }, + "dataSource": "wikipedia_auto", + "errorMsg": null + }, + { + "id": "index_parallel_wikipedia_auto_afggfiec_2023-06-26T21:08:05.546Z", + "groupId": "index_parallel_wikipedia_auto_afggfiec_2023-06-26T21:08:05.546Z", + "type": "index_parallel", + "createdTime": "2023-06-26T21:08:05.548Z", + "queueInsertionTime": "1970-01-01T00:00:00.000Z", + "statusCode": "RUNNING", + "status": "RUNNING", + "runnerStatusCode": "WAITING", + "duration": -1, + "location": { + "host": null, + "port": -1, + "tlsPort": -1 + }, + "dataSource": "wikipedia_auto", + "errorMsg": null + }, + { + "id": "index_parallel_wikipedia_auto_jmmddihf_2023-06-26T21:08:06.644Z", + "groupId": "index_parallel_wikipedia_auto_jmmddihf_2023-06-26T21:08:06.644Z", + "type": "index_parallel", + "createdTime": "2023-06-26T21:08:06.671Z", + "queueInsertionTime": "1970-01-01T00:00:00.000Z", + "statusCode": "RUNNING", + "status": "RUNNING", + "runnerStatusCode": "WAITING", + "duration": -1, + "location": { + "host": null, + "port": -1, + "tlsPort": -1 + }, + "dataSource": "wikipedia_auto", + "errorMsg": null + } + ] + ``` + +
+ +### Get an array of pending tasks + +Retrieves an array of pending tasks in the Druid cluster. It is functionally equivalent to `/druid/indexer/v1/tasks?state=pending`. For definitions of the response properties, see the [Tasks table](../querying/sql-metadata-tables.md#tasks-table). + +#### URL + +`GET` `/druid/indexer/v1/pendingTasks` + +#### Query parameters + +The endpoint supports a set of optional query parameters to filter results. + +|Parameter|Type|Description| +|---|---|---| +| `datasource`|String| Return tasks filtered by Druid datasource.| +| `createdTimeInterval`|String (ISO-8601)| Return tasks created within the specified interval. The interval string should be delimited by `_` instead of `/`. For example, `2023-06-27_2023-06-28`.| +| `max`|Integer|Maximum number of `complete` tasks to return. Only applies when `state` is set to `complete`.| +| `type`|String|Filter tasks by task type. See [task documentation](../ingestion/tasks.md) for more details.| + +#### Responses + + + + + + +
+ +*Successfully retrieved list of pending tasks* + +
+
+ +--- + +#### Sample request + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/pendingTasks" +``` + + + + + +```HTTP +GET /druid/indexer/v1/pendingTasks HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + [ + { + "id": "query-7b37c315-50a0-4b68-aaa8-b1ef1f060e67", + "groupId": "query-7b37c315-50a0-4b68-aaa8-b1ef1f060e67", + "type": "query_controller", + "createdTime": "2023-06-23T19:53:06.037Z", + "queueInsertionTime": "2023-06-23T19:53:06.037Z", + "statusCode": "RUNNING", + "status": "RUNNING", + "runnerStatusCode": "PENDING", + "duration": -1, + "location": { + "host": null, + "port": -1, + "tlsPort": -1 + }, + "dataSource": "wikipedia_api", + "errorMsg": null + }, + { + "id": "query-544f0c41-f81d-4504-b98b-f9ab8b36ef36", + "groupId": "query-544f0c41-f81d-4504-b98b-f9ab8b36ef36", + "type": "query_controller", + "createdTime": "2023-06-23T19:53:06.616Z", + "queueInsertionTime": "2023-06-23T19:53:06.616Z", + "statusCode": "RUNNING", + "status": "RUNNING", + "runnerStatusCode": "PENDING", + "duration": -1, + "location": { + "host": null, + "port": -1, + "tlsPort": -1 + }, + "dataSource": "wikipedia_api", + "errorMsg": null + } + ] + ``` + +
+ +### Get task payload + +Retrieves the payload of a task given the task ID. It returns a JSON object with the task ID and payload that includes task configuration details and relevant specifications associated with the execution of the task. + +#### URL + +`GET` `/druid/indexer/v1/task/{taskId}` + +#### Responses + + + + + + +
+ +*Successfully retrieved payload of task* + +
+ + + +
+ +*Cannot find task with ID* + +
+
+ +--- + +#### Sample request + +The following examples shows how to retrieve the task payload of a task with the specified ID `index_parallel_wikipedia_short_iajoonnd_2023-07-07T17:53:12.174Z`. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/task/index_parallel_wikipedia_short_iajoonnd_2023-07-07T17:53:12.174Z" +``` + + + + + +```HTTP +GET /druid/indexer/v1/task/index_parallel_wikipedia_short_iajoonnd_2023-07-07T17:53:12.174Z HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + + +#### Sample response + +
+ View the response + + ```json + { + "task": "index_parallel_wikipedia_short_iajoonnd_2023-07-07T17:53:12.174Z", + "payload": { + "type": "index_parallel", + "id": "index_parallel_wikipedia_short_iajoonnd_2023-07-07T17:53:12.174Z", + "groupId": "index_parallel_wikipedia_short_iajoonnd_2023-07-07T17:53:12.174Z", + "resource": { + "availabilityGroup": "index_parallel_wikipedia_short_iajoonnd_2023-07-07T17:53:12.174Z", + "requiredCapacity": 1 + }, + "spec": { + "dataSchema": { + "dataSource": "wikipedia_short", + "timestampSpec": { + "column": "time", + "format": "iso", + "missingValue": null + }, + "dimensionsSpec": { + "dimensions": [ + { + "type": "string", + "name": "cityName", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "countryName", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + }, + { + "type": "string", + "name": "regionName", + "multiValueHandling": "SORTED_ARRAY", + "createBitmapIndex": true + } + ], + "dimensionExclusions": [ + "__time", + "time" + ], + "includeAllDimensions": false, + "useSchemaDiscovery": false + }, + "metricsSpec": [], + "granularitySpec": { + "type": "uniform", + "segmentGranularity": "DAY", + "queryGranularity": { + "type": "none" + }, + "rollup": false, + "intervals": [ + "2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.000Z" + ] + }, + "transformSpec": { + "filter": null, + "transforms": [] + } + }, + "ioConfig": { + "type": "index_parallel", + "inputSource": { + "type": "local", + "baseDir": "quickstart/tutorial", + "filter": "wikiticker-2015-09-12-sampled.json.gz" + }, + "inputFormat": { + "type": "json" + }, + "appendToExisting": false, + "dropExisting": false + }, + "tuningConfig": { + "type": "index_parallel", + "maxRowsPerSegment": 5000000, + "appendableIndexSpec": { + "type": "onheap", + "preserveExistingMetrics": false + }, + "maxRowsInMemory": 25000, + "maxBytesInMemory": 0, + "skipBytesInMemoryOverheadCheck": false, + "maxTotalRows": null, + "numShards": null, + "splitHintSpec": null, + "partitionsSpec": { + "type": "dynamic", + "maxRowsPerSegment": 5000000, + "maxTotalRows": null + }, + "indexSpec": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "indexSpecForIntermediatePersists": { + "bitmap": { + "type": "roaring" + }, + "dimensionCompression": "lz4", + "stringDictionaryEncoding": { + "type": "utf8" + }, + "metricCompression": "lz4", + "longEncoding": "longs" + }, + "maxPendingPersists": 0, + "forceGuaranteedRollup": false, + "reportParseExceptions": false, + "pushTimeout": 0, + "segmentWriteOutMediumFactory": null, + "maxNumConcurrentSubTasks": 1, + "maxRetry": 3, + "taskStatusCheckPeriodMs": 1000, + "chatHandlerTimeout": "PT10S", + "chatHandlerNumRetries": 5, + "maxNumSegmentsToMerge": 100, + "totalNumMergeTasks": 10, + "logParseExceptions": false, + "maxParseExceptions": 2147483647, + "maxSavedParseExceptions": 0, + "maxColumnsToMerge": -1, + "awaitSegmentAvailabilityTimeoutMillis": 0, + "maxAllowedLockCount": -1, + "partitionDimensions": [] + } + }, + "context": { + "forceTimeChunkLock": true, + "useLineageBasedSegmentAllocation": true + }, + "dataSource": "wikipedia_short" + } +} + ``` + +
+ +### Get task status + +Retrieves the status of a task given the task ID. It returns a JSON object with the task's status code, runner status, task type, datasource, and other relevant metadata. + +#### URL + +`GET` `/druid/indexer/v1/task/{taskId}/status` + +#### Responses + + + + + + +
+ +*Successfully retrieved task status* + +
+ + + +
+ +*Cannot find task with ID* + +
+
+ +--- + +#### Sample request + +The following examples shows how to retrieve the status of a task with the specified ID `query-223549f8-b993-4483-b028-1b0d54713cad`. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/task/query-223549f8-b993-4483-b028-1b0d54713cad/status" +``` + + + + + +```HTTP +GET /druid/indexer/v1/task/query-223549f8-b993-4483-b028-1b0d54713cad/status HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + + +#### Sample response + +
+ View the response + + ```json + { + "task": "query-223549f8-b993-4483-b028-1b0d54713cad", + "status": { + "id": "query-223549f8-b993-4483-b028-1b0d54713cad", + "groupId": "query-223549f8-b993-4483-b028-1b0d54713cad", + "type": "query_controller", + "createdTime": "2023-06-22T22:11:28.367Z", + "queueInsertionTime": "1970-01-01T00:00:00.000Z", + "statusCode": "RUNNING", + "status": "RUNNING", + "runnerStatusCode": "RUNNING", + "duration": -1, + "location": {"host": "localhost", "port": 8100, "tlsPort": -1}, + "dataSource": "wikipedia_api", + "errorMsg": null + } + } + ``` + +
+ +### Get task segments + +:::info + This API is not supported anymore and always returns a 404 response. + Use the metric `segment/added/bytes` instead to identify the segment IDs committed by a task. +::: + +#### URL + +`GET` `/druid/indexer/v1/task/{taskId}/segments` + +#### Responses + + + + + + +```json +{ + "error": "Segment IDs committed by a task action are not persisted anymore. Use the metric 'segment/added/bytes' to identify the segments created by a task." +} +``` + + + + +--- + +#### Sample request + +The following examples shows how to retrieve the task segment of the task with the specified ID `query-52a8aafe-7265-4427-89fe-dc51275cc470`. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/task/query-52a8aafe-7265-4427-89fe-dc51275cc470/reports" +``` + + + + + +```HTTP +GET /druid/indexer/v1/task/query-52a8aafe-7265-4427-89fe-dc51275cc470/reports HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +A successful request returns a `200 OK` response and an array of the task segments. + +### Get task log + +Retrieves the event log associated with a task. It returns a list of logged events during the lifecycle of the task. The endpoint is useful for providing information about the execution of the task, including any errors or warnings raised. + +Task logs are automatically retrieved from the Middle Manager/Indexer or in long-term storage. For reference, see [Task logs](../ingestion/tasks.md#task-logs). + +#### URL + +`GET` `/druid/indexer/v1/task/{taskId}/log` + +#### Query parameters + +* `offset` (optional) + * Type: Int + * Exclude the first passed in number of entries from the response. + +#### Responses + + + + + + +
+ +*Successfully retrieved task log* + +
+
+ +--- + +#### Sample request + +The following examples shows how to retrieve the task log of a task with the specified ID `index_kafka_social_media_0e905aa31037879_nommnaeg`. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/task/index_kafka_social_media_0e905aa31037879_nommnaeg/log" +``` + + + + + +```HTTP +GET /druid/indexer/v1/task/index_kafka_social_media_0e905aa31037879_nommnaeg/log HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + 2023-07-03T22:11:17,891 INFO [qtp1251996697-122] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Sequence[index_kafka_social_media_0e905aa31037879_0] end offsets updated from [{0=9223372036854775807}] to [{0=230985}]. + 2023-07-03T22:11:17,900 INFO [qtp1251996697-122] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Saved sequence metadata to disk: [SequenceMetadata{sequenceId=0, sequenceName='index_kafka_social_media_0e905aa31037879_0', assignments=[0], startOffsets={0=230985}, exclusiveStartPartitions=[], endOffsets={0=230985}, sentinel=false, checkpointed=true}] + 2023-07-03T22:11:17,901 INFO [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Received resume command, resuming ingestion. + 2023-07-03T22:11:17,901 INFO [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Finished reading partition[0], up to[230985]. + 2023-07-03T22:11:17,902 INFO [task-runner-0-priority-0] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-kafka-supervisor-dcanhmig-1, groupId=kafka-supervisor-dcanhmig] Resetting generation and member id due to: consumer pro-actively leaving the group + 2023-07-03T22:11:17,902 INFO [task-runner-0-priority-0] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-kafka-supervisor-dcanhmig-1, groupId=kafka-supervisor-dcanhmig] Request joining group due to: consumer pro-actively leaving the group + 2023-07-03T22:11:17,902 INFO [task-runner-0-priority-0] org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-kafka-supervisor-dcanhmig-1, groupId=kafka-supervisor-dcanhmig] Unsubscribed all topics or patterns and assigned partitions + 2023-07-03T22:11:17,912 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Persisted rows[0] and (estimated) bytes[0] + 2023-07-03T22:11:17,916 INFO [[index_kafka_social_media_0e905aa31037879_nommnaeg]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Flushed in-memory data with commit metadata [AppenderatorDriverMetadata{segments={}, lastSegmentIds={}, callerMetadata={nextPartitions=SeekableStreamEndSequenceNumbers{stream='social_media', partitionSequenceNumberMap={0=230985}}}}] for segments: + 2023-07-03T22:11:17,917 INFO [[index_kafka_social_media_0e905aa31037879_nommnaeg]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Persisted stats: processed rows: [0], persisted rows[0], sinks: [0], total fireHydrants (across sinks): [0], persisted fireHydrants (across sinks): [0] + 2023-07-03T22:11:17,919 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Pushing [0] segments in background + 2023-07-03T22:11:17,921 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Persisted rows[0] and (estimated) bytes[0] + 2023-07-03T22:11:17,924 INFO [[index_kafka_social_media_0e905aa31037879_nommnaeg]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Flushed in-memory data with commit metadata [AppenderatorDriverMetadata{segments={}, lastSegmentIds={}, callerMetadata={nextPartitions=SeekableStreamStartSequenceNumbers{stream='social_media', partitionSequenceNumberMap={0=230985}, exclusivePartitions=[]}, publishPartitions=SeekableStreamEndSequenceNumbers{stream='social_media', partitionSequenceNumberMap={0=230985}}}}] for segments: + 2023-07-03T22:11:17,924 INFO [[index_kafka_social_media_0e905aa31037879_nommnaeg]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Persisted stats: processed rows: [0], persisted rows[0], sinks: [0], total fireHydrants (across sinks): [0], persisted fireHydrants (across sinks): [0] + 2023-07-03T22:11:17,925 INFO [[index_kafka_social_media_0e905aa31037879_nommnaeg]-appenderator-merge] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Preparing to push (stats): processed rows: [0], sinks: [0], fireHydrants (across sinks): [0] + 2023-07-03T22:11:17,925 INFO [[index_kafka_social_media_0e905aa31037879_nommnaeg]-appenderator-merge] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Push complete... + 2023-07-03T22:11:17,929 INFO [[index_kafka_social_media_0e905aa31037879_nommnaeg]-publish] org.apache.druid.indexing.seekablestream.SequenceMetadata - With empty segment set, start offsets [SeekableStreamStartSequenceNumbers{stream='social_media', partitionSequenceNumberMap={0=230985}, exclusivePartitions=[]}] and end offsets [SeekableStreamEndSequenceNumbers{stream='social_media', partitionSequenceNumberMap={0=230985}}] are the same, skipping metadata commit. + 2023-07-03T22:11:17,930 INFO [[index_kafka_social_media_0e905aa31037879_nommnaeg]-publish] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Published [0] segments with commit metadata [{nextPartitions=SeekableStreamStartSequenceNumbers{stream='social_media', partitionSequenceNumberMap={0=230985}, exclusivePartitions=[]}, publishPartitions=SeekableStreamEndSequenceNumbers{stream='social_media', partitionSequenceNumberMap={0=230985}}}] + 2023-07-03T22:11:17,930 INFO [[index_kafka_social_media_0e905aa31037879_nommnaeg]-publish] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Published 0 segments for sequence [index_kafka_social_media_0e905aa31037879_0] with metadata [AppenderatorDriverMetadata{segments={}, lastSegmentIds={}, callerMetadata={nextPartitions=SeekableStreamStartSequenceNumbers{stream='social_media', partitionSequenceNumberMap={0=230985}, exclusivePartitions=[]}, publishPartitions=SeekableStreamEndSequenceNumbers{stream='social_media', partitionSequenceNumberMap={0=230985}}}}]. + 2023-07-03T22:11:17,931 INFO [[index_kafka_social_media_0e905aa31037879_nommnaeg]-publish] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Saved sequence metadata to disk: [] + 2023-07-03T22:11:17,932 INFO [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Handoff complete for segments: + 2023-07-03T22:11:17,932 INFO [task-runner-0-priority-0] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-kafka-supervisor-dcanhmig-1, groupId=kafka-supervisor-dcanhmig] Resetting generation and member id due to: consumer pro-actively leaving the group + 2023-07-03T22:11:17,932 INFO [task-runner-0-priority-0] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-kafka-supervisor-dcanhmig-1, groupId=kafka-supervisor-dcanhmig] Request joining group due to: consumer pro-actively leaving the group + 2023-07-03T22:11:17,933 INFO [task-runner-0-priority-0] org.apache.kafka.common.metrics.Metrics - Metrics scheduler closed + 2023-07-03T22:11:17,933 INFO [task-runner-0-priority-0] org.apache.kafka.common.metrics.Metrics - Closing reporter org.apache.kafka.common.metrics.JmxReporter + 2023-07-03T22:11:17,933 INFO [task-runner-0-priority-0] org.apache.kafka.common.metrics.Metrics - Metrics reporters closed + 2023-07-03T22:11:17,935 INFO [task-runner-0-priority-0] org.apache.kafka.common.utils.AppInfoParser - App info kafka.consumer for consumer-kafka-supervisor-dcanhmig-1 unregistered + 2023-07-03T22:11:17,936 INFO [task-runner-0-priority-0] org.apache.druid.curator.announcement.PathChildrenAnnouncer - Unannouncing [/druid/internal-discovery/PEON/localhost:8100] + 2023-07-03T22:11:17,972 INFO [task-runner-0-priority-0] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannounced self [{"druidNode":{"service":"druid/middleManager","host":"localhost","bindOnHost":false,"plaintextPort":8100,"port":-1,"tlsPort":-1,"enablePlaintextPort":true,"enableTlsPort":false},"nodeType":"peon","services":{"dataNodeService":{"type":"dataNodeService","tier":"_default_tier","maxSize":0,"type":"indexer-executor","serverType":"indexer-executor","priority":0},"lookupNodeService":{"type":"lookupNodeService","lookupTier":"__default"}}}]. + 2023-07-03T22:11:17,972 INFO [task-runner-0-priority-0] org.apache.druid.curator.announcement.PathChildrenAnnouncer - Unannouncing [/druid/announcements/localhost:8100] + 2023-07-03T22:11:17,996 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: { + "id" : "index_kafka_social_media_0e905aa31037879_nommnaeg", + "status" : "SUCCESS", + "duration" : 3601130, + "errorMsg" : null, + "location" : { + "host" : null, + "port" : -1, + "tlsPort" : -1 + } + } + 2023-07-03T22:11:17,998 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [ANNOUNCEMENTS] + 2023-07-03T22:11:18,005 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [SERVER] + 2023-07-03T22:11:18,009 INFO [main] org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@6491006{HTTP/1.1, (http/1.1)}{0.0.0.0:8100} + 2023-07-03T22:11:18,009 INFO [main] org.eclipse.jetty.server.session - node0 Stopped scavenging + 2023-07-03T22:11:18,012 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@742aa00a{/,null,STOPPED} + 2023-07-03T22:11:18,014 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [NORMAL] + 2023-07-03T22:11:18,014 INFO [main] org.apache.druid.server.coordination.ZkCoordinator - Stopping ZkCoordinator for [DruidServerMetadata{name='localhost:8100', hostAndPort='localhost:8100', hostAndTlsPort='null', maxSize=0, tier='_default_tier', type=indexer-executor, priority=0}] + 2023-07-03T22:11:18,014 INFO [main] org.apache.druid.server.coordination.SegmentLoadDropHandler - Stopping... + 2023-07-03T22:11:18,014 INFO [main] org.apache.druid.server.coordination.SegmentLoadDropHandler - Stopped. + 2023-07-03T22:11:18,014 INFO [main] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Starting graceful shutdown of task[index_kafka_social_media_0e905aa31037879_nommnaeg]. + 2023-07-03T22:11:18,014 INFO [main] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Stopping forcefully (status: [PUBLISHING]) + 2023-07-03T22:11:18,019 INFO [LookupExtractorFactoryContainerProvider-MainThread] org.apache.druid.query.lookup.LookupReferencesManager - Lookup Management loop exited. Lookup notices are not handled anymore. + 2023-07-03T22:11:18,020 INFO [main] org.apache.druid.query.lookup.LookupReferencesManager - Closed lookup [name]. + 2023-07-03T22:11:18,020 INFO [Curator-Framework-0] org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting + 2023-07-03T22:11:18,147 INFO [main] org.apache.zookeeper.ZooKeeper - Session: 0x1000097ceaf0007 closed + 2023-07-03T22:11:18,147 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x1000097ceaf0007 + 2023-07-03T22:11:18,151 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [INIT] + Finished peon task + ``` + +
+ +### Get task completion report + +Retrieves a [task completion report](../ingestion/tasks.md#task-reports) for a task. It returns a JSON object with information about the number of rows ingested, and any parse exceptions that Druid raised. + +#### URL + +`GET` `/druid/indexer/v1/task/{taskId}/reports` + +#### Responses + + + + + + +
+ +*Successfully retrieved task report* + +
+
+ +--- + +#### Sample request + +The following examples shows how to retrieve the completion report of a task with the specified ID `query-52a8aafe-7265-4427-89fe-dc51275cc470`. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/task/query-52a8aafe-7265-4427-89fe-dc51275cc470/reports" +``` + + + + + +```HTTP +GET /druid/indexer/v1/task/query-52a8aafe-7265-4427-89fe-dc51275cc470/reports HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + { + "ingestionStatsAndErrors": { + "type": "ingestionStatsAndErrors", + "taskId": "query-52a8aafe-7265-4427-89fe-dc51275cc470", + "payload": { + "ingestionState": "COMPLETED", + "unparseableEvents": {}, + "rowStats": { + "determinePartitions": { + "processed": 0, + "processedBytes": 0, + "processedWithError": 0, + "thrownAway": 0, + "unparseable": 0 + }, + "buildSegments": { + "processed": 39244, + "processedBytes": 17106256, + "processedWithError": 0, + "thrownAway": 0, + "unparseable": 0 + } + }, + "errorMsg": null, + "segmentAvailabilityConfirmed": false, + "segmentAvailabilityWaitTimeMs": 0 + } + } + } + ``` + +
+ +## Task operations + +### Submit a task + +Submits a JSON-based ingestion spec or supervisor spec to the Overlord. It returns the task ID of the submitted task. For information on creating an ingestion spec, refer to the [ingestion spec reference](../ingestion/ingestion-spec.md). + +Note that for most batch ingestion use cases, you should use the [SQL-ingestion API](./sql-ingestion-api.md) instead of JSON-based batch ingestion. + +#### URL + +`POST` `/druid/indexer/v1/task` + +#### Responses + + + + + + +
+ +*Successfully submitted task* + +
+ + + +
+ +*Missing information in query* + +
+ + + +
+ +*Incorrect request body media type* + +
+ + + +
+ +*Unexpected token or characters in request body* + +
+
+ +--- + +#### Sample request + +The following request is an example of submitting a task to create a datasource named `"wikipedia auto"`. + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/task" \ +--header 'Content-Type: application/json' \ +--data '{ + "type" : "index_parallel", + "spec" : { + "dataSchema" : { + "dataSource" : "wikipedia_auto", + "timestampSpec": { + "column": "time", + "format": "iso" + }, + "dimensionsSpec" : { + "useSchemaDiscovery": true + }, + "metricsSpec" : [], + "granularitySpec" : { + "type" : "uniform", + "segmentGranularity" : "day", + "queryGranularity" : "none", + "intervals" : ["2015-09-12/2015-09-13"], + "rollup" : false + } + }, + "ioConfig" : { + "type" : "index_parallel", + "inputSource" : { + "type" : "local", + "baseDir" : "quickstart/tutorial/", + "filter" : "wikiticker-2015-09-12-sampled.json.gz" + }, + "inputFormat" : { + "type" : "json" + }, + "appendToExisting" : false + }, + "tuningConfig" : { + "type" : "index_parallel", + "maxRowsPerSegment" : 5000000, + "maxRowsInMemory" : 25000 + } + } +}' + +``` + + + + +```HTTP +POST /druid/indexer/v1/task HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json +Content-Length: 952 + +{ + "type" : "index_parallel", + "spec" : { + "dataSchema" : { + "dataSource" : "wikipedia_auto", + "timestampSpec": { + "column": "time", + "format": "iso" + }, + "dimensionsSpec" : { + "useSchemaDiscovery": true + }, + "metricsSpec" : [], + "granularitySpec" : { + "type" : "uniform", + "segmentGranularity" : "day", + "queryGranularity" : "none", + "intervals" : ["2015-09-12/2015-09-13"], + "rollup" : false + } + }, + "ioConfig" : { + "type" : "index_parallel", + "inputSource" : { + "type" : "local", + "baseDir" : "quickstart/tutorial/", + "filter" : "wikiticker-2015-09-12-sampled.json.gz" + }, + "inputFormat" : { + "type" : "json" + }, + "appendToExisting" : false + }, + "tuningConfig" : { + "type" : "index_parallel", + "maxRowsPerSegment" : 5000000, + "maxRowsInMemory" : 25000 + } + } +} +``` + + + + +#### Sample response + +
+ View the response + + ```json + { + "task": "index_parallel_wikipedia_odofhkle_2023-06-23T21:07:28.226Z" + } + ``` + +
+ +### Shut down a task + +Shuts down a task if it not already complete. Returns a JSON object with the ID of the task that was shut down successfully. + +#### URL + +`POST` `/druid/indexer/v1/task/{taskId}/shutdown` + +#### Responses + + + + + + +
+ +*Successfully shut down task* + +
+ + + +
+ +*Cannot find task with ID or task is no longer running* + +
+
+ +--- + +#### Sample request + +The following request shows how to shut down a task with the ID `query-52as 8aafe-7265-4427-89fe-dc51275cc470`. + + + + + + +```shell +curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/task/query-52as 8aafe-7265-4427-89fe-dc51275cc470/shutdown" +``` + + + + + +```HTTP +POST /druid/indexer/v1/task/query-52as 8aafe-7265-4427-89fe-dc51275cc470/shutdown HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + { + "task": "query-577a83dd-a14e-4380-bd01-c942b781236b" + } + ``` + +
+ +### Shut down all tasks for a datasource + +Shuts down all tasks for a specified datasource. If successful, it returns a JSON object with the name of the datasource whose tasks are shut down. + +#### URL + +`POST` `/druid/indexer/v1/datasources/{datasource}/shutdownAllTasks` + +#### Responses + + + + + + +
+ +*Successfully shut down tasks* + +
+ + + +
+ +*Error or datasource does not have a running task* + +
+
+ +--- + +#### Sample request + +The following request is an example of shutting down all tasks for datasource `wikipedia_auto`. + + + + + + +```shell +curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/datasources/wikipedia_auto/shutdownAllTasks" +``` + + + + + +```HTTP +POST /druid/indexer/v1/datasources/wikipedia_auto/shutdownAllTasks HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + { + "dataSource": "wikipedia_api" + } + ``` + +
+ +## Task management + +### Retrieve status objects for tasks + +Retrieves list of task status objects for list of task ID strings in request body. It returns a set of JSON objects with the status, duration, location of each task, and any error messages. + +#### URL + +`POST` `/druid/indexer/v1/taskStatus` + +#### Responses + + + + + + +
+ +*Successfully retrieved status objects* + +
+ + + +
+ +*Missing request body or incorrect request body type* + +
+
+ +--- + +#### Sample request + +The following request is an example of retrieving status objects for task ID `index_parallel_wikipedia_auto_jndhkpbo_2023-06-26T17:23:05.308Z` and `index_parallel_wikipedia_auto_jbgiianh_2023-06-26T23:17:56.769Z` . + + + + + + +```shell +curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/taskStatus" \ +--header 'Content-Type: application/json' \ +--data '["index_parallel_wikipedia_auto_jndhkpbo_2023-06-26T17:23:05.308Z","index_parallel_wikipedia_auto_jbgiianh_2023-06-26T23:17:56.769Z"]' +``` + + + + + +```HTTP +POST /druid/indexer/v1/taskStatus HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +Content-Type: application/json +Content-Length: 134 + +["index_parallel_wikipedia_auto_jndhkpbo_2023-06-26T17:23:05.308Z", "index_parallel_wikipedia_auto_jbgiianh_2023-06-26T23:17:56.769Z"] +``` + + + + + +#### Sample response + +
+ View the response + + ```json + { + "index_parallel_wikipedia_auto_jbgiianh_2023-06-26T23:17:56.769Z": { + "id": "index_parallel_wikipedia_auto_jbgiianh_2023-06-26T23:17:56.769Z", + "status": "SUCCESS", + "duration": 10630, + "errorMsg": null, + "location": { + "host": "localhost", + "port": 8100, + "tlsPort": -1 + } + }, + "index_parallel_wikipedia_auto_jndhkpbo_2023-06-26T17:23:05.308Z": { + "id": "index_parallel_wikipedia_auto_jndhkpbo_2023-06-26T17:23:05.308Z", + "status": "SUCCESS", + "duration": 11012, + "errorMsg": null, + "location": { + "host": "localhost", + "port": 8100, + "tlsPort": -1 + } + } + } + ``` + +
+ +### Clean up pending segments for a datasource + +Manually clean up pending segments table in metadata storage for `datasource`. It returns a JSON object response with +`numDeleted` for the number of rows deleted from the pending segments table. This API is used by the +`druid.coordinator.kill.pendingSegments.on` [Coordinator setting](../configuration/index.md#data-management) +which automates this operation to perform periodically. + +#### URL + +`DELETE` `/druid/indexer/v1/pendingSegments/{datasource}` + +#### Responses + + + + + + +
+ +*Successfully deleted pending segments* + +
+
+ +--- + +#### Sample request + +The following request is an example of cleaning up pending segments for the `wikipedia_api` datasource. + + + + + + +```shell +curl --request DELETE "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/pendingSegments/wikipedia_api" +``` + + + + + +```HTTP +DELETE /druid/indexer/v1/pendingSegments/wikipedia_api HTTP/1.1 +Host: http://ROUTER_IP:ROUTER_PORT +``` + + + + +#### Sample response + +
+ View the response + + ```json + { + "numDeleted": 2 + } + ``` + +
\ No newline at end of file diff --git a/docs/35.0.0/assets/compaction-config.png b/docs/35.0.0/assets/compaction-config.png new file mode 100644 index 0000000000..9dbcfefa80 Binary files /dev/null and b/docs/35.0.0/assets/compaction-config.png differ diff --git a/docs/35.0.0/assets/datasources-action-button.png b/docs/35.0.0/assets/datasources-action-button.png new file mode 100644 index 0000000000..6a52b8444d Binary files /dev/null and b/docs/35.0.0/assets/datasources-action-button.png differ diff --git a/docs/35.0.0/assets/druid-architecture.png b/docs/35.0.0/assets/druid-architecture.png new file mode 100644 index 0000000000..954a87bc1b Binary files /dev/null and b/docs/35.0.0/assets/druid-architecture.png differ diff --git a/docs/35.0.0/assets/druid-architecture.svg b/docs/35.0.0/assets/druid-architecture.svg new file mode 100644 index 0000000000..9d0e67188f --- /dev/null +++ b/docs/35.0.0/assets/druid-architecture.svg @@ -0,0 +1,19 @@ + + \ No newline at end of file diff --git a/docs/35.0.0/assets/druid-column-types.png b/docs/35.0.0/assets/druid-column-types.png new file mode 100644 index 0000000000..9db56c0681 Binary files /dev/null and b/docs/35.0.0/assets/druid-column-types.png differ diff --git a/docs/35.0.0/assets/druid-dataflow-2x.png b/docs/35.0.0/assets/druid-dataflow-2x.png new file mode 100644 index 0000000000..ab1c583e43 Binary files /dev/null and b/docs/35.0.0/assets/druid-dataflow-2x.png differ diff --git a/docs/35.0.0/assets/druid-dataflow-3.png b/docs/35.0.0/assets/druid-dataflow-3.png new file mode 100644 index 0000000000..355215cbce Binary files /dev/null and b/docs/35.0.0/assets/druid-dataflow-3.png differ diff --git a/docs/35.0.0/assets/druid-manage-1.png b/docs/35.0.0/assets/druid-manage-1.png new file mode 100644 index 0000000000..0d10c6e7bc Binary files /dev/null and b/docs/35.0.0/assets/druid-manage-1.png differ diff --git a/docs/35.0.0/assets/druid-timeline.png b/docs/35.0.0/assets/druid-timeline.png new file mode 100644 index 0000000000..40380e2794 Binary files /dev/null and b/docs/35.0.0/assets/druid-timeline.png differ diff --git a/docs/35.0.0/assets/files/kttm-kafka-supervisor.json b/docs/35.0.0/assets/files/kttm-kafka-supervisor.json new file mode 100644 index 0000000000..2096f9c7cd --- /dev/null +++ b/docs/35.0.0/assets/files/kttm-kafka-supervisor.json @@ -0,0 +1,66 @@ +{ + "type": "kafka", + "spec": { + "ioConfig": { + "type": "kafka", + "consumerProperties": { + "bootstrap.servers": "localhost:9092" + }, + "topic": "kttm", + "inputFormat": { + "type": "json" + }, + "useEarliestOffset": true + }, + "tuningConfig": { + "type": "kafka" + }, + "dataSchema": { + "dataSource": "kttm-kafka-supervisor-api", + "timestampSpec": { + "column": "timestamp", + "format": "iso" + }, + "dimensionsSpec": { + "dimensions": [ + "session", + "number", + "client_ip", + "language", + "adblock_list", + "app_version", + "path", + "loaded_image", + "referrer", + "referrer_host", + "server_ip", + "screen", + "window", + { + "type": "long", + "name": "session_length" + }, + "timezone", + "timezone_offset", + { + "type": "json", + "name": "event" + }, + { + "type": "json", + "name": "agent" + }, + { + "type": "json", + "name": "geo_ip" + } + ] + }, + "granularitySpec": { + "queryGranularity": "none", + "rollup": false, + "segmentGranularity": "day" + } + } + } +} \ No newline at end of file diff --git a/docs/35.0.0/assets/indexing_service.png b/docs/35.0.0/assets/indexing_service.png new file mode 100644 index 0000000000..a4462a413c Binary files /dev/null and b/docs/35.0.0/assets/indexing_service.png differ diff --git a/docs/35.0.0/assets/multi-stage-query/msq-ui-download-query-results.png b/docs/35.0.0/assets/multi-stage-query/msq-ui-download-query-results.png new file mode 100644 index 0000000000..e428cb2dfd Binary files /dev/null and b/docs/35.0.0/assets/multi-stage-query/msq-ui-download-query-results.png differ diff --git a/docs/35.0.0/assets/multi-stage-query/tutorial-msq-convert.png b/docs/35.0.0/assets/multi-stage-query/tutorial-msq-convert.png new file mode 100644 index 0000000000..f16941af67 Binary files /dev/null and b/docs/35.0.0/assets/multi-stage-query/tutorial-msq-convert.png differ diff --git a/docs/35.0.0/assets/multi-stage-query/ui-annotated.png b/docs/35.0.0/assets/multi-stage-query/ui-annotated.png new file mode 100644 index 0000000000..5a98c00d19 Binary files /dev/null and b/docs/35.0.0/assets/multi-stage-query/ui-annotated.png differ diff --git a/docs/35.0.0/assets/multi-stage-query/ui-empty.png b/docs/35.0.0/assets/multi-stage-query/ui-empty.png new file mode 100644 index 0000000000..7c30d5a671 Binary files /dev/null and b/docs/35.0.0/assets/multi-stage-query/ui-empty.png differ diff --git a/docs/35.0.0/assets/native-queries-01.png b/docs/35.0.0/assets/native-queries-01.png new file mode 100644 index 0000000000..27fd29b632 Binary files /dev/null and b/docs/35.0.0/assets/native-queries-01.png differ diff --git a/docs/35.0.0/assets/nested-combined-json.png b/docs/35.0.0/assets/nested-combined-json.png new file mode 100644 index 0000000000..f98bfcf538 Binary files /dev/null and b/docs/35.0.0/assets/nested-combined-json.png differ diff --git a/docs/35.0.0/assets/nested-display-data-types.png b/docs/35.0.0/assets/nested-display-data-types.png new file mode 100644 index 0000000000..2776068ee4 Binary files /dev/null and b/docs/35.0.0/assets/nested-display-data-types.png differ diff --git a/docs/35.0.0/assets/nested-examine-schema.png b/docs/35.0.0/assets/nested-examine-schema.png new file mode 100644 index 0000000000..11769a162a Binary files /dev/null and b/docs/35.0.0/assets/nested-examine-schema.png differ diff --git a/docs/35.0.0/assets/nested-extract-as-type.png b/docs/35.0.0/assets/nested-extract-as-type.png new file mode 100644 index 0000000000..c54a5eeb62 Binary files /dev/null and b/docs/35.0.0/assets/nested-extract-as-type.png differ diff --git a/docs/35.0.0/assets/nested-extract-elements.png b/docs/35.0.0/assets/nested-extract-elements.png new file mode 100644 index 0000000000..9f7076b50d Binary files /dev/null and b/docs/35.0.0/assets/nested-extract-elements.png differ diff --git a/docs/35.0.0/assets/nested-group-aggregate.png b/docs/35.0.0/assets/nested-group-aggregate.png new file mode 100644 index 0000000000..2d1907fe64 Binary files /dev/null and b/docs/35.0.0/assets/nested-group-aggregate.png differ diff --git a/docs/35.0.0/assets/nested-msq-ingestion-transform.png b/docs/35.0.0/assets/nested-msq-ingestion-transform.png new file mode 100644 index 0000000000..b46fde8593 Binary files /dev/null and b/docs/35.0.0/assets/nested-msq-ingestion-transform.png differ diff --git a/docs/35.0.0/assets/nested-msq-ingestion.png b/docs/35.0.0/assets/nested-msq-ingestion.png new file mode 100644 index 0000000000..0487ee1883 Binary files /dev/null and b/docs/35.0.0/assets/nested-msq-ingestion.png differ diff --git a/docs/35.0.0/assets/nested-parse-deserialize.png b/docs/35.0.0/assets/nested-parse-deserialize.png new file mode 100644 index 0000000000..881a67164b Binary files /dev/null and b/docs/35.0.0/assets/nested-parse-deserialize.png differ diff --git a/docs/35.0.0/assets/nested-retrieve-json.png b/docs/35.0.0/assets/nested-retrieve-json.png new file mode 100644 index 0000000000..4f5fa0f969 Binary files /dev/null and b/docs/35.0.0/assets/nested-retrieve-json.png differ diff --git a/docs/35.0.0/assets/nested-return-json.png b/docs/35.0.0/assets/nested-return-json.png new file mode 100644 index 0000000000..9a67aaa71d Binary files /dev/null and b/docs/35.0.0/assets/nested-return-json.png differ diff --git a/docs/35.0.0/assets/retention-rules.png b/docs/35.0.0/assets/retention-rules.png new file mode 100644 index 0000000000..59061d5511 Binary files /dev/null and b/docs/35.0.0/assets/retention-rules.png differ diff --git a/docs/35.0.0/assets/security-model-1.png b/docs/35.0.0/assets/security-model-1.png new file mode 100644 index 0000000000..55c7f24c54 Binary files /dev/null and b/docs/35.0.0/assets/security-model-1.png differ diff --git a/docs/35.0.0/assets/security-model-2.png b/docs/35.0.0/assets/security-model-2.png new file mode 100644 index 0000000000..dcb256bacc Binary files /dev/null and b/docs/35.0.0/assets/security-model-2.png differ diff --git a/docs/35.0.0/assets/segmentPropagation.png b/docs/35.0.0/assets/segmentPropagation.png new file mode 100644 index 0000000000..e1ec82029e Binary files /dev/null and b/docs/35.0.0/assets/segmentPropagation.png differ diff --git a/docs/35.0.0/assets/services-overview.png b/docs/35.0.0/assets/services-overview.png new file mode 100644 index 0000000000..157ce608e5 Binary files /dev/null and b/docs/35.0.0/assets/services-overview.png differ diff --git a/docs/35.0.0/assets/set-query-context-insert-query.png b/docs/35.0.0/assets/set-query-context-insert-query.png new file mode 100644 index 0000000000..d156597d2a Binary files /dev/null and b/docs/35.0.0/assets/set-query-context-insert-query.png differ diff --git a/docs/35.0.0/assets/set-query-context-open-context-dialog.png b/docs/35.0.0/assets/set-query-context-open-context-dialog.png new file mode 100644 index 0000000000..765caa0d72 Binary files /dev/null and b/docs/35.0.0/assets/set-query-context-open-context-dialog.png differ diff --git a/docs/35.0.0/assets/set-query-context-query-view.png b/docs/35.0.0/assets/set-query-context-query-view.png new file mode 100644 index 0000000000..9d25d3c664 Binary files /dev/null and b/docs/35.0.0/assets/set-query-context-query-view.png differ diff --git a/docs/35.0.0/assets/set-query-context-run-the-query.png b/docs/35.0.0/assets/set-query-context-run-the-query.png new file mode 100644 index 0000000000..27f29f8390 Binary files /dev/null and b/docs/35.0.0/assets/set-query-context-run-the-query.png differ diff --git a/docs/35.0.0/assets/set-query-context-set-context-parameters.png b/docs/35.0.0/assets/set-query-context-set-context-parameters.png new file mode 100644 index 0000000000..17fa110501 Binary files /dev/null and b/docs/35.0.0/assets/set-query-context-set-context-parameters.png differ diff --git a/docs/35.0.0/assets/spectator-histogram-size-comparison.png b/docs/35.0.0/assets/spectator-histogram-size-comparison.png new file mode 100644 index 0000000000..306f45abd8 Binary files /dev/null and b/docs/35.0.0/assets/spectator-histogram-size-comparison.png differ diff --git a/docs/35.0.0/assets/supervisor-actions.png b/docs/35.0.0/assets/supervisor-actions.png new file mode 100644 index 0000000000..2797cf69ea Binary files /dev/null and b/docs/35.0.0/assets/supervisor-actions.png differ diff --git a/docs/35.0.0/assets/supervisor-info-dialog.png b/docs/35.0.0/assets/supervisor-info-dialog.png new file mode 100644 index 0000000000..3be424a413 Binary files /dev/null and b/docs/35.0.0/assets/supervisor-info-dialog.png differ diff --git a/docs/35.0.0/assets/supervisor-view.png b/docs/35.0.0/assets/supervisor-view.png new file mode 100644 index 0000000000..e3100cdd3b Binary files /dev/null and b/docs/35.0.0/assets/supervisor-view.png differ diff --git a/docs/35.0.0/assets/tutorial-batch-data-loader-00.png b/docs/35.0.0/assets/tutorial-batch-data-loader-00.png new file mode 100644 index 0000000000..793b6c1232 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-batch-data-loader-00.png differ diff --git a/docs/35.0.0/assets/tutorial-batch-data-loader-01.png b/docs/35.0.0/assets/tutorial-batch-data-loader-01.png new file mode 100644 index 0000000000..2ff1d6398b Binary files /dev/null and b/docs/35.0.0/assets/tutorial-batch-data-loader-01.png differ diff --git a/docs/35.0.0/assets/tutorial-batch-data-loader-015.png b/docs/35.0.0/assets/tutorial-batch-data-loader-015.png new file mode 100644 index 0000000000..fd588caea4 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-batch-data-loader-015.png differ diff --git a/docs/35.0.0/assets/tutorial-batch-data-loader-02.png b/docs/35.0.0/assets/tutorial-batch-data-loader-02.png new file mode 100644 index 0000000000..736188cb13 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-batch-data-loader-02.png differ diff --git a/docs/35.0.0/assets/tutorial-batch-data-loader-03.png b/docs/35.0.0/assets/tutorial-batch-data-loader-03.png new file mode 100644 index 0000000000..74bb8c88fe Binary files /dev/null and b/docs/35.0.0/assets/tutorial-batch-data-loader-03.png differ diff --git a/docs/35.0.0/assets/tutorial-batch-data-loader-04.png b/docs/35.0.0/assets/tutorial-batch-data-loader-04.png new file mode 100644 index 0000000000..e4237cda8a Binary files /dev/null and b/docs/35.0.0/assets/tutorial-batch-data-loader-04.png differ diff --git a/docs/35.0.0/assets/tutorial-batch-data-loader-05.png b/docs/35.0.0/assets/tutorial-batch-data-loader-05.png new file mode 100644 index 0000000000..d245dde67a Binary files /dev/null and b/docs/35.0.0/assets/tutorial-batch-data-loader-05.png differ diff --git a/docs/35.0.0/assets/tutorial-batch-data-loader-06.png b/docs/35.0.0/assets/tutorial-batch-data-loader-06.png new file mode 100644 index 0000000000..285fd57ba2 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-batch-data-loader-06.png differ diff --git a/docs/35.0.0/assets/tutorial-batch-data-loader-07.png b/docs/35.0.0/assets/tutorial-batch-data-loader-07.png new file mode 100644 index 0000000000..481838d789 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-batch-data-loader-07.png differ diff --git a/docs/35.0.0/assets/tutorial-batch-data-loader-08.png b/docs/35.0.0/assets/tutorial-batch-data-loader-08.png new file mode 100644 index 0000000000..b64c5a4e0d Binary files /dev/null and b/docs/35.0.0/assets/tutorial-batch-data-loader-08.png differ diff --git a/docs/35.0.0/assets/tutorial-batch-data-loader-09.png b/docs/35.0.0/assets/tutorial-batch-data-loader-09.png new file mode 100644 index 0000000000..bec3085f67 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-batch-data-loader-09.png differ diff --git a/docs/35.0.0/assets/tutorial-batch-data-loader-10.png b/docs/35.0.0/assets/tutorial-batch-data-loader-10.png new file mode 100644 index 0000000000..857a5a5c4f Binary files /dev/null and b/docs/35.0.0/assets/tutorial-batch-data-loader-10.png differ diff --git a/docs/35.0.0/assets/tutorial-batch-data-loader-11.png b/docs/35.0.0/assets/tutorial-batch-data-loader-11.png new file mode 100644 index 0000000000..bf7e304b8a Binary files /dev/null and b/docs/35.0.0/assets/tutorial-batch-data-loader-11.png differ diff --git a/docs/35.0.0/assets/tutorial-batch-data-loader-12.png b/docs/35.0.0/assets/tutorial-batch-data-loader-12.png new file mode 100644 index 0000000000..f195b9ca50 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-batch-data-loader-12.png differ diff --git a/docs/35.0.0/assets/tutorial-batch-submit-task-01.png b/docs/35.0.0/assets/tutorial-batch-submit-task-01.png new file mode 100644 index 0000000000..01b91427fc Binary files /dev/null and b/docs/35.0.0/assets/tutorial-batch-submit-task-01.png differ diff --git a/docs/35.0.0/assets/tutorial-batch-submit-task-02.png b/docs/35.0.0/assets/tutorial-batch-submit-task-02.png new file mode 100644 index 0000000000..ba7caeb22c Binary files /dev/null and b/docs/35.0.0/assets/tutorial-batch-submit-task-02.png differ diff --git a/docs/35.0.0/assets/tutorial-compaction-01.png b/docs/35.0.0/assets/tutorial-compaction-01.png new file mode 100644 index 0000000000..aeb9bf36fc Binary files /dev/null and b/docs/35.0.0/assets/tutorial-compaction-01.png differ diff --git a/docs/35.0.0/assets/tutorial-compaction-02.png b/docs/35.0.0/assets/tutorial-compaction-02.png new file mode 100644 index 0000000000..836d8a7a7c Binary files /dev/null and b/docs/35.0.0/assets/tutorial-compaction-02.png differ diff --git a/docs/35.0.0/assets/tutorial-compaction-03.png b/docs/35.0.0/assets/tutorial-compaction-03.png new file mode 100644 index 0000000000..d51f8f8a8a Binary files /dev/null and b/docs/35.0.0/assets/tutorial-compaction-03.png differ diff --git a/docs/35.0.0/assets/tutorial-compaction-04.png b/docs/35.0.0/assets/tutorial-compaction-04.png new file mode 100644 index 0000000000..46c5b1d261 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-compaction-04.png differ diff --git a/docs/35.0.0/assets/tutorial-compaction-05.png b/docs/35.0.0/assets/tutorial-compaction-05.png new file mode 100644 index 0000000000..e692694aff Binary files /dev/null and b/docs/35.0.0/assets/tutorial-compaction-05.png differ diff --git a/docs/35.0.0/assets/tutorial-compaction-06.png b/docs/35.0.0/assets/tutorial-compaction-06.png new file mode 100644 index 0000000000..55c999f9d1 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-compaction-06.png differ diff --git a/docs/35.0.0/assets/tutorial-compaction-07.png b/docs/35.0.0/assets/tutorial-compaction-07.png new file mode 100644 index 0000000000..661e89784c Binary files /dev/null and b/docs/35.0.0/assets/tutorial-compaction-07.png differ diff --git a/docs/35.0.0/assets/tutorial-compaction-08.png b/docs/35.0.0/assets/tutorial-compaction-08.png new file mode 100644 index 0000000000..6e3f1aa037 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-compaction-08.png differ diff --git a/docs/35.0.0/assets/tutorial-deletion-01.png b/docs/35.0.0/assets/tutorial-deletion-01.png new file mode 100644 index 0000000000..942f057d7e Binary files /dev/null and b/docs/35.0.0/assets/tutorial-deletion-01.png differ diff --git a/docs/35.0.0/assets/tutorial-deletion-02.png b/docs/35.0.0/assets/tutorial-deletion-02.png new file mode 100644 index 0000000000..516fdf7fe8 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-deletion-02.png differ diff --git a/docs/35.0.0/assets/tutorial-deletion-03.png b/docs/35.0.0/assets/tutorial-deletion-03.png new file mode 100644 index 0000000000..666ff7a89e Binary files /dev/null and b/docs/35.0.0/assets/tutorial-deletion-03.png differ diff --git a/docs/35.0.0/assets/tutorial-kafka-data-loader-01.png b/docs/35.0.0/assets/tutorial-kafka-data-loader-01.png new file mode 100644 index 0000000000..7f8d0daacd Binary files /dev/null and b/docs/35.0.0/assets/tutorial-kafka-data-loader-01.png differ diff --git a/docs/35.0.0/assets/tutorial-kafka-data-loader-02.png b/docs/35.0.0/assets/tutorial-kafka-data-loader-02.png new file mode 100644 index 0000000000..8475eeba2b Binary files /dev/null and b/docs/35.0.0/assets/tutorial-kafka-data-loader-02.png differ diff --git a/docs/35.0.0/assets/tutorial-kafka-data-loader-03.png b/docs/35.0.0/assets/tutorial-kafka-data-loader-03.png new file mode 100644 index 0000000000..dc7400404f Binary files /dev/null and b/docs/35.0.0/assets/tutorial-kafka-data-loader-03.png differ diff --git a/docs/35.0.0/assets/tutorial-kafka-data-loader-04.png b/docs/35.0.0/assets/tutorial-kafka-data-loader-04.png new file mode 100644 index 0000000000..5703066959 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-kafka-data-loader-04.png differ diff --git a/docs/35.0.0/assets/tutorial-kafka-data-loader-05.png b/docs/35.0.0/assets/tutorial-kafka-data-loader-05.png new file mode 100644 index 0000000000..c920f05658 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-kafka-data-loader-05.png differ diff --git a/docs/35.0.0/assets/tutorial-kafka-data-loader-06.png b/docs/35.0.0/assets/tutorial-kafka-data-loader-06.png new file mode 100644 index 0000000000..4fb96dd47c Binary files /dev/null and b/docs/35.0.0/assets/tutorial-kafka-data-loader-06.png differ diff --git a/docs/35.0.0/assets/tutorial-kafka-data-loader-07.png b/docs/35.0.0/assets/tutorial-kafka-data-loader-07.png new file mode 100644 index 0000000000..b3013b735d Binary files /dev/null and b/docs/35.0.0/assets/tutorial-kafka-data-loader-07.png differ diff --git a/docs/35.0.0/assets/tutorial-kafka-data-loader-08.png b/docs/35.0.0/assets/tutorial-kafka-data-loader-08.png new file mode 100644 index 0000000000..b1cdd2df16 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-kafka-data-loader-08.png differ diff --git a/docs/35.0.0/assets/tutorial-kafka-data-loader-09.png b/docs/35.0.0/assets/tutorial-kafka-data-loader-09.png new file mode 100644 index 0000000000..e2045ac895 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-kafka-data-loader-09.png differ diff --git a/docs/35.0.0/assets/tutorial-kafka-data-loader-10.png b/docs/35.0.0/assets/tutorial-kafka-data-loader-10.png new file mode 100644 index 0000000000..39eaa3750a Binary files /dev/null and b/docs/35.0.0/assets/tutorial-kafka-data-loader-10.png differ diff --git a/docs/35.0.0/assets/tutorial-kafka-data-loader-11.png b/docs/35.0.0/assets/tutorial-kafka-data-loader-11.png new file mode 100644 index 0000000000..7bd3d9a25e Binary files /dev/null and b/docs/35.0.0/assets/tutorial-kafka-data-loader-11.png differ diff --git a/docs/35.0.0/assets/tutorial-kafka-data-loader-12.png b/docs/35.0.0/assets/tutorial-kafka-data-loader-12.png new file mode 100644 index 0000000000..ed952b135b Binary files /dev/null and b/docs/35.0.0/assets/tutorial-kafka-data-loader-12.png differ diff --git a/docs/35.0.0/assets/tutorial-kafka-submit-supervisor-01.png b/docs/35.0.0/assets/tutorial-kafka-submit-supervisor-01.png new file mode 100644 index 0000000000..809c0c6733 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-kafka-submit-supervisor-01.png differ diff --git a/docs/35.0.0/assets/tutorial-query-01.png b/docs/35.0.0/assets/tutorial-query-01.png new file mode 100644 index 0000000000..99354cbdfe Binary files /dev/null and b/docs/35.0.0/assets/tutorial-query-01.png differ diff --git a/docs/35.0.0/assets/tutorial-query-02.png b/docs/35.0.0/assets/tutorial-query-02.png new file mode 100644 index 0000000000..4d789f5989 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-query-02.png differ diff --git a/docs/35.0.0/assets/tutorial-query-03.png b/docs/35.0.0/assets/tutorial-query-03.png new file mode 100644 index 0000000000..841d36bfe8 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-query-03.png differ diff --git a/docs/35.0.0/assets/tutorial-query-04.png b/docs/35.0.0/assets/tutorial-query-04.png new file mode 100644 index 0000000000..7c713e367c Binary files /dev/null and b/docs/35.0.0/assets/tutorial-query-04.png differ diff --git a/docs/35.0.0/assets/tutorial-query-05.png b/docs/35.0.0/assets/tutorial-query-05.png new file mode 100644 index 0000000000..4b3d78d155 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-query-05.png differ diff --git a/docs/35.0.0/assets/tutorial-query-06.png b/docs/35.0.0/assets/tutorial-query-06.png new file mode 100644 index 0000000000..cb35a07871 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-query-06.png differ diff --git a/docs/35.0.0/assets/tutorial-query-07.png b/docs/35.0.0/assets/tutorial-query-07.png new file mode 100644 index 0000000000..aa94d629f8 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-query-07.png differ diff --git a/docs/35.0.0/assets/tutorial-query-deepstorage-retention-rule.png b/docs/35.0.0/assets/tutorial-query-deepstorage-retention-rule.png new file mode 100644 index 0000000000..9dee37bdea Binary files /dev/null and b/docs/35.0.0/assets/tutorial-query-deepstorage-retention-rule.png differ diff --git a/docs/35.0.0/assets/tutorial-quickstart-01.png b/docs/35.0.0/assets/tutorial-quickstart-01.png new file mode 100644 index 0000000000..649708b7c4 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-quickstart-01.png differ diff --git a/docs/35.0.0/assets/tutorial-quickstart-02.png b/docs/35.0.0/assets/tutorial-quickstart-02.png new file mode 100644 index 0000000000..5edec67c3f Binary files /dev/null and b/docs/35.0.0/assets/tutorial-quickstart-02.png differ diff --git a/docs/35.0.0/assets/tutorial-quickstart-03.png b/docs/35.0.0/assets/tutorial-quickstart-03.png new file mode 100644 index 0000000000..917f25d040 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-quickstart-03.png differ diff --git a/docs/35.0.0/assets/tutorial-quickstart-04.png b/docs/35.0.0/assets/tutorial-quickstart-04.png new file mode 100644 index 0000000000..e847ef550c Binary files /dev/null and b/docs/35.0.0/assets/tutorial-quickstart-04.png differ diff --git a/docs/35.0.0/assets/tutorial-quickstart-05.png b/docs/35.0.0/assets/tutorial-quickstart-05.png new file mode 100644 index 0000000000..da3ed0dfa6 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-quickstart-05.png differ diff --git a/docs/35.0.0/assets/tutorial-retention-00.png b/docs/35.0.0/assets/tutorial-retention-00.png new file mode 100644 index 0000000000..a3f84a9fe6 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-retention-00.png differ diff --git a/docs/35.0.0/assets/tutorial-retention-01.png b/docs/35.0.0/assets/tutorial-retention-01.png new file mode 100644 index 0000000000..35a97c2626 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-retention-01.png differ diff --git a/docs/35.0.0/assets/tutorial-retention-02.png b/docs/35.0.0/assets/tutorial-retention-02.png new file mode 100644 index 0000000000..f38fad0d27 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-retention-02.png differ diff --git a/docs/35.0.0/assets/tutorial-retention-03.png b/docs/35.0.0/assets/tutorial-retention-03.png new file mode 100644 index 0000000000..256836a2d4 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-retention-03.png differ diff --git a/docs/35.0.0/assets/tutorial-retention-04.png b/docs/35.0.0/assets/tutorial-retention-04.png new file mode 100644 index 0000000000..d39495f87d Binary files /dev/null and b/docs/35.0.0/assets/tutorial-retention-04.png differ diff --git a/docs/35.0.0/assets/tutorial-retention-05.png b/docs/35.0.0/assets/tutorial-retention-05.png new file mode 100644 index 0000000000..638a752fac Binary files /dev/null and b/docs/35.0.0/assets/tutorial-retention-05.png differ diff --git a/docs/35.0.0/assets/tutorial-retention-06.png b/docs/35.0.0/assets/tutorial-retention-06.png new file mode 100644 index 0000000000..f47cbffbb1 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-retention-06.png differ diff --git a/docs/35.0.0/assets/tutorial-sql-aggregate-query.png b/docs/35.0.0/assets/tutorial-sql-aggregate-query.png new file mode 100644 index 0000000000..0ffbff60e0 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-sql-aggregate-query.png differ diff --git a/docs/35.0.0/assets/tutorial-sql-auto-queries.png b/docs/35.0.0/assets/tutorial-sql-auto-queries.png new file mode 100644 index 0000000000..dc04a8de6f Binary files /dev/null and b/docs/35.0.0/assets/tutorial-sql-auto-queries.png differ diff --git a/docs/35.0.0/assets/tutorial-sql-count-distinct-help.png b/docs/35.0.0/assets/tutorial-sql-count-distinct-help.png new file mode 100644 index 0000000000..5327972d2a Binary files /dev/null and b/docs/35.0.0/assets/tutorial-sql-count-distinct-help.png differ diff --git a/docs/35.0.0/assets/tutorial-sql-count-distinct.png b/docs/35.0.0/assets/tutorial-sql-count-distinct.png new file mode 100644 index 0000000000..5fb9b2ae0b Binary files /dev/null and b/docs/35.0.0/assets/tutorial-sql-count-distinct.png differ diff --git a/docs/35.0.0/assets/tutorial-sql-demo-queries.png b/docs/35.0.0/assets/tutorial-sql-demo-queries.png new file mode 100644 index 0000000000..16fc040a67 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-sql-demo-queries.png differ diff --git a/docs/35.0.0/assets/tutorial-sql-query-plan.png b/docs/35.0.0/assets/tutorial-sql-query-plan.png new file mode 100644 index 0000000000..03f3c3cc6e Binary files /dev/null and b/docs/35.0.0/assets/tutorial-sql-query-plan.png differ diff --git a/docs/35.0.0/assets/tutorial-sql-result-column-actions.png b/docs/35.0.0/assets/tutorial-sql-result-column-actions.png new file mode 100644 index 0000000000..16518d4bff Binary files /dev/null and b/docs/35.0.0/assets/tutorial-sql-result-column-actions.png differ diff --git a/docs/35.0.0/assets/tutorial-theta-01.png b/docs/35.0.0/assets/tutorial-theta-01.png new file mode 100644 index 0000000000..2411fbf194 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-theta-01.png differ diff --git a/docs/35.0.0/assets/tutorial-theta-02.png b/docs/35.0.0/assets/tutorial-theta-02.png new file mode 100644 index 0000000000..ce849fd36a Binary files /dev/null and b/docs/35.0.0/assets/tutorial-theta-02.png differ diff --git a/docs/35.0.0/assets/tutorial-theta-03.png b/docs/35.0.0/assets/tutorial-theta-03.png new file mode 100644 index 0000000000..316bf7f0b0 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-theta-03.png differ diff --git a/docs/35.0.0/assets/tutorial-theta-04.png b/docs/35.0.0/assets/tutorial-theta-04.png new file mode 100644 index 0000000000..21f383af6d Binary files /dev/null and b/docs/35.0.0/assets/tutorial-theta-04.png differ diff --git a/docs/35.0.0/assets/tutorial-theta-05.png b/docs/35.0.0/assets/tutorial-theta-05.png new file mode 100644 index 0000000000..ec2c8df6d3 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-theta-05.png differ diff --git a/docs/35.0.0/assets/tutorial-theta-06.png b/docs/35.0.0/assets/tutorial-theta-06.png new file mode 100644 index 0000000000..4048aa2389 Binary files /dev/null and b/docs/35.0.0/assets/tutorial-theta-06.png differ diff --git a/docs/35.0.0/assets/tutorial-theta-07.png b/docs/35.0.0/assets/tutorial-theta-07.png new file mode 100644 index 0000000000..369b5914ad Binary files /dev/null and b/docs/35.0.0/assets/tutorial-theta-07.png differ diff --git a/docs/35.0.0/assets/tutorial-theta-08.png b/docs/35.0.0/assets/tutorial-theta-08.png new file mode 100644 index 0000000000..59a6bc051e Binary files /dev/null and b/docs/35.0.0/assets/tutorial-theta-08.png differ diff --git a/docs/35.0.0/assets/web-console-0.7-tasks.png b/docs/35.0.0/assets/web-console-0.7-tasks.png new file mode 100644 index 0000000000..80080ba8ed Binary files /dev/null and b/docs/35.0.0/assets/web-console-0.7-tasks.png differ diff --git a/docs/35.0.0/assets/web-console-01-home-view.png b/docs/35.0.0/assets/web-console-01-home-view.png new file mode 100644 index 0000000000..39b6e8a1a6 Binary files /dev/null and b/docs/35.0.0/assets/web-console-01-home-view.png differ diff --git a/docs/35.0.0/assets/web-console-02-data-loader-1.png b/docs/35.0.0/assets/web-console-02-data-loader-1.png new file mode 100644 index 0000000000..ecd18c01f9 Binary files /dev/null and b/docs/35.0.0/assets/web-console-02-data-loader-1.png differ diff --git a/docs/35.0.0/assets/web-console-03-data-loader-2.png b/docs/35.0.0/assets/web-console-03-data-loader-2.png new file mode 100644 index 0000000000..bfb7be59cf Binary files /dev/null and b/docs/35.0.0/assets/web-console-03-data-loader-2.png differ diff --git a/docs/35.0.0/assets/web-console-04-datasources.png b/docs/35.0.0/assets/web-console-04-datasources.png new file mode 100644 index 0000000000..fab3cec452 Binary files /dev/null and b/docs/35.0.0/assets/web-console-04-datasources.png differ diff --git a/docs/35.0.0/assets/web-console-05-retention.png b/docs/35.0.0/assets/web-console-05-retention.png new file mode 100644 index 0000000000..96278525a8 Binary files /dev/null and b/docs/35.0.0/assets/web-console-05-retention.png differ diff --git a/docs/35.0.0/assets/web-console-06-segments.png b/docs/35.0.0/assets/web-console-06-segments.png new file mode 100644 index 0000000000..9e9e9ab985 Binary files /dev/null and b/docs/35.0.0/assets/web-console-06-segments.png differ diff --git a/docs/35.0.0/assets/web-console-07-supervisors.png b/docs/35.0.0/assets/web-console-07-supervisors.png new file mode 100644 index 0000000000..70391bd642 Binary files /dev/null and b/docs/35.0.0/assets/web-console-07-supervisors.png differ diff --git a/docs/35.0.0/assets/web-console-08-supervisor-status.png b/docs/35.0.0/assets/web-console-08-supervisor-status.png new file mode 100644 index 0000000000..1bcfccdfe6 Binary files /dev/null and b/docs/35.0.0/assets/web-console-08-supervisor-status.png differ diff --git a/docs/35.0.0/assets/web-console-09-task-status.png b/docs/35.0.0/assets/web-console-09-task-status.png new file mode 100644 index 0000000000..100e8ada0e Binary files /dev/null and b/docs/35.0.0/assets/web-console-09-task-status.png differ diff --git a/docs/35.0.0/assets/web-console-10-servers.png b/docs/35.0.0/assets/web-console-10-servers.png new file mode 100644 index 0000000000..a3e0084e12 Binary files /dev/null and b/docs/35.0.0/assets/web-console-10-servers.png differ diff --git a/docs/35.0.0/assets/web-console-11-query-sql.png b/docs/35.0.0/assets/web-console-11-query-sql.png new file mode 100644 index 0000000000..a144774f46 Binary files /dev/null and b/docs/35.0.0/assets/web-console-11-query-sql.png differ diff --git a/docs/35.0.0/assets/web-console-12-query-rune.png b/docs/35.0.0/assets/web-console-12-query-rune.png new file mode 100644 index 0000000000..8c5e270562 Binary files /dev/null and b/docs/35.0.0/assets/web-console-12-query-rune.png differ diff --git a/docs/35.0.0/assets/web-console-13-lookups.png b/docs/35.0.0/assets/web-console-13-lookups.png new file mode 100644 index 0000000000..fa0bd0b060 Binary files /dev/null and b/docs/35.0.0/assets/web-console-13-lookups.png differ diff --git a/docs/35.0.0/comparisons/druid-vs-elasticsearch.md b/docs/35.0.0/comparisons/druid-vs-elasticsearch.md new file mode 100644 index 0000000000..82752aa7ad --- /dev/null +++ b/docs/35.0.0/comparisons/druid-vs-elasticsearch.md @@ -0,0 +1,39 @@ +--- +id: druid-vs-elasticsearch +title: "Apache Druid vs Elasticsearch" +--- + + + + +We are not experts on search systems, if anything is incorrect about our portrayal, please let us know on the mailing list or via some other means. + +Elasticsearch is a search system based on Apache Lucene. It provides full text search for schema-free documents +and provides access to raw event level data. Elasticsearch is increasingly adding more support for analytics and aggregations. +[Some members of the community](https://groups.google.com/forum/#!msg/druid-development/nlpwTHNclj8/sOuWlKOzPpYJ) have pointed out +the resource requirements for data ingestion and aggregation in Elasticsearch is much higher than those of Druid. + +Elasticsearch also does not support data summarization/roll-up at ingestion time, which can compact the data that needs to be +stored up to 100x with real-world data sets. This leads to Elasticsearch having greater storage requirements. + +Druid focuses on OLAP work flows. Druid is optimized for high performance (fast aggregation and ingestion) at low cost, +and supports a wide range of analytic operations. Druid has some basic search support for structured event data, but does not support +full text search. Druid also does not support completely unstructured data. Measures must be defined in a Druid schema such that +summarization/roll-up can be done. diff --git a/docs/35.0.0/comparisons/druid-vs-key-value.md b/docs/35.0.0/comparisons/druid-vs-key-value.md new file mode 100644 index 0000000000..57f3dec66d --- /dev/null +++ b/docs/35.0.0/comparisons/druid-vs-key-value.md @@ -0,0 +1,46 @@ +--- +id: druid-vs-key-value +title: "Apache Druid vs. Key/Value Stores (HBase/Cassandra/OpenTSDB)" +--- + + + + +Druid is highly optimized for scans and aggregations, it supports arbitrarily deep drill downs into data sets. This same functionality +is supported in key/value stores in 2 ways: + +1. Pre-compute all permutations of possible user queries +2. Range scans on event data + +When pre-computing results, the key is the exact parameters of the query, and the value is the result of the query. +The queries return extremely quickly, but at the cost of flexibility, as ad-hoc exploratory queries are not possible with +pre-computing every possible query permutation. Pre-computing all permutations of all ad-hoc queries leads to result sets +that grow exponentially with the number of columns of a data set, and pre-computing queries for complex real-world data sets +can require hours of pre-processing time. + +The other approach to using key/value stores for aggregations to use the dimensions of an event as the key and the event measures as the value. +Aggregations are done by issuing range scans on this data. Timeseries specific databases such as OpenTSDB use this approach. +One of the limitations here is that the key/value storage model does not have indexes for any kind of filtering other than prefix ranges, +which can be used to filter a query down to a metric and time range, but cannot resolve complex predicates to narrow the exact data to scan. +When the number of rows to scan gets large, this limitation can greatly reduce performance. It is also harder to achieve good +locality with key/value stores because most don’t support pushing down aggregates to the storage layer. + +For arbitrary exploration of data (flexible data filtering), Druid's custom column format enables ad-hoc queries without pre-computation. The format +also enables fast scans on columns, which is important for good aggregation performance. diff --git a/docs/35.0.0/comparisons/druid-vs-kudu.md b/docs/35.0.0/comparisons/druid-vs-kudu.md new file mode 100644 index 0000000000..b992a1633d --- /dev/null +++ b/docs/35.0.0/comparisons/druid-vs-kudu.md @@ -0,0 +1,39 @@ +--- +id: druid-vs-kudu +title: "Apache Druid vs Kudu" +--- + + + + +Kudu's storage format enables single row updates, whereas updates to existing Druid segments requires recreating the segment, so theoretically +the process for updating old values should be higher latency in Druid. However, the requirements in Kudu for maintaining extra head space to store +updates as well as organizing data by id instead of time has the potential to introduce some extra latency and accessing +of data that is not needed to answer a query at query time. + +Druid summarizes/rollups up data at ingestion time, which in practice reduces the raw data that needs to be +stored significantly (up to 40 times on average), and increases performance of scanning raw data significantly. +Druid segments also contain bitmap indexes for fast filtering, which Kudu does not currently support. +Druid's segment architecture is heavily geared towards fast aggregates and filters, and for OLAP workflows. Appends are very +fast in Druid, whereas updates of older data are higher latency. This is by design as the data Druid is good for is typically event data, +and does not need to be updated too frequently. Kudu supports arbitrary primary keys with uniqueness constraints, and +efficient lookup by ranges of those keys. Kudu chooses not to include the execution engine, but supports sufficient +operations so as to allow node-local processing from the execution engines. This means that Kudu can support multiple frameworks on the same data (e.g., MR, Spark, and SQL). +Druid includes its own query layer that allows it to push down aggregations and computations directly to data processes for faster query processing. diff --git a/docs/35.0.0/comparisons/druid-vs-redshift.md b/docs/35.0.0/comparisons/druid-vs-redshift.md new file mode 100644 index 0000000000..3e2c7b9ead --- /dev/null +++ b/docs/35.0.0/comparisons/druid-vs-redshift.md @@ -0,0 +1,62 @@ +--- +id: druid-vs-redshift +title: "Apache Druid vs Redshift" +--- + + + + +### How does Druid compare to Redshift? + +In terms of drawing a differentiation, Redshift started out as ParAccel (Actian), which Amazon is licensing and has since heavily modified. + +Aside from potential performance differences, there are some functional differences: + +### Real-time data ingestion + +Because Druid is optimized to provide insight against massive quantities of streaming data; it is able to load and aggregate data in real-time. + +Generally traditional data warehouses including column stores work only with batch ingestion and are not optimal for streaming data in regularly. + +### Druid is a read oriented analytical data store + +Druid’s write semantics are not as fluid and does not support full joins (we support large table to small table joins). Redshift provides full SQL support including joins and insert/update statements. + +### Data distribution model + +Druid’s data distribution is segment-based and leverages a highly available "deep" storage such as S3 or HDFS. Scaling up (or down) does not require massive copy actions or downtime; in fact, losing any number of Historical processes does not result in data loss because new Historical processes can always be brought up by reading data from "deep" storage. + +To contrast, ParAccel’s data distribution model is hash-based. Expanding the cluster requires re-hashing the data across the nodes, making it difficult to perform without taking downtime. Amazon’s Redshift works around this issue with a multi-step process: + +* set cluster into read-only mode +* copy data from cluster to new cluster that exists in parallel +* redirect traffic to new cluster + +### Replication strategy + +Druid employs segment-level data distribution meaning that more processes can be added and rebalanced without having to perform a staged swap. The replication strategy also makes all replicas available for querying. Replication is done automatically and without any impact to performance. + +ParAccel’s hash-based distribution generally means that replication is conducted via hot spares. This puts a numerical limit on the number of nodes you can lose without losing data, and this replication strategy often does not allow the hot spare to help share query load. + +### Indexing strategy + +Along with column oriented structures, Druid uses indexing structures to speed up query execution when a filter is provided. Indexing structures do increase storage overhead (and make it more difficult to allow for mutation), but they also significantly speed up queries. + +ParAccel does not appear to employ indexing strategies. diff --git a/docs/35.0.0/comparisons/druid-vs-spark.md b/docs/35.0.0/comparisons/druid-vs-spark.md new file mode 100644 index 0000000000..4d3a6b43da --- /dev/null +++ b/docs/35.0.0/comparisons/druid-vs-spark.md @@ -0,0 +1,42 @@ +--- +id: druid-vs-spark +title: "Apache Druid vs Spark" +--- + + + + +Druid and Spark are complementary solutions as Druid can be used to accelerate OLAP queries in Spark. + +Spark is a general cluster computing framework initially designed around the concept of Resilient Distributed Datasets (RDDs). +RDDs enable data reuse by persisting intermediate results +in memory and enable Spark to provide fast computations for iterative algorithms. +This is especially beneficial for certain work flows such as machine +learning, where the same operation may be applied over and over +again until some result is converged upon. The generality of Spark makes it very suitable as an engine to process (clean or transform) data. +Although Spark provides the ability to query data through Spark SQL, much like Hadoop, the query latencies are not specifically targeted to be interactive (sub-second). + +Druid's focus is on extremely low latency queries, and is ideal for powering applications used by thousands of users, and where each query must +return fast enough such that users can interactively explore through data. Druid fully indexes all data, and can act as a middle layer between Spark and your application. +One typical setup seen in production is to process data in Spark, and load the processed data into Druid for faster access. + +For more information about using Druid and Spark together, including benchmarks of the two systems, please see: + +https://www.linkedin.com/pulse/combining-druid-spark-interactive-flexible-analytics-scale-butani diff --git a/docs/35.0.0/comparisons/druid-vs-sql-on-hadoop.md b/docs/35.0.0/comparisons/druid-vs-sql-on-hadoop.md new file mode 100644 index 0000000000..00e4473125 --- /dev/null +++ b/docs/35.0.0/comparisons/druid-vs-sql-on-hadoop.md @@ -0,0 +1,82 @@ +--- +id: druid-vs-sql-on-hadoop +title: "Apache Druid vs SQL-on-Hadoop" +--- + + + + +SQL-on-Hadoop engines provide an +execution engine for various data formats and data stores, and +many can be made to push down computations down to Druid, while providing a SQL interface to Druid. + +For a direct comparison between the technologies and when to only use one or the other, things basically comes down to your +product requirements and what the systems were designed to do. + +Druid was designed to + +1. be an always on service +1. ingest data in real-time +1. handle slice-n-dice style ad-hoc queries + +SQL-on-Hadoop engines generally sidestep Map/Reduce, instead querying data directly from HDFS or, in some cases, other storage systems. +Some of these engines (including Impala and Presto) can be co-located with HDFS data nodes and coordinate with them to achieve data locality for queries. +What does this mean? We can talk about it in terms of three general areas + +1. Queries +1. Data Ingestion +1. Query Flexibility + +### Queries + +Druid segments stores data in a custom column format. Segments are scanned directly as part of queries and each Druid server +calculates a set of results that are eventually merged at the Broker level. This means the data that is transferred between servers +are queries and results, and all computation is done internally as part of the Druid servers. + +Most SQL-on-Hadoop engines are responsible for query planning and execution for underlying storage layers and storage formats. +They are processes that stay on even if there is no query running (eliminating the JVM startup costs from Hadoop MapReduce). +Some (Impala/Presto) SQL-on-Hadoop engines have daemon processes that can be run where the data is stored, virtually eliminating network transfer costs. There is still +some latency overhead (e.g. serialization/deserialization time) associated with pulling data from the underlying storage layer into the computation layer. We are unaware of exactly +how much of a performance impact this makes. + +### Data Ingestion + +Druid is built to allow for real-time ingestion of data. You can ingest data and query it immediately upon ingestion, +the latency between how quickly the event is reflected in the data is dominated by how long it takes to deliver the event to Druid. + +SQL-on-Hadoop, being based on data in HDFS or some other backing store, are limited in their data ingestion rates by the +rate at which that backing store can make data available. Generally, the backing store is the biggest bottleneck for +how quickly data can become available. + +### Query Flexibility + +Druid's query language is fairly low level and maps to how Druid operates internally. Although Druid can be combined with a high level query +planner to support most SQL queries and analytic SQL queries (minus joins among large tables), +base Druid is less flexible than SQL-on-Hadoop solutions for generic processing. + +SQL-on-Hadoop support SQL style queries with full joins. + +## Druid vs Parquet + +Parquet is a column storage format that is designed to work with SQL-on-Hadoop engines. Parquet doesn't have a query execution engine, and instead +relies on external sources to pull data out of it. + +Druid's storage format is highly optimized for linear scans. Although Druid has support for nested data, Parquet's storage format is much +more hierarchical, and is more designed for binary chunking. In theory, this should lead to faster scans in Druid. diff --git a/docs/35.0.0/configuration/extensions.md b/docs/35.0.0/configuration/extensions.md new file mode 100644 index 0000000000..ae8d5987d2 --- /dev/null +++ b/docs/35.0.0/configuration/extensions.md @@ -0,0 +1,178 @@ +--- +id: extensions +title: "Extensions" +--- + + + +Druid implements an extension system that allows for adding functionality at runtime. Extensions +are commonly used to add support for deep storages (like HDFS and S3), metadata stores (like MySQL +and PostgreSQL), new aggregators, new input formats, and so on. + +Production clusters will generally use at least two extensions; one for deep storage and one for a +metadata store. Many clusters will also use additional extensions. + +## Core extensions + +Core extensions are maintained by Druid committers. + +|Name|Description|Docs| +|----|-----------|----| +|druid-avro-extensions|Support for data in Apache Avro data format.|[link](../development/extensions-core/avro.md)| +|druid-azure-extensions|Microsoft Azure deep storage.|[link](../development/extensions-core/azure.md)| +|druid-basic-security|Support for Basic HTTP authentication and role-based access control.|[link](../development/extensions-core/druid-basic-security.md)| +|druid-bloom-filter|Support for providing Bloom filters in druid queries.|[link](../development/extensions-core/bloom-filter.md)| +|druid-catalog|This extension allows users to configure, update, retrieve, and manage metadata stored in Druid's catalog. |[link](../development/extensions-core/catalog.md)| +|druid-datasketches|Support for approximate counts and set operations with [Apache DataSketches](https://datasketches.apache.org/).|[link](../development/extensions-core/datasketches-extension.md)| +|druid-google-extensions|Google Cloud Storage deep storage.|[link](../development/extensions-core/google.md)| +|druid-hdfs-storage|HDFS deep storage.|[link](../development/extensions-core/hdfs.md)| +|druid-histogram|Approximate histograms and quantiles aggregator. Deprecated, please use the [DataSketches quantiles aggregator](../development/extensions-core/datasketches-quantiles.md) from the `druid-datasketches` extension instead.|[link](../development/extensions-core/approximate-histograms.md)| +|druid-kafka-extraction-namespace|Apache Kafka-based namespaced lookup. Requires namespace lookup extension.|[link](../querying/kafka-extraction-namespace.md)| +|druid-kafka-indexing-service|Supervised exactly-once Apache Kafka ingestion for the indexing service.|[link](../ingestion/kafka-ingestion.md)| +|druid-kinesis-indexing-service|Supervised exactly-once Kinesis ingestion for the indexing service.|[link](../ingestion/kinesis-ingestion.md)| +|druid-kerberos|Kerberos authentication for druid processes.|[link](../development/extensions-core/druid-kerberos.md)| +|druid-lookups-cached-global|A module for [lookups](../querying/lookups.md) providing a jvm-global eager caching for lookups. It provides JDBC and URI implementations for fetching lookup data.|[link](../querying/lookups-cached-global.md)| +|druid-lookups-cached-single| Per lookup caching module to support the use cases where a lookup need to be isolated from the global pool of lookups |[link](../development/extensions-core/druid-lookups.md)| +|druid-multi-stage-query| Support for the multi-stage query architecture for Apache Druid and the multi-stage query task engine.|[link](../multi-stage-query/index.md)| +|druid-orc-extensions|Support for data in Apache ORC data format.|[link](../development/extensions-core/orc.md)| +|druid-parquet-extensions|Support for data in Apache Parquet data format. Requires druid-avro-extensions to be loaded.|[link](../development/extensions-core/parquet.md)| +|druid-protobuf-extensions| Support for data in Protobuf data format.|[link](../development/extensions-core/protobuf.md)| +|druid-s3-extensions|Interfacing with data in Amazon S3, and using S3 as deep storage.|[link](../development/extensions-core/s3.md)| +|druid-ec2-extensions|Interfacing with AWS EC2 for autoscaling middle managers|UNDOCUMENTED| +|druid-aws-rds-extensions|Support for AWS token based access to AWS RDS DB Cluster.|[link](../development/extensions-core/druid-aws-rds.md)| +|druid-stats|Statistics related module including variance and standard deviation.|[link](../development/extensions-core/stats.md)| +|mysql-metadata-storage|MySQL metadata store.|[link](../development/extensions-core/mysql.md)| +|postgresql-metadata-storage|PostgreSQL metadata store.|[link](../development/extensions-core/postgresql.md)| +|simple-client-sslcontext|Simple SSLContext provider module to be used by Druid's internal HttpClient when talking to other Druid processes over HTTPS.|[link](../development/extensions-core/simple-client-sslcontext.md)| +|druid-pac4j|OpenID Connect authentication for druid processes.|[link](../development/extensions-core/druid-pac4j.md)| +|druid-kubernetes-extensions|Druid cluster deployment on Kubernetes without Zookeeper.|[link](../development/extensions-core/kubernetes.md)| +|druid-kubernetes-overlord-extensions|Support for launching tasks in k8s without Middle Managers|[link](../development/extensions-core/k8s-jobs.md)| + +## Community extensions + +:::info + Community extensions are not maintained by Druid committers, although we accept patches from community members using these extensions. They may not have been as extensively tested as the core extensions. +::: + +A number of community members have contributed their own extensions to Druid that are not packaged with the default Druid tarball. +If you'd like to take on maintenance for a community extension, please post on [dev@druid.apache.org](https://lists.apache.org/list.html?dev@druid.apache.org) to let us know! + +All of these community extensions can be downloaded using [pull-deps](../operations/pull-deps.md) while specifying a `-c` coordinate option to pull `org.apache.druid.extensions.contrib:{EXTENSION_NAME}:{DRUID_VERSION}`. + +|Name|Description|Docs| +|----|-----------|----| +|aliyun-oss-extensions|Aliyun OSS deep storage |[link](../development/extensions-contrib/aliyun-oss-extensions.md)| +|ambari-metrics-emitter|Ambari Metrics Emitter |[link](../development/extensions-contrib/ambari-metrics-emitter.md)| +|druid-cassandra-storage|Apache Cassandra deep storage.|[link](../development/extensions-contrib/cassandra.md)| +|druid-cloudfiles-extensions|Rackspace Cloudfiles deep storage.|[link](../development/extensions-contrib/cloudfiles.md)| +|druid-compressed-bigdecimal|Compressed Big Decimal Type | [link](../development/extensions-contrib/compressed-big-decimal.md)| +|druid-ddsketch|Support for DDSketch approximate quantiles based on [DDSketch](https://github.com/datadog/sketches-java) | [link](../development/extensions-contrib/ddsketch-quantiles.md)| +|druid-deltalake-extensions|Support for ingesting Delta Lake tables.|[link](../development/extensions-contrib/delta-lake.md)| +|druid-distinctcount|DistinctCount aggregator|[link](../development/extensions-contrib/distinctcount.md)| +|druid-exact-count-bitmap|Support for exact cardinality counting using Roaring Bitmap over a Long column.|[link](../development/extensions-contrib/druid-exact-count-bitmap.md)| +|druid-iceberg-extensions|Support for ingesting Iceberg tables.|[link](../development/extensions-contrib/iceberg.md)| +|druid-redis-cache|A cache implementation for Druid based on Redis.|[link](../development/extensions-contrib/redis-cache.md)| +|druid-time-min-max|Min/Max aggregator for timestamp.|[link](../development/extensions-contrib/time-min-max.md)| +|sqlserver-metadata-storage|Microsoft SQLServer metadata store.|[link](../development/extensions-contrib/sqlserver.md)| +|graphite-emitter|Graphite metrics emitter|[link](../development/extensions-contrib/graphite.md)| +|statsd-emitter|StatsD metrics emitter|[link](../development/extensions-contrib/statsd.md)| +|kafka-emitter|Kafka metrics emitter|[link](../development/extensions-contrib/kafka-emitter.md)| +|druid-thrift-extensions|Support thrift ingestion |[link](../development/extensions-contrib/thrift.md)| +|druid-opentsdb-emitter|OpenTSDB metrics emitter |[link](../development/extensions-contrib/opentsdb-emitter.md)| +|materialized-view-selection, materialized-view-maintenance|Materialized View|[link](../development/extensions-contrib/materialized-view.md)| +|druid-moving-average-query|Support for [Moving Average](https://en.wikipedia.org/wiki/Moving_average) and other Aggregate [Window Functions](https://en.wikibooks.org/wiki/Structured_Query_Language/Window_functions) in Druid queries.|[link](../development/extensions-contrib/moving-average-query.md)| +|druid-influxdb-emitter|InfluxDB metrics emitter|[link](../development/extensions-contrib/influxdb-emitter.md)| +|druid-momentsketch|Support for approximate quantile queries using the [momentsketch](https://github.com/stanford-futuredata/momentsketch) library|[link](../development/extensions-contrib/momentsketch-quantiles.md)| +|druid-tdigestsketch|Support for approximate sketch aggregators based on [T-Digest](https://github.com/tdunning/t-digest)|[link](../development/extensions-contrib/tdigestsketch-quantiles.md)| +|gce-extensions|GCE Extensions|[link](../development/extensions-contrib/gce-extensions.md)| +|prometheus-emitter|Exposes [Druid metrics](../operations/metrics.md) for [Prometheus](https://prometheus.io/)|[link](../development/extensions-contrib/prometheus.md)| +|druid-spectator-histogram|Support for efficient approximate percentile queries|[link](../development/extensions-contrib/spectator-histogram.md)| +|druid-rabbit-indexing-service|Support for creating and managing [RabbitMQ](https://www.rabbitmq.com/) indexing tasks|[link](../development/extensions-contrib/rabbit-stream-ingestion.md)| +|druid-ranger-security|Support for access control through Apache Ranger.|[link](../development/extensions-contrib/druid-ranger-security.md)| + +## Promoting community extensions to core extensions + +Please post on [dev@druid.apache.org](https://lists.apache.org/list.html?dev@druid.apache.org) if you'd like an extension to be promoted to core. +If we see a community extension actively supported by the community, we can promote it to core based on community feedback. + +For information how to create your own extension, please see [here](../development/modules.md). + +## Loading extensions + +### Loading core extensions + +Apache Druid bundles all [core extensions](../configuration/extensions.md#core-extensions) out of the box. +See the [list of extensions](../configuration/extensions.md#core-extensions) for your options. You +can load bundled extensions by adding their names to your common.runtime.properties +`druid.extensions.loadList` property. For example, to load the postgresql-metadata-storage and +druid-hdfs-storage extensions, use the configuration: + +```properties +druid.extensions.loadList=["postgresql-metadata-storage", "druid-hdfs-storage"] +``` + +These extensions are located in the `extensions` directory of the distribution. + +:::info + Druid bundles two sets of configurations: one for the [quickstart](../tutorials/index.md) and + one for a [clustered configuration](../tutorials/cluster.md). Make sure you are updating the correct + `common.runtime.properties` for your setup. +::: + +:::info + Because of licensing, the mysql-metadata-storage extension does not include the required MySQL JDBC driver. For instructions + on how to install this library, see the [MySQL extension page](../development/extensions-core/mysql.md). +::: + +### Loading community extensions + +You can also load community and third-party extensions not already bundled with Druid. To do this, first download the extension and +then install it into your `extensions` directory. You can download extensions from their distributors directly, or +if they are available from Maven, the included [pull-deps](../operations/pull-deps.md) can download them for you. To use *pull-deps*, +specify the full Maven coordinate of the extension in the form `groupId:artifactId:version`. For example, +for the (hypothetical) extension *com.example:druid-example-extension:1.0.0*, run: + +```shell +java \ + -cp "lib/*" \ + -Ddruid.extensions.directory="extensions" \ + -Ddruid.extensions.hadoopDependenciesDir="hadoop-dependencies" \ + org.apache.druid.cli.Main tools pull-deps \ + --no-default-hadoop \ + -c "com.example:druid-example-extension:1.0.0" +``` + +You only have to install the extension once. Then, add `"druid-example-extension"` to +`druid.extensions.loadList` in common.runtime.properties to instruct Druid to load the extension. + +:::info + Please make sure all the Extensions related configuration properties listed [here](../configuration/index.md#extensions) are set correctly. +::: + +:::info + The Maven `groupId` for almost every [community extension](../configuration/extensions.md#community-extensions) is `org.apache.druid.extensions.contrib`. The `artifactId` is the name + of the extension, and the version is the latest Druid stable version. +::: + +### Loading extensions from the classpath + +If you add your extension jar to the classpath at runtime, Druid will also load it into the system. This mechanism is relatively easy to reason about, +but it also means that you have to ensure that all dependency jars on the classpath are compatible. That is, Druid makes no provisions while using +this method to maintain class loader isolation so you must make sure that the jars on your classpath are mutually compatible. diff --git a/docs/35.0.0/configuration/human-readable-byte.md b/docs/35.0.0/configuration/human-readable-byte.md new file mode 100644 index 0000000000..0f412b69ab --- /dev/null +++ b/docs/35.0.0/configuration/human-readable-byte.md @@ -0,0 +1,98 @@ +--- +id: human-readable-byte +title: "Human-readable Byte Configuration Reference" +--- + + + + +This page documents configuration properties related to bytes. + +These properties can be configured through 2 ways: +1. a simple number in bytes +2. a number with a unit suffix + +## A number in bytes + +Given that cache size is 3G, there's a configuration as below + +```properties +# 3G bytes = 3_000_000_000 bytes +druid.cache.sizeInBytes=3000000000 +``` + + +## A number with a unit suffix + +When you have to put a large number for some configuration as above, it is easy to make a mistake such as extra or missing 0s. Druid supports a better way, a number with a unit suffix. + +Given a disk of 1T, the configuration can be + +```properties +druid.segmentCache.locations=[{"path":"/segment-cache-00","maxSize":"1t"},{"path":"/segment-cache-01","maxSize":"1200g"}] +``` + +Note: in above example, both `1t` and `1T` are acceptable since it's case-insensitive. +Also, only integers are valid as the number part. For example, you can't replace `1200g` with `1.2t`. + +### Supported Units +In the world of computer, a unit like `K` is ambiguous. It means 1000 or 1024 in different contexts, for more information please see [Here](https://en.wikipedia.org/wiki/Binary_prefix). + +To make it clear, the base of units are defined in Druid as below + +| Unit | Description | Base | +|---|---|---| +| K | Kilo Decimal Byte | 1_000 | +| M | Mega Decimal Byte | 1_000_000 | +| G | Giga Decimal Byte | 1_000_000_000 | +| T | Tera Decimal Byte | 1_000_000_000_000 | +| P | Peta Decimal Byte | 1_000_000_000_000_000 | +| Ki | Kilo Binary Byte | 1024 | +| Mi | Mega Binary Byte | 1024 * 1024 | +| Gi | Giga Binary Byte | 1024 * 1024 * 1024 | +| Ti | Tera Binary Byte | 1024 * 1024 * 1024 * 1024 | +| Pi | Peta Binary Byte | 1024 * 1024 * 1024 * 1024 * 1024 | +| KiB | Kilo Binary Byte | 1024 | +| MiB | Mega Binary Byte | 1024 * 1024 | +| GiB | Giga Binary Byte | 1024 * 1024 * 1024 | +| TiB | Tera Binary Byte | 1024 * 1024 * 1024 * 1024 | +| PiB | Peta Binary Byte | 1024 * 1024 * 1024 * 1024 * 1024 | + +Unit is case-insensitive. `k`, `kib`, `ki`, `KiB`, `Ki`, `kiB` are all acceptable. + +Here are some examples + +```properties +# 1G bytes = 1_000_000_000 bytes +druid.cache.sizeInBytes=1g +``` + +```properties +# 256MiB bytes = 256 * 1024 * 1024 bytes +druid.cache.sizeInBytes=256MiB +``` + +```properties +# 256Mi = 256MiB = 256 * 1024 * 1024 bytes +druid.cache.sizeInBytes=256Mi +``` + + + diff --git a/docs/35.0.0/configuration/index.md b/docs/35.0.0/configuration/index.md new file mode 100644 index 0000000000..8aa5e81846 --- /dev/null +++ b/docs/35.0.0/configuration/index.md @@ -0,0 +1,2320 @@ +--- +id: index +title: "Configuration reference" +--- + + + +This page documents all of the configuration properties for each Druid service type. + +## Recommended configuration file organization + +A recommended way of organizing Druid configuration files can be seen in the `conf` directory in the Druid package root, shown below: + +```sh +$ ls -R conf +druid + +conf/druid: +_common broker coordinator historical middleManager overlord + +conf/druid/_common: +common.runtime.properties log4j2.xml + +conf/druid/broker: +jvm.config runtime.properties + +conf/druid/coordinator: +jvm.config runtime.properties + +conf/druid/historical: +jvm.config runtime.properties + +conf/druid/middleManager: +jvm.config runtime.properties + +conf/druid/overlord: +jvm.config runtime.properties +``` + +Each directory has a `runtime.properties` file containing configuration properties for the specific Druid service corresponding to the directory, such as `historical`. + +The `jvm.config` files contain JVM flags such as heap sizing properties for each service. + +Common properties shared by all services are placed in `_common/common.runtime.properties`. + +## Configuration interpolation + +Configuration values can be interpolated from System Properties, Environment Variables, or local files. Below is an example of how this can be used: + +```properties +druid.metadata.storage.type=${env:METADATA_STORAGE_TYPE} +druid.processing.tmpDir=${sys:java.io.tmpdir} +druid.segmentCache.locations=${file:UTF-8:/config/segment-cache-def.json} +``` + +Interpolation is also recursive so you can do: + +```properties +druid.segmentCache.locations=${file:UTF-8:${env:SEGMENT_DEF_LOCATION}} +``` + +If the property is not set, an exception will be thrown on startup, but a default can be provided if desired. Setting a default value will not work with file interpolation as an exception will be thrown if the file does not exist. + +```properties +druid.metadata.storage.type=${env:METADATA_STORAGE_TYPE:-mysql} +druid.processing.tmpDir=${sys:java.io.tmpdir:-/tmp} +``` + +If you need to set a variable that is wrapped by `${...}` but do not want it to be interpolated, you can escape it by adding another `$`. For example: + +```properties +config.name=$${value} +``` + +## Common configurations + +The properties under this section are common configurations that should be shared across all Druid services in a cluster. + +### JVM configuration best practices + +There are four JVM parameters that we set on all of our services: + +* `-Duser.timezone=UTC`: This sets the default timezone of the JVM to UTC. We always set this and do not test with other default timezones, so local timezones might work, but they also might uncover weird and interesting bugs. To issue queries in a non-UTC timezone, see [query granularities](../querying/granularities.md#period-granularities) +* `-Dfile.encoding=UTF-8` This is similar to timezone, we test assuming UTF-8. Local encodings might work, but they also might result in weird and interesting bugs. +* `-Djava.io.tmpdir=` Various parts of Druid use temporary files to interact with the file system. These files can become quite large. This means that systems that have small `/tmp` directories can cause problems for Druid. Therefore, set the JVM tmp directory to a location with ample space. + + Also consider the following when configuring the JVM tmp directory: + * The temp directory should not be volatile tmpfs. + * This directory should also have good read and write speed. + * Avoid NFS mount. + * The `org.apache.druid.java.util.metrics.SysMonitor` requires execute privileges on files in `java.io.tmpdir`. If you are using the system monitor, do not set `java.io.tmpdir` to `noexec`. +* `-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager` This allows log4j2 to handle logs for non-log4j2 components (like jetty) which use standard java logging. + +### Extensions + +Many of Druid's external dependencies can be plugged in as modules. Extensions can be provided using the following configs: + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.extensions.directory`|The root extension directory where user can put extensions related files. Druid will load extensions stored under this directory.|`extensions` (This is a relative path to Druid's working directory)| +|`druid.extensions.hadoopDependenciesDir`|The root Hadoop dependencies directory where user can put Hadoop related dependencies files. Druid will load the dependencies based on the Hadoop coordinate specified in the Hadoop index task.|`hadoop-dependencies` (This is a relative path to Druid's working directory| +|`druid.extensions.loadList`|A JSON array of extensions to load from extension directories by Druid. If it is not specified, its value will be `null` and Druid will load all the extensions under `druid.extensions.directory`. If its value is empty list `[]`, then no extensions will be loaded at all. It is also allowed to specify absolute path of other custom extensions not stored in the common extensions directory.|null| +|`druid.extensions.searchCurrentClassloader`|This is a boolean flag that determines if Druid will search the main classloader for extensions. It defaults to true but can be turned off if you have reason to not automatically add all modules on the classpath.|true| +|`druid.extensions.useExtensionClassloaderFirst`|This is a boolean flag that determines if Druid extensions should prefer loading classes from their own jars rather than jars bundled with Druid. If false, extensions must be compatible with classes provided by any jars bundled with Druid. If true, extensions may depend on conflicting versions.|false| +|`druid.extensions.hadoopContainerDruidClasspath`|Hadoop Indexing launches Hadoop jobs and this configuration provides way to explicitly set the user classpath for the Hadoop job. By default, this is computed automatically by Druid based on the Druid service classpath and set of extensions. However, sometimes you might want to be explicit to resolve dependency conflicts between Druid and Hadoop.|null| +|`druid.extensions.addExtensionsToHadoopContainer`|Only applicable if `druid.extensions.hadoopContainerDruidClasspath` is provided. If set to true, then extensions specified in the loadList are added to Hadoop container classpath. Note that when `druid.extensions.hadoopContainerDruidClasspath` is not provided then extensions are always added to Hadoop container classpath.|false| + +### Modules + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.modules.excludeList`|A JSON array of canonical class names (e.g., `"org.apache.druid.somepackage.SomeModule"`) of module classes which shouldn't be loaded, even if they are found in extensions specified by `druid.extensions.loadList`, or in the list of core modules specified to be loaded on a particular Druid service type. Useful when some useful extension contains some module, which shouldn't be loaded on some Druid service type because some dependencies of that module couldn't be satisfied.|[]| + +### ZooKeeper + +We recommend just setting the base ZK path and the ZK service host, but all ZK paths that Druid uses can be overwritten to absolute paths. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.zk.paths.base`|Base ZooKeeper path.|`/druid`| +|`druid.zk.service.host`|The ZooKeeper hosts to connect to. This is a REQUIRED property and therefore a host address must be supplied.|none| +|`druid.zk.service.user`|The username to authenticate with ZooKeeper. This is an optional property.|none| +|`druid.zk.service.pwd`|The [Password Provider](../operations/password-provider.md) or the string password to authenticate with ZooKeeper. This is an optional property.|none| +|`druid.zk.service.authScheme`|digest is the only authentication scheme supported. |digest| + +#### ZooKeeper behavior + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.zk.service.sessionTimeoutMs`|ZooKeeper session timeout, in milliseconds.|`30000`| +|`druid.zk.service.connectionTimeoutMs`|ZooKeeper connection timeout, in milliseconds.|`15000`| +|`druid.zk.service.compress`|Boolean flag for whether or not created Znodes should be compressed.|`true`| +|`druid.zk.service.acl`|Boolean flag for whether or not to enable ACL security for ZooKeeper. If ACL is enabled, zNode creators will have all permissions.|`false`| +|`druid.zk.service.pathChildrenCacheStrategy`|Dictates the underlying caching strategy for service announcements. Set true to let announcers to use Apache Curator's PathChildrenCache strategy, otherwise NodeCache strategy. Consider using NodeCache strategy when you are dealing with huge number of ZooKeeper watches in your cluster.|`true`| + +#### Path configuration + +Druid interacts with ZooKeeper through a set of standard path configurations. We recommend just setting the base ZooKeeper path, but all ZooKeeper paths that Druid uses can be overwritten to absolute paths. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.zk.paths.base`|Base ZooKeeper path.|`/druid`| +|`druid.zk.paths.propertiesPath`|ZooKeeper properties path.|`${druid.zk.paths.base}/properties`| +|`druid.zk.paths.announcementsPath`|Druid service announcement path.|`${druid.zk.paths.base}/announcements`| +|`druid.zk.paths.liveSegmentsPath`|Current path for where Druid services announce their segments.|`${druid.zk.paths.base}/segments`| +|`druid.zk.paths.coordinatorPath`|Used by the Coordinator for leader election.|`${druid.zk.paths.base}/coordinator`| + +The indexing service also uses its own set of paths. These configs can be included in the common configuration. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.zk.paths.indexer.base`|Base ZooKeeper path for |`${druid.zk.paths.base}/indexer`| +|`druid.zk.paths.indexer.announcementsPath`|Middle Managers announce themselves here.|`${druid.zk.paths.indexer.base}/announcements`| +|`druid.zk.paths.indexer.tasksPath`|Used to assign tasks to Middle Managers.|`${druid.zk.paths.indexer.base}/tasks`| +|`druid.zk.paths.indexer.statusPath`|Parent path for announcement of task statuses.|`${druid.zk.paths.indexer.base}/status`| + +If `druid.zk.paths.base` and `druid.zk.paths.indexer.base` are both set, and none of the other `druid.zk.paths.*` or `druid.zk.paths.indexer.*` values are set, then the other properties will be evaluated relative to their respective `base`. +For example, if `druid.zk.paths.base` is set to `/druid1` and `druid.zk.paths.indexer.base` is set to `/druid2` then `druid.zk.paths.announcementsPath` will default to `/druid1/announcements` while `druid.zk.paths.indexer.announcementsPath` will default to `/druid2/announcements`. + +The following path is used for service discovery. It is **not** affected by `druid.zk.paths.base` and **must** be specified separately. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.discovery.curator.path`|Services announce themselves under this ZooKeeper path.|`/druid/discovery`| + +### TLS + +#### General configuration + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.enablePlaintextPort`|Enable/Disable HTTP connector.|`true`| +|`druid.enableTlsPort`|Enable/Disable HTTPS connector.|`false`| + +Although not recommended but both HTTP and HTTPS connectors can be enabled at a time and respective ports are configurable using `druid.plaintextPort` +and `druid.tlsPort` properties on each service. Please see `Configuration` section of individual services to check the valid and default values for these ports. + +#### Jetty server TLS configuration + +Druid uses Jetty as an embedded web server. To learn more about TLS/SSL, certificates, and related concepts in Jetty, including explanations of the configuration settings below, see "Configuring SSL/TLS KeyStores" in the [Jetty Operations Guide](https://www.eclipse.org/jetty/documentation.php). + +For information about TLS/SSL support in Java in general, see the [Java Secure Socket Extension (JSSE) Reference Guide](https://docs.oracle.com/en/java/javase/17/security/java-secure-socket-extension-jsse-reference-guide.html). +The [Java Cryptography Architecture +Standard Algorithm Name Documentation for JDK 17](https://docs.oracle.com/en/java/javase/17/docs/specs/security/standard-names.html) lists all possible +values for the following properties, among others provided by the Java implementation. + +|Property|Description|Default|Required| +|--------|-----------|-------|--------| +|`druid.server.https.keyStorePath`|The file path or URL of the TLS/SSL KeyStore.|none|yes| +|`druid.server.https.keyStoreType`|The type of the KeyStore.|none|yes| +|`druid.server.https.certAlias`|Alias of TLS/SSL certificate for the connector.|none|yes| +|`druid.server.https.keyStorePassword`|The [Password Provider](../operations/password-provider.md) or String password for the KeyStore.|none|yes| + +Following table contains non-mandatory advanced configuration options, use caution. + +|Property|Description|Default|Required| +|--------|-----------|-------|--------| +|`druid.server.https.keyManagerFactoryAlgorithm`|Algorithm to use for creating KeyManager, more details [here](https://docs.oracle.com/javase/7/docs/technotes/guides/security/jsse/JSSERefGuide.html#KeyManager).|`javax.net.ssl.KeyManagerFactory.getDefaultAlgorithm()`|no| +|`druid.server.https.keyManagerPassword`|The [Password Provider](../operations/password-provider.md) or String password for the Key Manager.|none|no| +|`druid.server.https.includeCipherSuites`|List of cipher suite names to include. You can either use the exact cipher suite name or a regular expression.|Jetty's default include cipher list|no| +|`druid.server.https.excludeCipherSuites`|List of cipher suite names to exclude. You can either use the exact cipher suite name or a regular expression.|Jetty's default exclude cipher list|no| +|`druid.server.https.includeProtocols`|List of exact protocols names to include.|Jetty's default include protocol list|no| +|`druid.server.https.excludeProtocols`|List of exact protocols names to exclude.|Jetty's default exclude protocol list|no| + +#### Internal client TLS configuration (requires `simple-client-sslcontext` extension) + +These properties apply to the SSLContext that will be provided to the internal HTTP client that Druid services use to communicate with each other. These properties require the `simple-client-sslcontext` extension to be loaded. Without it, Druid services will be unable to communicate with each other when TLS is enabled. + +|Property|Description|Default|Required| +|--------|-----------|-------|--------| +|`druid.client.https.protocol`|SSL protocol to use.|`TLSv1.2`|no| +|`druid.client.https.trustStoreType`|The type of the key store where trusted root certificates are stored.|`java.security.KeyStore.getDefaultType()`|no| +|`druid.client.https.trustStorePath`|The file path or URL of the TLS/SSL Key store where trusted root certificates are stored.|none|yes| +|`druid.client.https.trustStoreAlgorithm`|Algorithm to be used by TrustManager to validate certificate chains|`javax.net.ssl.TrustManagerFactory.getDefaultAlgorithm()`|no| +|`druid.client.https.trustStorePassword`|The [Password Provider](../operations/password-provider.md) or String password for the Trust Store.|none|yes| + +This [document](https://docs.oracle.com/en/java/javase/17/docs/specs/security/standard-names.html) lists all the possible +values for the above mentioned configs among others provided by Java implementation. + +### Authentication and authorization + +|Property|Type|Description|Default|Required| +|--------|-----------|--------|--------|--------| +|`druid.auth.authenticatorChain`|JSON List of Strings|List of Authenticator type names|["allowAll"]|no| +|`druid.escalator.type`|String|Type of the Escalator that should be used for internal Druid communications. This Escalator must use an authentication scheme that is supported by an Authenticator in `druid.auth.authenticatorChain`.|`noop`|no| +|`druid.auth.authorizers`|JSON List of Strings|List of Authorizer type names |["allowAll"]|no| +|`druid.auth.unsecuredPaths`| List of Strings|List of paths for which security checks will not be performed. All requests to these paths will be allowed.|[]|no| +|`druid.auth.allowUnauthenticatedHttpOptions`|Boolean|If true, skip authentication checks for HTTP OPTIONS requests. This is needed for certain use cases, such as supporting CORS pre-flight requests. Note that disabling authentication checks for OPTIONS requests will allow unauthenticated users to determine what Druid endpoints are valid (by checking if the OPTIONS request returns a 200 instead of 404), so enabling this option may reveal information about server configuration, including information about what extensions are loaded (if those extensions add endpoints).|false|no| + +For more information, please see [Authentication and Authorization](../operations/auth.md). + +For configuration options for specific auth extensions, please refer to the extension documentation. + +### Startup logging + +All services can log debugging information on startup. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.startup.logging.logProperties`|Log all properties on startup (from common.runtime.properties, runtime.properties, and the JVM command line).|false| +|`druid.startup.logging.maskProperties`|Masks sensitive properties (passwords, for example) containing theses words.|["password"]| + +Note that some sensitive information may be logged if these settings are enabled. + +### Request logging + +All services that can serve queries can also log the query requests they see. Broker services can additionally log the SQL requests (both from HTTP and JDBC) they see. +For an example of setting up request logging, see [Request logging](../operations/request-logging.md). + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.request.logging.type`|How to log every query request. Choices: `noop`, [`file`](#file-request-logging), [`emitter`](#emitter-request-logging), [`slf4j`](#slf4j-request-logging), [`filtered`](#filtered-request-logging), [`composing`](#composing-request-logging), [`switching`](#switching-request-logging)|`noop` (request logging disabled by default)| + +To enable sending all the HTTP requests to a log, set `org.apache.druid.jetty.RequestLog` to the `DEBUG` level. See [Logging](../configuration/logging.md) for more information. + +#### File request logging + +The `file` request logger stores daily request logs on disk. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.request.logging.dir`| Historical, Realtime, and Broker services maintain request logs of all of the requests they get (interaction is via POST, so normal request logs don’t generally capture information about the actual query), this specifies the directory to store the request logs in. | none| +|`druid.request.logging.filePattern`| [Joda datetime format](http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html) for each file.| "yyyy-MM-dd'.log'"| +|`druid.request.logging.durationToRetain`| Period to retain the request logs on disk. The period should be at least as long as roll period.| none| +|`druid.request.logging.rollPeriod`| Defines the log rotation period for request logs. The period should be at least `PT1H`. For periods smaller than 1 day, it is recommended to use `"yyyy-MM-dd-HH'.log'"` as the file pattern.| P1D| + +The format of request logs is TSV, one line per requests, with five fields: timestamp, remote\_addr, native\_query, query\_context, sql\_query. + +For native JSON request, the `sql_query` field is empty. For example: + +```txt +2019-01-14T10:00:00.000Z 127.0.0.1 {"queryType":"topN","dataSource":{"type":"table","name":"wikiticker"},"virtualColumns":[],"dimension":{"type":"LegacyDimensionSpec","dimension":"page","outputName":"page","outputType":"STRING"},"metric":{"type":"LegacyTopNMetricSpec","metric":"count"},"threshold":10,"intervals":{"type":"LegacySegmentSpec","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.000Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"count"}],"postAggregations":[],"context":{"queryId":"74c2d540-d700-4ebd-b4a9-3d02397976aa"},"descending":false} {"query/time":100,"query/bytes":800,"success":true,"identity":"user1"} +``` + +For SQL query request, the `native_query` field is empty. For example: + +```txt +2019-01-14T10:00:00.000Z 127.0.0.1 {"sqlQuery/time":100, "sqlQuery/planningTimeMs":10, "sqlQuery/bytes":600, "success":true, "identity":"user1"} {"query":"SELECT page, COUNT(*) AS Edits FROM wikiticker WHERE TIME_IN_INTERVAL(\"__time\", '2015-09-12/2015-09-13') GROUP BY page ORDER BY Edits DESC LIMIT 10","context":{"sqlQueryId":"c9d035a0-5ffd-4a79-a865-3ffdadbb5fdd","nativeQueryIds":"[490978e4-f5c7-4cf6-b174-346e63cf8863]"}} +``` + +#### Emitter request logging + +The `emitter` request logger emits every request to the external location specified in the [emitter](#metrics-monitors) configuration. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.request.logging.feed`|Feed name for requests.|none| + +#### SLF4J request logging + +The `slf4j` request logger logs every request using SLF4J. It serializes native queries into JSON in the log message regardless of the SLF4J format specification. Requests are logged under the class `org.apache.druid.server.log.LoggingRequestLogger`. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.request.logging.setMDC`|If you want to set MDC entries within the log entry, set this value to `true`. Your logging system must be configured to support MDC in order to format this data.|false| +|`druid.request.logging.setContextMDC`|Set to "true" to add the Druid query `context` to the MDC entries. Only applies when `setMDC` is `true`.|false| + +For a native query, the following MDC fields are populated when `setMDC` is `true`: + +|MDC field|Description| +|---------|-----------| +|`queryId` |The query ID| +|`sqlQueryId`|The SQL query ID if this query is part of a SQL request| +|`dataSource`|The datasource the query was against| +|`queryType` |The type of the query| +|`hasFilters`|If the query has any filters| +|`remoteAddr`|The remote address of the requesting client| +|`duration` |The duration of the query interval| +|`resultOrdering`|The ordering of results| +|`descending`|If the query is a descending query| + +#### Filtered request logging + +The `filtered` request logger filters requests based on the query type or how long a query takes to complete. +For native queries, the logger only logs requests when the `query/time` metric exceeds the threshold provided in `queryTimeThresholdMs`. +For SQL queries, it only logs requests when the `sqlQuery/time` metric exceeds threshold provided in `sqlQueryTimeThresholdMs`. +See [Metrics](../operations/metrics.md) for more details on query metrics. + +Requests that meet the threshold are logged using the request logger type set in `druid.request.logging.delegate.type`. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.request.logging.queryTimeThresholdMs`|Threshold value for the `query/time` metric in milliseconds.|0, i.e., no filtering| +|`druid.request.logging.sqlQueryTimeThresholdMs`|Threshold value for the `sqlQuery/time` metric in milliseconds.|0, i.e., no filtering| +|`druid.request.logging.mutedQueryTypes` | Query requests of these types are not logged. Query types are defined as string objects corresponding to the "queryType" value for the specified query in the Druid's [native JSON query API](../querying/querying.md). Misspelled query types will be ignored. Example to ignore scan and timeBoundary queries: `["scan", "timeBoundary"]`| []| +|`druid.request.logging.delegate.type`|Type of delegate request logger to log requests.|none| + +#### Composing request logging + +The `composing` request logger emits request logs to multiple request loggers. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.request.logging.loggerProviders`|List of request loggers for emitting request logs.|none| + +#### Switching request logging + +The `switching` request logger routes native query request logs to one request logger and SQL query request logs to another request logger. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.request.logging.nativeQueryLogger`|Request logger for emitting native query request logs.|none| +|`druid.request.logging.sqlQueryLogger`|Request logger for emitting SQL query request logs.|none| + +### Audit logging + +Coordinator and Overlord log changes to lookups, segment load/drop rules, and dynamic configuration changes for auditing. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.audit.manager.type`|Type of audit manager used for handling audited events. Audited events are logged when set to `log` or persisted in metadata store when set to `sql`.|sql| +|`druid.audit.manager.logLevel`|Log level of audit events with possible values DEBUG, INFO, WARN. This property is used only when `druid.audit.manager.type` is set to `log`.|INFO| +|`druid.audit.manager.auditHistoryMillis`|Default duration for querying audit history.|1 week| +|`druid.audit.manager.includePayloadAsDimensionInMetric`|Boolean flag on whether to add `payload` column in service metric.|false| +|`druid.audit.manager.maxPayloadSizeBytes`|The maximum size of audit payload to store in Druid's metadata store audit table. If the size of audit payload exceeds this value, the audit log would be stored with a message indicating that the payload was omitted instead. Setting `maxPayloadSizeBytes` to -1 (default value) disables this check, meaning Druid will always store audit payload regardless of it's size. Setting to any negative number other than `-1` is invalid. Human-readable format is supported, see [here](human-readable-byte.md). |-1| +|`druid.audit.manager.skipNullField`|If true, the audit payload stored in metadata store will exclude any field with null value. |false| + +### Metadata storage + +These properties specify the JDBC connection and other configuration around the metadata storage. The only services that connect to the metadata storage with these properties are the [Coordinator](../design/coordinator.md) and [Overlord](../design/overlord.md). + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.metadata.storage.type`|The type of metadata storage to use. One of `mysql`, `postgresql`, or `derby`.|`derby`| +|`druid.metadata.storage.connector.connectURI`|The JDBC URI for the database to connect to|none| +|`druid.metadata.storage.connector.user`|The username to connect with.|none| +|`druid.metadata.storage.connector.password`|The [Password Provider](../operations/password-provider.md) or String password used to connect with.|none| +|`druid.metadata.storage.connector.createTables`|If Druid requires a table and it doesn't exist, create it?|true| +|`druid.metadata.storage.tables.base`|The base name for tables.|`druid`| +|`druid.metadata.storage.tables.dataSource`|The table to use to look for datasources created by [Kafka Indexing Service](../ingestion/kafka-ingestion.md).|`druid_dataSource`| +|`druid.metadata.storage.tables.pendingSegments`|The table to use to look for pending segments.|`druid_pendingSegments`| +|`druid.metadata.storage.tables.segments`|The table to use to look for segments.|`druid_segments`| +|`druid.metadata.storage.tables.rules`|The table to use to look for segment load/drop rules.|`druid_rules`| +|`druid.metadata.storage.tables.config`|The table to use to look for configs.|`druid_config`| +|`druid.metadata.storage.tables.tasks`|Used by the indexing service to store tasks.|`druid_tasks`| +|`druid.metadata.storage.tables.taskLog`|Used by the indexing service to store task logs.|`druid_tasklogs`| +|`druid.metadata.storage.tables.taskLock`|Used by the indexing service to store task locks.|`druid_tasklocks`| +|`druid.metadata.storage.tables.supervisors`|Used by the indexing service to store supervisor configurations.|`druid_supervisors`| +|`druid.metadata.storage.tables.audit`|The table to use for audit history of configuration changes, such as Coordinator rules.|`druid_audit`| +|`druid.metadata.storage.tables.useShortIndexNames`|Whether to use SHA-based unique index names to ensure all indices are created.|`false`| + +### Deep storage + +The configurations concern how to push and pull [Segments](../design/segments.md) from deep storage. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.storage.type`|The type of deep storage to use. One of `local`, `noop`, `s3`, `hdfs`, `c*`.|local| + +#### Local deep storage + +Local deep storage uses the local filesystem. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.storage.storageDirectory`|Directory on disk to use as deep storage.|`/tmp/druid/localStorage`| + +#### Noop deep storage + +This deep storage doesn't do anything. There are no configs. + +#### S3 deep storage + +This deep storage is used to interface with Amazon's S3. Note that the `druid-s3-extensions` extension must be loaded. +The below table shows some important configurations for S3. See [S3 Deep Storage](../development/extensions-core/s3.md) for full configurations. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.storage.bucket`|S3 bucket name.|none| +|`druid.storage.baseKey`|S3 object key prefix for storage.|none| +|`druid.storage.disableAcl`|Boolean flag for ACL. If this is set to `false`, the full control would be granted to the bucket owner. This may require to set additional permissions. See [S3 permissions settings](../development/extensions-core/s3.md#s3-permissions-settings).|false| +|`druid.storage.archiveBucket`|S3 bucket name for archiving when running the _archive task_.|none| +|`druid.storage.archiveBaseKey`|S3 object key prefix for archiving.|none| +|`druid.storage.sse.type`|Server-side encryption type. Should be one of `s3`, `kms`, and `custom`. See the below [Server-side encryption section](../development/extensions-core/s3.md#server-side-encryption) for more details.|None| +|`druid.storage.sse.kms.keyId`|AWS KMS key ID. This is used only when `druid.storage.sse.type` is `kms` and can be empty to use the default key ID.|None| +|`druid.storage.sse.custom.base64EncodedKey`|Base64-encoded key. Should be specified if `druid.storage.sse.type` is `custom`.|None| +|`druid.storage.useS3aSchema`|If true, use the "s3a" filesystem when using Hadoop-based ingestion. If false, the "s3n" filesystem will be used. Only affects Hadoop-based ingestion.|false| + +#### HDFS deep storage + +This deep storage is used to interface with HDFS. You must load the `druid-hdfs-storage` extension. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.storage.storageDirectory`|HDFS directory to use as deep storage.|none| + +#### Cassandra deep storage + +This deep storage is used to interface with Cassandra. You must load the `druid-cassandra-storage` extension. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.storage.host`|Cassandra host.|none| +|`druid.storage.keyspace`|Cassandra key space.|none| + +#### Centralized datasource schema (Experimental) + +This is an [experimental feature](../development/experimental.md) to improve datasource schema management by persisting segment schemas to the metadata store and caching them on the Coordinator. +Traditionally, Brokers issue segment metadata queries to data nodes and tasks to fetch the schemas of all available segments. +Each Broker then individually builds the schema of a datasource by combining the schemas of all the segments of that datasource. +This mechanism is redundant and prone to errors as there is no single source of truth for schemas. + +Centralized schema management improves upon this design as follows: +- Tasks publish segment schema along with segment metadata to the database. +- Tasks announce schema for realtime segments periodically to the Coordinator. +- Coordinator caches segment schemas and builds a combined schema for each datasource. +- Broker poll the datasource schema cached on the Coordinator rather than building it on their own. +- Brokers still retain the ability to build a datasource schema if they are unable to fetch it from the Coordinator. + +|Property|Description|Default|Required| +|--------|-----------|-------|--------| +|`druid.centralizedDatasourceSchema.enabled`|Boolean flag for enabling datasource schema building and caching on the Coordinator. This property should be specified in the common runtime properties.|false|No.| +|`druid.indexer.fork.property.druid.centralizedDatasourceSchema.enabled`| This config should be set when CentralizedDatasourceSchema feature is enabled. This should be specified in the Middle Manager runtime properties.|false|No.| + +If you enable this feature, you can query datasources that are only stored in deep storage and are not loaded on a Historical. For more information, see [Query from deep storage](../querying/query-from-deep-storage.md). + +For stale schema cleanup configs, refer to properties with the prefix `druid.coordinator.kill.segmentSchema` in [Metadata Management](#metadata-management). + +### Ingestion security configuration + +#### HDFS input source + +You can set the following property to specify permissible protocols for +the [HDFS input source](../ingestion/input-sources.md#hdfs-input-source). + +|Property|Possible values|Description|Default| +|--------|---------------|-----------|-------| +|`druid.ingestion.hdfs.allowedProtocols`|List of protocols|Allowed protocols for the HDFS input source.|`["hdfs"]`| + +#### HTTP input source + +You can set the following property to specify permissible protocols for +the [HTTP input source](../ingestion/input-sources.md#http-input-source). + +|Property| Possible values | Description |Default| +|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|-------| +|`druid.ingestion.http.allowedProtocols`| List of protocols | Allowed protocols for the HTTP input source. |`["http", "https"]`| +|`druid.ingestion.http.allowedHeaders`| A list of permitted request headers for the HTTP input source. By default, the list is empty, which means no headers are allowed in the ingestion specification. |`[]`| + +### External data access security configuration + +#### JDBC connections to external databases + +You can use the following properties to specify permissible JDBC options for: + +* [SQL input source](../ingestion/input-sources.md#sql-input-source) +* [globally cached JDBC lookups](../querying/lookups-cached-global.md#jdbc-lookup) +* [JDBC Data Fetcher for per-lookup caching](../development/extensions-core/druid-lookups.md#data-fetcher-layer). + +These properties do not apply to metadata storage connections. + +|Property|Possible values| Description |Default| +|--------|---------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------| +|`druid.access.jdbc.enforceAllowedProperties`|Boolean| When true, Druid applies `druid.access.jdbc.allowedProperties` to JDBC connections starting with `jdbc:postgresql:`, `jdbc:mysql:`, or `jdbc:mariadb:`. When false, Druid allows any kind of JDBC connections without JDBC property validation. This config is for backward compatibility especially during upgrades since enforcing allow list can break existing ingestion jobs or lookups based on JDBC. This config is deprecated and will be removed in a future release. |true| +|`druid.access.jdbc.allowedProperties`|List of JDBC properties| Defines a list of allowed JDBC properties. Druid always enforces the list for all JDBC connections starting with `jdbc:postgresql:`, `jdbc:mysql:`, and `jdbc:mariadb:` if `druid.access.jdbc.enforceAllowedProperties` is set to true.

This option is tested against MySQL connector 8.2.0, MariaDB connector 2.7.4, and PostgreSQL connector 42.2.14. Other connector versions might not work. |`["useSSL", "requireSSL", "ssl", "sslmode"]`| +|`druid.access.jdbc.allowUnknownJdbcUrlFormat`|Boolean| When false, Druid only accepts JDBC connections starting with `jdbc:postgresql:` or `jdbc:mysql:`. When true, Druid allows JDBC connections to any kind of database, but only enforces `druid.access.jdbc.allowedProperties` for PostgreSQL and MySQL/MariaDB. |true| + +### Task logging + +You can use the `druid.indexer` configuration to set a [long-term storage](#log-long-term-storage) location for task log files, and to set a [retention policy](#log-retention-policy). + +For more information about ingestion tasks and the services of generating logs, see the [task reference](../ingestion/tasks.md). + +#### Log long-term storage + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.indexer.logs.type`|Where to store task logs. `noop`, [`s3`](#s3-task-logs), [`azure`](#azure-blob-store-task-logs), [`google`](#google-cloud-storage-task-logs), [`hdfs`](#hdfs-task-logs), [`file`](#file-task-logs) |`file`| + +##### File task logs + +Store task logs in the local filesystem. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.indexer.logs.directory`|Local filesystem path.|log| + +##### S3 task logs + +Store task logs in S3. Note that the `druid-s3-extensions` extension must be loaded. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.indexer.logs.s3Bucket`|S3 bucket name.|none| +|`druid.indexer.logs.s3Prefix`|S3 key prefix.|none| +|`druid.indexer.logs.disableAcl`|Boolean flag for ACL. If this is set to `false`, the full control would be granted to the bucket owner. If the task logs bucket is the same as the deep storage (S3) bucket, then the value of this property will need to be set to true if druid.storage.disableAcl has been set to true.|false| + +##### Azure Blob Store task logs + +Store task logs in Azure Blob Store. To enable this feature, load the `druid-azure-extensions` extension, and configure deep storage for Azure. Druid uses the same authentication method configured for deep storage and stores task logs in the same storage account (set in `druid.azure.account`). + +| Property | Description | Default | +|---|---|---| +| `druid.indexer.logs.container` | The Azure Blob Store container to write logs to. | Must be set. | +| `druid.indexer.logs.prefix` | The path to prepend to logs. | Must be set. | + +##### Google Cloud Storage task logs + +Store task logs in Google Cloud Storage. + +Note: The `druid-google-extensions` extension must be loaded, and this uses the same storage settings as the deep storage module for google. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.indexer.logs.bucket`|The Google Cloud Storage bucket to write logs to|none| +|`druid.indexer.logs.prefix`|The path to prepend to logs|none| + +##### HDFS task logs + +Store task logs in HDFS. Note that the `druid-hdfs-storage` extension must be loaded. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.indexer.logs.directory`|The directory to store logs.|none| + +#### Log retention policy + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.indexer.logs.kill.enabled`|Boolean value for whether to enable deletion of old task logs. If set to true, Overlord will submit kill tasks periodically based on `druid.indexer.logs.kill.delay` specified, which will delete task logs from the log directory as well as tasks and tasklogs table entries in metadata storage except for tasks created in the last `druid.indexer.logs.kill.durationToRetain` period. |false| +|`druid.indexer.logs.kill.durationToRetain`| Required if kill is enabled. In milliseconds, task logs and entries in task-related metadata storage tables to be retained created in last x milliseconds. |None| +|`druid.indexer.logs.kill.initialDelay`| Optional. Number of milliseconds after Overlord start when first auto kill is run. |random value less than 300000 (5 mins)| +|`druid.indexer.logs.kill.delay`|Optional. Number of milliseconds of delay between successive executions of auto kill run. |21600000 (6 hours)| + +### API error response + +You can configure Druid API error responses to hide internal information like the Druid class name, stack trace, thread name, servlet name, code, line/column number, host, or IP address. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.server.http.showDetailedJettyErrors`|When set to true, any error from the Jetty layer / Jetty filter includes the following fields in the JSON response: `servlet`, `message`, `url`, `status`, and `cause`, if it exists. When set to false, the JSON response only includes `message`, `url`, and `status`. The field values remain unchanged.|true| +|`druid.server.http.errorResponseTransform.strategy`|Error response transform strategy. The strategy controls how Druid transforms error responses from Druid services. When unset or set to `none`, Druid leaves error responses unchanged.|`none`| + +#### Error response transform strategy + +You can use an error response transform strategy to transform error responses from within Druid services to hide internal information. +When you specify an error response transform strategy other than `none`, Druid transforms the error responses from Druid services as follows: + +* For any query API that fails in the Router service, Druid sets the fields `errorClass` and `host` to null. Druid applies the transformation strategy to the `errorMessage` field. +* For any SQL query API that fails, for example `POST /druid/v2/sql/...`, Druid sets the fields `errorClass` and `host` to null. Druid applies the transformation strategy to the `errorMessage` field. +* For any JDBC related exceptions, Druid will turn all checked exceptions into `QueryInterruptedException` otherwise druid will attempt to keep the exception as the same type. For example if the original exception isn't owned by Druid it will become `QueryInterruptedException`. Druid applies the transformation strategy to the `errorMessage` field. + +##### No error response transform strategy + +In this mode, Druid leaves error responses from underlying services unchanged and returns the unchanged errors to the API client. +This is the default Druid error response mode. To explicitly enable this strategy, set `druid.server.http.errorResponseTransform.strategy` to `none`. + +##### Allowed regular expression error response transform strategy + +In this mode, Druid validates the error responses from underlying services against a list of regular expressions. Only error messages that match a configured regular expression are returned. To enable this strategy, set `druid.server.http.errorResponseTransform.strategy` to `allowedRegex`. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.server.http.errorResponseTransform.allowedRegex`|The list of regular expressions Druid uses to validate error messages. If the error message matches any of the regular expressions, then Druid includes it in the response unchanged. If the error message does not match any of the regular expressions, Druid replaces the error message with null or with a default message depending on the type of underlying Exception. |`[]`| + +For example, consider the following error response: + +```json +{"error":"Plan validation failed","errorMessage":"org.apache.calcite.runtime.CalciteContextException: From line 1, column 15 to line 1, column 38: Object 'nonexistent-datasource' not found","errorClass":"org.apache.calcite.tools.ValidationException","host":null} +``` + +If `druid.server.http.errorResponseTransform.allowedRegex` is set to `[]`, Druid transforms the query error response to the following: + +```json +{"error":"Plan validation failed","errorMessage":null,"errorClass":null,"host":null} +``` + +On the other hand, if `druid.server.http.errorResponseTransform.allowedRegex` is set to `[".*CalciteContextException.*"]` then Druid transforms the query error response to the following: + +```json +{"error":"Plan validation failed","errorMessage":"org.apache.calcite.runtime.CalciteContextException: From line 1, column 15 to line 1, column 38: Object 'nonexistent-datasource' not found","errorClass":null,"host":null} +``` + +##### Persona based error response transform strategy + +In this mode, Druid transforms any exceptions which are targeted at non-users personas. Instead of returning such exception directly, the strategy logs the exception against a random id and returns the id along with a generic error message to the user. + +To enable this strategy, set `druid.server.http.errorResponseTransform.strategy` to `persona`. + +### Overlord discovery + +This config is used to find the [Overlord](../design/overlord.md) using Curator service discovery. Only required if you are actually running an Overlord. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.selectors.indexing.serviceName`|The druid.service name of the Overlord service. To start the Overlord with a different name, set it with this property. |druid/overlord| + +### Coordinator discovery + +This config is used to find the [Coordinator](../design/coordinator.md) using Curator service discovery. This config is used by the realtime indexing services to get information about the segments loaded in the cluster. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.selectors.coordinator.serviceName`|The druid.service name of the Coordinator service. To start the Coordinator with a different name, set it with this property. |druid/coordinator| + +### Announcing segments + +You can configure how to announce and unannounce Znodes in ZooKeeper (using Curator). For normal operations you do not need to override any of these configs. + +#### Batch data segment announcer + +In current Druid, multiple data segments may be announced under the same Znode. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.announcer.segmentsPerNode`|Each Znode contains info for up to this many segments.|50| +|`druid.announcer.maxBytesPerNode`|Max byte size for Znode. Allowed range is [1024, 1048576].|524288| +|`druid.announcer.skipDimensionsAndMetrics`|Skip Dimensions and Metrics list from segment announcements. NOTE: Enabling this will also remove the dimensions and metrics list from Coordinator and Broker endpoints.|false| +|`druid.announcer.skipLoadSpec`|Skip segment LoadSpec from segment announcements. NOTE: Enabling this will also remove the loadspec from Coordinator and Broker endpoints.|false| + +If you want to turn off the batch data segment announcer, you can add a property to skip announcing segments. **You do not want to enable this config if you have any services using `batch` for `druid.serverview.type`** + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.announcer.skipSegmentAnnouncementOnZk`|Skip announcing segments to ZooKeeper. Note that the batch server view will not work if this is set to true.|false| + +### JavaScript + +Druid supports dynamic runtime extension through JavaScript functions. This functionality can be configured through +the following properties. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.javascript.enabled`|Set to "true" to enable JavaScript functionality. This affects the JavaScript parser, filter, extractionFn, aggregator, post-aggregator, router strategy, and worker selection strategy.|false| + +:::info + JavaScript-based functionality is disabled by default. Please refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it. +::: + +### Double column storage + +Prior to version 0.13.0, Druid's storage layer used a 32-bit float representation to store columns created by the +doubleSum, doubleMin, and doubleMax aggregators at indexing time. +Starting from version 0.13.0 the default will be 64-bit floats for Double columns. +Using 64-bit representation for double column will lead to avoid precision loss at the cost of doubling the storage size of such columns. +To keep the old format set the system-wide property `druid.indexing.doubleStorage=float`. +You can also use `floatSum`, `floatMin`, and `floatMax` to use 32-bit float representation. +Support for 64-bit floating point columns was released in Druid 0.11.0, so if you use this feature then older versions of Druid will not be able to read your data segments. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.indexing.doubleStorage`|Set to "float" to use 32-bit double representation for double columns.|double| + +### HTTP client + +All Druid components can communicate with each other over HTTP. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.global.http.numConnections`|Size of connection pool per destination URL. If there are more HTTP requests than this number that all need to speak to the same URL, then they will queue up.|`20`| +|`druid.global.http.eagerInitialization`|Indicates that http connections should be eagerly initialized. If set to true, `numConnections` connections are created upon initialization|`false`| +|`druid.global.http.compressionCodec`|Compression codec to communicate with others. May be "gzip" or "identity".|`gzip`| +|`druid.global.http.readTimeout`|The timeout for data reads.|`PT15M`| +|`druid.global.http.unusedConnectionTimeout`|The timeout for idle connections in connection pool. The connection in the pool will be closed after this timeout and a new one will be established. This timeout should be less than `druid.global.http.readTimeout`. Set this timeout = ~90% of `druid.global.http.readTimeout`|`PT4M`| +|`druid.global.http.numMaxThreads`|Maximum number of I/O worker threads|`(number of cores) * 3 / 2 + 1`| +|`druid.global.http.clientConnectTimeout`|The timeout (in milliseconds) for establishing client connections.|500| + +### Common endpoints configuration + +This section contains the configuration options for endpoints that are supported by all services. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.server.hiddenProperties`| If property names or substring of property names (case insensitive) is in this list, responses of the `/status/properties` endpoint do not show these properties | `["druid.s3.accessKey","druid.s3.secretKey","druid.metadata.storage.connector.password", "password", "key", "token", "pwd"]` | + +## Master server + +This section contains the configuration options for the services that reside on Master servers (Coordinators and Overlords) in the suggested [three-server configuration](../design/architecture.md#druid-servers). + +### Coordinator + +For general Coordinator services information, see [Coordinator service](../design/coordinator.md). + +#### Static Configuration + +These Coordinator static configurations can be defined in the `coordinator/runtime.properties` file. + +##### Coordinator service config + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.host`|The host for the current service. This is used to advertise the current service location as reachable from another service and should generally be specified such that `http://${druid.host}/` could actually talk to this service.|`InetAddress.getLocalHost().getCanonicalHostName()`| +|`druid.bindOnHost`|Indicating whether the service's internal jetty server bind on `druid.host`. Default is false, which means binding to all interfaces.|false| +|`druid.plaintextPort`|This is the port to actually listen on; unless port mapping is used, this will be the same port as is on `druid.host`|8081| +|`druid.tlsPort`|TLS port for HTTPS connector, if [druid.enableTlsPort](../operations/tls-support.md) is set then this config will be used. If `druid.host` contains port then that port will be ignored. This should be a non-negative integer.|8281| +|`druid.service`|The name of the service. This is used as a dimension when emitting metrics and alerts to differentiate between the various services.|`druid/coordinator`| +|`druid.labels`|Optional JSON object of key-value pairs that define custom labels for the server. These labels are displayed in the web console under the "Services" tab. Example: `druid.labels={"location":"Airtrunk"}` or `druid.labels.location=Airtrunk`|`null`| + +##### Coordinator operation + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.coordinator.period`|The run period for the Coordinator. The Coordinator operates by maintaining the current state of the world in memory and periodically looking at the set of "used" segments and segments being served to make decisions about whether any changes need to be made to the data topology. This property sets the delay between each of these runs.|`PT60S`| +|`druid.coordinator.startDelay`|The operation of the Coordinator works on the assumption that it has an up-to-date view of the state of the world when it runs, the current ZooKeeper interaction code, however, is written in a way that doesn’t allow the Coordinator to know for a fact that it’s done loading the current state of the world. This delay is a hack to give it enough time to believe that it has all the data.|`PT300S`| +|`druid.coordinator.load.timeout`|The timeout duration for when the Coordinator assigns a segment to a Historical service.|`PT15M`| +|`druid.coordinator.balancer.strategy`|The [balancing strategy](../design/coordinator.md#balancing-segments-in-a-tier) used by the Coordinator to distribute segments among the Historical servers in a tier. The `cost` strategy distributes segments by minimizing a cost function, `diskNormalized` weights these costs with the disk usage ratios of the servers and `random` distributes segments randomly.|`cost`| +|`druid.coordinator.loadqueuepeon.http.repeatDelay`|The start and repeat delay (in milliseconds) for the load queue peon, which manages the load/drop queue of segments for any server.|1 minute| +|`druid.coordinator.loadqueuepeon.http.batchSize`|Number of segment load/drop requests to batch in one HTTP request. Note that it must be smaller than or equal to the `druid.segmentCache.numLoadingThreads` config on Historical service. If this value is not configured, the coordinator uses the value of the `numLoadingThreads` for the respective server. | `druid.segmentCache.numLoadingThreads` | +|`druid.coordinator.asOverlord.enabled`|Boolean value for whether this Coordinator service should act like an Overlord as well. This configuration allows users to simplify a Druid cluster by not having to deploy any standalone Overlord services. If set to true, then Overlord console is available at `http://coordinator-host:port/console.html` and be sure to set `druid.coordinator.asOverlord.overlordService` also.|false| +|`druid.coordinator.asOverlord.overlordService`| Required, if `druid.coordinator.asOverlord.enabled` is `true`. This must be same value as `druid.service` on standalone Overlord services and `druid.selectors.indexing.serviceName` on Middle Managers.|NULL| + +##### Data management + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.coordinator.period.indexingPeriod`|Period to run data management duties on the Coordinator including launching compact tasks and performing clean up of unused data. It is recommended to keep this value longer than `druid.manager.segments.pollDuration`.|`PT1800S` (30 mins)| +|`druid.coordinator.kill.pendingSegments.on`|Boolean flag for whether or not the Coordinator clean up old entries in the `pendingSegments` table of metadata store. If set to true, Coordinator will check the created time of most recently complete task. If it doesn't exist, it finds the created time of the earliest running/pending/waiting tasks. Once the created time is found, then for all datasources not in the `killPendingSegmentsSkipList` (see [Dynamic configuration](#dynamic-configuration)), Coordinator will ask the Overlord to clean up the entries 1 day or more older than the found created time in the `pendingSegments` table. This will be done periodically based on `druid.coordinator.period.indexingPeriod` specified.|true| +|`druid.coordinator.kill.on`|Boolean flag to enable the Coordinator to submit a kill task for unused segments and delete them permanently from the metadata store and deep storage.|false| +|`druid.coordinator.kill.period`| The frequency of sending kill tasks to the indexing service. The value must be greater than or equal to `druid.coordinator.period.indexingPeriod`. Only applies if kill is turned on.|Same as `druid.coordinator.period.indexingPeriod`| +|`druid.coordinator.kill.durationToRetain`|Duration, in ISO 8601 format, relative to the current time that identifies the data interval of segments to retain. When `druid.coordinator.kill.on` is true, any segment with a data interval ending before `now - durationToRetain` is eligible for permanent deletion. For example, if `durationToRetain` is set to `P90D`, unused segments with time intervals ending 90 days in the past are eligible for deletion. If `durationToRetain` is set to a negative ISO 8601 period, segments with future intervals ending before `now - durationToRetain` are also eligible for deletion.|`P90D`| +|`druid.coordinator.kill.ignoreDurationToRetain`|A way to override `druid.coordinator.kill.durationToRetain` and tell the coordinator that you do not care about the end date of unused segment intervals when it comes to killing them. If true, the coordinator considers all unused segments as eligible to be killed.|false| +|`druid.coordinator.kill.bufferPeriod`|The amount of time that a segment must be unused before it is able to be permanently removed from metadata and deep storage. This can serve as a buffer period to prevent data loss if data ends up being needed after being marked unused.|`P30D`| +|`druid.coordinator.kill.maxSegments`|The number of unused segments to kill per kill task. This number must be greater than 0. This only applies when `druid.coordinator.kill.on=true`.|100| +|`druid.coordinator.kill.maxInterval`|The largest interval, as an [ISO 8601 duration](https://en.wikipedia.org/wiki/ISO_8601#Durations), of segments to delete per kill task. Set to zero, e.g. `PT0S`, for unlimited. This only applies when `druid.coordinator.kill.on=true`.|`P30D`| + +##### Metadata management + +|Property|Description|Required|Default| +|--------|-----------|---------|-------| +|`druid.coordinator.period.metadataStoreManagementPeriod`|How often to run metadata management tasks in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. |No | `PT1H`| +|`druid.coordinator.kill.supervisor.on`| Boolean value for whether to enable automatic deletion of terminated supervisors. If set to true, Coordinator will periodically remove terminated supervisors from the supervisor table in metadata storage.| No |true| +|`druid.coordinator.kill.supervisor.period`| How often to do automatic deletion of terminated supervisor in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Value must be equal to or greater than `druid.coordinator.period.metadataStoreManagementPeriod`. Only applies if `druid.coordinator.kill.supervisor.on` is set to true.| No| `P1D`| +|`druid.coordinator.kill.supervisor.durationToRetain`| Duration of terminated supervisor to be retained from created time in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Only applies if `druid.coordinator.kill.supervisor.on` is set to true.| Yes if `druid.coordinator.kill.supervisor.on` is set to true.| `P90D`| +|`druid.coordinator.kill.audit.on`| Boolean value for whether to enable automatic deletion of audit logs. If set to true, Coordinator will periodically remove audit logs from the audit table entries in metadata storage.| No | True| +|`druid.coordinator.kill.audit.period`| How often to do automatic deletion of audit logs in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Value must be equal to or greater than `druid.coordinator.period.metadataStoreManagementPeriod`. Only applies if `druid.coordinator.kill.audit.on` is set to true.| No| `P1D`| +|`druid.coordinator.kill.audit.durationToRetain`| Duration of audit logs to be retained from created time in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Only applies if `druid.coordinator.kill.audit.on` is set to true.| Yes if `druid.coordinator.kill.audit.on` is set to true.| `P90D`| +|`druid.coordinator.kill.compaction.on`| Boolean value for whether to enable automatic deletion of compaction configurations. If set to true, Coordinator will periodically remove compaction configuration of inactive datasource (datasource with no used and unused segments) from the config table in metadata storage. | No |True| +|`druid.coordinator.kill.compaction.period`| How often to do automatic deletion of compaction configurations in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Value must be equal to or greater than `druid.coordinator.period.metadataStoreManagementPeriod`. Only applies if `druid.coordinator.kill.compaction.on` is set to true.| No| `P1D`| +|`druid.coordinator.kill.rule.on`| Boolean value for whether to enable automatic deletion of rules. If set to true, Coordinator will periodically remove rules of inactive datasource (datasource with no used and unused segments) from the rule table in metadata storage.| No | True| +|`druid.coordinator.kill.rule.period`| How often to do automatic deletion of rules in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Value must be equal to or greater than `druid.coordinator.period.metadataStoreManagementPeriod`. Only applies if `druid.coordinator.kill.rule.on` is set to true.| No| `P1D`| +|`druid.coordinator.kill.rule.durationToRetain`| Duration of rules to be retained from created time in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Only applies if `druid.coordinator.kill.rule.on` is set to true.| Yes if `druid.coordinator.kill.rule.on` is set to true.| `P90D`| +|`druid.coordinator.kill.datasource.on`| Boolean value for whether to enable automatic deletion of datasource metadata (Note: datasource metadata only exists for datasource created from supervisor). If set to true, Coordinator will periodically remove datasource metadata of terminated supervisor from the datasource table in metadata storage. | No | True| +|`druid.coordinator.kill.datasource.period`| How often to do automatic deletion of datasource metadata in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Value must be equal to or greater than `druid.coordinator.period.metadataStoreManagementPeriod`. Only applies if `druid.coordinator.kill.datasource.on` is set to true.| No| `P1D`| +|`druid.coordinator.kill.datasource.durationToRetain`| Duration of datasource metadata to be retained from created time in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Only applies if `druid.coordinator.kill.datasource.on` is set to true.| Yes if `druid.coordinator.kill.datasource.on` is set to true.| `P90D`| +|`druid.coordinator.kill.segmentSchema.on`| Boolean value for whether to enable automatic deletion of unused segment schemas. If set to true, Coordinator will periodically identify segment schemas which are not referenced by any used segment and mark them as unused. At a later point, these unused schemas are deleted. Only applies if [Centralized Datasource schema](#centralized-datasource-schema-experimental) feature is enabled. | No | True| +|`druid.coordinator.kill.segmentSchema.period`| How often to do automatic deletion of segment schemas in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Value must be equal to or greater than `druid.coordinator.period.metadataStoreManagementPeriod`. Only applies if `druid.coordinator.kill.segmentSchema.on` is set to true.| No| `P1D`| +|`druid.coordinator.kill.segmentSchema.durationToRetain`| Duration of segment schemas to be retained from the time it was marked as unused in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Only applies if `druid.coordinator.kill.segmentSchema.on` is set to true.| Yes, if `druid.coordinator.kill.segmentSchema.on` is set to true.| `P90D`| + +##### Segment management + +|Property|Possible values|Description|Default| +|--------|---------------|-----------|-------| +|`druid.serverview.type`|batch or http|Segment discovery method to use. "http" enables discovering segments using HTTP instead of ZooKeeper.|http| +|`druid.coordinator.segment.awaitInitializationOnStart`|true or false|Whether the Coordinator will wait for its view of segments to fully initialize before starting up. If set to 'true', the Coordinator's HTTP server will not start up, and the Coordinator will not announce itself as available, until the server view is initialized.|true| + +##### Metadata retrieval + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.manager.config.pollDuration`|How often the manager polls the config table for updates.|`PT1M`| +|`druid.manager.segments.pollDuration`|The duration between polls the Coordinator does for updates to the set of active segments. Generally defines the amount of lag time it can take for the Coordinator to notice new segments.|`PT1M`| +|`druid.manager.segments.useIncrementalCache`|(Experimental) Denotes the usage mode of the segment metadata incremental cache. This cache provides a performance improvement over the polling mechanism currently employed by the Coordinator as it retrieves payloads of only updated segments. Possible cache modes are: (a) `never`: Incremental cache is disabled. (b) `always`: Incremental cache is enabled. Service start-up will be blocked until cache has synced with the metadata store at least once. (c) `ifSynced`: Cache is enabled. This mode does not block service start-up and is a way to retain existing behavior of the Coordinator. If the incremental cache is in modes `always` or `ifSynced`, reads from the cache will block until it has synced with the metadata store at least once after becoming leader. The Coordinator never writes to this cache.|`never`| +|`druid.manager.rules.pollDuration`|The duration between polls the Coordinator does for updates to the set of active rules. Generally defines the amount of lag time it can take for the Coordinator to notice rules.|`PT1M`| +|`druid.manager.rules.defaultRule`|The default rule for the cluster|`_default`| +|`druid.manager.rules.alertThreshold`|The duration after a failed poll upon which an alert should be emitted.|`PT10M`| + +#### Dynamic configuration + +The Coordinator has dynamic configurations to tune certain behavior on the fly, without requiring a service restart. +You can configure these parameters using the [web console](../operations/web-console.md)(recommended) or through the [Coordinator dynamic configuration API](../api-reference/dynamic-configuration-api.md#coordinator-dynamic-configuration). + +The following table shows the dynamic configuration properties for the Coordinator. + +|Property|Description|Default| +|--------|-----------|-------| +|`millisToWaitBeforeDeleting`|How long does the Coordinator need to be a leader before it can start marking overshadowed segments as unused in metadata storage.| 900000 (15 mins)| +|`smartSegmentLoading`|Enables ["smart" segment loading mode](#smart-segment-loading) which dynamically computes the optimal values of several properties that maximize Coordinator performance.|true| +|`maxSegmentsToMove`|The maximum number of segments that can be moved in a Historical tier at any given time.|100| +|`replicantLifetime`|The maximum number of Coordinator runs for which a segment can wait in the load queue of a Historical before Druid raises an alert.|15| +|`replicationThrottleLimit`|The maximum number of segment replicas that can be assigned to a historical tier in a single Coordinator run. This property prevents Historical services from becoming overwhelmed when loading extra replicas of segments that are already available in the cluster.|500| +|`balancerComputeThreads`|Thread pool size for computing moving cost of segments during segment balancing. Consider increasing this if you have a lot of segments and moving segments begins to stall.|`num_cores` / 2| +|`killDataSourceWhitelist`|List of specific data sources for which kill tasks can be issued if `druid.coordinator.kill.on` is true. It can be a comma-separated list of data source names or a JSON array. If `killDataSourceWhitelist` is empty, the Coordinator issues kill tasks for all data sources.|none| +|`killTaskSlotRatio`|Ratio of total available task slots, including autoscaling if applicable that will be allowed for kill tasks. This value must be between 0 and 1. Only applicable for kill tasks that are spawned automatically by the coordinator's auto kill duty, which is enabled when `druid.coordinator.kill.on` is true.|0.1| +|`maxKillTaskSlots`|Maximum number of tasks that will be allowed for kill tasks. This limit only applies for kill tasks that are spawned automatically by the coordinator's auto kill duty, which is enabled when `druid.coordinator.kill.on` is true.|`Integer.MAX_VALUE` - no limit| +|`killPendingSegmentsSkipList`|List of data sources for which pendingSegments are _NOT_ cleaned up if property `druid.coordinator.kill.pendingSegments.on` is true. This can be a list of comma-separated data sources or a JSON array.|none| +|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments allowed in the load queue of any given server. Use this parameter to load segments faster if, for example, the cluster contains slow-loading nodes or if there are too many segments to be replicated to a particular node (when faster loading is preferred to better segments distribution). The optimal value depends on the loading speed of segments, acceptable replication time and number of nodes.|500| +|`useRoundRobinSegmentAssignment`|Boolean flag for whether segments should be assigned to Historical services in a round robin fashion. When disabled, segment assignment is done using the chosen balancer strategy. When enabled, this can speed up segment assignments leaving balancing to move the segments to their optimal locations (based on the balancer strategy) lazily.|true| +|`decommissioningNodes`|List of Historical servers to decommission. Coordinator will not assign new segments to decommissioning servers, and segments will be moved away from them to be placed on non-decommissioning servers at the maximum rate specified by `maxSegmentsToMove`.|none| +|`pauseCoordination`|Boolean flag for whether or not the Coordinator should execute its various duties of coordinating the cluster. Setting this to true essentially pauses all coordination work while allowing the API to remain up. Duties that are paused include all classes that implement the `CoordinatorDuty` interface. Such duties include: segment balancing, segment compaction, submitting kill tasks for unused segments (if enabled), logging of used segments in the cluster, marking of newly unused or overshadowed segments, matching and execution of load/drop rules for used segments, unloading segments that are no longer marked as used from Historical servers. An example of when an admin may want to pause coordination would be if they are doing deep storage maintenance on HDFS name nodes with downtime and don't want the Coordinator to be directing Historical nodes to hit the name node with API requests until maintenance is done and the deep store is declared healthy for use again.|false| +|`replicateAfterLoadTimeout`|Boolean flag for whether or not additional replication is needed for segments that have failed to load due to the expiry of `druid.coordinator.load.timeout`. If this is set to true, the Coordinator will attempt to replicate the failed segment on a different historical server. This helps improve the segment availability if there are a few slow Historicals in the cluster. However, the slow Historical may still load the segment later and the Coordinator may issue drop requests if the segment is over-replicated.|false| +|`turboLoadingNodes`| Experimental. List of Historical servers to place in turbo loading mode. These servers use a larger thread-pool to load segments faster but at the cost of query performance. For servers specified in `turboLoadingNodes`, `druid.coordinator.loadqueuepeon.http.batchSize` is ignored and the coordinator uses the value of the respective `numLoadingThreads` instead.
Please use this config with caution. All servers should eventually be removed from this list once the segment loading on the respective historicals is finished. |none| +|`cloneServers`| Experimental. Map from target Historical server to source Historical server which should be cloned by the target. The target Historical does not participate in regular segment assignment or balancing. Instead, the Coordinator mirrors any segment assignment made to the source Historical onto the target Historical, so that the target becomes an exact copy of the source. Segments on the target Historical do not count towards replica counts either. If the source disappears, the target remains in the last known state of the source server until removed from the configuration.
Use this config with caution. All servers should eventually be removed from this list once the desired state on the respective Historicals is achieved. |none| + +##### Smart segment loading + +The `smartSegmentLoading` mode simplifies Coordinator configuration for segment loading and balancing. +If you enable this mode, do not provide values for the properties in the table below as the Coordinator computes them automatically. +Druid computes the values to optimize Coordinator performance, based on the current state of the cluster. + +If you enable `smartSegmentLoading` mode, Druid ignores any value you provide for the following properties. + +|Property|Computed value|Description| +|--------|--------------|-----------| +|`useRoundRobinSegmentAssignment`|true|Speeds up segment assignment.| +|`maxSegmentsInNodeLoadingQueue`|0|Removes the limit on load queue size.| +|`replicationThrottleLimit`|5% of used segments, minimum value 100|Prevents aggressive replication when a Historical disappears only intermittently.| +|`replicantLifetime`|60|Allows segments to wait about an hour (assuming a Coordinator period of 1 minute) in the load queue before an alert is raised. In `smartSegmentLoading` mode, load queues are not limited by size. Segments might therefore assigned to a load queue even if the corresponding server is slow to load them.| +|`maxSegmentsToMove`|2% of used segments, minimum value 100, maximum value 1000|Ensures that some segments are always moving in the cluster to keep it well balanced. The maximum value keeps the Coordinator run times bounded.| +|`balancerComputeThreads`|`num_cores` / 2|Ensures that there are enough threads to perform balancing computations without hogging all Coordinator resources.| + +When `smartSegmentLoading` is disabled, Druid uses the configured values of these properties. +Disable `smartSegmentLoading` only if you want to explicitly set the values of any of the above properties. + +##### Lookups dynamic configuration + +These configuration options control Coordinator lookup management. For configurations that affect lookup propagation, see [Dynamic configuration for lookups](../querying/lookups.md#dynamic-configuration). + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.manager.lookups.hostDeleteTimeout`|How long to wait for a `DELETE` request to a particular service before considering the `DELETE` a failure.|`PT1S`| +|`druid.manager.lookups.hostUpdateTimeout`|How long to wait for a `POST` request to a particular service before considering the `POST` a failure.|`PT10S`| +|`druid.manager.lookups.deleteAllTimeout`|How long to wait for all `DELETE` requests to finish before considering the delete attempt a failure.|`PT10S`| +|`druid.manager.lookups.updateAllTimeout`|How long to wait for all `POST` requests to finish before considering the attempt a failure.|`PT60S`| +|`druid.manager.lookups.threadPoolSize`|How many services can be managed concurrently (concurrent `POST` and `DELETE` requests). Requests this limit will wait in a queue until a slot becomes available.|10| +|`druid.manager.lookups.period`|Number of milliseconds between checks for configuration changes.|120000 (2 minutes)| + +##### Automatic compaction dynamic configuration + +You can set or update [automatic compaction](../data-management/automatic-compaction.md) properties dynamically using the +[Automatic compaction API](../api-reference/automatic-compaction-api.md) without restarting Coordinators. + +For details about segment compaction, see [Segment size optimization](../operations/segment-optimization.md). + +You can configure automatic compaction through the following properties: + +|Property|Description|Required| +|--------|-----------|--------| +|`dataSource`|The datasource name to be compacted.|yes| +|`taskPriority`|[Priority](../ingestion/tasks.md#lock-priority) of compaction task.|no (default = 25)| +|`inputSegmentSizeBytes`|Maximum number of total segment bytes processed per compaction task. Since a time chunk must be processed in its entirety, if the segments for a particular time chunk have a total size in bytes greater than this parameter, compaction will not run for that time chunk.|no (default = 100,000,000,000,000 i.e. 100TB)| +|`skipOffsetFromLatest`|The offset for searching segments to be compacted in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Strongly recommended to set for realtime datasources. See [Data handling with compaction](../data-management/compaction.md#data-handling-with-compaction).|no (default = "P1D")| +|`tuningConfig`|Tuning config for compaction tasks. See below [Automatic compaction tuningConfig](#automatic-compaction-tuningconfig).|no| +|`taskContext`|[Task context](../ingestion/tasks.md#context-parameters) for compaction tasks.|no| +|`granularitySpec`|Custom `granularitySpec`. See [Automatic compaction granularitySpec](#automatic-compaction-granularityspec).|no| +|`dimensionsSpec`|Custom `dimensionsSpec`. See [Automatic compaction dimensionsSpec](#automatic-compaction-dimensionsspec).|no| +|`transformSpec`|Custom `transformSpec`. See [Automatic compaction transformSpec](#automatic-compaction-transformspec).|no| +|`metricsSpec`|Custom [`metricsSpec`](../ingestion/ingestion-spec.md#metricsspec). The compaction task preserves any existing metrics regardless of whether `metricsSpec` is specified. If `metricsSpec` is specified, Druid does not reapply any aggregators matching the metric names specified in `metricsSpec` to rows that already have the associated metrics. For rows that do not already have the metric specified in `metricsSpec`, Druid applies the metric aggregator on the source column, then proceeds to combine the metrics across segments as usual. If `metricsSpec` is not specified, Druid automatically discovers the metrics in the existing segments and combines existing metrics with the same metric name across segments. Aggregators for metrics with the same name are assumed to be compatible for combining across segments, otherwise the compaction task may fail.|no| +|`ioConfig`|IO config for compaction tasks. See [Automatic compaction ioConfig](#automatic-compaction-ioconfig).|no| + +Automatic compaction config example: + +```json +{ + "dataSource": "wikiticker", + "granularitySpec" : { + "segmentGranularity" : "none" + } +} +``` + +Compaction tasks fail when higher priority tasks cause Druid to revoke their locks. By default, realtime tasks like ingestion have a higher priority than compaction tasks. Frequent conflicts between compaction tasks and realtime tasks can cause the Coordinator's automatic compaction to hang. +You may see this issue with streaming ingestion from Kafka and Kinesis, which ingest late-arriving data. + +To mitigate this problem, set `skipOffsetFromLatest` to a value large enough so that arriving data tends to fall outside the offset value from the current time. This way you can avoid conflicts between compaction tasks and realtime ingestion tasks. +For example, if you want to skip over segments from thirty days prior to the end time of the most recent segment, assign `"skipOffsetFromLatest": "P30D"`. +For more information, see [Avoid conflicts with ingestion](../data-management/automatic-compaction.md#avoid-conflicts-with-ingestion). + +###### Automatic compaction tuningConfig + +Auto-compaction supports a subset of the [tuningConfig for Parallel task](../ingestion/native-batch.md#tuningconfig). + +The following table shows the supported configurations for auto-compaction. + +|Property|Description|Required| +|--------|-----------|--------| +|type|The task type. If you're using Coordinator duties for auto-compaction, set it to `index_parallel`. If you're using compaction supervisors, set it to `autocompact`. |yes| +|`maxRowsInMemory`|Used in determining when intermediate persists to disk should occur. Normally user does not need to set this, but depending on the nature of data, if rows are short in terms of bytes, user may not want to store a million rows in memory and this value should be set.|no (default = 1000000)| +|`maxBytesInMemory`|Used in determining when intermediate persists to disk should occur. Normally this is computed internally and user does not need to set it. This value represents number of bytes to aggregate in heap memory before persisting. This is based on a rough estimate of memory usage and not actual usage. The maximum heap memory usage for indexing is `maxBytesInMemory` * (2 + `maxPendingPersists`)|no (default = 1/6 of max JVM memory)| +|`splitHintSpec`|Used to give a hint to control the amount of data that each first phase task reads. This hint could be ignored depending on the implementation of the input source. See [Split hint spec](../ingestion/native-batch.md#split-hint-spec) for more details.|no (default = size-based split hint spec)| +|`partitionsSpec`|Defines how to partition data in each time chunk, see [`PartitionsSpec`](../ingestion/native-batch.md#partitionsspec)|no (default = `dynamic`)| +|`indexSpec`|Defines segment storage format options to be used at indexing time, see [IndexSpec](../ingestion/ingestion-spec.md#indexspec)|no| +|`indexSpecForIntermediatePersists`|Defines segment storage format options to be used at indexing time for intermediate persisted temporary segments. this can be used to disable dimension/metric compression on intermediate segments to reduce memory required for final merging. however, disabling compression on intermediate segments might increase page cache use while they are used before getting merged into final segment published, see [IndexSpec](../ingestion/ingestion-spec.md#indexspec) for possible values.|no| +|`maxPendingPersists`|Maximum number of persists that can be pending but not started. If this limit would be exceeded by a new intermediate persist, ingestion will block until the currently-running persist finishes. Maximum heap memory usage for indexing scales with `maxRowsInMemory` * (2 + `maxPendingPersists`).|no (default = 0, meaning one persist can be running concurrently with ingestion, and none can be queued up)| +|`pushTimeout`|Milliseconds to wait for pushing segments. It must be >= 0, where 0 means to wait forever.|no (default = 0)| +|`segmentWriteOutMediumFactory`|Segment write-out medium to use when creating segments. See [SegmentWriteOutMediumFactory](../ingestion/native-batch.md#segmentwriteoutmediumfactory).|no (default is the value from `druid.peon.defaultSegmentWriteOutMediumFactory.type` is used)| +|`maxNumConcurrentSubTasks`|Maximum number of worker tasks which can be run in parallel at the same time. The supervisor task would spawn worker tasks up to `maxNumConcurrentSubTasks` regardless of the current available task slots. If this value is set to 1, the Supervisor task processes data ingestion on its own instead of spawning worker tasks. If this value is set to too large, too many worker tasks can be created which might block other ingestion. Check [Capacity Planning](../ingestion/native-batch.md#capacity-planning) for more details.|no (default = 1)| +|`maxRetry`|Maximum number of retries on task failures.|no (default = 3)| +|`maxNumSegmentsToMerge`|Max limit for the number of segments that a single task can merge at the same time in the second phase. Used only with `hashed` or `single_dim` partitionsSpec.|no (default = 100)| +|`totalNumMergeTasks`|Total number of tasks to merge segments in the merge phase when `partitionsSpec` is set to `hashed` or `single_dim`.|no (default = 10)| +|`taskStatusCheckPeriodMs`|Polling period in milliseconds to check running task statuses.|no (default = 1000)| +|`chatHandlerTimeout`|Timeout for reporting the pushed segments in worker tasks.|no (default = PT10S)| +|`chatHandlerNumRetries`|Retries for reporting the pushed segments in worker tasks.|no (default = 5)| +|`engine` | Engine for compaction. Can be either `native` or `msq`. `msq` uses the MSQ task engine and is only supported with [compaction supervisors](../data-management/automatic-compaction.md#auto-compaction-using-compaction-supervisors). | no (default = native)| + +###### Automatic compaction granularitySpec + +|Field|Description|Required| +|-----|-----------|--------| +|`segmentGranularity`|Time chunking period for the segment granularity. Defaults to 'null', which preserves the original segment granularity. Accepts all [Query granularity](../querying/granularities.md) values.|No| +|`queryGranularity`|The resolution of timestamp storage within each segment. Defaults to 'null', which preserves the original query granularity. Accepts all [Query granularity](../querying/granularities.md) values.|No| +|`rollup`|Whether to enable ingestion-time rollup or not. Defaults to null, which preserves the original setting. Note that once data is rollup, individual records can no longer be recovered. |No| + +###### Automatic compaction dimensionsSpec + +|Field|Description|Required| +|-----|-----------|--------| +|`dimensions`| A list of dimension names or objects. Defaults to null, which preserves the original dimensions. Note that setting this will cause segments manually compacted with `dimensionExclusions` to be compacted again.|No| + +###### Automatic compaction transformSpec + +|Field|Description|Required| +|-----|-----------|--------| +|`filter`| Conditionally filters input rows during compaction. Only rows that pass the filter will be included in the compacted segments. Any of Druid's standard [query filters](../querying/filters.md) can be used. Defaults to null, which will not filter any row. |No| + +###### Automatic compaction ioConfig + +Auto-compaction supports a subset of the [ioConfig for Parallel task](../ingestion/native-batch.md). +The below is a list of the supported configurations for auto-compaction. + +|Property|Description|Default|Required| +|--------|-----------|-------|--------| +|`dropExisting`|If `true` the compaction task replaces all existing segments fully contained by the umbrella interval of the compacted segments when the task publishes new segments and tombstones. If compaction fails, Druid does not publish any segments or tombstones. WARNING: this functionality is still in beta. Note that changing this config does not cause intervals to be compacted again.|false|no| + +### Overlord + +For general Overlord service information, see [Overlord](../design/overlord.md). + +#### Overlord static configuration + +These Overlord static configurations can be defined in the `overlord/runtime.properties` file. + +##### Overlord service configs + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.host`|The host for the current service. This is used to advertise the current service location as reachable from another service and should generally be specified such that `http://${druid.host}/` could actually talk to this service.|`InetAddress.getLocalHost().getCanonicalHostName()`| +|`druid.bindOnHost`|Indicating whether the service's internal jetty server bind on `druid.host`. Default is false, which means binding to all interfaces.|false| +|`druid.plaintextPort`|This is the port to actually listen on; unless port mapping is used, this will be the same port as is on `druid.host`.|8090| +|`druid.tlsPort`|TLS port for HTTPS connector, if [druid.enableTlsPort](../operations/tls-support.md) is set then this config will be used. If `druid.host` contains port then that port will be ignored. This should be a non-negative Integer.|8290| +|`druid.service`|The name of the service. This is used as a dimension when emitting metrics and alerts to differentiate between the various services.|`druid/overlord`| +|`druid.labels`|Optional JSON object of key-value pairs that define custom labels for the server. These labels are displayed in the web console under the "Services" tab. Example: `druid.labels={"location":"Airtrunk"}` or `druid.labels.location=Airtrunk`|`null`| + +##### Overlord operations + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.indexer.runner.type`|Indicates whether tasks should be run locally using `local` or in a distributed environment using `remote`. The recommended option is `httpRemote`, which is similar to `remote` but uses HTTP to interact with Middle Managers instead of ZooKeeper.|`httpRemote`| +|`druid.indexer.storage.type`|Indicates whether incoming tasks should be stored locally (in heap) or in metadata storage. One of `local` or `metadata`. `local` is mainly for internal testing while `metadata` is recommended in production because storing incoming tasks in metadata storage allows for tasks to be resumed if the Overlord should fail.|`local`| +|`druid.indexer.storage.recentlyFinishedThreshold`|Duration of time to store task results. Default is 24 hours. If you have hundreds of tasks running in a day, consider increasing this threshold.|`PT24H`| +|`druid.indexer.tasklock.forceTimeChunkLock`|**Setting this to false is still experimental**
If set, all tasks are enforced to use time chunk lock. If not set, each task automatically chooses a lock type to use. This configuration can be overwritten by setting `forceTimeChunkLock` in the [task context](../ingestion/tasks.md#context-parameters). See [Task lock system](../ingestion/tasks.md#task-lock-system) for more details about locking in tasks.|true| +|`druid.indexer.tasklock.batchSegmentAllocation`| If set to true, Druid performs segment allocate actions in batches to improve throughput and reduce the average `task/action/run/time`. See [batching `segmentAllocate` actions](../ingestion/tasks.md#batching-segmentallocate-actions) for details.|true| +|`druid.indexer.tasklock.batchAllocationWaitTime`|Number of milliseconds after Druid adds the first segment allocate action to a batch, until it executes the batch. Allows the batch to add more requests and improve the average segment allocation run time. This configuration takes effect only if `batchSegmentAllocation` is enabled.|0| +|`druid.indexer.tasklock.batchAllocationNumThreads`|Number of worker threads to use for batch segment allocation. This represents the maximum number of allocation batches that can be processed in parallel for distinct datasources. Batches for a single datasource are always processed sequentially. This configuration takes effect only if `batchSegmentAllocation` is enabled.|5| +|`druid.indexer.task.default.context`|Default task context that is applied to all tasks submitted to the Overlord. Any default in this config does not override neither the context values the user provides nor `druid.indexer.tasklock.forceTimeChunkLock`.|empty context| +|`druid.indexer.queue.maxSize`|Maximum number of active tasks at one time.|`Integer.MAX_VALUE`| +|`druid.indexer.queue.startDelay`|Sleep this long before starting Overlord queue management. This can be useful to give a cluster time to re-orient itself (for example, after a widespread network issue).|`PT1M`| +|`druid.indexer.queue.restartDelay`|Sleep this long when Overlord queue management throws an exception before trying again.|`PT30S`| +|`druid.indexer.queue.storageSyncRate`|Sync Overlord state this often with an underlying task persistence mechanism.|`PT1M`| +|`druid.indexer.queue.maxTaskPayloadSize`|Maximum allowed size in bytes of a single task payload accepted by the Overlord.|none (allow all task payload sizes)| + +The following configs only apply if the Overlord is running in remote mode. For a description of local vs. remote mode, see [Overlord service](../design/overlord.md). + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.indexer.runner.taskAssignmentTimeout`|How long to wait after a task has been assigned to a Middle Manager before throwing an error.|`PT5M`| +|`druid.indexer.runner.minWorkerVersion`|The minimum Middle Manager version to send tasks to. The version number is a string. This affects the expected behavior during certain operations like comparison against `druid.worker.version`. Specifically, the version comparison follows dictionary order. Use ISO8601 date format for the version to accommodate date comparisons. |"0"| +| `druid.indexer.runner.parallelIndexTaskSlotRatio`| The ratio of task slots available for parallel indexing supervisor tasks per worker. The specified value must be in the range `[0, 1]`. |1| +|`druid.indexer.runner.compressZnodes`|Indicates whether or not the Overlord should expect Middle Managers to compress Znodes.|true| +|`druid.indexer.runner.maxZnodeBytes`|The maximum size Znode in bytes that can be created in ZooKeeper, should be in the range of `[10KiB, 2GiB)`. [Human-readable format](human-readable-byte.md) is supported.| 512 KiB | +|`druid.indexer.runner.taskCleanupTimeout`|How long to wait before failing a task after a Middle Manager is disconnected from ZooKeeper.|`PT15M`| +|`druid.indexer.runner.taskShutdownLinkTimeout`|How long to wait on a shutdown request to a Middle Manager before timing out|`PT1M`| +|`druid.indexer.runner.pendingTasksRunnerNumThreads`|Number of threads to allocate pending-tasks to workers, must be at least 1.|1| +|`druid.indexer.runner.maxRetriesBeforeBlacklist`|Number of consecutive times the Middle Manager can fail tasks, before the worker is blacklisted, must be at least 1|5| +|`druid.indexer.runner.workerBlackListBackoffTime`|How long to wait before a task is whitelisted again. This value should be greater that the value set for taskBlackListCleanupPeriod.|`PT15M`| +|`druid.indexer.runner.workerBlackListCleanupPeriod`|A duration after which the cleanup thread will start up to clean blacklisted workers.|`PT5M`| +|`druid.indexer.runner.maxPercentageBlacklistWorkers`|The maximum percentage of workers to blacklist, this must be between 0 and 100.|20| + +If autoscaling is enabled, you can set these additional configs: + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.indexer.autoscale.strategy`|Sets the strategy to run when autoscaling is required. One of `noop`, `ec2` or `gce`.|`noop`| +|`druid.indexer.autoscale.doAutoscale`|If set to true, autoscaling will be enabled.|false| +|`druid.indexer.autoscale.provisionPeriod`|How often to check whether or not new Middle Managers should be added.|`PT1M`| +|`druid.indexer.autoscale.terminatePeriod`|How often to check when Middle Managers should be removed.|`PT5M`| +|`druid.indexer.autoscale.originTime`|The starting reference timestamp that the terminate period increments upon.|`2012-01-01T00:55:00.000Z`| +|`druid.indexer.autoscale.workerIdleTimeout`|How long can a worker be idle (not a run task) before it can be considered for termination.|`PT90M`| +|`druid.indexer.autoscale.maxScalingDuration`|How long the Overlord will wait around for a Middle Manager to show up before giving up.|`PT15M`| +|`druid.indexer.autoscale.numEventsToTrack`|The number of autoscaling related events (node creation and termination) to track.|10| +|`druid.indexer.autoscale.pendingTaskTimeout`|How long a task can be in "pending" state before the Overlord tries to scale up.|`PT30S`| +|`druid.indexer.autoscale.workerVersion`|If set, will only create nodes of set version during autoscaling. Overrides dynamic configuration. |null| +|`druid.indexer.autoscale.workerPort`|The port that Middle Managers will run on.|8080| +|`druid.indexer.autoscale.workerCapacityHint`| An estimation of the number of task slots available for each worker launched by the auto scaler when there are no workers running. The auto scaler uses the worker capacity hint to launch workers with an adequate capacity to handle pending tasks. When unset or set to a value less than or equal to 0, the auto scaler scales workers equal to the value for `minNumWorkers` in autoScaler config instead. The auto scaler assumes that each worker, either a Middle Manager or indexer, has the same amount of task slots. Therefore, when all your workers have the same capacity (homogeneous capacity), set the value for `autoscale.workerCapacityHint` equal to `druid.worker.capacity`. If your workers have different capacities (heterogeneous capacity), set the value to the average of `druid.worker.capacity` across the workers. For example, if two workers have `druid.worker.capacity=10`, and one has `druid.worker.capacity=4`, set `autoscale.workerCapacityHint=8`. Only applies to `pendingTaskBased` provisioning strategy.|-1| + +##### Supervisors + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.supervisor.healthinessThreshold`|The number of successful runs before an unhealthy supervisor is again considered healthy.|3| +|`druid.supervisor.unhealthinessThreshold`|The number of failed runs before the supervisor is considered unhealthy.|3| +|`druid.supervisor.taskHealthinessThreshold`|The number of consecutive task successes before an unhealthy supervisor is again considered healthy.|3| +|`druid.supervisor.taskUnhealthinessThreshold`|The number of consecutive task failures before the supervisor is considered unhealthy.|3| +|`druid.supervisor.storeStackTrace`|Whether full stack traces of supervisor exceptions should be stored and returned by the supervisor `/status` endpoint.|false| +|`druid.supervisor.maxStoredExceptionEvents`|The maximum number of exception events that can be returned through the supervisor `/status` endpoint.|`max(healthinessThreshold, unhealthinessThreshold)`| +|`druid.supervisor.idleConfig.enabled`|If `true`, supervisor can become idle if there is no data on input stream/topic for some time.|false| +|`druid.supervisor.idleConfig.inactiveAfterMillis`|Supervisor is marked as idle if all existing data has been read from input topic and no new data has been published for `inactiveAfterMillis` milliseconds.|`600_000`| + +The `druid.supervisor.idleConfig.*` specification in the Overlord runtime properties defines the default behavior for the entire cluster. See [Idle Configuration in Kafka Supervisor IOConfig](../ingestion/kinesis-ingestion.md#io-configuration) to override it for an individual supervisor. + +##### Segment metadata cache (Experimental) + +The following properties pertain to segment metadata caching on the Overlord that may be used to speed up segment allocation and other metadata operations. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.manager.segments.useIncrementalCache`|Denotes the usage mode of the segment metadata incremental cache. Possible modes are: (a) `never`: Cache is disabled. (b) `always`: Reads are always done from the cache. Service start-up will be blocked until cache has synced with the metadata store at least once. Transactions will block until cache has synced with the metadata store at least once after becoming leader. (c) `ifSynced`: Reads are done from the cache only if it has already synced with the metadata store. This mode does not block service start-up or transactions.|`never`| +|`druid.manager.segments.pollDuration`|Duration (in ISO 8601 format) between successive syncs of the cache with the metadata store. This property is used only when `druid.manager.segments.useIncrementalCache` is set to `always` or `ifSynced`.|`PT1M` (1 minute)| + +##### Auto-kill unused segments (Experimental) + +These configs pertain to the new embedded mode of running [kill tasks on the Overlord](../data-management/delete.md#auto-kill-data-on-the-overlord-experimental). +None of the configs that apply to [auto-kill performed by the Coordinator](../data-management/delete.md#auto-kill-data-using-coordinator-duties) are used by this feature. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.manager.segments.killUnused.enabled`|Boolean flag to enable auto-kill of eligible unused segments on the Overlord. This feature can be used only when [segment metadata caching](#segment-metadata-cache-experimental) is enabled on the Overlord and MUST NOT be enabled if `druid.coordinator.kill.on` is already set to `true` on the Coordinator.|`true`| +|`druid.manager.segments.killUnused.bufferPeriod`|Period after which a segment marked as unused becomes eligible for auto-kill on the Overlord. This config is effective only if `druid.manager.segments.killUnused.enabled` is set to `true`.|`P30D` (30 days)| + +#### Overlord dynamic configuration + +The Overlord has dynamic configurations to tune how Druid assigns tasks to workers. +You can configure these parameters using the [web console](../operations/web-console.md) or through the [Overlord dynamic configuration API](../api-reference/dynamic-configuration-api.md#overlord-dynamic-configuration). + +The following table shows the dynamic configuration properties for the Overlord. + +|Property|Description|Default| +|--------|-----------|-------| +|`selectStrategy`| Describes how to assign tasks to Middle Managers. The type can be `equalDistribution`, `equalDistributionWithCategorySpec`, `fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. | `{"type":"equalDistribution"}` | +|`autoScaler`| Only used if [autoscaling](#autoscaler) is enabled.| null | + +The following is an example of an Overlord dynamic config: + +
+Click to view the example + +```json +{ + "selectStrategy": { + "type": "fillCapacity", + "affinityConfig": { + "affinity": { + "datasource1": ["host1:port", "host2:port"], + "datasource2": ["host3:port"] + } + } + }, + "autoScaler": { + "type": "ec2", + "minNumWorkers": 2, + "maxNumWorkers": 12, + "envConfig": { + "availabilityZone": "us-east-1a", + "nodeData": { + "amiId": "${AMI}", + "instanceType": "c3.8xlarge", + "minInstances": 1, + "maxInstances": 1, + "securityGroupIds": ["${IDs}"], + "keyName": "${KEY_NAME}" + }, + "userData": { + "impl": "string", + "data": "${SCRIPT_COMMAND}", + "versionReplacementString": ":VERSION:", + "version": null + } + } + } +} +``` + +
+ +##### Worker select strategy + +The select strategy controls how Druid assigns tasks to workers (Middle Managers). +At a high level, the select strategy determines the list of eligible workers for a given task using +either an `affinityConfig` or a `categorySpec`. Then, Druid assigns the task by either trying to distribute load equally +(`equalDistribution`) or to fill as many workers as possible to capacity (`fillCapacity`). +There are 4 options for select strategies: + +* [`equalDistribution`](#equaldistribution) +* [`equalDistributionWithCategorySpec`](#equaldistributionwithcategoryspec) +* [`fillCapacity`](#fillcapacity) +* [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec) + +A `javascript` option is also available but should only be used for prototyping new strategies. + +If an `affinityConfig` is provided (as part of `fillCapacity` and `equalDistribution` strategies) for a given task, the list of workers eligible to be assigned is determined as follows: + +* a non-affinity worker if no affinity is specified for that datasource. Any worker not listed in the `affinityConfig` is considered a non-affinity worker. +* a non-affinity worker if preferred workers are not available and the affinity is _weak_ i.e. `strong: false`. +* a preferred worker listed in the `affinityConfig` for this datasource if it has available capacity +* no worker if preferred workers are not available and affinity is _strong_ i.e. `strong: true`. In this case, the task remains in "pending" state. The chosen provisioning strategy (e.g. `pendingTaskBased`) may then use the total number of pending tasks to determine if a new node should be provisioned. + +Note that every worker listed in the `affinityConfig` will only be used for the assigned datasources and no other. + +If a `categorySpec` is provided (as part of `fillCapacityWithCategorySpec` and `equalDistributionWithCategorySpec` strategies), then a task of a given datasource may be assigned to: + +* any worker if no category config is given for task type +* any worker if category config is given for task type but no category is given for datasource and there's no default category +* a preferred worker (based on category config and category for datasource) if available +* any worker if category config and category are given but no preferred worker is available and category config is `weak` +* not assigned at all if preferred workers are not available and category config is `strong` + +In both the cases, Druid determines the list of eligible workers and selects one depending on their load with the goal of either distributing the load equally or filling as few workers as possible. + +If you are using auto-scaling, use the `fillCapacity` select strategy since auto-scaled nodes can +not be assigned a category, and you want the work to be concentrated on the fewest number of workers to allow the empty ones to scale down. + +###### `equalDistribution` + +Tasks are assigned to the Middle Manager with the most free slots at the time the task begins running. +This evenly distributes work across your Middle Managers. + +|Property|Description|Default| +|--------|-----------|-------| +|`type`|`equalDistribution`|required; must be `equalDistribution`| +|`affinityConfig`|[`AffinityConfig`](#affinityconfig) object|null (no affinity)| +|`taskLimits`|[`TaskLimits`](#tasklimits) object|null (no limits)| + +###### `equalDistributionWithCategorySpec` + +This strategy is a variant of `equalDistribution`, which supports `workerCategorySpec` field rather than `affinityConfig`. +By specifying `workerCategorySpec`, you can assign tasks to run on different categories of Middle Managers based on the **type** and **dataSource** of the task. +This strategy doesn't work with `AutoScaler` since the behavior is undefined. + +|Property|Description|Default| +|--------|-----------|-------| +|`type`|`equalDistributionWithCategorySpec`|required; must be `equalDistributionWithCategorySpec`| +|`workerCategorySpec`|[`WorkerCategorySpec`](#workercategoryspec) object|null (no worker category spec)| +|`taskLimits`|[`TaskLimits`](#tasklimits) object|null (no limits)| + +The following example shows tasks of type `index_kafka` that default to running on Middle Managers of category `c1`, except for tasks that write to datasource `ds1`, which run on Middle Managers of category `c2`. + +```json +{ + "selectStrategy": { + "type": "equalDistributionWithCategorySpec", + "workerCategorySpec": { + "strong": false, + "categoryMap": { + "index_kafka": { + "defaultCategory": "c1", + "categoryAffinity": { + "ds1": "c2" + } + } + } + } + } +} +``` + +###### `fillCapacity` + +Tasks are assigned to the worker with the most currently-running tasks. This is +useful when you are auto-scaling Middle Managers since it tends to pack some full and +leave others empty. The empty ones can be safely terminated. + +Note that if `druid.indexer.runner.pendingTasksRunnerNumThreads` is set to _N_ > 1, then this strategy will fill _N_ +Middle Managers up to capacity simultaneously, rather than a single Middle Manager. + +|Property|Description|Default| +|--------|-----------|-------| +|`type`| `fillCapacity`|required; must be `fillCapacity`| +|`affinityConfig`| [`AffinityConfig`](#affinityconfig) object |null (no affinity)| +|`taskLimits`|[`TaskLimits`](#tasklimits) object|null (no limits)| + +###### `fillCapacityWithCategorySpec` + +This strategy is a variant of `fillCapacity`, which supports `workerCategorySpec` instead of an `affinityConfig`. +The usage is the same as `equalDistributionWithCategorySpec` strategy. +This strategy doesn't work with `AutoScaler` since the behavior is undefined. + +|Property|Description|Default| +|--------|-----------|-------| +|`type`|`fillCapacityWithCategorySpec`.|required; must be `fillCapacityWithCategorySpec`| +|`workerCategorySpec`|[`WorkerCategorySpec`](#workercategoryspec) object|null (no worker category spec)| +|`taskLimits`|[`TaskLimits`](#tasklimits) object|null (no limits)| + +
+ +###### `javascript` + +Allows defining arbitrary logic for selecting workers to run task using a JavaScript function. +The function is passed remoteTaskRunnerConfig, map of workerId to available workers and task to be executed and returns the workerId on which the task should be run or null if the task cannot be run. +It can be used for rapid development of missing features where the worker selection logic is to be changed or tuned often. +If the selection logic is quite complex and cannot be easily tested in JavaScript environment, +its better to write a druid extension module with extending current worker selection strategies written in java. + +|Property|Description|Default| +|--------|-----------|-------| +|`type`|`javascript`|required; must be `javascript`| +|`function`|String representing JavaScript function| | + +The following example shows a function that sends `batch_index_task` to workers `10.0.0.1` and `10.0.0.2` and all other tasks to other available workers. + +```json +{ + "type":"javascript", + "function":"function (config, zkWorkers, task) {\nvar batch_workers = new java.util.ArrayList();\nbatch_workers.add(\"middleManager1_hostname:8091\");\nbatch_workers.add(\"middleManager2_hostname:8091\");\nworkers = zkWorkers.keySet().toArray();\nvar sortedWorkers = new Array()\n;for(var i = 0; i < workers.length; i++){\n sortedWorkers[i] = workers[i];\n}\nArray.prototype.sort.call(sortedWorkers,function(a, b){return zkWorkers.get(b).getCurrCapacityUsed() - zkWorkers.get(a).getCurrCapacityUsed();});\nvar minWorkerVer = config.getMinWorkerVersion();\nfor (var i = 0; i < sortedWorkers.length; i++) {\n var worker = sortedWorkers[i];\n var zkWorker = zkWorkers.get(worker);\n if(zkWorker.canRunTask(task) && zkWorker.isValidVersion(minWorkerVer)){\n if(task.getType() == 'index_hadoop' && batch_workers.contains(worker)){\n return worker;\n } else {\n if(task.getType() != 'index_hadoop' && !batch_workers.contains(worker)){\n return worker;\n }\n }\n }\n}\nreturn null;\n}" +} +``` + +:::info + JavaScript-based functionality is disabled by default. Refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it. +::: + +###### affinityConfig + +Use the `affinityConfig` field to pass affinity configuration to the `equalDistribution` and `fillCapacity` strategies. +If not provided, the default is to have no affinity. + +|Property|Description|Default| +|--------|-----------|-------| +|`affinity`|JSON object mapping a datasource String name to a list of indexing service Middle Manager `host:port` values. Druid doesn't perform DNS resolution, so the 'host' value must match what is configured on the Middle Manager and what the Middle Manager announces itself as (examine the Overlord logs to see what your Middle Manager announces itself as).|`{}`| +|`strong`|When `true` tasks for a datasource must be assigned to affinity-mapped Middle Managers. Tasks remain queued until a slot becomes available. When `false`, Druid may assign tasks for a datasource to other Middle Managers when affinity-mapped Middle Managers are unavailable to run queued tasks.|false| + +###### workerCategorySpec + +You can provide `workerCategorySpec` to the `equalDistributionWithCategorySpec` and `fillCapacityWithCategorySpec` strategies using the `workerCategorySpec` +field. If not provided, the default is to not use it at all. + +|Property|Description|Default| +|--------|-----------|-------| +|`categoryMap`|A JSON map object mapping a task type String name to a [CategoryConfig](#categoryconfig) object, by which you can specify category config for different task type.|`{}`| +|`strong`|With weak workerCategorySpec (the default), tasks for a dataSource may be assigned to other Middle Managers if the Middle Managers specified in `categoryMap` are not able to run all pending tasks in the queue for that dataSource. With strong workerCategorySpec, tasks for a dataSource will only ever be assigned to their specified Middle Managers, and will wait in the pending queue if necessary.|false| + +###### `taskLimits` + +The `taskLimits` field can be used with the `equalDistribution`, `fillCapacity`, `equalDistributionWithCategorySpec` and `fillCapacityWithCategorySpec` strategies. +If you don't provide it, it will default to not being used. + +|Property|Description|Default| +|--------|-----------|-------| +|`maxSlotCountByType`|A map where each key is a task type (`String`), and the corresponding value represents the absolute limit on the number of task slots that tasks of this type can occupy. The value is an `Integer` that is greater than or equal to 0. For example, a value of 5 means that tasks of this type can occupy up to 5 task slots in total. If both absolute and ratio limits are specified for the same task type, the effective limit will be the smaller of the absolute limit and the limit derived from the corresponding ratio. `maxSlotCountByType = {"index_parallel": 3, "query_controller": 5}`. In this example, parallel indexing tasks can occupy up to 3 task slots, and query controllers can occupy up to 5 task slots.|`{}`| +|`maxSlotRatioByType`|A map where each key is a task type (`String`), and the corresponding value is a `Double` which should be in the range [0, 1], representing the ratio of task slots that tasks of this type can occupy. This ratio defines the proportion of total task slots a task type can use, calculated as `ratio * totalSlots`. If both absolute and ratio limits are specified for the same task type, the effective limit will be the smaller of the absolute limit and the limit derived from the corresponding ratio. `maxSlotRatioByType = {"index_parallel": 0.5, "query_controller": 0.25}`. In this example, parallel indexing tasks can occupy up to 50% of the total task slots, and query controllers can occupy up to 25% of the total task slots.|`{}`| + +###### CategoryConfig + +|Property|Description|Default| +|--------|-----------|-------| +|`defaultCategory`|Specify default category for a task type.|null| +|`categoryAffinity`|A JSON map object mapping a datasource String name to a category String name of the Middle Manager. If category isn't specified for a datasource, then using the `defaultCategory`. If no specified category and the `defaultCategory` is also null, then tasks can run on any available Middle Managers.|null| + +##### Autoscaler + +Amazon's EC2 together with Google's GCE are currently the only supported autoscalers. + +EC2's autoscaler properties are: + +|Property| Description|Default| +|--------|------------|-------| +|`type`|`ec2`|0| +|`minNumWorkers`| The minimum number of workers that can be in the cluster at any given time.|0| +|`maxNumWorkers`| The maximum number of workers that can be in the cluster at any given time.|0| +|`envConfig.availabilityZone` | What Amazon availability zone to run in.|none| +|`envConfig.nodeData`| A JSON object that describes how to launch new nodes.|none; required| +| `envConfig.userData`| A JSON object that describes how to configure new nodes. If you have set `druid.indexer.autoscale.workerVersion`, this must have a `versionReplacementString`. Otherwise, a `versionReplacementString` is not necessary.|none; optional| + +For GCE's properties, please refer to the [gce-extensions](../development/extensions-contrib/gce-extensions.md). + +## Data server + +This section contains the configuration options for the services that reside on Data servers (Middle Managers/Peons and Historicals) in the suggested [three-server configuration](../design/architecture.md#druid-servers). + +Configuration options for the [Indexer process](../design/indexer.md) are also provided here. + +### Middle Manager and Peon + +These Middle Manager and Peon configurations can be defined in the `middleManager/runtime.properties` file. + +#### Middle Manager service config + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.host`|The host for the current service. This is used to advertise the current service location as reachable from another service and should generally be specified such that `http://${druid.host}/` could actually talk to this service|`InetAddress.getLocalHost().getCanonicalHostName()`| +|`druid.bindOnHost`|Indicating whether the service's internal jetty server bind on `druid.host`. Default is false, which means binding to all interfaces.|false| +|`druid.plaintextPort`|This is the port to actually listen on; unless port mapping is used, this will be the same port as is on `druid.host`|8091| +|`druid.tlsPort`|TLS port for HTTPS connector, if [druid.enableTlsPort](../operations/tls-support.md) is set then this config will be used. If `druid.host` contains port then that port will be ignored. This should be a non-negative Integer.|8291| +|`druid.service`|The name of the service. This is used as a dimension when emitting metrics and alerts to differentiate between the various services|`druid/middlemanager`| +|`druid.labels`|Optional JSON object of key-value pairs that define custom labels for the server. These labels are displayed in the web console under the "Services" tab. Example: `druid.labels={"location":"Airtrunk"}` or `druid.labels.location=Airtrunk`|`null`| + +#### Middle Manager configuration + +Middle Managers pass their configurations down to their child peons. The Middle Manager requires the following configs: + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.indexer.runner.allowedPrefixes`|Whitelist of prefixes for configs that can be passed down to child peons.|`com.metamx`, `druid`, `org.apache.druid`, `user.timezone`, `file.encoding`, `java.io.tmpdir`, `hadoop`| +|`druid.indexer.runner.compressZnodes`|Indicates whether or not the Middle Managers should compress Znodes.|true| +|`druid.indexer.runner.classpath`|Java classpath for the peon.|`System.getProperty("java.class.path")`| +|`druid.indexer.runner.javaCommand`|Command required to execute java.|java| +|`druid.indexer.runner.javaOpts`|_DEPRECATED_ A string of -X Java options to pass to the peon's JVM. Quotable parameters or parameters with spaces are encouraged to use javaOptsArray|`''`| +|`druid.indexer.runner.javaOptsArray`|A JSON array of strings to be passed in as options to the peon's JVM. This is additive to `druid.indexer.runner.javaOpts` and is recommended for properly handling arguments which contain quotes or spaces like `["-XX:OnOutOfMemoryError=kill -9 %p"]`|`[]`| +|`druid.indexer.runner.maxZnodeBytes`|The maximum size Znode in bytes that can be created in ZooKeeper, should be in the range of [10KiB, 2GiB). [Human-readable format](human-readable-byte.md) is supported.|512KiB| +|`druid.indexer.runner.startPort`|Starting port used for Peon services, should be greater than 1023 and less than 65536.|8100| +|`druid.indexer.runner.endPort`|Ending port used for Peon services, should be greater than or equal to `druid.indexer.runner.startPort` and less than 65536.|65535| +|`druid.indexer.runner.ports`|A JSON array of integers to specify ports that used for Peon services. If provided and non-empty, ports for Peon services will be chosen from these ports. And `druid.indexer.runner.startPort/druid.indexer.runner.endPort` will be completely ignored.|`[]`| +|`druid.worker.ip`|The IP of the worker.|`localhost`| +|`druid.worker.version`|Version identifier for the Middle Manager. The version number is a string. This affects the expected behavior during certain operations like comparison against `druid.indexer.runner.minWorkerVersion`. Specifically, the version comparison follows dictionary order. Use ISO8601 date format for the version to accommodate date comparisons.|0| +|`druid.worker.capacity`|Maximum number of tasks the Middle Manager can accept.|Number of CPUs on the machine - 1| +|`druid.worker.baseTaskDirs`|List of base temporary working directories, one of which is assigned per task in a round-robin fashion. This property can be used to allow usage of multiple disks for indexing. This property is recommended in place of and takes precedence over `${druid.indexer.task.baseTaskDir}`. If this configuration is not set, `${druid.indexer.task.baseTaskDir}` is used. For example, `druid.worker.baseTaskDirs=[\"PATH1\",\"PATH2\",...]`.|null| +|`druid.worker.baseTaskDirSize`|The total amount of bytes that can be used by tasks on any single task dir. This value is treated symmetrically across all directories, that is, if this is 500 GB and there are 3 `baseTaskDirs`, then each of those task directories is assumed to allow for 500 GB to be used and a total of 1.5 TB will potentially be available across all tasks. The actual amount of memory assigned to each task is discussed in [Configuring task storage sizes](../ingestion/tasks.md#configuring-task-storage-sizes)|`Long.MAX_VALUE`| +|`druid.worker.category`|A string to name the category that the Middle Manager node belongs to.|`_default_worker_category`| +|`druid.indexer.fork.property.druid.centralizedDatasourceSchema.enabled`| This config should be set when [Centralized Datasource Schema](#centralized-datasource-schema-experimental) feature is enabled. |false| + +#### Peon processing + +Processing properties set on the Middle Manager are passed through to Peons. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.processing.buffer.sizeBytes`|This specifies a buffer size (less than 2GiB) for the storage of intermediate results. The computation engine in both the Historical and Realtime processes will use a scratch buffer of this size to do all of their intermediate computations off-heap. Larger values allow for more aggregations in a single pass over the data while smaller values can require more passes depending on the query that is being executed. [Human-readable format](human-readable-byte.md) is supported.|auto (max 1 GiB)| +|`druid.processing.buffer.poolCacheMaxCount`|Processing buffer pool caches the buffers for later use. This is the maximum count that the cache will grow to. Note that pool can create more buffers than it can cache if necessary.|`Integer.MAX_VALUE`| +|`druid.processing.formatString`|Realtime and Historical processes use this format string to name their processing threads.|processing-%s| +|`druid.processing.numMergeBuffers`|The number of direct memory buffers available for merging query results. The buffers are sized by `druid.processing.buffer.sizeBytes`. This property is effectively a concurrency limit for queries that require merging buffers. If you are using any queries that require merge buffers (currently, just groupBy) then you should have at least two of these.|`max(2, druid.processing.numThreads / 4)`| +|`druid.processing.numThreads`|The number of processing threads to have available for parallel processing of segments. Our rule of thumb is `num_cores - 1`, which means that even under heavy load there will still be one core available to do background tasks like talking with ZooKeeper and pulling down segments. If only one core is available, this property defaults to the value `1`.|Number of cores - 1 (or 1)| +|`druid.processing.numTimeoutThreads`|The number of processing threads to have available for handling per-segment query timeouts. Setting this value to `0` removes the ability to service per-segment timeouts, irrespective of `perSegmentTimeout` query context parameter. As these threads are just servicing timers, it's recommended to set this value to some small percent (e.g. 5%) of the total query processing cores available to the peon.|0| +|`druid.processing.fifo`|Enables the processing queue to treat tasks of equal priority in a FIFO manner.|`true`| +|`druid.processing.tmpDir`|Path where temporary files created while processing a query should be stored. If specified, this configuration takes priority over the default `java.io.tmpdir` path.|path represented by `java.io.tmpdir`| +|`druid.processing.intermediaryData.storage.type`|Storage type for intermediary segments of data shuffle between native parallel index tasks.
Set to `local` to store segment files in the local storage of the Middle Manager or Indexer.
Set to `deepstore` to use configured deep storage for better fault tolerance during rolling updates. When the storage type is `deepstore`, Druid stores the data in the `shuffle-data` directory under the configured deep storage path. Druid does not support automated cleanup for the `shuffle-data` directory. You can set up cloud storage lifecycle rules for automated cleanup of data at the `shuffle-data` prefix location.|`local`| + +The amount of direct memory needed by Druid is at least +`druid.processing.buffer.sizeBytes * (druid.processing.numMergeBuffers + druid.processing.numThreads + 1)`. You can +ensure at least this amount of direct memory is available by providing `-XX:MaxDirectMemorySize=` in +`druid.indexer.runner.javaOptsArray` as documented above. + +#### Peon query configuration + +See [general query configuration](#general-query-configuration). + +#### Peon caching + +You can optionally configure caching to be enabled on the peons by setting caching configs here. + +|Property|Possible Values|Description|Default| +|--------|---------------|-----------|-------| +|`druid.realtime.cache.useCache`|true, false|Enable the cache on the realtime.|false| +|`druid.realtime.cache.populateCache`|true, false|Populate the cache on the realtime.|false| +|`druid.realtime.cache.unCacheable`|All druid query types|All query types to not cache.|`[scan]`| +|`druid.realtime.cache.maxEntrySize`|positive integer|Maximum cache entry size in bytes.|1_000_000| + +See [cache configuration](#cache-configuration) for how to configure cache settings. + +#### Additional Peon configuration + +Although Peons inherit the configurations of their parent Middle Managers, explicit child Peon configs in Middle Manager can be set by prefixing them with: + +```properties +druid.indexer.fork.property +``` + +Additional Peon configs include: + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.peon.mode`|One of `local` or `remote`. Setting this property to `local` means you intend to run the Peon as a standalone process which is not recommended.|`remote`| +|`druid.indexer.task.baseDir`|Base temporary working directory.|`System.getProperty("java.io.tmpdir")`| +|`druid.indexer.task.baseTaskDir`|Base temporary working directory for tasks.|`${druid.indexer.task.baseDir}/persistent/task`| +|`druid.indexer.task.defaultHadoopCoordinates`|Hadoop version to use with HadoopIndexTasks that do not request a particular version.|`org.apache.hadoop:hadoop-client-api:3.3.6`, `org.apache.hadoop:hadoop-client-runtime:3.3.6`| +|`druid.indexer.task.defaultRowFlushBoundary`|Highest row count before persisting to disk. Used for indexing generating tasks.|75000| +|`druid.indexer.task.directoryLockTimeout`|Wait this long for zombie Peons to exit before giving up on their replacements.|PT10M| +|`druid.indexer.task.gracefulShutdownTimeout`|Wait this long on Middle Manager restart for restorable tasks to gracefully exit.|PT5M| +|`druid.indexer.task.hadoopWorkingPath`|Temporary working directory for Hadoop tasks.|`/tmp/druid-indexing`| +|`druid.indexer.task.restoreTasksOnRestart`|If true, Middle Managers will attempt to stop tasks gracefully on shutdown and restore them on restart.|false| +|`druid.indexer.task.ignoreTimestampSpecForDruidInputSource`|If true, tasks using the [Druid input source](../ingestion/input-sources.md) will ignore the provided timestampSpec, and will use the `__time` column of the input datasource. This option is provided for compatibility with ingestion specs written before Druid 0.22.0.|false| +|`druid.indexer.task.storeEmptyColumns`|Boolean value for whether or not to store empty columns during ingestion. When set to true, Druid stores every column specified in the [`dimensionsSpec`](../ingestion/ingestion-spec.md#dimensionsspec). If you use the string-based schemaless ingestion and don't specify any dimensions to ingest, you must also set [`includeAllDimensions`](../ingestion/ingestion-spec.md#dimensionsspec) for Druid to store empty columns.

If you set `storeEmptyColumns` to false, Druid SQL queries referencing empty columns will fail. If you intend to leave `storeEmptyColumns` disabled, you should either ingest placeholder data for empty columns or else not query on empty columns.

You can overwrite this configuration by setting `storeEmptyColumns` in the [task context](../ingestion/tasks.md#context-parameters).|true| +|`druid.indexer.task.tmpStorageBytesPerTask`|Maximum number of bytes per task to be used to store temporary files on disk. This config is generally intended for internal usage. Attempts to set it are very likely to be overwritten by the TaskRunner that executes the task, so be sure of what you expect to happen before directly adjusting this configuration parameter. The config is documented here primarily to provide an understanding of what it means if/when someone sees that it has been set. A value of -1 disables this limit. |-1| +|`druid.indexer.task.allowHadoopTaskExecution`|Conditional dictating if the cluster allows `index_hadoop` tasks to be executed. `index_hadoop` is deprecated, and defaulting to false will force cluster operators to acknowledge the deprecation and consciously opt in to using index_hadoop with the understanding that it will be removed in the future.|false| +|`druid.indexer.server.maxChatRequests`|Maximum number of concurrent requests served by a task's chat handler. Set to 0 to disable limiting.|0| + +If the Peon is running in remote mode, there must be an Overlord up and running. Peons in remote mode can set the following configurations: + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.peon.taskActionClient.retry.minWait`|The minimum retry time to communicate with Overlord.|`PT5S`| +|`druid.peon.taskActionClient.retry.maxWait`|The maximum retry time to communicate with Overlord.|`PT1M`| +|`druid.peon.taskActionClient.retry.maxRetryCount`|The maximum number of retries to communicate with Overlord.|13 (about 10 minutes of retrying)| + +##### SegmentWriteOutMediumFactory + +When new segments are created, Druid temporarily stores some preprocessed data in some buffers. +The following types of medium exist for the buffers: + +* **Temporary files** (`tmpFile`) are stored under the task working directory (see `druid.worker.baseTaskDirs` configuration above) and thus share it's mounting properties. For example, they could be backed by HDD, SSD or memory (tmpfs). +This type of medium may do unnecessary disk I/O and requires some disk space to be available. + +* **Off-heap memory** (`offHeapMemory`) creates buffers in off-heap memory of a JVM process that is running a task. +This type of medium is preferred, but it may require you to allow the JVM to have more off-heap memory by changing the `-XX:MaxDirectMemorySize` configuration. It's not understood yet how the required off-heap memory size relates to the size of the segments being created. But you shouldn't add more extra off-heap memory than the configured maximum _heap_ size (`-Xmx`) for the same JVM. + +* **On-heap memory** (`onHeapMemory`) creates buffers using the allocated heap memory of the JVM process running a task. Using on-heap memory introduces garbage collection overhead and so is not recommended in most cases. This type of medium is most helpful for tasks run on external clusters where it may be difficult to allocate and work with direct memory effectively. + +For most types of tasks, `SegmentWriteOutMediumFactory` can be configured per-task (see [Tasks](../ingestion/tasks.md) for more information), but if it's not specified for a task, or it's not supported for a particular task type, then Druid uses the value from the following configuration: + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.peon.defaultSegmentWriteOutMediumFactory.type`|`tmpFile`, `offHeapMemory`, or `onHeapMemory`|`tmpFile`| + +### Indexer + +#### Indexer process configuration + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.host`|The host for the current process. This is used to advertise the current processes location as reachable from another process and should generally be specified such that `http://${druid.host}/` could actually talk to this process|`InetAddress.getLocalHost().getCanonicalHostName()`| +|`druid.bindOnHost`|Indicating whether the process's internal jetty server bind on `druid.host`. Default is false, which means binding to all interfaces.|false| +|`druid.plaintextPort`|This is the port to actually listen on; unless port mapping is used, this will be the same port as is on `druid.host`|8091| +|`druid.tlsPort`|TLS port for HTTPS connector, if [druid.enableTlsPort](../operations/tls-support.md) is set then this config will be used. If `druid.host` contains port then that port will be ignored. This should be a non-negative Integer.|8283| +|`druid.service`|The name of the service. This is used as a dimension when emitting metrics and alerts to differentiate between the various services|`druid/indexer`| +|`druid.labels`|Optional JSON object of key-value pairs that define custom labels for the server. These labels are displayed in the web console under the "Services" tab. Example: `druid.labels={"location":"Airtrunk"}` or `druid.labels.location=Airtrunk`|`null`| + +#### Indexer general configuration + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.worker.version`|Version identifier for the Indexer.|0| +|`druid.worker.capacity`|Maximum number of tasks the Indexer can accept.|Number of available processors - 1| +|`druid.worker.baseTaskDirs`|List of base temporary working directories, one of which is assigned per task in a round-robin fashion. This property can be used to allow usage of multiple disks for indexing. This property is recommended in place of and takes precedence over `${druid.indexer.task.baseTaskDir}`. If this configuration is not set, `${druid.indexer.task.baseTaskDir}` is used. Example: `druid.worker.baseTaskDirs=[\"PATH1\",\"PATH2\",...]`.|null| +|`druid.worker.baseTaskDirSize`|The total amount of bytes that can be used by tasks on any single task dir. This value is treated symmetrically across all directories, that is, if this is 500 GB and there are 3 `baseTaskDirs`, then each of those task directories is assumed to allow for 500 GB to be used and a total of 1.5 TB will potentially be available across all tasks. The actual amount of memory assigned to each task is discussed in [Configuring task storage sizes](../ingestion/tasks.md#configuring-task-storage-sizes)|`Long.MAX_VALUE`| +|`druid.worker.globalIngestionHeapLimitBytes`|Total amount of heap available for ingestion processing. This is applied by automatically setting the `maxBytesInMemory` property on tasks.|Configured max JVM heap size / 6| +|`druid.worker.numConcurrentMerges`|Maximum number of segment persist or merge operations that can run concurrently across all tasks.|`druid.worker.capacity` / 2, rounded down| +|`druid.indexer.task.baseDir`|Base temporary working directory.|`System.getProperty("java.io.tmpdir")`| +|`druid.indexer.task.baseTaskDir`|Base temporary working directory for tasks.|`${druid.indexer.task.baseDir}/persistent/tasks`| +|`druid.indexer.task.defaultHadoopCoordinates`|Hadoop version to use with HadoopIndexTasks that do not request a particular version.|`org.apache.hadoop:hadoop-client-api:3.3.6`, `org.apache.hadoop:hadoop-client-runtime:3.3.6`| +|`druid.indexer.task.gracefulShutdownTimeout`|Wait this long on Indexer restart for restorable tasks to gracefully exit.|`PT5M`| +|`druid.indexer.task.hadoopWorkingPath`|Temporary working directory for Hadoop tasks.|`/tmp/druid-indexing`| +|`druid.indexer.task.restoreTasksOnRestart`|If true, the Indexer will attempt to stop tasks gracefully on shutdown and restore them on restart.|false| +|`druid.indexer.task.ignoreTimestampSpecForDruidInputSource`|If true, tasks using the [Druid input source](../ingestion/input-sources.md) will ignore the provided timestampSpec, and will use the `__time` column of the input datasource. This option is provided for compatibility with ingestion specs written before Druid 0.22.0.|false| +|`druid.indexer.task.storeEmptyColumns`|Boolean value for whether or not to store empty columns during ingestion. When set to true, Druid stores every column specified in the [`dimensionsSpec`](../ingestion/ingestion-spec.md#dimensionsspec).

If you set `storeEmptyColumns` to false, Druid SQL queries referencing empty columns will fail. If you intend to leave `storeEmptyColumns` disabled, you should either ingest placeholder data for empty columns or else not query on empty columns.

You can overwrite this configuration by setting `storeEmptyColumns` in the [task context](../ingestion/tasks.md#context-parameters).|true| +|`druid.peon.taskActionClient.retry.minWait`|The minimum retry time to communicate with Overlord.|`PT5S`| +|`druid.peon.taskActionClient.retry.maxWait`|The maximum retry time to communicate with Overlord.|`PT1M`| +|`druid.peon.taskActionClient.retry.maxRetryCount`|The maximum number of retries to communicate with Overlord.|13 (about 10 minutes of retrying)| + +#### Indexer concurrent requests + +Druid uses Jetty to serve HTTP requests. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.server.http.numThreads`|Number of threads for HTTP requests. Please see the [Indexer Server HTTP threads](../design/indexer.md#server-http-threads) documentation for more details on how the Indexer uses this configuration.|max(10, (Number of cores * 17) / 16 + 2) + 30| +|`druid.server.http.queueSize`|Size of the worker queue used by Jetty server to temporarily store incoming client connections. If this value is set and a request is rejected by jetty because queue is full then client would observe request failure with TCP connection being closed immediately with a completely empty response from server.|Unbounded| +|`druid.server.http.maxIdleTime`|The Jetty max idle time for a connection.|`PT5M`| +|`druid.server.http.enableRequestLimit`|If enabled, no requests would be queued in jetty queue and "HTTP 429 Too Many Requests" error response would be sent. |false| +|`druid.server.http.defaultQueryTimeout`|Query timeout in millis, beyond which unfinished queries will be cancelled|300000| +|`druid.server.http.gracefulShutdownTimeout`|The maximum amount of time Jetty waits after receiving shutdown signal. After this timeout the threads will be forcefully shutdown. This allows any queries that are executing to complete(Only values greater than zero are valid).|`PT30S`| +|`druid.server.http.unannouncePropagationDelay`|How long to wait for ZooKeeper unannouncements to propagate before shutting down Jetty. This is a minimum and `druid.server.http.gracefulShutdownTimeout` does not start counting down until after this period elapses.|`PT0S` (do not wait)| +|`druid.server.http.maxQueryTimeout`|Maximum allowed value (in milliseconds) for `timeout` parameter. See [query-context](../querying/query-context-reference.md) to know more about `timeout`. Query is rejected if the query context `timeout` is greater than this value. |`Long.MAX_VALUE`| +|`druid.server.http.maxRequestHeaderSize`|Maximum size of a request header in bytes. Larger headers consume more memory and can make a server more vulnerable to denial of service attacks.|8 * 1024| +|`druid.server.http.enableForwardedRequestCustomizer`|If enabled, adds Jetty ForwardedRequestCustomizer which reads X-Forwarded-* request headers to manipulate servlet request object when Druid is used behind a proxy.|false| +|`druid.server.http.allowedHttpMethods`|List of HTTP methods that should be allowed in addition to the ones required by Druid APIs. Druid APIs require GET, PUT, POST, and DELETE, which are always allowed. This option is not useful unless you have installed an extension that needs these additional HTTP methods or that adds functionality related to CORS. None of Druid's bundled extensions require these methods.|`[]`| +|`druid.server.http.contentSecurityPolicy`|Content-Security-Policy header value to set on each non-POST response. Setting this property to an empty string, or omitting it, both result in the default `frame-ancestors: none` being set.|`frame-ancestors 'none'`| +|`druid.server.http.uriCompliance`|Jetty `UriCompliance` mode for Druid's embedded Jetty servers. To modify, override this config with the string representation of any `UriCompliance` mode that [Jetty supports](https://javadoc.jetty.org/jetty-12/org/eclipse/jetty/http/UriCompliance.html).|LEGACY| +|`druid.server.http.enforceStrictSNIHostChecking`| If enabled, the Jetty server will enforce strict SNI host checking. This means that if a client connects to the server using TLS but does not provide an SNI hostname, or provides an SNI hostname that does not match the server's configured hostname, a request will get a 400 response. Setting this to false is not recommended in production.|true| + +#### Indexer processing resources + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.processing.buffer.sizeBytes`|This specifies a buffer size (less than 2GiB) for the storage of intermediate results. The computation engine in the Indexer processes will use a scratch buffer of this size to do all of their intermediate computations off-heap. Larger values allow for more aggregations in a single pass over the data while smaller values can require more passes depending on the query that is being executed. [Human-readable format](human-readable-byte.md) is supported.|auto (max 1GiB)| +|`druid.processing.buffer.poolCacheMaxCount`|processing buffer pool caches the buffers for later use, this is the maximum count cache will grow to. note that pool can create more buffers than it can cache if necessary.|`Integer.MAX_VALUE`| +|`druid.processing.formatString`|Indexer processes use this format string to name their processing threads.|processing-%s| +|`druid.processing.numMergeBuffers`|The number of direct memory buffers available for merging query results. The buffers are sized by `druid.processing.buffer.sizeBytes`. This property is effectively a concurrency limit for queries that require merging buffers. If you are using any queries that require merge buffers (currently, just groupBy) then you should have at least two of these.|`max(2, druid.processing.numThreads / 4)`| +|`druid.processing.numThreads`|The number of processing threads to have available for parallel processing of segments. Our rule of thumb is `num_cores - 1`, which means that even under heavy load there will still be one core available to do background tasks like talking with ZooKeeper and pulling down segments. If only one core is available, this property defaults to the value `1`.|Number of cores - 1 (or 1)| +|`druid.processing.numTimeoutThreads`|The number of processing threads to have available for handling per-segment query timeouts. Setting this value to `0` removes the ability to service per-segment timeouts, irrespective of `perSegmentTimeout` query context parameter. As these threads are just servicing timers, it's recommended to set this value to some small percent (e.g. 5%) of the total query processing cores available to the indexer.|0| +|`druid.processing.fifo`|If the processing queue should treat tasks of equal priority in a FIFO manner|`true`| +|`druid.processing.tmpDir`|Path where temporary files created while processing a query should be stored. If specified, this configuration takes priority over the default `java.io.tmpdir` path.|path represented by `java.io.tmpdir`| + +The amount of direct memory needed by Druid is at least +`druid.processing.buffer.sizeBytes * (druid.processing.numMergeBuffers + druid.processing.numThreads + 1)`. You can +ensure at least this amount of direct memory is available by providing `-XX:MaxDirectMemorySize=` at the command +line. + +#### Query configurations + +See [general query configuration](#general-query-configuration). + +#### Indexer caching + +You can optionally configure caching to be enabled on the Indexer by setting caching configs here. + +|Property|Possible Values|Description|Default| +|--------|---------------|-----------|-------| +|`druid.realtime.cache.useCache`|true, false|Enable the cache on the realtime.|false| +|`druid.realtime.cache.populateCache`|true, false|Populate the cache on the realtime.|false| +|`druid.realtime.cache.unCacheable`|All druid query types|All query types to not cache.|`[scan]`| +|`druid.realtime.cache.maxEntrySize`|positive integer|Maximum cache entry size in bytes.|1_000_000| + +See [cache configuration](#cache-configuration) for how to configure cache settings. + +Note that only local caches such as the `local`-type cache and `caffeine` cache are supported. If a remote cache such as `memcached` is used, it will be ignored. + +### Historical + +For general Historical service information, see [Historical](../design/historical.md). + +These Historical configurations can be defined in the `historical/runtime.properties` file. + +#### Historical service configuration + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.host`|The host for the current service. This is used to advertise the current service location as reachable from another service and should generally be specified such that `http://${druid.host}/` could actually talk to this service|`InetAddress.getLocalHost().getCanonicalHostName()`| +|`druid.bindOnHost`|Indicating whether the service's internal jetty server bind on `druid.host`. Default is false, which means binding to all interfaces.|false| +|`druid.plaintextPort`|This is the port to actually listen on; unless port mapping is used, this will be the same port as is on `druid.host`|8083| +|`druid.tlsPort`|TLS port for HTTPS connector, if [druid.enableTlsPort](../operations/tls-support.md) is set then this config will be used. If `druid.host` contains port then that port will be ignored. This should be a non-negative Integer.|8283| +|`druid.service`|The name of the service. This is used as a dimension when emitting metrics and alerts to differentiate between the various services|`druid/historical`| +|`druid.labels`|Optional JSON object of key-value pairs that define custom labels for the server. These labels are displayed in the web console under the "Services" tab. Example: `druid.labels={"location":"Airtrunk"}` or `druid.labels.location=Airtrunk`|`null`| + +#### Historical general configuration + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.server.maxSize`|The maximum number of bytes-worth of segments that the service wants assigned to it. The Coordinator service will attempt to assign segments to a Historical service only if this property is greater than the total size of segments served by it. Since this property defines the upper limit on the total segment size that can be assigned to a Historical, it is defaulted to the sum of all `maxSize` values specified within `druid.segmentCache.locations` property. Human-readable format is supported, see [here](human-readable-byte.md). |Sum of `maxSize` values defined within `druid.segmentCache.locations`| +|`druid.server.tier`| A string to name the distribution tier that the storage service belongs to. Many of the [rules Coordinator services use](../operations/rule-configuration.md) to manage segments can be keyed on tiers. | `_default_tier` | +|`druid.server.priority`|In a tiered architecture, the priority of the tier, thus allowing control over which services are queried. Higher numbers mean higher priority. The default (no priority) works for architecture with no cross replication (tiers that have no data-storage overlap). Data centers typically have equal priority. | 0 | + +#### Storing segments + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.segmentCache.locations`|Segments assigned to a Historical services are first stored on the local file system (in a disk cache) and then served by the Historical services. These locations define where that local cache resides. This value cannot be NULL or EMPTY. Here is an example `druid.segmentCache.locations=[{"path": "/mnt/druidSegments", "maxSize": "10k", "freeSpacePercent": 1.0}]`. "freeSpacePercent" is optional, if provided then enforces that much of free disk partition space while storing segments. But, it depends on `File.getTotalSpace()` and `File.getFreeSpace()` methods, so enable if only if they work for your File System.| none | +|`druid.segmentCache.locationSelector.strategy`|The strategy used to select a location from the configured `druid.segmentCache.locations` for segment distribution. Possible values are `leastBytesUsed`, `roundRobin`, `random`, or `mostAvailableSize`. |leastBytesUsed| +|`druid.segmentCache.deleteOnRemove`|Delete segment files from cache once a service is no longer serving a segment.|true| +|`druid.segmentCache.dropSegmentDelayMillis`|How long a service delays before completely dropping segment.|30000 (30 seconds)| +|`druid.segmentCache.infoDir`|Historical services keep track of the segments they are serving so that when the service is restarted they can reload the same segments without waiting for the Coordinator to reassign. This path defines where this metadata is kept. Directory will be created if needed.|`${first_location}/info_dir`| +|`druid.segmentCache.announceIntervalMillis`|How frequently to announce segments while segments are loading from cache. Set this value to zero to wait for all segments to be loaded before announcing.|5000 (5 seconds)| +|`druid.segmentCache.numLoadingThreads`|How many segments to drop or load concurrently from deep storage. Note that the work of loading segments involves downloading segments from deep storage, decompressing them and loading them to a memory mapped location. So the work is not all I/O Bound. Depending on CPU and network load, one could possibly increase this config to a higher value.|max(1,Number of cores / 6)| +|`druid.segmentCache.numBootstrapThreads`|How many segments to load concurrently during historical startup.|`druid.segmentCache.numLoadingThreads`| +|`druid.segmentCache.lazyLoadOnStart`|Whether or not to load segment columns metadata lazily during historical startup. When set to true, Historical startup time will be dramatically improved by deferring segment loading until the first time that segment takes part in a query, which will incur this cost instead.|false| +|`druid.segmentCache.numThreadsToLoadSegmentsIntoPageCacheOnDownload`|Number of threads to asynchronously read segment index files into null output stream on each new segment download after the Historical service finishes bootstrapping. Recommended to set to 1 or 2 or leave unspecified to disable. See also `druid.segmentCache.numThreadsToLoadSegmentsIntoPageCacheOnBootstrap`|0| +|`druid.segmentCache.numThreadsToLoadSegmentsIntoPageCacheOnBootstrap`|Number of threads to asynchronously read segment index files into null output stream during Historical service bootstrap. This thread pool is terminated after Historical service finishes bootstrapping. Recommended to set to half of available cores. If left unspecified, `druid.segmentCache.numThreadsToLoadSegmentsIntoPageCacheOnDownload` will be used. If both configs are unspecified, this feature is disabled. Preemptively loading segments into page cache helps in the sense that later when a segment is queried, it's already in page cache and only a minor page fault needs to be triggered instead of a more costly major page fault to make the query latency more consistent. Note that loading segment into page cache just does a blind loading of segment index files and will evict any existing segments from page cache at the discretion of operating system when the total segment size on local disk is larger than the page cache usable in the RAM, which roughly equals to total available RAM in the host - druid process memory including both heap and direct memory allocated - memory used by other non druid processes on the host, so it is the user's responsibility to ensure the host has enough RAM to host all the segments to avoid random evictions to fully leverage this feature.|`druid.segmentCache.numThreadsToLoadSegmentsIntoPageCacheOnDownload`| + +In `druid.segmentCache.locations`, `freeSpacePercent` was added because the `maxSize` setting is only a theoretical limit and assumes that much space will always be available for storing segments. In case of any druid bug leading to unaccounted segment files left alone on disk or some other service writing stuff to disk, This check can start failing segment loading early before filling up the disk completely and leaving the host usable otherwise. + +In `druid.segmentCache.locationSelector.strategy`, one of `leastBytesUsed`, `roundRobin`, `random`, or `mostAvailableSize` could be specified to represent the strategy to distribute segments across multiple segment cache locations. + +|Strategy|Description| +|--------|-----------| +|`leastBytesUsed`|Selects a location which has least bytes used in absolute terms.| +|`roundRobin`|Selects a location in a round robin fashion oblivious to the bytes used or the capacity.| +|`random`|Selects a segment cache location randomly each time among the available storage locations.| +|`mostAvailableSize`|Selects a segment cache location that has most free space among the available storage locations.| + +Note that if `druid.segmentCache.numLoadingThreads` > 1, multiple threads can download different segments at the same time. In this case, with the `leastBytesUsed` strategy or `mostAvailableSize` strategy, Historicals may select a sub-optimal storage location because each decision is based on a snapshot of the storage location status of when a segment is requested to download. + +#### Historical query configs + +##### Concurrent requests + +Druid uses Jetty to serve HTTP requests. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.server.http.numThreads`|Number of threads for HTTP requests.|max(10, (Number of cores * 17) / 16 + 2) + 30| +|`druid.server.http.queueSize`|Size of the worker queue used by Jetty server to temporarily store incoming client connections. If this value is set and a request is rejected by jetty because queue is full then client would observe request failure with TCP connection being closed immediately with a completely empty response from server.|Unbounded| +|`druid.server.http.maxIdleTime`|The Jetty max idle time for a connection.|`PT5M`| +|`druid.server.http.enableRequestLimit`|If enabled, no requests would be queued in jetty queue and "HTTP 429 Too Many Requests" error response would be sent. |false| +|`druid.server.http.defaultQueryTimeout`|Query timeout in millis, beyond which unfinished queries will be cancelled|300000| +|`druid.server.http.gracefulShutdownTimeout`|The maximum amount of time Jetty waits after receiving shutdown signal. After this timeout the threads will be forcefully shutdown. This allows any queries that are executing to complete(Only values greater than zero are valid).|`PT30S`| +|`druid.server.http.unannouncePropagationDelay`|How long to wait for ZooKeeper unannouncements to propagate before shutting down Jetty. This is a minimum and `druid.server.http.gracefulShutdownTimeout` does not start counting down until after this period elapses.|`PT0S` (do not wait)| +|`druid.server.http.maxQueryTimeout`|Maximum allowed value (in milliseconds) for `timeout` parameter. See [query-context](../querying/query-context-reference.md) to know more about `timeout`. Query is rejected if the query context `timeout` is greater than this value. |`Long.MAX_VALUE`| +|`druid.server.http.maxRequestHeaderSize`|Maximum size of a request header in bytes. Larger headers consume more memory and can make a server more vulnerable to denial of service attacks.|8 * 1024| +|`druid.server.http.contentSecurityPolicy`|Content-Security-Policy header value to set on each non-POST response. Setting this property to an empty string, or omitting it, both result in the default `frame-ancestors: none` being set.|`frame-ancestors 'none'`| + +##### Processing + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.processing.buffer.sizeBytes`|This specifies a buffer size (less than 2GiB), for the storage of intermediate results. The computation engine in both the Historical and Realtime processes will use a scratch buffer of this size to do all of their intermediate computations off-heap. Larger values allow for more aggregations in a single pass over the data while smaller values can require more passes depending on the query that is being executed. [Human-readable format](human-readable-byte.md) is supported.|auto (max 1GiB)| +|`druid.processing.buffer.poolCacheMaxCount`|processing buffer pool caches the buffers for later use, this is the maximum count cache will grow to. note that pool can create more buffers than it can cache if necessary.|`Integer.MAX_VALUE`| +|`druid.processing.formatString`|Realtime and Historical processes use this format string to name their processing threads.|processing-%s| +|`druid.processing.numMergeBuffers`|The number of direct memory buffers available for merging query results. The buffers are sized by `druid.processing.buffer.sizeBytes`. This property is effectively a concurrency limit for queries that require merging buffers. If you are using any queries that require merge buffers (currently, just groupBy) then you should have at least two of these.|`max(2, druid.processing.numThreads / 4)`| +|`druid.processing.numThreads`|The number of processing threads to have available for parallel processing of segments. Our rule of thumb is `num_cores - 1`, which means that even under heavy load there will still be one core available to do background tasks like talking with ZooKeeper and pulling down segments. If only one core is available, this property defaults to the value `1`.|Number of cores - 1 (or 1)| +|`druid.processing.numTimeoutThreads`|The number of processing threads to have available for handling per-segment query timeouts. Setting this value to `0` removes the ability to service per-segment timeouts, irrespective of `perSegmentTimeout` query context parameter. As these threads are just servicing timers, it's recommended to set this value to some small percent (e.g. 5%) of the total query processing cores available to the historical.|0| +|`druid.processing.fifo`|If the processing queue should treat tasks of equal priority in a FIFO manner|`true`| +|`druid.processing.tmpDir`|Path where temporary files created while processing a query should be stored. If specified, this configuration takes priority over the default `java.io.tmpdir` path.|path represented by `java.io.tmpdir`| + +The amount of direct memory needed by Druid is at least +`druid.processing.buffer.sizeBytes * (druid.processing.numMergeBuffers + druid.processing.numThreads + 1)`. You can +ensure at least this amount of direct memory is available by providing `-XX:MaxDirectMemorySize=` at the command +line. + +##### Historical query configuration + +See [general query configuration](#general-query-configuration). + +#### Historical caching + +You can optionally only configure caching to be enabled on the Historical by setting caching configs here. + +|Property|Possible Values|Description|Default| +|--------|---------------|-----------|-------| +|`druid.historical.cache.useCache`|true, false|Enable the cache on the Historical.|false| +|`druid.historical.cache.populateCache`|true, false|Populate the cache on the Historical.|false| +|`druid.historical.cache.unCacheable`|All druid query types|All query types to not cache.|`[scan]`| +|`druid.historical.cache.maxEntrySize`|positive integer|Maximum cache entry size in bytes.|1_000_000| + +See [cache configuration](#cache-configuration) for how to configure cache settings. + +## Query server + +This section contains the configuration options for the services that reside on Query servers (Brokers) in the suggested [three-server configuration](../design/architecture.md#druid-servers). + +Configuration options for the [Router process](../design/router.md) are also provided here. + +### Broker + +For general Broker process information, see [here](../design/broker.md). + +These Broker configurations can be defined in the `broker/runtime.properties` file. + +#### Broker process configs + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.host`|The host for the current process. This is used to advertise the current processes location as reachable from another process and should generally be specified such that `http://${druid.host}/` could actually talk to this process|`InetAddress.getLocalHost().getCanonicalHostName()`| +|`druid.bindOnHost`|Indicating whether the process's internal jetty server bind on `druid.host`. Default is false, which means binding to all interfaces.|false| +|`druid.plaintextPort`|This is the port to actually listen on; unless port mapping is used, this will be the same port as is on `druid.host`|8082| +|`druid.tlsPort`|TLS port for HTTPS connector, if [druid.enableTlsPort](../operations/tls-support.md) is set then this config will be used. If `druid.host` contains port then that port will be ignored. This should be a non-negative Integer.|8282| +|`druid.service`|The name of the service. This is used as a dimension when emitting metrics and alerts to differentiate between the various services|`druid/broker`| +|`druid.labels`|Optional JSON object of key-value pairs that define custom labels for the server. These labels are displayed in the web console under the "Services" tab. Example: `druid.labels={"location":"Airtrunk"}` or `druid.labels.location=Airtrunk`|`null`| + +#### Query configuration + +##### Query routing + +|Property|Possible Values|Description|Default| +|--------|---------------|-----------|-------| +|`druid.broker.balancer.type`|`random`, `connectionCount`|Determines how the broker balances connections to Historical processes. `random` choose randomly, `connectionCount` picks the process with the fewest number of active connections to|`random`| +|`druid.broker.select.tier`|`highestPriority`, `lowestPriority`, `custom`, `preferred`|If segments are cross-replicated across tiers in a cluster, you can tell the broker to prefer to select segments in a tier with a certain priority.|`highestPriority`| +|`druid.broker.select.tier.custom.priorities`|An array of integer priorities, such as `[-1, 0, 1, 2]`|Select servers in tiers with a custom priority list.|The config only has effect if `druid.broker.select.tier` is set to `custom`. If `druid.broker.select.tier` is set to `custom` but this config is not specified, the effect is the same as `druid.broker.select.tier` set to `highestPriority`. Any of the integers in this config can be ignored if there's no corresponding tiers with such priorities. Tiers with priorities explicitly specified in this config always have higher priority than those not and those not specified fall back to use `highestPriority` strategy among themselves.| +|`druid.broker.select.tier.preferred.tier`| The preferred tier name. E.g., `_default_tier` | A non-empty value that specifies the preferred tier in which historical servers will be picked up for queries. If there are not enough historical servers from the preferred tier, servers from other tiers (if there are any) will be selected. This config only has effect if `druid.broker.select.tier` is set to `preferred` | null | +|`druid.broker.select.tier.preferred.priority`| `highest`, `lowest` | If there are multiple candidates in a preferred tier, specifies the priority to pick up candidates. By default, the higher priority a historical, the higher chances it will be picked up. This config only has effect if `druid.broker.select.tier` is set to `preferred`| `highest` | + +##### Query prioritization and laning + +Laning strategies allow you to control capacity utilization for heterogeneous query workloads. With laning, the broker examines and classifies a query for the purpose of assigning it to a lane. Lanes have capacity limits, enforced by the broker, that can be used to ensure sufficient resources are available for other lanes or for interactive queries (with no lane), or to limit overall throughput for queries within the lane. Requests in excess of the capacity are discarded with an HTTP 429 status code. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.query.scheduler.numThreads`|Maximum number of concurrently-running queries. When this parameter is set lower than `druid.server.http.numThreads`, query requests beyond the limit are put into the Jetty request queue. This has the effect of reserving the leftover Jetty threads for non-query requests.

When this parameter is set equal to or higher than `druid.server.http.numThreads`, it has no effect.|Unbounded| +|`druid.query.scheduler.laning.strategy`|Query laning strategy to use to assign queries to a lane in order to control capacities for certain classes of queries.|`none`| +|`druid.query.scheduler.prioritization.strategy`|Query prioritization strategy to automatically assign priorities.|`manual`| + +##### Prioritization strategies + +###### Manual prioritization strategy + +With this configuration, queries are never assigned a priority automatically, but will preserve a priority manually set on the [query context](../querying/query-context-reference.md) with the `priority` key. This mode can be explicitly set by setting `druid.query.scheduler.prioritization.strategy` to `manual`. + +###### Threshold prioritization strategy + +This prioritization strategy lowers the priority of queries that cross any of a configurable set of thresholds, such as how far in the past the data is, how large of an interval a query covers, or the number of segments taking part in a query. + +This strategy can be enabled by setting `druid.query.scheduler.prioritization.strategy` to `threshold`. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.query.scheduler.prioritization.periodThreshold`|ISO duration threshold for how old data can be queried before automatically adjusting query priority.|none| +|`druid.query.scheduler.prioritization.durationThreshold`|ISO duration threshold for maximum duration a queries interval can span before the priority is automatically adjusted.|none| +|`druid.query.scheduler.prioritization.segmentCountThreshold`|Number threshold for maximum number of segments that can take part in a query before its priority is automatically adjusted.|none| +|`druid.query.scheduler.prioritization.segmentRangeThreshold`|ISO duration threshold for maximum segment range a query can span before the priority is automatically adjusted.|none| +|`druid.query.scheduler.prioritization.adjustment`|Amount to reduce the priority of queries which cross any threshold.|none| + +##### Laning strategies + +###### No laning strategy + +In this mode, queries are never assigned a lane, and the concurrent query count will only be limited by `druid.server.http.numThreads` or `druid.query.scheduler.numThreads`, if set. This is the default Druid query scheduler operating mode. Enable this strategy explicitly by setting `druid.query.scheduler.laning.strategy` to `none`. + +###### 'High/Low' laning strategy + +This laning strategy splits queries with a `priority` below zero into a `low` query lane, automatically. Queries with priority of zero (the default) or above are considered 'interactive'. The limit on `low` queries can be set to some desired percentage of the total capacity (or HTTP thread pool size), reserving capacity for interactive queries. Queries in the `low` lane are _not_ guaranteed their capacity, which may be consumed by interactive queries, but may use up to this limit if total capacity is available. + +If the `low` lane is specified in the [query context](../querying/query-context-reference.md) `lane` parameter, this will override the computed lane. + +This strategy can be enabled by setting `druid.query.scheduler.laning.strategy=hilo`. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.query.scheduler.laning.maxLowPercent`|Maximum percent of the smaller number of `druid.server.http.numThreads` or `druid.query.scheduler.numThreads`, defining the number of HTTP threads that can be used by queries with a priority lower than 0. Value must be an integer in the range 1 to 100, and will be rounded up|No default, must be set if using this mode| + +##### Guardrails for materialization of subqueries + +Druid stores the subquery rows in temporary tables that live in the Java heap. It is a good practice to avoid large subqueries in Druid. +Therefore, there are guardrails that are built in Druid to prevent the queries from generating subquery results which can exhaust the heap +space. They can be set on a cluster level or modified per query level as desired. +Note the following guardrails that can be set by the cluster admin to limit the subquery results: + +1. `druid.server.http.maxSubqueryRows` in broker's config to set a default for the entire cluster or `maxSubqueryRows` in the query context to set an upper limit on the number of rows a subquery can generate +2. `druid.server.http.maxSubqueryBytes` in broker's config to set a default for the entire cluster or `maxSubqueryBytes` in the query context to set an upper limit on the number of bytes a subquery can generate + +Limiting the subquery by bytes is an experimental feature as it materializes the results differently. + +You can configure `maxSubqueryBytes` to the following values: + +* `disabled`: It is the default setting out of the box. It disables the subquery's from the byte based limit, and effectively disables this feature. +* `auto`: Druid automatically decides the optimal byte based limit based upon the heap space available and the max number of concurrent queries. +* A positive long value: User can manually specify the number of bytes that the results of the subqueries of a single query can occupy on the heap. + +Due to the conversion between the Java objects and the Frame's format, setting `maxSubqueryBytes` can become slow if the subquery starts generating +rows in the order of magnitude of around 10 million and above. In those scenarios, disable the `maxSubqueryBytes` settings for such queries, assess the number of rows that the subqueries generate and override the `maxSubqueryRows` to appropriate value. + +If you choose to modify or set any of the above limits, you must also think about the heap size of all Brokers, Historicals, and task Peons that process data for the subqueries to accommodate the subquery results. +There is no formula to calculate the correct value. Trial and error is the best approach. + +###### Manual laning strategy + +This laning strategy is best suited for cases where one or more external applications which query Druid are capable of manually deciding what lane a given query should belong to. Configured with a map of lane names to percent or exact max capacities, queries with a matching `lane` parameter in the [query context](../querying/query-context-reference.md) will be subjected to those limits. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.query.scheduler.laning.lanes.{name}`|Maximum percent or exact limit of queries that can concurrently run in the defined lanes. Any number of lanes may be defined like this. The lane names 'total' and 'default' are reserved for internal use.|No default, must define at least one lane with a limit above 0. If `druid.query.scheduler.laning.isLimitPercent` is set to `true`, values must be integers in the range of 1 to 100.| +|`druid.query.scheduler.laning.isLimitPercent`|If set to `true`, the values set for `druid.query.scheduler.laning.lanes` will be treated as a percent of the smaller number of `druid.server.http.numThreads` or `druid.query.scheduler.numThreads`. Note that in this mode, these lane values across lanes are _not_ required to add up to, and can exceed, 100%.|`false`| + +##### Server configuration + +Druid uses Jetty to serve HTTP requests. Each query being processed consumes a single thread from `druid.server.http.numThreads`, so consider defining `druid.query.scheduler.numThreads` to a lower value in order to reserve HTTP threads for responding to health checks, lookup loading, and other non-query, (in most cases) comparatively very short-lived, HTTP requests. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.server.http.numThreads`|Number of threads for HTTP requests.|max(10, (Number of cores * 17) / 16 + 2) + 30| +|`druid.server.http.queueSize`|Size of the worker queue used by Jetty server to temporarily store incoming client connections. If this value is set and a request is rejected by jetty because queue is full then client would observe request failure with TCP connection being closed immediately with a completely empty response from server.|Unbounded| +|`druid.server.http.maxIdleTime`|The Jetty max idle time for a connection.|`PT5M`| +|`druid.server.http.enableRequestLimit`|If enabled, no requests would be queued in jetty queue and "HTTP 429 Too Many Requests" error response would be sent. |false| +|`druid.server.http.defaultQueryTimeout`|Query timeout in millis, beyond which unfinished queries will be cancelled|300000| +|`druid.server.http.maxScatterGatherBytes`|Maximum number of bytes gathered from data processes such as Historicals and realtime processes to execute a query. Queries that exceed this limit will fail. This is an advance configuration that allows to protect in case Broker is under heavy load and not utilizing the data gathered in memory fast enough and leading to OOMs. This limit can be further reduced at query time using `maxScatterGatherBytes` in the context. Note that having large limit is not necessarily bad if broker is never under heavy concurrent load in which case data gathered is processed quickly and freeing up the memory used. Human-readable format is supported, see [here](human-readable-byte.md). |`Long.MAX_VALUE`| +|`druid.server.http.maxSubqueryRows`|Maximum number of rows from all subqueries per query. Druid stores the subquery rows in temporary tables that live in the Java heap. `druid.server.http.maxSubqueryRows` is a guardrail to prevent the system from exhausting available heap. When a subquery exceeds the row limit, Druid throws a resource limit exceeded exception: "Subquery generated results beyond maximum."

It is a good practice to avoid large subqueries in Druid. However, if you choose to raise the subquery row limit, you must also increase the heap size of all Brokers, Historicals, and task Peons that process data for the subqueries to accommodate the subquery results.

There is no formula to calculate the correct value. Trial and error is the best approach.|100000| +|`druid.server.http.maxSubqueryBytes`|Maximum number of bytes from all subqueries per query. Since the results are stored on the Java heap, `druid.server.http.maxSubqueryBytes` is a guardrail like `druid.server.http.maxSubqueryRows` to prevent the heap space from exhausting. When a subquery exceeds the byte limit, Druid throws a resource limit exceeded exception. A negative value for the guardrail indicates that Druid won't guardrail by memory. This can be set to 'disabled' which disables the results from being limited via the byte limit, 'auto' which sets this value automatically taking free heap space into account, or a positive long value depicting the number of bytes per query's subqueries' results can occupy. This is an experimental feature for now as this materializes the results in a different format.|'disabled'| +|`druid.server.http.gracefulShutdownTimeout`|The maximum amount of time Jetty waits after receiving shutdown signal. After this timeout the threads will be forcefully shutdown. This allows any queries that are executing to complete(Only values greater than zero are valid).|`PT30S`| +|`druid.server.http.unannouncePropagationDelay`|How long to wait for ZooKeeper unannouncements to propagate before shutting down Jetty. This is a minimum and `druid.server.http.gracefulShutdownTimeout` does not start counting down until after this period elapses.|`PT0S` (do not wait)| +|`druid.server.http.maxQueryTimeout`|Maximum allowed value (in milliseconds) for `timeout` parameter. See [query-context](../querying/query-context-reference.md) to know more about `timeout`. Query is rejected if the query context `timeout` is greater than this value. |`Long.MAX_VALUE`| +|`druid.server.http.maxRequestHeaderSize`|Maximum size of a request header in bytes. Larger headers consume more memory and can make a server more vulnerable to denial of service attacks. |8 * 1024| +|`druid.server.http.contentSecurityPolicy`|Content-Security-Policy header value to set on each non-POST response. Setting this property to an empty string, or omitting it, both result in the default `frame-ancestors: none` being set.|`frame-ancestors 'none'`| +|`druid.server.http.enableHSTS`|If set to true, druid services will add strict transport security header `Strict-Transport-Security: max-age=63072000; includeSubDomains` to all HTTP responses|`false`| + +##### Client configuration + +Druid Brokers use an HTTP client to communicate with data servers (Historical servers and real-time tasks). This +client has the following configuration options. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.broker.http.numConnections`|Size of connection pool for the Broker to connect to Historical and real-time processes. If there are more queries than this number that all need to speak to the same process, then they will queue up.|20| +|`druid.broker.http.eagerInitialization`|Indicates that http connections from Broker to Historical and Real-time processes should be eagerly initialized. If set to true, `numConnections` connections are created upon initialization|`true`| +|`druid.broker.http.compressionCodec`|Compression codec the Broker uses to communicate with Historical and real-time processes. May be "gzip" or "identity".|`gzip`| +|`druid.broker.http.readTimeout`|The timeout for data reads from Historical servers and real-time tasks.|`PT15M`| +|`druid.broker.http.unusedConnectionTimeout`|The timeout for idle connections in connection pool. The connection in the pool will be closed after this timeout and a new one will be established. This timeout should be less than `druid.broker.http.readTimeout`. Set this timeout = ~90% of `druid.broker.http.readTimeout`|`PT4M`| +|`druid.broker.http.maxQueuedBytes`|Maximum number of bytes queued per query before exerting [backpressure](../operations/basic-cluster-tuning.md#broker-backpressure) on channels to the data servers.

Similar to `druid.server.http.maxScatterGatherBytes`, except that `maxQueuedBytes` triggers [backpressure](../operations/basic-cluster-tuning.md#broker-backpressure) instead of query failure. Set to zero to disable. You can override this setting by using the [`maxQueuedBytes` query context parameter](../querying/query-context-reference.md). Druid supports [human-readable](human-readable-byte.md) format. |25 MB or 2% of maximum Broker heap size, whichever is greater.| +|`druid.broker.http.numMaxThreads`|`Maximum number of I/O worker threads|(number of cores) * 3 / 2 + 1`| +|`druid.broker.http.clientConnectTimeout`|The timeout (in milliseconds) for establishing client connections.|500| + + +##### Retry policy + +Druid broker can optionally retry queries internally for transient errors. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.broker.retryPolicy.numTries`|Number of tries.|1| + +##### Processing + +The broker uses processing configs for nested groupBy queries. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.processing.buffer.sizeBytes`|This specifies a buffer size (less than 2GiB) for the storage of intermediate results. The computation engine in both the Historical and Realtime processes will use a scratch buffer of this size to do all of their intermediate computations off-heap. Larger values allow for more aggregations in a single pass over the data while smaller values can require more passes depending on the query that is being executed. [Human-readable format](human-readable-byte.md) is supported.|auto (max 1GiB)| +|`druid.processing.buffer.poolCacheInitialCount`|initializes the number of buffers allocated on the intermediate results pool. Note that pool can create more buffers if necessary.|`0`| +|`druid.processing.buffer.poolCacheMaxCount`|processing buffer pool caches the buffers for later use, this is the maximum count cache will grow to. note that pool can create more buffers than it can cache if necessary.|`Integer.MAX_VALUE`| +|`druid.processing.numMergeBuffers`|The number of direct memory buffers available for merging query results. The buffers are sized by `druid.processing.buffer.sizeBytes`. This property is effectively a concurrency limit for queries that require merging buffers. If you are using any queries that require merge buffers (currently, just groupBy) then you should have at least two of these.|`max(2, druid.processing.numThreads / 4)`| +|`druid.processing.fifo`|If the processing queue should treat tasks of equal priority in a FIFO manner|`true`| +|`druid.processing.tmpDir`|Path where temporary files created while processing a query should be stored. If specified, this configuration takes priority over the default `java.io.tmpdir` path.|path represented by `java.io.tmpdir`| +|`druid.processing.merge.useParallelMergePool`|Enable automatic parallel merging for Brokers on a dedicated async ForkJoinPool. If `false`, instead merges will be done serially on the `HTTP` thread pool.|`true`| +|`druid.processing.merge.parallelism`|Size of ForkJoinPool. Note that the default configuration assumes that the value returned by `Runtime.getRuntime().availableProcessors()` represents 2 hyper-threads per physical core, and multiplies this value by `0.75` in attempt to size `1.5` times the number of _physical_ cores.|`Runtime.getRuntime().availableProcessors() * 0.75` (rounded up)| +|`druid.processing.merge.defaultMaxQueryParallelism`|Default maximum number of parallel merge tasks per query. Note that the default configuration assumes that the value returned by `Runtime.getRuntime().availableProcessors()` represents 2 hyper-threads per physical core, and multiplies this value by `0.5` in attempt to size to the number of _physical_ cores.|`Runtime.getRuntime().availableProcessors() * 0.5` (rounded up)| +|`druid.processing.merge.awaitShutdownMillis`|Time to wait for merge ForkJoinPool tasks to complete before ungracefully stopping on process shutdown in milliseconds.|`60_000`| +|`druid.processing.merge.targetRunTimeMillis`|Ideal run-time of each ForkJoinPool merge task, before forking off a new task to continue merging sequences.|100| +|`druid.processing.merge.initialYieldNumRows`|Number of rows to yield per ForkJoinPool merge task, before forking off a new task to continue merging sequences.|16384| +|`druid.processing.merge.smallBatchNumRows`|Size of result batches to operate on in ForkJoinPool merge tasks.|4096| + +The amount of direct memory needed by Druid is at least +`druid.processing.buffer.sizeBytes * (druid.processing.numMergeBuffers + 1)`. You can +ensure at least this amount of direct memory is available by providing `-XX:MaxDirectMemorySize=` at the command +line. + +##### Broker query configuration + +See [general query configuration](#general-query-configuration). + +###### Broker generated query configuration supplementation + +The Broker generates queries internally. This configuration section describes how an operator can augment the configuration +of these queries. + +As of now the only supported augmentation is overriding the default query context. This allows an operator the flexibility +to adjust it as they see fit. A common use of this configuration is to override the query priority of the cluster generated +queries in order to avoid running as a default priority of 0. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.broker.internal.query.config.context`|A string formatted `key:value` map of a query context to add to internally generated broker queries.|null| + +#### SQL + +The Druid SQL server is configured through the following properties on the Broker. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.sql.enable`|Whether to enable SQL at all, including background metadata fetching. If false, this overrides all other SQL-related properties and disables SQL metadata, serving, and planning completely.|true| +|`druid.sql.avatica.enable`|Whether to enable JDBC querying at `/druid/v2/sql/avatica/`.|true| +|`druid.sql.avatica.maxConnections`|Maximum number of open connections for the Avatica server. These are not HTTP connections, but are logical client connections that may span multiple HTTP connections.|25| +|`druid.sql.avatica.maxRowsPerFrame`|Maximum acceptable value for the JDBC client `Statement.setFetchSize` method. This setting determines the maximum number of rows that Druid will populate in a single 'fetch' for a JDBC `ResultSet`. Set this property to -1 to enforce no row limit on the server-side and potentially return the entire set of rows on the initial statement execution. If the JDBC client calls `Statement.setFetchSize` with a value other than -1, Druid uses the lesser value of the client-provided limit and `maxRowsPerFrame`. If `maxRowsPerFrame` is smaller than `minRowsPerFrame`, then the `ResultSet` size will be fixed. To handle queries that produce results with a large number of rows, you can increase value of `druid.sql.avatica.maxRowsPerFrame` to reduce the number of fetches required to completely transfer the result set.|5,000| +|`druid.sql.avatica.minRowsPerFrame`|Minimum acceptable value for the JDBC client `Statement.setFetchSize` method. The value for this property must greater than 0. If the JDBC client calls `Statement.setFetchSize` with a lesser value, Druid uses `minRowsPerFrame` instead. If `maxRowsPerFrame` is less than `minRowsPerFrame`, Druid uses the minimum value of the two. For handling queries which produce results with a large number of rows, you can increase this value to reduce the number of fetches required to completely transfer the result set.|100| +|`druid.sql.avatica.maxStatementsPerConnection`|Maximum number of simultaneous open statements per Avatica client connection.|4| +|`druid.sql.avatica.connectionIdleTimeout`|Avatica client connection idle timeout.|`PT5M`| +|`druid.sql.avatica.fetchTimeoutMs`|Avatica fetch timeout, in milliseconds. When a request for the next batch of data takes longer than this time, Druid returns an empty result set, causing the client to poll again. This avoids HTTP timeouts for long-running queries. The default of 5 sec. is good for most cases. |5000| +|`druid.sql.http.enable`|Whether to enable JSON over HTTP querying at `/druid/v2/sql/`.|true| +|`druid.sql.planner.maxTopNLimit`|Maximum threshold for a [TopN query](../querying/topnquery.md). Higher limits will be planned as [GroupBy queries](../querying/groupbyquery.md) instead.|100000| +|`druid.sql.planner.metadataRefreshPeriod`|Throttle for metadata refreshes.|`PT1M`| +|`druid.sql.planner.metadataColumnTypeMergePolicy`|Defines how column types will be chosen when faced with differences between segments when computing the SQL schema. Options are specified as a JSON object, with valid choices of `leastRestrictive` or `latestInterval`. For `leastRestrictive`, Druid will automatically widen the type computed for the schema to a type which data across all segments can be converted into, however planned schema migrations can only take effect once all segments have been re-ingested to the new schema. With `latestInterval`, the column type in most recent time chunks defines the type for the schema. |`leastRestrictive`| +|`druid.sql.planner.useApproximateCountDistinct`|Whether to use an approximate cardinality algorithm for `COUNT(DISTINCT foo)`.|true| +|`druid.sql.planner.useGroupingSetForExactDistinct`|Only relevant when `useApproximateCountDistinct` is disabled. If set to true, exact distinct queries are re-written using grouping sets. Otherwise, exact distinct queries are re-written using joins. This should be set to true for group by query with multiple exact distinct aggregations. This flag can be overridden per query.|false| +|`druid.sql.planner.useApproximateTopN`|Whether to use approximate [TopN queries](../querying/topnquery.md) when a SQL query could be expressed as such. If false, exact [GroupBy queries](../querying/groupbyquery.md) will be used instead.|true| +|`druid.sql.planner.useLexicographicTopN`|Whether to use [TopN queries](../querying/topnquery.md) with lexicographic dimension ordering. If false, [GroupBy queries](../querying/groupbyquery.md) will be used instead for lexicographic ordering. When both this and `useApproximateTopN` are false, TopN queries are never used.|false| +|`druid.sql.planner.requireTimeCondition`|Whether to require SQL to have filter conditions on `__time` column so that all generated native queries will have user specified intervals. If true, all queries without filter condition on `__time` column will fail|false| +|`druid.sql.planner.sqlTimeZone`|Sets the default time zone for the server, which will affect how time functions and timestamp literals behave. Should be a time zone name like "America/Los_Angeles" or offset like "-08:00".|UTC| +|`druid.sql.planner.metadataSegmentCacheEnable`|Whether to keep a cache of published segments in broker. If true, broker polls coordinator in background to get segments from metadata store and maintains a local cache. If false, coordinator's REST API will be invoked when broker needs published segments info.|false| +|`druid.sql.planner.metadataSegmentPollPeriod`|How often to poll coordinator for published segments list if `druid.sql.planner.metadataSegmentCacheEnable` is set to true. Poll period is in milliseconds. |60000| +|`druid.sql.planner.authorizeSystemTablesDirectly`|If true, Druid authorizes queries against any of the system schema tables (`sys` in SQL) as `SYSTEM_TABLE` resources which require `READ` access, in addition to permissions based content filtering.|false| +|`druid.sql.planner.useNativeQueryExplain`|If true, `EXPLAIN PLAN FOR` will return the explain plan as a JSON representation of equivalent native query(s), else it will return the original version of explain plan generated by Calcite. It can be overridden per query with `useNativeQueryExplain` context key.|true| +|`druid.sql.planner.maxNumericInFilters`|Max limit for the amount of numeric values that can be compared for a string type dimension when the entire SQL WHERE clause of a query translates to an [OR](../querying/filters.md#or) of [Bound filter](../querying/filters.md#bound-filter). By default, Druid does not restrict the amount of numeric Bound Filters on String columns, although this situation may block other queries from running. Set this property to a smaller value to prevent Druid from running queries that have prohibitively long segment processing times. The optimal limit requires some trial and error; we recommend starting with 100. Users who submit a query that exceeds the limit of `maxNumericInFilters` should instead rewrite their queries to use strings in the `WHERE` clause instead of numbers. For example, `WHERE someString IN (‘123’, ‘456’)`. If this value is disabled, `maxNumericInFilters` set through query context is ignored.|`-1` (disabled)| +|`druid.sql.approxCountDistinct.function`|Implementation to use for the [`APPROX_COUNT_DISTINCT` function](../querying/sql-aggregations.md). Without extensions loaded, the only valid value is `APPROX_COUNT_DISTINCT_BUILTIN` (a HyperLogLog, or HLL, based implementation). If the [DataSketches extension](../development/extensions-core/datasketches-extension.md) is loaded, this can also be `APPROX_COUNT_DISTINCT_DS_HLL` (alternative HLL implementation) or `APPROX_COUNT_DISTINCT_DS_THETA`.

Theta sketches use significantly more memory than HLL sketches, so you should prefer one of the two HLL implementations.|`APPROX_COUNT_DISTINCT_BUILTIN`| + +:::info + Previous versions of Druid had properties named `druid.sql.planner.maxQueryCount` and `druid.sql.planner.maxSemiJoinRowsInMemory`. + These properties are no longer available. Since Druid 0.18.0, you can use `druid.server.http.maxSubqueryRows` to control the maximum + number of rows permitted across all subqueries. +::: + +#### Broker caching + +You can optionally only configure caching to be enabled on the Broker by setting caching configs here. + +|Property|Possible Values|Description|Default| +|--------|---------------|-----------|-------| +|`druid.broker.cache.useCache`|true, false|Enable the cache on the Broker.|false| +|`druid.broker.cache.populateCache`|true, false|Populate the cache on the Broker.|false| +|`druid.broker.cache.useResultLevelCache`|true, false|Enable result level caching on the Broker.|false| +|`druid.broker.cache.populateResultLevelCache`|true, false|Populate the result level cache on the Broker.|false| +|`druid.broker.cache.resultLevelCacheLimit`|positive integer|Maximum size of query response that can be cached.|`Integer.MAX_VALUE`| +|`druid.broker.cache.unCacheable`|All druid query types|All query types to not cache.|`[scan]`| +|`druid.broker.cache.cacheBulkMergeLimit`|positive integer or 0|Queries with more segments than this number will not attempt to fetch from cache at the broker level, leaving potential caching fetches (and cache result merging) to the Historicals|`Integer.MAX_VALUE`| +|`druid.broker.cache.maxEntrySize`|positive integer|Maximum cache entry size in bytes.|1_000_000| + +See [cache configuration](#cache-configuration) for how to configure cache settings. + +:::info + Note: Even if cache is enabled, for [groupBy](../querying/groupbyquery.md) queries, segment level cache does not work on Brokers. + See [Query caching](../querying/caching.md) for more information. +::: + +#### Segment discovery + +|Property|Possible Values|Description|Default| +|--------|---------------|-----------|-------| +|`druid.serverview.type`|batch or http|Segment discovery method to use. "http" enables discovering segments using HTTP instead of ZooKeeper.|http| +|`druid.broker.segment.watchedTiers`|List of strings|The Broker watches segment announcements from processes that serve segments to build a cache to relate each process to the segments it serves. This configuration allows the Broker to only consider segments being served from a list of tiers. By default, Broker considers all tiers. This can be used to partition your dataSources in specific Historical tiers and configure brokers in partitions so that they are only queryable for specific dataSources. This config is mutually exclusive from `druid.broker.segment.ignoredTiers` and at most one of these can be configured on a Broker.|none| +|`druid.broker.segment.ignoredTiers`|List of strings|The Broker watches segment announcements from processes that serve segments to build a cache to relate each process to the segments it serves. This configuration allows the Broker to ignore the segments being served from a list of tiers. By default, Broker considers all tiers. This config is mutually exclusive from `druid.broker.segment.watchedTiers` and at most one of these can be configured on a Broker.|none| +|`druid.broker.segment.watchedDataSources`|List of strings|Broker watches the segment announcements from processes serving segments to build cache of which process is serving which segments, this configuration allows to only consider segments being served from a whitelist of dataSources. By default, Broker would consider all datasources. This can be used to configure brokers in partitions so that they are only queryable for specific dataSources.|none| +|`druid.broker.segment.watchRealtimeTasks`|Boolean|The Broker watches segment announcements from processes that serve segments to build a cache to relate each process to the segments it serves. When `watchRealtimeTasks` is true, the Broker watches for segment announcements from both Historicals and realtime processes. To configure a broker to exclude segments served by realtime processes, set `watchRealtimeTasks` to false. |true| +|`druid.broker.segment.awaitInitializationOnStart`|Boolean|Whether the Broker will wait for its view of segments to fully initialize before starting up. If set to 'true', the Broker's HTTP server will not start up, and the Broker will not announce itself as available, until the server view is initialized. See also `druid.sql.planner.awaitInitializationOnStart`, a related setting.|true| + +## Metrics monitors + +You can configure Druid services to emit [metrics](../operations/metrics.md) regularly from a number of [monitors](#metrics-monitors-for-each-service) via [emitters](#metrics-emitters). The following table lists general configurations for metrics: + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.monitoring.emissionPeriod`| Frequency that Druid emits metrics.|`PT1M`| +|[`druid.monitoring.monitors`](#metrics-monitors-for-each-service)|Sets list of Druid monitors used by a service.|none (no monitors)| +|[`druid.emitter`](#metrics-emitters)|Setting this value initializes one of the emitter modules.|`noop` (metric emission disabled by default)| + +### Metrics monitors for each service + +Metric monitoring is an essential part of Druid operations. +Monitors can be enabled by configuring the property `druid.monitoring.monitors` in the common configuration file, `common.runtime.properties`. +If a monitor is not supported on a certain service, it will simply be ignored while starting up that service. + +The following table lists available monitors and the respective services where they are supported: + +|Name|Description|Service| +|----|-----------|-------| +|`org.apache.druid.client.cache.CacheMonitor`|Emits metrics (to logs) about the segment results cache for Historical and Broker services. Reports typical cache statistics include hits, misses, rates, and size (bytes and number of entries), as well as timeouts and and errors.|Broker, Historical, Indexer, Peon| +|`org.apache.druid.java.util.metrics.OshiSysMonitor`|Reports on various system activities and statuses using the [OSHI](https://github.com/oshi/oshi), a JNA-based (native) Operating System and Hardware Information library for Java.|Any| +|`org.apache.druid.java.util.metrics.JvmMonitor`|Reports various JVM-related statistics.|Any| +|`org.apache.druid.java.util.metrics.JvmCpuMonitor`|Reports statistics of CPU consumption by the JVM.|Any| +|`org.apache.druid.java.util.metrics.CpuAcctDeltaMonitor`|Reports consumed CPU as per the cpuacct cgroup.|Any| +|`org.apache.druid.java.util.metrics.JvmThreadsMonitor`|Reports Thread statistics in the JVM, like numbers of total, daemon, started, died threads.|Any| +|`org.apache.druid.java.util.metrics.CgroupCpuMonitor`|Reports CPU shares and quotas as per the `cpu` cgroup.|Any| +|`org.apache.druid.java.util.metrics.CgroupCpuSetMonitor`|Reports CPU core/HT and memory node allocations as per the `cpuset` cgroup.|Any| +|`org.apache.druid.java.util.metrics.CgroupDiskMonitor`|Reports disk statistic as per the blkio cgroup.|Any| +|`org.apache.druid.java.util.metrics.CgroupMemoryMonitor`|Reports memory statistic as per the memory cgroup.|Any| +|`org.apache.druid.java.util.metrics.CgroupV2CpuMonitor`| **EXPERIMENTAL** Reports CPU usage from `cpu.stat` file. Only applicable to `cgroupv2`.|Any| +|`org.apache.druid.java.util.metrics.CgroupV2DiskMonitor`| **EXPERIMENTAL** Reports disk usage from `io.stat` file. Only applicable to `cgroupv2`.|Any| +|`org.apache.druid.java.util.metrics.CgroupV2MemoryMonitor`| **EXPERIMENTAL** Reports memory usage from `memory.current` and `memory.max` files. Only applicable to `cgroupv2`.|Any| +|`org.apache.druid.server.metrics.HistoricalMetricsMonitor`|Reports statistics on Historical services.|Historical| +|`org.apache.druid.server.metrics.SegmentStatsMonitor` | **EXPERIMENTAL** Reports statistics about segments on Historical services. Not to be used when lazy loading is configured.|Historical| +|`org.apache.druid.server.metrics.QueryCountStatsMonitor`|Reports how many queries have been successful/failed/interrupted.|Broker, Historical, Router, Indexer, Peon| +|`org.apache.druid.server.metrics.SubqueryCountStatsMonitor`|Reports how many subqueries have been materialized as rows or bytes and various other statistics related to the subquery execution|Broker| +|`org.apache.druid.server.emitter.HttpEmittingMonitor`|Reports internal metrics of `http` or `parametrized` emitter (see below). Must not be used with another emitter type. See the description of the metrics here: https://github.com/apache/druid/pull/4973.|Any| +|`org.apache.druid.server.metrics.TaskCountStatsMonitor`|Reports how many ingestion tasks are currently running/pending/waiting and also the number of successful/failed tasks per emission period.|Overlord| +|`org.apache.druid.server.metrics.TaskSlotCountStatsMonitor`|Reports metrics about task slot usage per emission period.|Overlord| +|`org.apache.druid.server.metrics.WorkerTaskCountStatsMonitor`|Reports how many ingestion tasks are currently running/pending/waiting, the number of successful/failed tasks, and metrics about task slot usage for the reporting worker, per emission period. |MiddleManager, Indexer| +|`org.apache.druid.server.metrics.ServiceStatusMonitor`|Reports a heartbeat for the service.|Any| +|`org.apache.druid.server.metrics.GroupByStatsMonitor`|Report metrics for groupBy queries like disk and merge buffer utilization. |Broker, Historical, Indexer, Peon| + +For example, if you only wanted monitors on all services for system and JVM information, you'd add the following to `common.runtime.properties`: + +```properties +druid.monitoring.monitors=["org.apache.druid.java.util.metrics.OshiSysMonitor","org.apache.druid.java.util.metrics.JvmMonitor"] +``` + +All the services in your Druid deployment would have these two monitors. + +If you want any service specific monitors though, you need to add all the monitors you want to run for that service to the service's `runtime.properties` file even if they are listed in the common file. The service specific properties take precedence. + +The following example adds the `TaskCountStatsMonitor` and `TaskSlotCountStatsMonitor` as well as the `OshiSysMonitor` and `JvmMonitor` from the previous example to the Overlord service (`coordinator-overlord/runtime.properties`): + +```properties +druid.monitoring.monitors=["org.apache.druid.server.metrics.TaskCountStatsMonitor", "org.apache.druid.server.metrics.TaskSlotCountStatsMonitor", "org.apache.druid.java.util.metrics.OshiSysMonitor","org.apache.druid.java.util.metrics.JvmMonitor"] +``` + +If you don't include `OshiSysMonitor` and `JvmMonitor` in the Overlord's `runtime.properties` file, the monitors don't get loaded onto the Overlord despite being specified in the common file. + +### Metrics emitters + +There are several emitters available: + +* `noop` (default) disables metric emission. +* [`logging`](#logging-emitter-module) emits logs using Log4j2. +* [`http`](#http-emitter-module) sends `POST` requests of JSON events. +* [`parametrized`](#parametrized-http-emitter-module) operates like the `http` emitter but fine-tunes the recipient URL based on the event feed. +* [`composing`](#composing-emitter-module) initializes multiple emitter modules. +* [`graphite`](#graphite-emitter) emits metrics to a [Graphite](https://graphiteapp.org/) Carbon service. +* [`switching`](#switching-emitter) initializes and emits to multiple emitter modules based on the event feed. + +#### Logging emitter module + +The use this emitter module, set `druid.emitter=logging`. The `logging` emitter uses a Log4j2 logger named +`druid.emitter.logging.loggerClass` to emit events. Each event is logged as a single `json` object with a +[Marker](https://logging.apache.org/log4j/2.x/manual/markers.html) as the feed of the event. Users may wish to edit the +log4j config to route these logs to different sources based on the feed of the event. + +|Property|Description| Default| +|--------|-----------|--------| +|`druid.emitter.logging.loggerClass`|The class used for logging.|`org.apache.druid.java.util.emitter.core.LoggingEmitter`| +|`druid.emitter.logging.logLevel`|Choices: debug, info, warn, error. The log level at which message are logged.|info| + +#### HTTP emitter module + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.emitter.http.flushMillis`|How often the internal message buffer is flushed (data is sent).|60000| +|`druid.emitter.http.flushCount`|How many messages the internal message buffer can hold before flushing (sending).|500| +|`druid.emitter.http.basicAuthentication`|[Password Provider](../operations/password-provider.md) for providing login and password for authentication in `"login:password"` form. For example, `druid.emitter.http.basicAuthentication=admin:adminpassword` uses Default Password Provider which allows plain text passwords.|not specified = no authentication| +|`druid.emitter.http.flushTimeOut`|The timeout after which an event should be sent to the endpoint, even if internal buffers are not filled, in milliseconds.|not specified = no timeout| +|`druid.emitter.http.batchingStrategy`|The strategy of how the batch is formatted. "ARRAY" means `[event1,event2]`, "NEWLINES" means `event1\nevent2`, ONLY_EVENTS means `event1event2`.|ARRAY| +|`druid.emitter.http.maxBatchSize`|The maximum batch size, in bytes.|the minimum of (10% of JVM heap size divided by 2) or (5242880 (i. e. 5 MiB))| +|`druid.emitter.http.batchQueueSizeLimit`|The maximum number of batches in emitter queue, if there are problems with emitting.|the maximum of (2) or (10% of the JVM heap size divided by 5MiB)| +|`druid.emitter.http.minHttpTimeoutMillis`|If the speed of filling batches imposes timeout smaller than that, not even trying to send batch to endpoint, because it will likely fail, not being able to send the data that fast. Configure this depending based on emitter/successfulSending/minTimeMs metric. Reasonable values are 10ms..100ms.|0| +|`druid.emitter.http.recipientBaseUrl`|The base URL to emit messages to. Druid will POST JSON to be consumed at the HTTP endpoint specified by this property.|none, required config| + +#### HTTP emitter module TLS overrides + +By default, when sending events to a TLS-enabled receiver, the HTTP Emitter uses an SSLContext obtained from the service described at [Druid's internal communication over TLS](../operations/tls-support.md), that is the same SSLContext that would be used for internal communications between Druid services. + +In some use cases it may be desirable to have the HTTP Emitter use its own separate truststore configuration. For example, there may be organizational policies that prevent the TLS-enabled metrics receiver's certificate from being added to the same truststore used by Druid's internal HTTP client. + +The following properties allow the HTTP Emitter to use its own truststore configuration when building its SSLContext. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.emitter.http.ssl.useDefaultJavaContext`|If set to true, the HttpEmitter will use `SSLContext.getDefault()`, the default Java SSLContext, and all other properties below are ignored.|false| +|`druid.emitter.http.ssl.trustStorePath`|The file path or URL of the TLS/SSL Key store where trusted root certificates are stored. If this is unspecified, the HTTP Emitter will use the same SSLContext as Druid's internal HTTP client, as described in the beginning of this section, and all other properties below are ignored.|null| +|`druid.emitter.http.ssl.trustStoreType`|The type of the key store where trusted root certificates are stored.|`java.security.KeyStore.getDefaultType()`| +|`druid.emitter.http.ssl.trustStoreAlgorithm`|Algorithm to be used by TrustManager to validate certificate chains|`javax.net.ssl.TrustManagerFactory.getDefaultAlgorithm()`| +|`druid.emitter.http.ssl.trustStorePassword`|The [Password Provider](../operations/password-provider.md) or String password for the Trust Store.|none| +|`druid.emitter.http.ssl.protocol`|TLS protocol to use.|"TLSv1.2"| + +#### Parametrized HTTP emitter module + +The parametrized emitter takes the same configs as the [`http` emitter](#http-emitter-module) using the prefix `druid.emitter.parametrized.httpEmitting.`. +For example: + +* `druid.emitter.parametrized.httpEmitting.flushMillis` +* `druid.emitter.parametrized.httpEmitting.flushCount` +* `druid.emitter.parametrized.httpEmitting.ssl.trustStorePath` + +Do not specify `recipientBaseUrl` with the parametrized emitter. +Instead use `recipientBaseUrlPattern` described in the table below. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.emitter.parametrized.recipientBaseUrlPattern`|The URL pattern to send an event to, based on the event's feed. For example, `http://foo.bar/{feed}`, that will send event to `http://foo.bar/metrics` if the event's feed is "metrics".|none, required config| + +#### Composing emitter module + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.emitter.composing.emitters`|List of emitter modules to load, such as ["logging","http"].|[]| + +#### Graphite emitter + +To use graphite as emitter set `druid.emitter=graphite`. For configuration details, see [Graphite emitter](../development/extensions-contrib/graphite.md) for the Graphite emitter Druid extension. + +#### Switching emitter + +To use switching as emitter set `druid.emitter=switching`. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.emitter.switching.emitters`|JSON map of feed to list of emitter modules that will be used for the mapped feed, such as `{"metrics":["http"], "alerts":["logging"]}`|{}| +|`druid.emitter.switching.defaultEmitters`|JSON list of emitter modules to load that will be used if there is no emitter specifically designated for that event's feed, such as `["logging","http"]`.|[]| + + +## Cache configuration + +This section describes caching configuration that is common to Broker, Historical, and Middle Manager/Peon processes. + +Caching could optionally be enabled on the Broker, Historical, and Middle Manager/Peon processes. See +[Broker](#broker-caching), [Historical](#historical-caching), and [Peon](#peon-caching) configuration options for how to +enable it for different processes. + +Druid uses a local in-memory cache by default, unless a different type of cache is specified. +Use the `druid.cache.type` configuration to set a different kind of cache. + +Cache settings are set globally, so the same configuration can be re-used +for both Broker and Historical processes, when defined in the common properties file. + +### Cache type + +|Property|Possible Values|Description|Default| +|--------|---------------|-----------|-------| +|`druid.cache.type`|`local`, `memcached`, `hybrid`, `caffeine`|The type of cache to use for queries. See below of the configuration options for each cache type|`caffeine`| + +#### Local cache + +:::info + DEPRECATED: Use caffeine (default as of v0.12.0) instead +::: + +The local cache is deprecated in favor of the Caffeine cache, and may be removed in a future version of Druid. The Caffeine cache affords significantly better performance and control over eviction behavior compared to `local` cache, and is recommended in any situation where you are using JRE 8u60 or higher. + +A simple in-memory LRU cache. Local cache resides in JVM heap memory, so if you enable it, make sure you increase heap size accordingly. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.cache.sizeInBytes`|Maximum cache size in bytes. Zero disables caching.|0| +|`druid.cache.initialSize`|Initial size of the hash table backing the cache.|500000| +|`druid.cache.logEvictionCount`|If non-zero, log cache eviction every `logEvictionCount` items.|0| + +#### Caffeine cache + +A highly performant local cache implementation for Druid based on [Caffeine](https://github.com/ben-manes/caffeine). Requires a JRE8u60 or higher if using `COMMON_FJP`. + +##### Configuration + +The following table shows the configuration options known to this module: + +|`runtime.properties`|Description|Default| +|--------------------|-----------|-------| +|`druid.cache.type`| Set this to `caffeine` or leave out parameter|`caffeine`| +|`druid.cache.sizeInBytes`|The maximum size of the cache in bytes on heap. It can be configured as described in [here](human-readable-byte.md). |min(1GiB, Runtime.maxMemory / 10)| +|`druid.cache.expireAfter`|The time (in ms) after an access for which a cache entry may be expired|None (no time limit)| +|`druid.cache.cacheExecutorFactory`|The executor factory to use for Caffeine maintenance. One of `COMMON_FJP`, `SINGLE_THREAD`, or `SAME_THREAD`|ForkJoinPool common pool (`COMMON_FJP`)| +|`druid.cache.evictOnClose`|If a close of a namespace (ex: removing a segment from a process) should cause an eager eviction of associated cache values|`false`| + +##### `druid.cache.cacheExecutorFactory` + +The following are the possible values for `druid.cache.cacheExecutorFactory`, which controls how maintenance tasks are run: + +* `COMMON_FJP` (default) use the common ForkJoinPool. Should use with [JRE 8u60 or higher](https://github.com/apache/druid/pull/4810#issuecomment-329922810). Older versions of the JRE may have worse performance than newer JRE versions. +* `SINGLE_THREAD` Use a single-threaded executor. +* `SAME_THREAD` Cache maintenance is done eagerly. + +##### Metrics + +In addition to the normal cache metrics, the caffeine cache implementation also reports the following in both `total` and `delta`: + +|Metric|Description|Normal value| +|------|-----------|------------| +|`query/cache/caffeine/*/requests`|Count of hits or misses.|hit + miss| +|`query/cache/caffeine/*/loadTime`|Length of time caffeine spends loading new values (unused feature).|0| +|`query/cache/caffeine/*/evictionBytes`|Size in bytes that have been evicted from the cache|Varies, should tune cache `sizeInBytes` so that `sizeInBytes`/`evictionBytes` is approximately the rate of cache churn you desire.| + +##### Memcached + +Uses memcached as cache backend. This allows all processes to share the same cache. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.cache.expiration`|Memcached [expiration time](https://code.google.com/p/memcached/wiki/NewCommands#Standard_Protocol).|2592000 (30 days)| +|`druid.cache.timeout`|Maximum time in milliseconds to wait for a response from Memcached.|500| +|`druid.cache.hosts`|Comma separated list of Memcached hosts ``. Need to specify all nodes when `druid.cache.clientMode` is set to static. Dynamic mode [automatically identifies nodes in your cluster](https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/AutoDiscovery.html) so just specifying the configuration endpoint and port is fine.|none| +|`druid.cache.maxObjectSize`|Maximum object size in bytes for a Memcached object.|52428800 (50 MiB)| +|`druid.cache.memcachedPrefix`|Key prefix for all keys in Memcached.|druid| +|`druid.cache.numConnections`| Number of memcached connections to use.|1| +|`druid.cache.protocol`| Memcached communication protocol. Can be binary or text.|binary| +|`druid.cache.locator`| Memcached locator. Can be consistent or `array_mod`.|consistent| +|`druid.cache.enableTls`|Enable TLS based connection for Memcached client. Boolean.|false| +|`druid.cache.clientMode`|Client Mode. Static mode requires the user to specify individual cluster nodes. Dynamic mode uses [AutoDiscovery](https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/AutoDiscovery.HowAutoDiscoveryWorks.html) feature of AWS Memcached. String. ["static"](https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/AutoDiscovery.Manual.html) or ["dynamic"](https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/AutoDiscovery.Using.ModifyApp.Java.html)|static| +|`druid.cache.skipTlsHostnameVerification`|Skip TLS Hostname Verification. Boolean.|true| + +#### Hybrid + +Uses a combination of any two caches as a two-level L1 / L2 cache. +This may be used to combine a local in-memory cache with a remote memcached cache. + +Cache requests will first check L1 cache before checking L2. +If there is an L1 miss and L2 hit, it will also populate L1. + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.cache.l1.type`|The type of cache to use for L1 cache. See `druid.cache.type` configuration for valid types.|`caffeine`| +|`druid.cache.l2.type`|The type of cache to use for L2 cache. See `druid.cache.type` configuration for valid types.|`caffeine`| +|`druid.cache.l1.*`|Any property valid for the given type of L1 cache can be set using this prefix. For instance, if you are using a `caffeine` L1 cache, specify `druid.cache.l1.sizeInBytes` to set its size.|defaults are the same as for the given cache type| +|`druid.cache.l2.*`|Prefix for L2 cache settings, see description for L1.|defaults are the same as for the given cache type| +|`druid.cache.useL2`|A boolean indicating whether to query L2 cache, if it's a miss in L1. It makes sense to configure this to `false` on Historical processes, if L2 is a remote cache like `memcached`, and this cache also used on brokers, because in this case if a query reached Historical it means that a broker didn't find corresponding results in the same remote cache, so a query to the remote cache from Historical is guaranteed to be a miss.|`true`| +|`druid.cache.populateL2`|A boolean indicating whether to put results into L2 cache.|`true`| + +## General query configuration + +This section describes configurations that control behavior of Druid's query types, applicable to Broker, Historical, and Middle Manager processes. + +### Overriding default query context values + +You can override any [query context general parameter](../querying/query-context-reference.md#general-parameters) default value by setting the runtime property in the format of `druid.query.default.context.{query_context_key}`. +The `druid.query.default.context.{query_context_key}` runtime property prefix applies to all current and future query context keys, the same as how query context parameter passed with the query works. You can override the runtime property value if the value for the same key is specified in the query contexts. + +The precedence chain for query context values is as follows: + +hard-coded default value in Druid code `<-` runtime property not prefixed with `druid.query.default.context` +`<-` runtime property prefixed with `druid.query.default.context` `<-` context parameter in the query + +Note that not all query context key has a runtime property not prefixed with `druid.query.default.context` that can +override the hard-coded default value. For example, `maxQueuedBytes` has `druid.broker.http.maxQueuedBytes` +but `joinFilterRewriteMaxSize` does not. Hence, the only way of overriding `joinFilterRewriteMaxSize` hard-coded default +value is with runtime property `druid.query.default.context.joinFilterRewriteMaxSize`. + +To further elaborate on the previous example: + +If neither `druid.broker.http.maxQueuedBytes` or `druid.query.default.context.maxQueuedBytes` is set and +the query does not have `maxQueuedBytes` in the context, then the hard-coded value in Druid code is use. +If runtime property only contains `druid.broker.http.maxQueuedBytes=x` and query does not have `maxQueuedBytes` in the +context, then the value of the property, `x`, is use. However, if query does have `maxQueuedBytes` in the context, +then that value is use instead. +If runtime property only contains `druid.query.default.context.maxQueuedBytes=y` OR runtime property contains both +`druid.broker.http.maxQueuedBytes=x` and `druid.query.default.context.maxQueuedBytes=y`, then the value of +`druid.query.default.context.maxQueuedBytes`, `y`, is use (given that query does not have `maxQueuedBytes` in the +context). If query does have `maxQueuedBytes` in the context, then that value is use instead. + +### TopN query config + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.query.topN.minTopNThreshold`|See [TopN Aliasing](../querying/topnquery.md#aliasing) for details.|1000| + +### Search query config + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.query.search.maxSearchLimit`|Maximum number of search results to return.|1000| +|`druid.query.search.searchStrategy`|Default search query strategy.|`useIndexes`| + +### SegmentMetadata query config + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.query.segmentMetadata.defaultHistory`|When no interval is specified in the query, use a default interval of defaultHistory before the end time of the most recent segment, specified in ISO8601 format. This property also controls the duration of the default interval used by `GET` `/druid/v2/datasources/{dataSourceName}` interactions for retrieving datasource dimensions and metrics.|`P1W`| +|`druid.query.segmentMetadata.defaultAnalysisTypes`|This can be used to set the Default Analysis Types for all segment metadata queries, this can be overridden when making the query|`["cardinality", "interval", "minmax"]`| + +### GroupBy query config + +This section describes the configurations for groupBy queries. You can set the runtime properties in the `runtime.properties` file on Broker, Historical, and Middle Manager processes. You can set the query context parameters through the [query context](../querying/query-context-reference.md). + +Supported runtime properties: + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.query.groupBy.maxSelectorDictionarySize`|Maximum amount of heap space (approximately) to use for per-segment string dictionaries. See [groupBy memory tuning and resource limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for details.|100000000| +|`druid.query.groupBy.maxMergingDictionarySize`|Maximum amount of heap space (approximately) to use for per-query string dictionaries. When the dictionary exceeds this size, a spill to disk will be triggered. See [groupBy memory tuning and resource limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for details.|100000000| +|`druid.query.groupBy.maxOnDiskStorage`|Maximum amount of disk space to use, per-query, for spilling result sets to disk when either the merging buffer or the dictionary fills up. Queries that exceed this limit will fail. Set to zero to disable disk spilling.|0 (disabled)| +|`druid.query.groupBy.defaultOnDiskStorage`|Default amount of disk space to use, per-query, for spilling the result sets to disk when either the merging buffer or the dictionary fills up. Set to zero to disable disk spilling for queries which don't override `maxOnDiskStorage` in their context.|`druid.query.groupBy.maxOnDiskStorage`| + +Supported query contexts: + +|Key|Description| +|---|-----------| +|`maxSelectorDictionarySize`|Can be used to lower the value of `druid.query.groupBy.maxMergingDictionarySize` for this query.| +|`maxMergingDictionarySize`|Can be used to lower the value of `druid.query.groupBy.maxMergingDictionarySize` for this query.| +|`maxOnDiskStorage`|Can be used to set `maxOnDiskStorage` to a value between 0 and `druid.query.groupBy.maxOnDiskStorage` for this query. If this query context override exceeds `druid.query.groupBy.maxOnDiskStorage`, the query will use `druid.query.groupBy.maxOnDiskStorage`. Omitting this from the query context will cause the query to use `druid.query.groupBy.defaultOnDiskStorage` for `maxOnDiskStorage`| + +### Advanced configurations + +Supported runtime properties: + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.query.groupBy.singleThreaded`|Merge results using a single thread.|false| +|`druid.query.groupBy.bufferGrouperInitialBuckets`|Initial number of buckets in the off-heap hash table used for grouping results. Set to 0 to use a reasonable default (1024).|0| +|`druid.query.groupBy.bufferGrouperMaxLoadFactor`|Maximum load factor of the off-heap hash table used for grouping results. When the load factor exceeds this size, the table will be grown or spilled to disk. Set to 0 to use a reasonable default (0.7).|0| +|`druid.query.groupBy.forceHashAggregation`|Force to use hash-based aggregation.|false| +|`druid.query.groupBy.intermediateCombineDegree`|Number of intermediate processes combined together in the combining tree. Higher degrees will need less threads which might be helpful to improve the query performance by reducing the overhead of too many threads if the server has sufficiently powerful CPU cores.|8| +|`druid.query.groupBy.numParallelCombineThreads`|Hint for the number of parallel combining threads. This should be larger than 1 to turn on the parallel combining feature. The actual number of threads used for parallel combining is min(`druid.query.groupBy.numParallelCombineThreads`, `druid.processing.numThreads`).|1 (disabled)| + +Supported query contexts: + +|Key|Description|Default| +|---|-----------|-------| +|`groupByIsSingleThreaded`|Overrides the value of `druid.query.groupBy.singleThreaded` for this query.| | +|`bufferGrouperInitialBuckets`|Overrides the value of `druid.query.groupBy.bufferGrouperInitialBuckets` for this query.|none| +|`bufferGrouperMaxLoadFactor`|Overrides the value of `druid.query.groupBy.bufferGrouperMaxLoadFactor` for this query.|none| +|`forceHashAggregation`|Overrides the value of `druid.query.groupBy.forceHashAggregation`|none| +|`intermediateCombineDegree`|Overrides the value of `druid.query.groupBy.intermediateCombineDegree`|none| +|`numParallelCombineThreads`|Overrides the value of `druid.query.groupBy.numParallelCombineThreads`|none| +|`sortByDimsFirst`|Sort the results first by dimension values and then by timestamp.|false| +|`forceLimitPushDown`|When all fields in the orderby are part of the grouping key, the broker will push limit application down to the Historical processes. When the sorting order uses fields that are not in the grouping key, applying this optimization can result in approximate results with unknown accuracy, so this optimization is disabled by default in that case. Enabling this context flag turns on limit push down for limit/orderbys that contain non-grouping key columns.|false| + +### Router + +#### Router process configs + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.host`|The host for the current process. This is used to advertise the current processes location as reachable from another process and should generally be specified such that `http://${druid.host}/` could actually talk to this process|`InetAddress.getLocalHost().getCanonicalHostName()`| +|`druid.bindOnHost`|Indicating whether the process's internal jetty server bind on `druid.host`. Default is false, which means binding to all interfaces.|false| +|`druid.plaintextPort`|This is the port to actually listen on; unless port mapping is used, this will be the same port as is on `druid.host`|8888| +|`druid.tlsPort`|TLS port for HTTPS connector, if [druid.enableTlsPort](../operations/tls-support.md) is set then this config will be used. If `druid.host` contains port then that port will be ignored. This should be a non-negative Integer.|9088| +|`druid.service`|The name of the service. This is used as a dimension when emitting metrics and alerts to differentiate between the various services|`druid/router`| +|`druid.labels`|Optional JSON object of key-value pairs that define custom labels for the server. These labels are displayed in the web console under the "Services" tab. Example: `druid.labels={"location":"Airtrunk"}` or `druid.labels.location=Airtrunk`|`null`| + +#### Runtime configuration + +|Property|Description|Default| +|--------|-----------|-------| +|`druid.router.defaultBrokerServiceName`|The default Broker to connect to in case service discovery fails.|`druid/broker`| +|`druid.router.tierToBrokerMap`|Queries for a certain tier of data are routed to their appropriate Broker. This value should be an ordered JSON map of tiers to Broker names. The priority of Brokers is based on the ordering.|`{"_default_tier": ""}`| +|`druid.router.defaultRule`|The default rule for all datasources.|`_default`| +|`druid.router.pollPeriod`|How often to poll for new rules.|`PT1M`| +|`druid.router.sql.enable`|Enable routing of SQL queries using strategies. When`true`, the Router uses the strategies defined in `druid.router.strategies` to determine the broker service for a given SQL query. When `false`, the Router uses the `defaultBrokerServiceName`.|`false`| +|`druid.router.strategies`|Please see [Router Strategies](../design/router.md#router-strategies) for details.|`[{"type":"timeBoundary"},{"type":"priority"}]`| +|`druid.router.avatica.balancer.type`|Class to use for balancing Avatica queries across Brokers. Please see [Avatica Query Balancing](../design/router.md#avatica-query-balancing).|`rendezvousHash`| +|`druid.router.managementProxy.enabled`|Enables the Router's [management proxy](../design/router.md#router-as-management-proxy) functionality.|false| +|`druid.router.http.numConnections`|Size of connection pool for the Router to connect to Broker processes. If there are more queries than this number that all need to speak to the same process, then they will queue up.|`20`| +|`druid.router.http.eagerInitialization`|Indicates that http connections from Router to Broker should be eagerly initialized. If set to true, `numConnections` connections are created upon initialization|`true`| +|`druid.router.http.readTimeout`|The timeout for data reads from Broker processes.|`PT15M`| +|`druid.router.http.numMaxThreads`|Maximum number of worker threads to handle HTTP requests and responses|`(number of cores) * 3 / 2 + 1`| +|`druid.router.http.numRequestsQueued`|Maximum number of requests that may be queued to a destination|`1024`| +|`druid.router.http.requestBuffersize`|Size of the content buffer for receiving requests. These buffers are only used for active connections that have requests with bodies that will not fit within the header buffer|`8 * 1024`| +|`druid.router.http.clientConnectTimeout`|The timeout (in milliseconds) for establishing client connections.|500| diff --git a/docs/35.0.0/configuration/logging.md b/docs/35.0.0/configuration/logging.md new file mode 100644 index 0000000000..d740f38b09 --- /dev/null +++ b/docs/35.0.0/configuration/logging.md @@ -0,0 +1,170 @@ +--- +id: logging +title: "Logging" +--- + + + + +Apache Druid services emit logs that to help you debug. +The same services also emit periodic [metrics](../configuration/index.md#metrics-monitors) about their state. +To disable metric info logs set the following runtime property: `-Ddruid.emitter.logging.logLevel=debug`. + +Druid uses [log4j2](http://logging.apache.org/log4j/2.x/) for logging. +The default configuration file log4j2.xml ships with Druid at the following path: `conf/druid/{config}/_common/log4j2.xml`. + +By default, Druid uses `RollingRandomAccessFile` for rollover daily, and keeps log files up to 7 days. +If that's not suitable in your case, modify the `log4j2.xml` accordingly. + +The following example log4j2.xml is based upon the micro quickstart: + +``` + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +``` + +Peons always output logs to standard output. Middle Managers redirect task logs from standard output to +[long-term storage](index.md#log-long-term-storage). + +:::info + + Druid shares the log4j configuration file among all services, including task peon processes. + However, you must define a console appender in the logger for your peon processes. + If you don't define a console appender, Druid creates and configures a new console appender + that retains the log level, such as `info` or `warn`, but does not retain any other appender + configuration, including non-console ones. +::: + +## Log directory +The included log4j2.xml configuration for Druid and ZooKeeper writes logs to the `log` directory at the root of the distribution. + +If you want to change the log directory, set the environment variable `DRUID_LOG_DIR` to the right directory before you start Druid. + +## All-in-one start commands + +If you use one of the all-in-one start commands, such as `bin/start-micro-quickstart`, the default configuration for each service has two kinds of log files. +Log4j2 writes the main log file and rotates it periodically. +For example, `log/historical.log`. + +The secondary log file contains anything that is written by the component +directly to standard output or standard error without going through log4j2. +For example, `log/historical.stdout.log`. +This consists mainly of messages from the +Java runtime itself. +This file is not rotated, but it is generally small due to the low volume of messages. +If necessary, you can truncate it using the Linux command `truncate --size 0 log/historical.stdout.log`. + +## Set the logs to asynchronously write + +If your logs are really chatty, you can set them to write asynchronously. +The following example shows a `log4j2.xml` that configures some of the more chatty classes to write asynchronously: + +``` + + + + + + + + + + + + + + + + + + + + + + + + + + + + +``` diff --git a/docs/35.0.0/data-management/automatic-compaction.md b/docs/35.0.0/data-management/automatic-compaction.md new file mode 100644 index 0000000000..1a0803bafb --- /dev/null +++ b/docs/35.0.0/data-management/automatic-compaction.md @@ -0,0 +1,370 @@ +--- +id: automatic-compaction +title: "Automatic compaction" +--- + + + +In Apache Druid, compaction is a special type of ingestion task that reads data from a Druid datasource and writes it back into the same datasource. A common use case for this is to [optimally size segments](../operations/segment-optimization.md) after ingestion to improve query performance. Automatic compaction, or auto-compaction, refers to the system for automatic execution of compaction tasks issued by Druid itself. In addition to auto-compaction, you can perform [manual compaction](./manual-compaction.md) using the Overlord APIs. + +:::info + Auto-compaction skips datasources that have a segment granularity of `ALL`. +::: + +As a best practice, you should set up auto-compaction for all Druid datasources. You can run compaction tasks manually for cases where you want to allocate more system resources. For example, you may choose to run multiple compaction tasks in parallel to compact an existing datasource for the first time. See [Compaction](compaction.md) for additional details and use cases. + +This topic guides you through setting up automatic compaction for your Druid cluster. See the [examples](#examples) for common use cases for automatic compaction. + +## Auto-compaction syntax + +You can configure automatic compaction dynamically without restarting Druid. +The automatic compaction system uses the following syntax: + +```json +{ + "dataSource": , + "ioConfig": , + "dimensionsSpec": , + "transformSpec": , + "metricsSpec": , + "tuningConfig": , + "granularitySpec": , + "skipOffsetFromLatest":