
com.databricks.sdk.service.sql.StatementExecutionAPI Maven / Gradle / Ivy
// Code generated from OpenAPI specs by Databricks SDK Generator. DO NOT EDIT.
package com.databricks.sdk.service.sql;
import com.databricks.sdk.core.ApiClient;
import com.databricks.sdk.support.Generated;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
/**
* The SQL Statement Execution API manages the execution of arbitrary SQL statements and the
* fetching of result data.
*
* **Release status**
*
*
This feature is in [Public Preview].
*
*
**Getting started**
*
*
We suggest beginning with the [SQL Statement Execution API tutorial].
*
*
**Overview of statement execution and result fetching**
*
*
Statement execution begins by issuing a :method:statementexecution/executeStatement request
* with a valid SQL statement and warehouse ID, along with optional parameters such as the data
* catalog and output format.
*
*
When submitting the statement, the call can behave synchronously or asynchronously, based on
* the `wait_timeout` setting. When set between 5-50 seconds (default: 10) the call behaves
* synchronously and waits for results up to the specified timeout; when set to `0s`, the call is
* asynchronous and responds immediately with a statement ID that can be used to poll for status or
* fetch the results in a separate call.
*
*
**Call mode: synchronous**
*
*
In synchronous mode, when statement execution completes within the `wait timeout`, the result
* data is returned directly in the response. This response will contain `statement_id`, `status`,
* `manifest`, and `result` fields. The `status` field confirms success whereas the `manifest` field
* contains the result data column schema and metadata about the result set. The `result` field
* contains the first chunk of result data according to the specified `disposition`, and links to
* fetch any remaining chunks.
*
*
If the execution does not complete before `wait_timeout`, the setting `on_wait_timeout`
* determines how the system responds.
*
*
By default, `on_wait_timeout=CONTINUE`, and after reaching `wait_timeout`, a response is
* returned and statement execution continues asynchronously. The response will contain only
* `statement_id` and `status` fields, and the caller must now follow the flow described for
* asynchronous call mode to poll and fetch the result.
*
*
Alternatively, `on_wait_timeout` can also be set to `CANCEL`; in this case if the timeout is
* reached before execution completes, the underlying statement execution is canceled, and a
* `CANCELED` status is returned in the response.
*
*
**Call mode: asynchronous**
*
*
In asynchronous mode, or after a timed-out synchronous request continues, a `statement_id` and
* `status` will be returned. In this case polling :method:statementexecution/getStatement calls are
* required to fetch the result and metadata.
*
*
Next, a caller must poll until execution completes (`SUCCEEDED`, `FAILED`, etc.) by issuing
* :method:statementexecution/getStatement requests for the given `statement_id`.
*
*
When execution has succeeded, the response will contain `status`, `manifest`, and `result`
* fields. These fields and the structure are identical to those in the response to a successful
* synchronous submission. The `result` field will contain the first chunk of result data, either
* `INLINE` or as `EXTERNAL_LINKS` depending on `disposition`. Additional chunks of result data can
* be fetched by checking for the presence of the `next_chunk_internal_link` field, and iteratively
* `GET` those paths until that field is unset: `GET
* https://$DATABRICKS_HOST/{next_chunk_internal_link}`.
*
*
**Fetching result data: format and disposition**
*
*
To specify the result data format, set the `format` field to `JSON_ARRAY` (JSON),
* `ARROW_STREAM` ([Apache Arrow Columnar]), or `CSV`.
*
*
You can also configure how to fetch the result data in two different modes by setting the
* `disposition` field to `INLINE` or `EXTERNAL_LINKS`.
*
*
The `INLINE` disposition can only be used with the `JSON_ARRAY` format and allows results up
* to 16 MiB. When a statement executed with `INLINE` disposition exceeds this limit, the execution
* is aborted, and no result can be fetched.
*
*
The `EXTERNAL_LINKS` disposition allows fetching large result sets in `JSON_ARRAY`,
* `ARROW_STREAM` and `CSV` formats, and with higher throughput.
*
*
The API uses defaults of `format=JSON_ARRAY` and `disposition=INLINE`. Databricks recommends
* that you explicit setting the format and the disposition for all production use cases.
*
*
**Statement response: statement_id, status, manifest, and result**
*
*
The base call :method:statementexecution/getStatement returns a single response combining
* `statement_id`, `status`, a result `manifest`, and a `result` data chunk or link, depending on
* the `disposition`. The `manifest` contains the result schema definition and the result summary
* metadata. When using `disposition=EXTERNAL_LINKS`, it also contains a full listing of all chunks
* and their summary metadata.
*
*
**Use case: small result sets with INLINE + JSON_ARRAY**
*
*
For flows that generate small and predictable result sets (<= 16 MiB), `INLINE` downloads of
* `JSON_ARRAY` result data are typically the simplest way to execute and fetch result data.
*
*
When the result set with `disposition=INLINE` is larger, the result can be transferred in
* chunks. After receiving the initial chunk with :method:statementexecution/executeStatement or
* :method:statementexecution/getStatement subsequent calls are required to iteratively fetch each
* chunk. Each result response contains a link to the next chunk, when there are additional chunks
* to fetch; it can be found in the field `.next_chunk_internal_link`. This link is an absolute
* `path` to be joined with your `$DATABRICKS_HOST`, and of the form
* `/api/2.0/sql/statements/{statement_id}/result/chunks/{chunk_index}`. The next chunk can be
* fetched by issuing a :method:statementexecution/getStatementResultChunkN request.
*
*
When using this mode, each chunk may be fetched once, and in order. A chunk without a field
* `next_chunk_internal_link` indicates the last chunk was reached and all chunks have been fetched
* from the result set.
*
*
**Use case: large result sets with EXTERNAL_LINKS + ARROW_STREAM**
*
*
Using `EXTERNAL_LINKS` to fetch result data in Arrow format allows you to fetch large result
* sets efficiently. The primary difference from using `INLINE` disposition is that fetched result
* chunks contain resolved `external_links` URLs, which can be fetched with standard HTTP.
*
*
**Presigned URLs**
*
*
External links point to data stored within your workspace's internal DBFS, in the form of a
* presigned URL. The URLs are valid for only a short period, <= 15 minutes. Alongside each
* `external_link` is an expiration field indicating the time at which the URL is no longer valid.
* In `EXTERNAL_LINKS` mode, chunks can be resolved and fetched multiple times and in parallel.
*
*
----
*
*
### **Warning: We recommend you protect the URLs in the EXTERNAL_LINKS.**
*
*
When using the EXTERNAL_LINKS disposition, a short-lived pre-signed URL is generated, which
* the client can use to download the result chunk directly from cloud storage. As the short-lived
* credential is embedded in a pre-signed URL, this URL should be protected.
*
*
Since pre-signed URLs are generated with embedded temporary credentials, you need to remove
* the authorization header from the fetch requests.
*
*
----
*
*
Similar to `INLINE` mode, callers can iterate through the result set, by using the
* `next_chunk_internal_link` field. Each internal link response will contain an external link to
* the raw chunk data, and additionally contain the `next_chunk_internal_link` if there are more
* chunks.
*
*
Unlike `INLINE` mode, when using `EXTERNAL_LINKS`, chunks may be fetched out of order, and in
* parallel to achieve higher throughput.
*
*
**Limits and limitations**
*
*
Note: All byte limits are calculated based on internal storage metrics and will not match byte
* counts of actual payloads.
*
*
- Statements with `disposition=INLINE` are limited to 16 MiB and will abort when this limit is
* exceeded. - Statements with `disposition=EXTERNAL_LINKS` are limited to 100 GiB. - The maximum
* query text size is 16 MiB. - Cancelation may silently fail. A successful response from a cancel
* request indicates that the cancel request was successfully received and sent to the processing
* engine. However, for example, an outstanding statement may complete execution during signal
* delivery, with the cancel signal arriving too late to be meaningful. Polling for status until a
* terminal state is reached is a reliable way to determine the final state. - Wait timeouts are
* approximate, occur server-side, and cannot account for caller delays, network latency from caller
* to service, and similarly. - After a statement has been submitted and a statement_id is returned,
* that statement's status and result will automatically close after either of 2 conditions: - The
* last result chunk is fetched (or resolved to an external link). - One hour passes with no calls
* to get the status or fetch the result. Best practice: in asynchronous clients, poll for status
* regularly (and with backoff) to keep the statement open and alive. - After fetching the last
* result chunk (including chunk_index=0) the statement is automatically closed.
*
*
[Apache Arrow Columnar]: https://arrow.apache.org/overview/ [Public Preview]:
* https://docs.databricks.com/release-notes/release-types.html [SQL Statement Execution API
* tutorial]: https://docs.databricks.com/sql/api/sql-execution-tutorial.html
*/
@Generated
public class StatementExecutionAPI {
private static final Logger LOG = LoggerFactory.getLogger(StatementExecutionAPI.class);
private final StatementExecutionService impl;
/** Regular-use constructor */
public StatementExecutionAPI(ApiClient apiClient) {
impl = new StatementExecutionImpl(apiClient);
}
/** Constructor for mocks */
public StatementExecutionAPI(StatementExecutionService mock) {
impl = mock;
}
public void cancelExecution(String statementId) {
cancelExecution(new CancelExecutionRequest().setStatementId(statementId));
}
/**
* Cancel statement execution.
*
*
Requests that an executing statement be canceled. Callers must poll for status to see the
* terminal state.
*/
public void cancelExecution(CancelExecutionRequest request) {
impl.cancelExecution(request);
}
/**
* Execute a SQL statement.
*
*
Execute a SQL statement, and if flagged as such, await its result for a specified time.
*/
public ExecuteStatementResponse executeStatement(ExecuteStatementRequest request) {
return impl.executeStatement(request);
}
public GetStatementResponse getStatement(String statementId) {
return getStatement(new GetStatementRequest().setStatementId(statementId));
}
/**
* Get status, manifest, and result first chunk.
*
*
This request can be used to poll for the statement's status. When the `status.state` field
* is `SUCCEEDED` it will also return the result manifest and the first chunk of the result data.
* When the statement is in the terminal states `CANCELED`, `CLOSED` or `FAILED`, it returns HTTP
* 200 with the state set. After at least 12 hours in terminal state, the statement is removed
* from the warehouse and further calls will receive an HTTP 404 response.
*
*
**NOTE** This call currently may take up to 5 seconds to get the latest status and result.
*/
public GetStatementResponse getStatement(GetStatementRequest request) {
return impl.getStatement(request);
}
public ResultData getStatementResultChunkN(String statementId, long chunkIndex) {
return getStatementResultChunkN(
new GetStatementResultChunkNRequest()
.setStatementId(statementId)
.setChunkIndex(chunkIndex));
}
/**
* Get result chunk by index.
*
*
After the statement execution has `SUCCEEDED`, the result data can be fetched by chunks.
* Whereas the first chuck with `chunk_index=0` is typically fetched through a `get status`
* request, subsequent chunks can be fetched using a `get result` request. The response structure
* is identical to the nested `result` element described in the `get status` request, and
* similarly includes the `next_chunk_index` and `next_chunk_internal_link` fields for simple
* iteration through the result set.
*/
public ResultData getStatementResultChunkN(GetStatementResultChunkNRequest request) {
return impl.getStatementResultChunkN(request);
}
public StatementExecutionService impl() {
return impl;
}
}