All Downloads are FREE. Search and download functionalities are using the official Maven repository.

com.databricks.sdk.service.sql.StatementExecutionAPI Maven / Gradle / Ivy

// Code generated from OpenAPI specs by Databricks SDK Generator. DO NOT EDIT.
package com.databricks.sdk.service.sql;

import com.databricks.sdk.core.ApiClient;
import com.databricks.sdk.support.Generated;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
 * The SQL Statement Execution API manages the execution of arbitrary SQL statements and the
 * fetching of result data.
 *
 * 

**Release status** * *

This feature is in [Public Preview]. * *

**Getting started** * *

We suggest beginning with the [SQL Statement Execution API tutorial]. * *

**Overview of statement execution and result fetching** * *

Statement execution begins by issuing a :method:statementexecution/executeStatement request * with a valid SQL statement and warehouse ID, along with optional parameters such as the data * catalog and output format. * *

When submitting the statement, the call can behave synchronously or asynchronously, based on * the `wait_timeout` setting. When set between 5-50 seconds (default: 10) the call behaves * synchronously and waits for results up to the specified timeout; when set to `0s`, the call is * asynchronous and responds immediately with a statement ID that can be used to poll for status or * fetch the results in a separate call. * *

**Call mode: synchronous** * *

In synchronous mode, when statement execution completes within the `wait timeout`, the result * data is returned directly in the response. This response will contain `statement_id`, `status`, * `manifest`, and `result` fields. The `status` field confirms success whereas the `manifest` field * contains the result data column schema and metadata about the result set. The `result` field * contains the first chunk of result data according to the specified `disposition`, and links to * fetch any remaining chunks. * *

If the execution does not complete before `wait_timeout`, the setting `on_wait_timeout` * determines how the system responds. * *

By default, `on_wait_timeout=CONTINUE`, and after reaching `wait_timeout`, a response is * returned and statement execution continues asynchronously. The response will contain only * `statement_id` and `status` fields, and the caller must now follow the flow described for * asynchronous call mode to poll and fetch the result. * *

Alternatively, `on_wait_timeout` can also be set to `CANCEL`; in this case if the timeout is * reached before execution completes, the underlying statement execution is canceled, and a * `CANCELED` status is returned in the response. * *

**Call mode: asynchronous** * *

In asynchronous mode, or after a timed-out synchronous request continues, a `statement_id` and * `status` will be returned. In this case polling :method:statementexecution/getStatement calls are * required to fetch the result and metadata. * *

Next, a caller must poll until execution completes (`SUCCEEDED`, `FAILED`, etc.) by issuing * :method:statementexecution/getStatement requests for the given `statement_id`. * *

When execution has succeeded, the response will contain `status`, `manifest`, and `result` * fields. These fields and the structure are identical to those in the response to a successful * synchronous submission. The `result` field will contain the first chunk of result data, either * `INLINE` or as `EXTERNAL_LINKS` depending on `disposition`. Additional chunks of result data can * be fetched by checking for the presence of the `next_chunk_internal_link` field, and iteratively * `GET` those paths until that field is unset: `GET * https://$DATABRICKS_HOST/{next_chunk_internal_link}`. * *

**Fetching result data: format and disposition** * *

To specify the result data format, set the `format` field to `JSON_ARRAY` (JSON), * `ARROW_STREAM` ([Apache Arrow Columnar]), or `CSV`. * *

You can also configure how to fetch the result data in two different modes by setting the * `disposition` field to `INLINE` or `EXTERNAL_LINKS`. * *

The `INLINE` disposition can only be used with the `JSON_ARRAY` format and allows results up * to 16 MiB. When a statement executed with `INLINE` disposition exceeds this limit, the execution * is aborted, and no result can be fetched. * *

The `EXTERNAL_LINKS` disposition allows fetching large result sets in `JSON_ARRAY`, * `ARROW_STREAM` and `CSV` formats, and with higher throughput. * *

The API uses defaults of `format=JSON_ARRAY` and `disposition=INLINE`. Databricks recommends * that you explicit setting the format and the disposition for all production use cases. * *

**Statement response: statement_id, status, manifest, and result** * *

The base call :method:statementexecution/getStatement returns a single response combining * `statement_id`, `status`, a result `manifest`, and a `result` data chunk or link, depending on * the `disposition`. The `manifest` contains the result schema definition and the result summary * metadata. When using `disposition=EXTERNAL_LINKS`, it also contains a full listing of all chunks * and their summary metadata. * *

**Use case: small result sets with INLINE + JSON_ARRAY** * *

For flows that generate small and predictable result sets (<= 16 MiB), `INLINE` downloads of * `JSON_ARRAY` result data are typically the simplest way to execute and fetch result data. * *

When the result set with `disposition=INLINE` is larger, the result can be transferred in * chunks. After receiving the initial chunk with :method:statementexecution/executeStatement or * :method:statementexecution/getStatement subsequent calls are required to iteratively fetch each * chunk. Each result response contains a link to the next chunk, when there are additional chunks * to fetch; it can be found in the field `.next_chunk_internal_link`. This link is an absolute * `path` to be joined with your `$DATABRICKS_HOST`, and of the form * `/api/2.0/sql/statements/{statement_id}/result/chunks/{chunk_index}`. The next chunk can be * fetched by issuing a :method:statementexecution/getStatementResultChunkN request. * *

When using this mode, each chunk may be fetched once, and in order. A chunk without a field * `next_chunk_internal_link` indicates the last chunk was reached and all chunks have been fetched * from the result set. * *

**Use case: large result sets with EXTERNAL_LINKS + ARROW_STREAM** * *

Using `EXTERNAL_LINKS` to fetch result data in Arrow format allows you to fetch large result * sets efficiently. The primary difference from using `INLINE` disposition is that fetched result * chunks contain resolved `external_links` URLs, which can be fetched with standard HTTP. * *

**Presigned URLs** * *

External links point to data stored within your workspace's internal DBFS, in the form of a * presigned URL. The URLs are valid for only a short period, <= 15 minutes. Alongside each * `external_link` is an expiration field indicating the time at which the URL is no longer valid. * In `EXTERNAL_LINKS` mode, chunks can be resolved and fetched multiple times and in parallel. * *

---- * *

### **Warning: We recommend you protect the URLs in the EXTERNAL_LINKS.** * *

When using the EXTERNAL_LINKS disposition, a short-lived pre-signed URL is generated, which * the client can use to download the result chunk directly from cloud storage. As the short-lived * credential is embedded in a pre-signed URL, this URL should be protected. * *

Since pre-signed URLs are generated with embedded temporary credentials, you need to remove * the authorization header from the fetch requests. * *

---- * *

Similar to `INLINE` mode, callers can iterate through the result set, by using the * `next_chunk_internal_link` field. Each internal link response will contain an external link to * the raw chunk data, and additionally contain the `next_chunk_internal_link` if there are more * chunks. * *

Unlike `INLINE` mode, when using `EXTERNAL_LINKS`, chunks may be fetched out of order, and in * parallel to achieve higher throughput. * *

**Limits and limitations** * *

Note: All byte limits are calculated based on internal storage metrics and will not match byte * counts of actual payloads. * *

- Statements with `disposition=INLINE` are limited to 16 MiB and will abort when this limit is * exceeded. - Statements with `disposition=EXTERNAL_LINKS` are limited to 100 GiB. - The maximum * query text size is 16 MiB. - Cancelation may silently fail. A successful response from a cancel * request indicates that the cancel request was successfully received and sent to the processing * engine. However, for example, an outstanding statement may complete execution during signal * delivery, with the cancel signal arriving too late to be meaningful. Polling for status until a * terminal state is reached is a reliable way to determine the final state. - Wait timeouts are * approximate, occur server-side, and cannot account for caller delays, network latency from caller * to service, and similarly. - After a statement has been submitted and a statement_id is returned, * that statement's status and result will automatically close after either of 2 conditions: - The * last result chunk is fetched (or resolved to an external link). - One hour passes with no calls * to get the status or fetch the result. Best practice: in asynchronous clients, poll for status * regularly (and with backoff) to keep the statement open and alive. - After fetching the last * result chunk (including chunk_index=0) the statement is automatically closed. * *

[Apache Arrow Columnar]: https://arrow.apache.org/overview/ [Public Preview]: * https://docs.databricks.com/release-notes/release-types.html [SQL Statement Execution API * tutorial]: https://docs.databricks.com/sql/api/sql-execution-tutorial.html */ @Generated public class StatementExecutionAPI { private static final Logger LOG = LoggerFactory.getLogger(StatementExecutionAPI.class); private final StatementExecutionService impl; /** Regular-use constructor */ public StatementExecutionAPI(ApiClient apiClient) { impl = new StatementExecutionImpl(apiClient); } /** Constructor for mocks */ public StatementExecutionAPI(StatementExecutionService mock) { impl = mock; } public void cancelExecution(String statementId) { cancelExecution(new CancelExecutionRequest().setStatementId(statementId)); } /** * Cancel statement execution. * *

Requests that an executing statement be canceled. Callers must poll for status to see the * terminal state. */ public void cancelExecution(CancelExecutionRequest request) { impl.cancelExecution(request); } /** * Execute a SQL statement. * *

Execute a SQL statement, and if flagged as such, await its result for a specified time. */ public ExecuteStatementResponse executeStatement(ExecuteStatementRequest request) { return impl.executeStatement(request); } public GetStatementResponse getStatement(String statementId) { return getStatement(new GetStatementRequest().setStatementId(statementId)); } /** * Get status, manifest, and result first chunk. * *

This request can be used to poll for the statement's status. When the `status.state` field * is `SUCCEEDED` it will also return the result manifest and the first chunk of the result data. * When the statement is in the terminal states `CANCELED`, `CLOSED` or `FAILED`, it returns HTTP * 200 with the state set. After at least 12 hours in terminal state, the statement is removed * from the warehouse and further calls will receive an HTTP 404 response. * *

**NOTE** This call currently may take up to 5 seconds to get the latest status and result. */ public GetStatementResponse getStatement(GetStatementRequest request) { return impl.getStatement(request); } public ResultData getStatementResultChunkN(String statementId, long chunkIndex) { return getStatementResultChunkN( new GetStatementResultChunkNRequest() .setStatementId(statementId) .setChunkIndex(chunkIndex)); } /** * Get result chunk by index. * *

After the statement execution has `SUCCEEDED`, the result data can be fetched by chunks. * Whereas the first chuck with `chunk_index=0` is typically fetched through a `get status` * request, subsequent chunks can be fetched using a `get result` request. The response structure * is identical to the nested `result` element described in the `get status` request, and * similarly includes the `next_chunk_index` and `next_chunk_internal_link` fields for simple * iteration through the result set. */ public ResultData getStatementResultChunkN(GetStatementResultChunkNRequest request) { return impl.getStatementResultChunkN(request); } public StatementExecutionService impl() { return impl; } }





© 2015 - 2025 Weber Informatics LLC | Privacy Policy