com.databricks.sdk.service.sql.StatementExecutionAPI Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of databricks-sdk-java Show documentation
There is a newer version: 0.38.0
// Code generated from OpenAPI specs by Databricks SDK Generator. DO NOT EDIT.
package com.databricks.sdk.service.sql;

import com.databricks.sdk.core.ApiClient;
import com.databricks.sdk.support.Generated;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
 * The Databricks SQL Statement Execution API can be used to execute SQL statements on a SQL
 * warehouse and fetch the result.
 *
 * **Getting started**
 *
 * 
We suggest beginning with the [Databricks SQL Statement Execution API tutorial].
 *
 * 
**Overview of statement execution and result fetching**
 *
 * 
Statement execution begins by issuing a :method:statementexecution/executeStatement request
 * with a valid SQL statement and warehouse ID, along with optional parameters such as the data
 * catalog and output format. If no other parameters are specified, the server will wait for up to
 * 10s before returning a response. If the statement has completed within this timespan, the
 * response will include the result data as a JSON array and metadata. Otherwise, if no result is
 * available after the 10s timeout expired, the response will provide the statement ID that can be
 * used to poll for results by using a :method:statementexecution/getStatement request.
 *
 * 
You can specify whether the call should behave synchronously, asynchronously or start
 * synchronously with a fallback to asynchronous execution. This is controlled with the
 * `wait_timeout` and `on_wait_timeout` settings. If `wait_timeout` is set between 5-50 seconds
 * (default: 10s), the call waits for results up to the specified timeout; when set to `0s`, the
 * call is asynchronous and responds immediately with a statement ID. The `on_wait_timeout` setting
 * specifies what should happen when the timeout is reached while the statement execution has not
 * yet finished. This can be set to either `CONTINUE`, to fallback to asynchronous mode, or it can
 * be set to `CANCEL`, which cancels the statement.
 *
 * 
In summary: - Synchronous mode - `wait_timeout=30s` and `on_wait_timeout=CANCEL` - The call
 * waits up to 30 seconds; if the statement execution finishes within this time, the result data is
 * returned directly in the response. If the execution takes longer than 30 seconds, the execution
 * is canceled and the call returns with a `CANCELED` state. - Asynchronous mode - `wait_timeout=0s`
 * (`on_wait_timeout` is ignored) - The call doesn't wait for the statement to finish but returns
 * directly with a statement ID. The status of the statement execution can be polled by issuing
 * :method:statementexecution/getStatement with the statement ID. Once the execution has succeeded,
 * this call also returns the result and metadata in the response. - Hybrid mode (default) -
 * `wait_timeout=10s` and `on_wait_timeout=CONTINUE` - The call waits for up to 10 seconds; if the
 * statement execution finishes within this time, the result data is returned directly in the
 * response. If the execution takes longer than 10 seconds, a statement ID is returned. The
 * statement ID can be used to fetch status and results in the same way as in the asynchronous mode.
 *
 * 
Depending on the size, the result can be split into multiple chunks. If the statement
 * execution is successful, the statement response contains a manifest and the first chunk of the
 * result. The manifest contains schema information and provides metadata for each chunk in the
 * result. Result chunks can be retrieved by index with
 * :method:statementexecution/getStatementResultChunkN which may be called in any order and in
 * parallel. For sequential fetching, each chunk, apart from the last, also contains a
 * `next_chunk_index` and `next_chunk_internal_link` that point to the next chunk.
 *
 * 
A statement can be canceled with :method:statementexecution/cancelExecution.
 *
 * 
**Fetching result data: format and disposition**
 *
 * 
To specify the format of the result data, use the `format` field, which can be set to one of
 * the following options: `JSON_ARRAY` (JSON), `ARROW_STREAM` ([Apache Arrow Columnar]), or `CSV`.
 *
 * 
There are two ways to receive statement results, controlled by the `disposition` setting,
 * which can be either `INLINE` or `EXTERNAL_LINKS`:
 *
 * 
- `INLINE`: In this mode, the result data is directly included in the response. It's best
 * suited for smaller results. This mode can only be used with the `JSON_ARRAY` format.
 *
 * 
- `EXTERNAL_LINKS`: In this mode, the response provides links that can be used to download the
 * result data in chunks separately. This approach is ideal for larger results and offers higher
 * throughput. This mode can be used with all the formats: `JSON_ARRAY`, `ARROW_STREAM`, and `CSV`.
 *
 * 
By default, the API uses `format=JSON_ARRAY` and `disposition=INLINE`.
 *
 * 
**Limits and limitations**
 *
 * 
Note: The byte limit for INLINE disposition is based on internal storage metrics and will not
 * exactly match the byte count of the actual payload.
 *
 * 
- Statements with `disposition=INLINE` are limited to 25 MiB and will fail when this limit is
 * exceeded. - Statements with `disposition=EXTERNAL_LINKS` are limited to 100 GiB. Result sets
 * larger than this limit will be truncated. Truncation is indicated by the `truncated` field in the
 * result manifest. - The maximum query text size is 16 MiB. - Cancelation might silently fail. A
 * successful response from a cancel request indicates that the cancel request was successfully
 * received and sent to the processing engine. However, an outstanding statement might have already
 * completed execution when the cancel request arrives. Polling for status until a terminal state is
 * reached is a reliable way to determine the final state. - Wait timeouts are approximate, occur
 * server-side, and cannot account for things such as caller delays and network latency from caller
 * to service. - The system will auto-close a statement after one hour if the client stops polling
 * and thus you must poll at least once an hour. - The results are only available for one hour after
 * success; polling does not extend this.
 *
 * 
[Apache Arrow Columnar]: https://arrow.apache.org/overview/ [Databricks SQL Statement
 * Execution API tutorial]: https://docs.databricks.com/sql/api/sql-execution-tutorial.html
 */
@Generated
public class StatementExecutionAPI {
  private static final Logger LOG = LoggerFactory.getLogger(StatementExecutionAPI.class);

  private final StatementExecutionService impl;

  /** Regular-use constructor */
  public StatementExecutionAPI(ApiClient apiClient) {
    impl = new StatementExecutionImpl(apiClient);
  }

  /** Constructor for mocks */
  public StatementExecutionAPI(StatementExecutionService mock) {
    impl = mock;
  }

  public void cancelExecution(String statementId) {
    cancelExecution(new CancelExecutionRequest().setStatementId(statementId));
  }

  /**
   * Cancel statement execution.
   *
   * 
Requests that an executing statement be canceled. Callers must poll for status to see the
   * terminal state.
   */
  public void cancelExecution(CancelExecutionRequest request) {
    impl.cancelExecution(request);
  }

  public ExecuteStatementResponse executeStatement(String statement, String warehouseId) {
    return executeStatement(
        new ExecuteStatementRequest().setStatement(statement).setWarehouseId(warehouseId));
  }

  /** Execute a SQL statement. */
  public ExecuteStatementResponse executeStatement(ExecuteStatementRequest request) {
    return impl.executeStatement(request);
  }

  public GetStatementResponse getStatement(String statementId) {
    return getStatement(new GetStatementRequest().setStatementId(statementId));
  }

  /**
   * Get status, manifest, and result first chunk.
   *
   * 
This request can be used to poll for the statement's status. When the `status.state` field
   * is `SUCCEEDED` it will also return the result manifest and the first chunk of the result data.
   * When the statement is in the terminal states `CANCELED`, `CLOSED` or `FAILED`, it returns HTTP
   * 200 with the state set. After at least 12 hours in terminal state, the statement is removed
   * from the warehouse and further calls will receive an HTTP 404 response.
   *
   * 
**NOTE** This call currently might take up to 5 seconds to get the latest status and result.
   */
  public GetStatementResponse getStatement(GetStatementRequest request) {
    return impl.getStatement(request);
  }

  public ResultData getStatementResultChunkN(String statementId, long chunkIndex) {
    return getStatementResultChunkN(
        new GetStatementResultChunkNRequest()
            .setStatementId(statementId)
            .setChunkIndex(chunkIndex));
  }

  /**
   * Get result chunk by index.
   *
   * After the statement execution has `SUCCEEDED`, this request can be used to fetch any chunk
   * by index. Whereas the first chunk with `chunk_index=0` is typically fetched with
   * :method:statementexecution/executeStatement or :method:statementexecution/getStatement, this
   * request can be used to fetch subsequent chunks. The response structure is identical to the
   * nested `result` element described in the :method:statementexecution/getStatement request, and
   * similarly includes the `next_chunk_index` and `next_chunk_internal_link` fields for simple
   * iteration through the result set.
   */
  public ResultData getStatementResultChunkN(GetStatementResultChunkNRequest request) {
    return impl.getStatementResultChunkN(request);
  }

  public StatementExecutionService impl() {
    return impl;
  }
}