com.amazonaws.services.textract.AmazonTextract Maven / Gradle / Ivy
Show all versions of aws-java-sdk-textract Show documentation
/*
* Copyright 2016-2021 Amazon.com, Inc. or its affiliates. All Rights Reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance with
* the License. A copy of the License is located at
*
* http://aws.amazon.com/apache2.0
*
* or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
* CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions
* and limitations under the License.
*/
package com.amazonaws.services.textract;
import javax.annotation.Generated;
import com.amazonaws.*;
import com.amazonaws.regions.*;
import com.amazonaws.services.textract.model.*;
/**
* Interface for accessing Amazon Textract.
*
* Note: Do not directly implement this interface, new methods are added to it regularly. Extend from
* {@link com.amazonaws.services.textract.AbstractAmazonTextract} instead.
*
*
*
* Amazon Textract detects and analyzes text in documents and converts it into machine-readable text. This is the API
* reference documentation for Amazon Textract.
*
*/
@Generated("com.amazonaws:aws-java-sdk-code-generator")
public interface AmazonTextract {
/**
* The region metadata service name for computing region endpoints. You can use this value to retrieve metadata
* (such as supported regions) of the service.
*
* @see RegionUtils#getRegionsForService(String)
*/
String ENDPOINT_PREFIX = "textract";
/**
*
* Analyzes an input document for relationships between detected items.
*
*
* The types of information returned are as follows:
*
*
* -
*
* Form data (key-value pairs). The related information is returned in two Block objects, each of type
* KEY_VALUE_SET
: a KEY Block
object and a VALUE Block
object. For example,
* Name: Ana Silva Carolina contains a key and value. Name: is the key. Ana Silva Carolina is
* the value.
*
*
* -
*
* Table and table cell data. A TABLE Block
object contains information about a detected table. A CELL
* Block
object is returned for each cell in a table.
*
*
* -
*
* Lines and words of text. A LINE Block
object contains one or more WORD Block
objects.
* All lines and words that are detected in the document are returned (including text that doesn't have a
* relationship with the value of FeatureTypes
).
*
*
*
*
* Selection elements such as check boxes and option buttons (radio buttons) can be detected in form data and in
* tables. A SELECTION_ELEMENT Block
object contains information about a selection element, including
* the selection status.
*
*
* You can choose which type of analysis to perform by specifying the FeatureTypes
list.
*
*
* The output is returned in a list of Block
objects.
*
*
* AnalyzeDocument
is a synchronous operation. To analyze documents asynchronously, use
* StartDocumentAnalysis.
*
*
* For more information, see Document Text Analysis.
*
*
* @param analyzeDocumentRequest
* @return Result of the AnalyzeDocument operation returned by the service.
* @throws InvalidParameterException
* An input parameter violated a constraint. For example, in synchronous operations, an
* InvalidParameterException
exception occurs when neither of the S3Object
or
* Bytes
values are supplied in the Document
request parameter. Validate your
* parameter before calling the API operation again.
* @throws InvalidS3ObjectException
* Amazon Textract is unable to access the S3 object that's specified in the request. for more information,
* Configure Access to
* Amazon S3 For troubleshooting information, see Troubleshooting Amazon S3
* @throws UnsupportedDocumentException
* The format of the input document isn't supported. Documents for synchronous operations can be in PNG or
* JPEG format. Documents for asynchronous operations can also be in PDF format.
* @throws DocumentTooLargeException
* The document can't be processed because it's too large. The maximum document size for synchronous
* operations 10 MB. The maximum document size for asynchronous operations is 500 MB for PDF files.
* @throws BadDocumentException
* Amazon Textract isn't able to read the document. For more information on the document limits in Amazon
* Textract, see limits.
* @throws AccessDeniedException
* You aren't authorized to perform the action. Use the Amazon Resource Name (ARN) of an authorized user or
* IAM role to perform the operation.
* @throws ProvisionedThroughputExceededException
* The number of requests exceeded your throughput limit. If you want to increase this limit, contact Amazon
* Textract.
* @throws InternalServerErrorException
* Amazon Textract experienced a service issue. Try your call again.
* @throws ThrottlingException
* Amazon Textract is temporarily unable to process the request. Try your call again.
* @throws HumanLoopQuotaExceededException
* Indicates you have exceeded the maximum number of active human in the loop workflows available
* @sample AmazonTextract.AnalyzeDocument
* @see AWS API
* Documentation
*/
AnalyzeDocumentResult analyzeDocument(AnalyzeDocumentRequest analyzeDocumentRequest);
/**
*
* Analyzes an input document for financially related relationships between text.
*
*
* Information is returned as ExpenseDocuments
and seperated as follows.
*
*
* -
*
* LineItemGroups
- A data set containing LineItems
which store information about the lines
* of text, such as an item purchased and its price on a receipt.
*
*
* -
*
* SummaryFields
- Contains all other information a receipt, such as header information or the vendors
* name.
*
*
*
*
* @param analyzeExpenseRequest
* @return Result of the AnalyzeExpense operation returned by the service.
* @throws InvalidParameterException
* An input parameter violated a constraint. For example, in synchronous operations, an
* InvalidParameterException
exception occurs when neither of the S3Object
or
* Bytes
values are supplied in the Document
request parameter. Validate your
* parameter before calling the API operation again.
* @throws InvalidS3ObjectException
* Amazon Textract is unable to access the S3 object that's specified in the request. for more information,
* Configure Access to
* Amazon S3 For troubleshooting information, see Troubleshooting Amazon S3
* @throws UnsupportedDocumentException
* The format of the input document isn't supported. Documents for synchronous operations can be in PNG or
* JPEG format. Documents for asynchronous operations can also be in PDF format.
* @throws DocumentTooLargeException
* The document can't be processed because it's too large. The maximum document size for synchronous
* operations 10 MB. The maximum document size for asynchronous operations is 500 MB for PDF files.
* @throws BadDocumentException
* Amazon Textract isn't able to read the document. For more information on the document limits in Amazon
* Textract, see limits.
* @throws AccessDeniedException
* You aren't authorized to perform the action. Use the Amazon Resource Name (ARN) of an authorized user or
* IAM role to perform the operation.
* @throws ProvisionedThroughputExceededException
* The number of requests exceeded your throughput limit. If you want to increase this limit, contact Amazon
* Textract.
* @throws InternalServerErrorException
* Amazon Textract experienced a service issue. Try your call again.
* @throws ThrottlingException
* Amazon Textract is temporarily unable to process the request. Try your call again.
* @sample AmazonTextract.AnalyzeExpense
* @see AWS API
* Documentation
*/
AnalyzeExpenseResult analyzeExpense(AnalyzeExpenseRequest analyzeExpenseRequest);
/**
*
* Detects text in the input document. Amazon Textract can detect lines of text and the words that make up a line of
* text. The input document must be an image in JPEG or PNG format. DetectDocumentText
returns the
* detected text in an array of Block objects.
*
*
* Each document page has as an associated Block
of type PAGE. Each PAGE Block
object is
* the parent of LINE Block
objects that represent the lines of detected text on a page. A LINE
* Block
object is a parent for each word that makes up the line. Words are represented by
* Block
objects of type WORD.
*
*
* DetectDocumentText
is a synchronous operation. To analyze documents asynchronously, use
* StartDocumentTextDetection.
*
*
* For more information, see Document Text Detection.
*
*
* @param detectDocumentTextRequest
* @return Result of the DetectDocumentText operation returned by the service.
* @throws InvalidParameterException
* An input parameter violated a constraint. For example, in synchronous operations, an
* InvalidParameterException
exception occurs when neither of the S3Object
or
* Bytes
values are supplied in the Document
request parameter. Validate your
* parameter before calling the API operation again.
* @throws InvalidS3ObjectException
* Amazon Textract is unable to access the S3 object that's specified in the request. for more information,
* Configure Access to
* Amazon S3 For troubleshooting information, see Troubleshooting Amazon S3
* @throws UnsupportedDocumentException
* The format of the input document isn't supported. Documents for synchronous operations can be in PNG or
* JPEG format. Documents for asynchronous operations can also be in PDF format.
* @throws DocumentTooLargeException
* The document can't be processed because it's too large. The maximum document size for synchronous
* operations 10 MB. The maximum document size for asynchronous operations is 500 MB for PDF files.
* @throws BadDocumentException
* Amazon Textract isn't able to read the document. For more information on the document limits in Amazon
* Textract, see limits.
* @throws AccessDeniedException
* You aren't authorized to perform the action. Use the Amazon Resource Name (ARN) of an authorized user or
* IAM role to perform the operation.
* @throws ProvisionedThroughputExceededException
* The number of requests exceeded your throughput limit. If you want to increase this limit, contact Amazon
* Textract.
* @throws InternalServerErrorException
* Amazon Textract experienced a service issue. Try your call again.
* @throws ThrottlingException
* Amazon Textract is temporarily unable to process the request. Try your call again.
* @sample AmazonTextract.DetectDocumentText
* @see AWS
* API Documentation
*/
DetectDocumentTextResult detectDocumentText(DetectDocumentTextRequest detectDocumentTextRequest);
/**
*
* Gets the results for an Amazon Textract asynchronous operation that analyzes text in a document.
*
*
* You start asynchronous text analysis by calling StartDocumentAnalysis, which returns a job identifier (
* JobId
). When the text analysis operation finishes, Amazon Textract publishes a completion status to
* the Amazon Simple Notification Service (Amazon SNS) topic that's registered in the initial call to
* StartDocumentAnalysis
. To get the results of the text-detection operation, first check that the
* status value published to the Amazon SNS topic is SUCCEEDED
. If so, call
* GetDocumentAnalysis
, and pass the job identifier (JobId
) from the initial call to
* StartDocumentAnalysis
.
*
*
* GetDocumentAnalysis
returns an array of Block objects. The following types of information are
* returned:
*
*
* -
*
* Form data (key-value pairs). The related information is returned in two Block objects, each of type
* KEY_VALUE_SET
: a KEY Block
object and a VALUE Block
object. For example,
* Name: Ana Silva Carolina contains a key and value. Name: is the key. Ana Silva Carolina is
* the value.
*
*
* -
*
* Table and table cell data. A TABLE Block
object contains information about a detected table. A CELL
* Block
object is returned for each cell in a table.
*
*
* -
*
* Lines and words of text. A LINE Block
object contains one or more WORD Block
objects.
* All lines and words that are detected in the document are returned (including text that doesn't have a
* relationship with the value of the StartDocumentAnalysis
FeatureTypes
input parameter).
*
*
*
*
* Selection elements such as check boxes and option buttons (radio buttons) can be detected in form data and in
* tables. A SELECTION_ELEMENT Block
object contains information about a selection element, including
* the selection status.
*
*
* Use the MaxResults
parameter to limit the number of blocks that are returned. If there are more
* results than specified in MaxResults
, the value of NextToken
in the operation response
* contains a pagination token for getting the next set of results. To get the next page of results, call
* GetDocumentAnalysis
, and populate the NextToken
request parameter with the token value
* that's returned from the previous call to GetDocumentAnalysis
.
*
*
* For more information, see Document Text Analysis.
*
*
* @param getDocumentAnalysisRequest
* @return Result of the GetDocumentAnalysis operation returned by the service.
* @throws InvalidParameterException
* An input parameter violated a constraint. For example, in synchronous operations, an
* InvalidParameterException
exception occurs when neither of the S3Object
or
* Bytes
values are supplied in the Document
request parameter. Validate your
* parameter before calling the API operation again.
* @throws AccessDeniedException
* You aren't authorized to perform the action. Use the Amazon Resource Name (ARN) of an authorized user or
* IAM role to perform the operation.
* @throws ProvisionedThroughputExceededException
* The number of requests exceeded your throughput limit. If you want to increase this limit, contact Amazon
* Textract.
* @throws InvalidJobIdException
* An invalid job identifier was passed to GetDocumentAnalysis or to GetDocumentAnalysis.
* @throws InternalServerErrorException
* Amazon Textract experienced a service issue. Try your call again.
* @throws ThrottlingException
* Amazon Textract is temporarily unable to process the request. Try your call again.
* @throws InvalidS3ObjectException
* Amazon Textract is unable to access the S3 object that's specified in the request. for more information,
* Configure Access to
* Amazon S3 For troubleshooting information, see Troubleshooting Amazon S3
* @throws InvalidKMSKeyException
* Indicates you do not have decrypt permissions with the KMS key entered, or the KMS key was entered
* incorrectly.
* @sample AmazonTextract.GetDocumentAnalysis
* @see AWS
* API Documentation
*/
GetDocumentAnalysisResult getDocumentAnalysis(GetDocumentAnalysisRequest getDocumentAnalysisRequest);
/**
*
* Gets the results for an Amazon Textract asynchronous operation that detects text in a document. Amazon Textract
* can detect lines of text and the words that make up a line of text.
*
*
* You start asynchronous text detection by calling StartDocumentTextDetection, which returns a job
* identifier (JobId
). When the text detection operation finishes, Amazon Textract publishes a
* completion status to the Amazon Simple Notification Service (Amazon SNS) topic that's registered in the initial
* call to StartDocumentTextDetection
. To get the results of the text-detection operation, first check
* that the status value published to the Amazon SNS topic is SUCCEEDED
. If so, call
* GetDocumentTextDetection
, and pass the job identifier (JobId
) from the initial call to
* StartDocumentTextDetection
.
*
*
* GetDocumentTextDetection
returns an array of Block objects.
*
*
* Each document page has as an associated Block
of type PAGE. Each PAGE Block
object is
* the parent of LINE Block
objects that represent the lines of detected text on a page. A LINE
* Block
object is a parent for each word that makes up the line. Words are represented by
* Block
objects of type WORD.
*
*
* Use the MaxResults parameter to limit the number of blocks that are returned. If there are more results than
* specified in MaxResults
, the value of NextToken
in the operation response contains a
* pagination token for getting the next set of results. To get the next page of results, call
* GetDocumentTextDetection
, and populate the NextToken
request parameter with the token
* value that's returned from the previous call to GetDocumentTextDetection
.
*
*
* For more information, see Document Text Detection.
*
*
* @param getDocumentTextDetectionRequest
* @return Result of the GetDocumentTextDetection operation returned by the service.
* @throws InvalidParameterException
* An input parameter violated a constraint. For example, in synchronous operations, an
* InvalidParameterException
exception occurs when neither of the S3Object
or
* Bytes
values are supplied in the Document
request parameter. Validate your
* parameter before calling the API operation again.
* @throws AccessDeniedException
* You aren't authorized to perform the action. Use the Amazon Resource Name (ARN) of an authorized user or
* IAM role to perform the operation.
* @throws ProvisionedThroughputExceededException
* The number of requests exceeded your throughput limit. If you want to increase this limit, contact Amazon
* Textract.
* @throws InvalidJobIdException
* An invalid job identifier was passed to GetDocumentAnalysis or to GetDocumentAnalysis.
* @throws InternalServerErrorException
* Amazon Textract experienced a service issue. Try your call again.
* @throws ThrottlingException
* Amazon Textract is temporarily unable to process the request. Try your call again.
* @throws InvalidS3ObjectException
* Amazon Textract is unable to access the S3 object that's specified in the request. for more information,
* Configure Access to
* Amazon S3 For troubleshooting information, see Troubleshooting Amazon S3
* @throws InvalidKMSKeyException
* Indicates you do not have decrypt permissions with the KMS key entered, or the KMS key was entered
* incorrectly.
* @sample AmazonTextract.GetDocumentTextDetection
* @see AWS API Documentation
*/
GetDocumentTextDetectionResult getDocumentTextDetection(GetDocumentTextDetectionRequest getDocumentTextDetectionRequest);
/**
*
* Starts the asynchronous analysis of an input document for relationships between detected items such as key-value
* pairs, tables, and selection elements.
*
*
* StartDocumentAnalysis
can analyze text in documents that are in JPEG, PNG, and PDF format. The
* documents are stored in an Amazon S3 bucket. Use DocumentLocation to specify the bucket name and file name
* of the document.
*
*
* StartDocumentAnalysis
returns a job identifier (JobId
) that you use to get the results
* of the operation. When text analysis is finished, Amazon Textract publishes a completion status to the Amazon
* Simple Notification Service (Amazon SNS) topic that you specify in NotificationChannel
. To get the
* results of the text analysis operation, first check that the status value published to the Amazon SNS topic is
* SUCCEEDED
. If so, call GetDocumentAnalysis, and pass the job identifier (JobId
)
* from the initial call to StartDocumentAnalysis
.
*
*
* For more information, see Document Text Analysis.
*
*
* @param startDocumentAnalysisRequest
* @return Result of the StartDocumentAnalysis operation returned by the service.
* @throws InvalidParameterException
* An input parameter violated a constraint. For example, in synchronous operations, an
* InvalidParameterException
exception occurs when neither of the S3Object
or
* Bytes
values are supplied in the Document
request parameter. Validate your
* parameter before calling the API operation again.
* @throws InvalidS3ObjectException
* Amazon Textract is unable to access the S3 object that's specified in the request. for more information,
* Configure Access to
* Amazon S3 For troubleshooting information, see Troubleshooting Amazon S3
* @throws InvalidKMSKeyException
* Indicates you do not have decrypt permissions with the KMS key entered, or the KMS key was entered
* incorrectly.
* @throws UnsupportedDocumentException
* The format of the input document isn't supported. Documents for synchronous operations can be in PNG or
* JPEG format. Documents for asynchronous operations can also be in PDF format.
* @throws DocumentTooLargeException
* The document can't be processed because it's too large. The maximum document size for synchronous
* operations 10 MB. The maximum document size for asynchronous operations is 500 MB for PDF files.
* @throws BadDocumentException
* Amazon Textract isn't able to read the document. For more information on the document limits in Amazon
* Textract, see limits.
* @throws AccessDeniedException
* You aren't authorized to perform the action. Use the Amazon Resource Name (ARN) of an authorized user or
* IAM role to perform the operation.
* @throws ProvisionedThroughputExceededException
* The number of requests exceeded your throughput limit. If you want to increase this limit, contact Amazon
* Textract.
* @throws InternalServerErrorException
* Amazon Textract experienced a service issue. Try your call again.
* @throws IdempotentParameterMismatchException
* A ClientRequestToken
input parameter was reused with an operation, but at least one of the
* other input parameters is different from the previous call to the operation.
* @throws ThrottlingException
* Amazon Textract is temporarily unable to process the request. Try your call again.
* @throws LimitExceededException
* An Amazon Textract service limit was exceeded. For example, if you start too many asynchronous jobs
* concurrently, calls to start operations (StartDocumentTextDetection
, for example) raise a
* LimitExceededException exception (HTTP status code: 400) until the number of concurrently running jobs is
* below the Amazon Textract service limit.
* @sample AmazonTextract.StartDocumentAnalysis
* @see AWS
* API Documentation
*/
StartDocumentAnalysisResult startDocumentAnalysis(StartDocumentAnalysisRequest startDocumentAnalysisRequest);
/**
*
* Starts the asynchronous detection of text in a document. Amazon Textract can detect lines of text and the words
* that make up a line of text.
*
*
* StartDocumentTextDetection
can analyze text in documents that are in JPEG, PNG, and PDF format. The
* documents are stored in an Amazon S3 bucket. Use DocumentLocation to specify the bucket name and file name
* of the document.
*
*
* StartTextDetection
returns a job identifier (JobId
) that you use to get the results of
* the operation. When text detection is finished, Amazon Textract publishes a completion status to the Amazon
* Simple Notification Service (Amazon SNS) topic that you specify in NotificationChannel
. To get the
* results of the text detection operation, first check that the status value published to the Amazon SNS topic is
* SUCCEEDED
. If so, call GetDocumentTextDetection, and pass the job identifier (
* JobId
) from the initial call to StartDocumentTextDetection
.
*
*
* For more information, see Document Text Detection.
*
*
* @param startDocumentTextDetectionRequest
* @return Result of the StartDocumentTextDetection operation returned by the service.
* @throws InvalidParameterException
* An input parameter violated a constraint. For example, in synchronous operations, an
* InvalidParameterException
exception occurs when neither of the S3Object
or
* Bytes
values are supplied in the Document
request parameter. Validate your
* parameter before calling the API operation again.
* @throws InvalidS3ObjectException
* Amazon Textract is unable to access the S3 object that's specified in the request. for more information,
* Configure Access to
* Amazon S3 For troubleshooting information, see Troubleshooting Amazon S3
* @throws InvalidKMSKeyException
* Indicates you do not have decrypt permissions with the KMS key entered, or the KMS key was entered
* incorrectly.
* @throws UnsupportedDocumentException
* The format of the input document isn't supported. Documents for synchronous operations can be in PNG or
* JPEG format. Documents for asynchronous operations can also be in PDF format.
* @throws DocumentTooLargeException
* The document can't be processed because it's too large. The maximum document size for synchronous
* operations 10 MB. The maximum document size for asynchronous operations is 500 MB for PDF files.
* @throws BadDocumentException
* Amazon Textract isn't able to read the document. For more information on the document limits in Amazon
* Textract, see limits.
* @throws AccessDeniedException
* You aren't authorized to perform the action. Use the Amazon Resource Name (ARN) of an authorized user or
* IAM role to perform the operation.
* @throws ProvisionedThroughputExceededException
* The number of requests exceeded your throughput limit. If you want to increase this limit, contact Amazon
* Textract.
* @throws InternalServerErrorException
* Amazon Textract experienced a service issue. Try your call again.
* @throws IdempotentParameterMismatchException
* A ClientRequestToken
input parameter was reused with an operation, but at least one of the
* other input parameters is different from the previous call to the operation.
* @throws ThrottlingException
* Amazon Textract is temporarily unable to process the request. Try your call again.
* @throws LimitExceededException
* An Amazon Textract service limit was exceeded. For example, if you start too many asynchronous jobs
* concurrently, calls to start operations (StartDocumentTextDetection
, for example) raise a
* LimitExceededException exception (HTTP status code: 400) until the number of concurrently running jobs is
* below the Amazon Textract service limit.
* @sample AmazonTextract.StartDocumentTextDetection
* @see AWS API Documentation
*/
StartDocumentTextDetectionResult startDocumentTextDetection(StartDocumentTextDetectionRequest startDocumentTextDetectionRequest);
/**
* Shuts down this client object, releasing any resources that might be held open. This is an optional method, and
* callers are not expected to call it, but can if they want to explicitly release any open resources. Once a client
* has been shutdown, it should not be used to make any more requests.
*/
void shutdown();
/**
* Returns additional metadata for a previously executed successful request, typically used for debugging issues
* where a service isn't acting as expected. This data isn't considered part of the result data returned by an
* operation, so it's available through this separate, diagnostic interface.
*
* Response metadata is only cached for a limited period of time, so if you need to access this extra diagnostic
* information for an executed request, you should use this method to retrieve it as soon as possible after
* executing a request.
*
* @param request
* The originally executed request.
*
* @return The response metadata for the specified request, or null if none is available.
*/
ResponseMetadata getCachedResponseMetadata(AmazonWebServiceRequest request);
}