Amazon KCL and Trim Horizon Checkpoints

How are breakpoints and cropping related to the AWS KCL library?

Documentation page Processing start, shutdown and throttle says:

By default, KCL starts reading records from the tip of stream;, which is the most recently added record. In this configuration, if the data writer application adds records to before any record processors are started, the records are not read by the write processors after they are started.

To change the behavior of the recording processors so that it always reads data from the beginning of the stream, sets the following value in the properties file for your Amazon Kinesis Streams application:

initialPositionInStream = TRIM_HORIZON

Documentation page Amazon Kinesis Java Client Library Client Library Development says:

For streams, the recording processor needs to track records that have already been processed in the shard. KCL takes care of this tracking for you by passing a checkpointer (IRecordProcessorCheckpointer) to process the records. The write processor calls the checkpoint method on this interface to tell KCL how far it has come in processing the fragments. In an event that the worker fails, KCL uses this information to restart processing the shard in the last known processed record.

The first page seems to indicate that KCL resumes at the end of the stream, the second page in the last known processed record (which was marked as processed by RecordProcessor using checkpointer ). In my case, I definitely need to restart the last known processed record. Do I need to set initialPositionInStream to TRIM_HORIZON?

+6
source share
3 answers

With a stream of kinesia you have two options: you can read the latest notes or start from the oldest (TRIM_HORIZON).

But, as soon as you run the application, it just reads from the position that it stopped using its breakpoints. You can see these breakpoints in dynamodb (usually the table name is the name of the application). Therefore, if you restart the application, it will usually continue from where it was stopped.

Answer: no, you do not need to set initialPositionInStream to TRIM_HORIZON.

+7
source

When you read events from a stream of kinesias, you have 4 options:

TRIM_HORIZON are the oldest events that are still in streaming turtles before they are automatically cropped (default is 1 day, but can be increased to 7 days). You will use this option if you want to launch a new application that will process all the records available in the stream, but it will take some time until it can catch up and start processing events in real time.

LATEST are the latest events in the stream and ignore all past events. You will use this option if you run a new application that you want to process in a loop.

AT / AFTER_SEQUENCE_NUMBER - The sequence number is usually the control point that you hold during event processing. These breakpoints allow you to reliably handle events, even in the event of reading failures or when you want to update your version and continue processing all events without losing any of them. The difference between AT / AFTER is based on the time of your checkpoint before or after the successful completion of events.

Note that this is the only shard specific parameter, since all other parameters are global for the stream. When you use KCL, it manages the DynamoDB table for this application with an entry for each shard with the "current" serial number for that shard.

AT_TIMESTAMP - time to evaluate the event placed on the stream. You will use this option if you want to find specific events to process based on their timestamp. For example, when you know that at a certain point in time you had a real event in your service, you can develop an application that will process these specific events, even if you do not have a sequence number.

See the Kinesis documentation for more details: https://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetShardIterator.html

+1
source

You must use "TRIM_HORIZON". This will only affect the first time your application starts reading records from the stream. After that, he will be continued from the last known position.

0
source

All Articles