Skip to content

brunoselvacj/kafka-connect-kinesis

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This connector is for reading data from Amazon Kinesis and writing the data to Apache Kafka.

Configuration

KinesisSourceConnector

name=connector1
tasks.max=1
connector.class=com.github.jcustenborder.kafka.connect.kinesis.KinesisSourceConnector

# Set these required values
aws.secret.key.id=
aws.access.key.id=
kafka.topic=
kinesis.stream=
Name Description Type Default Valid Values Importance
aws.access.key.id aws.access.key.id string high
aws.secret.key.id aws.secret.key.id password high
kafka.topic The kafka topic to write the data to. string high
kinesis.stream The Kinesis stream to read from. string high
kinesis.shard.id The shard of the Kinesis stream to read from. This is a regex which can be used to read all of the shards in the stream. string .* high
kinesis.empty.records.backoff.ms The number of milliseconds to backoff when the stream is empty. long 5000 [500,...,2147483647] medium
kinesis.position The position in the stream to reset to if no offsets are stored. string TRIM_HORIZON ValidEnum{enum=ShardIteratorType, allowed=[AT_SEQUENCE_NUMBER, AFTER_SEQUENCE_NUMBER, TRIM_HORIZON, LATEST, AT_TIMESTAMP]} medium
kinesis.record.limit The number of records to read in each poll of the Kinesis shard. int 500 [1,...,10000] medium
kinesis.region The AWS region for the Kinesis stream. string US_EAST_1 ValidEnum{enum=Regions, allowed=[GovCloud, US_EAST_1, US_EAST_2, US_WEST_1, US_WEST_2, EU_WEST_1, EU_WEST_2, EU_CENTRAL_1, AP_SOUTH_1, AP_SOUTHEAST_1, AP_SOUTHEAST_2, AP_NORTHEAST_1, AP_NORTHEAST_2, SA_EAST_1, CN_NORTH_1, CA_CENTRAL_1]} medium
kinesis.throughput.exceeded.backoff.ms The number of milliseconds to backoff when a throughput exceeded exception is thrown. long 10000 [500,...,2147483647] medium

Data

com.github.jcustenborder.kafka.connect.kinesis.KinesisKey

A partition key is used to group data by shard within a stream.

Name Optional Schema Default Value Documentation
partitionKey true String A partition key is used to group data by shard within a stream. The Streams service segregates the data records belonging to a stream into multiple shards, using the partition key associated with each data record to determine which shard a given data record belongs to. Partition keys are Unicode strings with a maximum length limit of 256 bytes. An MD5 hash function is used to map partition keys to 128-bit integer values and to map associated data records to shards. A partition key is specified by the applications putting the data into a stream. Identifies which shard in the stream the data record is assigned to. See Record.getPartitionKey()

com.github.jcustenborder.kafka.connect.kinesis.KinesisValue

The unit of data of the Amazon Kinesis stream, which is composed of a sequence number, a partition key, and a data blob. See Record

Name Optional Schema Default Value Documentation
sequenceNumber true String The unique identifier of the record in the stream. See Record.getSequenceNumber()
approximateArrivalTimestamp true Timestamp The approximate time that the record was inserted into the stream. See Record.getApproximateArrivalTimestamp()
data true Bytes The data blob. See Record.getData()
partitionKey true String A partition key is used to group data by shard within a stream. The Streams service segregates the data records belonging to a stream into multiple shards, using the partition key associated with each data record to determine which shard a given data record belongs to. Partition keys are Unicode strings with a maximum length limit of 256 bytes. An MD5 hash function is used to map partition keys to 128-bit integer values and to map associated data records to shards. A partition key is specified by the applications putting the data into a stream. Identifies which shard in the stream the data record is assigned to. See Record.getPartitionKey()
shardId true String A shard is a uniquely identified group of data records in a stream. A stream is composed of one or more shards, each of which provides a fixed unit of capacity. Each shard can support up to 5 transactions per second for reads, up to a maximum total data read rate of 2 MB per second and up to 1,000 records per second for writes, up to a maximum total data write rate of 1 MB per second (including partition keys). The data capacity of your stream is a function of the number of shards that you specify for the stream. The total capacity of the stream is the sum of the capacities of its shards.
streamName true String The name of the Kinesis stream.

About

Kafka Connect connector for Amazon Kinesis

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 97.5%
  • Shell 2.5%