Skip to content

Commit

Permalink
[Feature][Connector-V2] Support single file mode in file sink (#8518)
Browse files Browse the repository at this point in the history
  • Loading branch information
Hisoka-X authored Jan 16, 2025
1 parent edca75b commit e893dee
Show file tree
Hide file tree
Showing 38 changed files with 754 additions and 275 deletions.
63 changes: 32 additions & 31 deletions docs/en/connector-v2/sink/CosFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,37 +34,38 @@ By default, we use 2PC commit to ensure `exactly-once`

## Options

| Name | Type | Required | Default | Description |
|---------------------------------------|---------|----------|--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
| path | string | yes | - | |
| tmp_path | string | no | /tmp/seatunnel | The result file will write to a tmp path first and then use `mv` to submit tmp dir to target dir. Need a COS dir. |
| bucket | string | yes | - | |
| secret_id | string | yes | - | |
| secret_key | string | yes | - | |
| region | string | yes | - | |
| custom_filename | boolean | no | false | Whether you need custom the filename |
| file_name_expression | string | no | "${transactionId}" | Only used when custom_filename is true |
| filename_time_format | string | no | "yyyy.MM.dd" | Only used when custom_filename is true |
| file_format_type | string | no | "csv" | |
| field_delimiter | string | no | '\001' | Only used when file_format is text |
| row_delimiter | string | no | "\n" | Only used when file_format is text |
| have_partition | boolean | no | false | Whether you need processing partitions. |
| partition_by | array | no | - | Only used then have_partition is true |
| partition_dir_expression | string | no | "${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/" | Only used then have_partition is true |
| is_partition_field_write_in_file | boolean | no | false | Only used then have_partition is true |
| sink_columns | array | no | | When this parameter is empty, all fields are sink columns |
| is_enable_transaction | boolean | no | true | |
| batch_size | int | no | 1000000 | |
| compress_codec | string | no | none | |
| common-options | object | no | - | |
| max_rows_in_memory | int | no | - | Only used when file_format is excel. |
| sheet_name | string | no | Sheet${Random number} | Only used when file_format is excel. |
| xml_root_tag | string | no | RECORDS | Only used when file_format is xml. |
| xml_row_tag | string | no | RECORD | Only used when file_format is xml. |
| xml_use_attr_format | boolean | no | - | Only used when file_format is xml. |
| parquet_avro_write_timestamp_as_int96 | boolean | no | false | Only used when file_format is parquet. |
| parquet_avro_write_fixed_as_int96 | array | no | - | Only used when file_format is parquet. |
| encoding | string | no | "UTF-8" | Only used when file_format_type is json,text,csv,xml. |
| Name | Type | Required | Default | Description |
|---------------------------------------|---------|----------|--------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| path | string | yes | - | |
| tmp_path | string | no | /tmp/seatunnel | The result file will write to a tmp path first and then use `mv` to submit tmp dir to target dir. Need a COS dir. |
| bucket | string | yes | - | |
| secret_id | string | yes | - | |
| secret_key | string | yes | - | |
| region | string | yes | - | |
| custom_filename | boolean | no | false | Whether you need custom the filename |
| file_name_expression | string | no | "${transactionId}" | Only used when custom_filename is true |
| filename_time_format | string | no | "yyyy.MM.dd" | Only used when custom_filename is true |
| file_format_type | string | no | "csv" | |
| field_delimiter | string | no | '\001' | Only used when file_format is text |
| row_delimiter | string | no | "\n" | Only used when file_format is text |
| have_partition | boolean | no | false | Whether you need processing partitions. |
| partition_by | array | no | - | Only used then have_partition is true |
| partition_dir_expression | string | no | "${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/" | Only used then have_partition is true |
| is_partition_field_write_in_file | boolean | no | false | Only used then have_partition is true |
| sink_columns | array | no | | When this parameter is empty, all fields are sink columns |
| is_enable_transaction | boolean | no | true | |
| batch_size | int | no | 1000000 | |
| compress_codec | string | no | none | |
| common-options | object | no | - | |
| max_rows_in_memory | int | no | - | Only used when file_format is excel. |
| sheet_name | string | no | Sheet${Random number} | Only used when file_format is excel. |
| xml_root_tag | string | no | RECORDS | Only used when file_format is xml. |
| xml_row_tag | string | no | RECORD | Only used when file_format is xml. |
| xml_use_attr_format | boolean | no | - | Only used when file_format is xml. |
| single_file_mode | boolean | no | false | Each parallelism will only output one file. When this parameter is turned on, batch_size will not take effect. The output file name does not have a file block suffix. |
| parquet_avro_write_timestamp_as_int96 | boolean | no | false | Only used when file_format is parquet. |
| parquet_avro_write_fixed_as_int96 | array | no | - | Only used when file_format is parquet. |
| encoding | string | no | "UTF-8" | Only used when file_format_type is json,text,csv,xml. |

### path [string]

Expand Down
Loading

0 comments on commit e893dee

Please sign in to comment.