From 374a1ee9d3a27e9bd075f3995c1736ed9e1431fd Mon Sep 17 00:00:00 2001 From: AnthonyTsu1984 <115786031+AnthonyTsu1984@users.noreply.github.com> Date: Thu, 28 Nov 2024 15:46:15 +0800 Subject: [PATCH] update docs Signed-off-by: AnthonyTsu1984 <115786031+AnthonyTsu1984@users.noreply.github.com> --- .../CollectionSchema/CollectionSchema.md | 158 +++++++++++ .../CollectionSchema/add_field.md | 265 ++++++++++++++++++ .../CollectionSchema/construct_from_dict.md | 81 ++++++ .../MilvusClient/CollectionSchema/to_dict.md | 75 +++++ .../MilvusClient/CollectionSchema/verify.md | 70 +++++ 5 files changed, 649 insertions(+) create mode 100644 API_Reference/pymilvus/v2.5.x/MilvusClient/CollectionSchema/CollectionSchema.md create mode 100644 API_Reference/pymilvus/v2.5.x/MilvusClient/CollectionSchema/add_field.md create mode 100644 API_Reference/pymilvus/v2.5.x/MilvusClient/CollectionSchema/construct_from_dict.md create mode 100644 API_Reference/pymilvus/v2.5.x/MilvusClient/CollectionSchema/to_dict.md create mode 100644 API_Reference/pymilvus/v2.5.x/MilvusClient/CollectionSchema/verify.md diff --git a/API_Reference/pymilvus/v2.5.x/MilvusClient/CollectionSchema/CollectionSchema.md b/API_Reference/pymilvus/v2.5.x/MilvusClient/CollectionSchema/CollectionSchema.md new file mode 100644 index 000000000..e5cc13e6c --- /dev/null +++ b/API_Reference/pymilvus/v2.5.x/MilvusClient/CollectionSchema/CollectionSchema.md @@ -0,0 +1,158 @@ +# CollectionSchema + +A **CollectionSchema** instance represents the schema of a collection. A schema sketches the structure of a collection. + +```python +class pymilvus.CollectionSchema +``` + +## Constructor + +Constructs the schema of a collection by defining fields, data types, and other parameters. + +```python +CollectionSchema( + fields: list, + description: str +) +``` + +**PARAMETERS:** + +- **fields** (*list*) - + + **[REQUIRED]** + + A list of **FieldSchema** objects that define the fields in the collection schema. + +
+ +

what is a field schema?

+ +

A field schema represents and contains metadata for a single field, while CollectionSchema ties together a list of FieldSchema objects to define the full schema.

+ +
+ +- **description** (*string*) - + + The description of the schema. + + If a description is not provided, it will be set to an empty string. + +- **kwargs** - + + - **auto_id** (*bool*) + + Whether allows the primary field to automatically increment. + + Setting this to **True** makes the primary field automatically increment. In this case, the primary field should not be included in the data to insert to avoid errors. + + - **enable_dynamic_field** (*bool*) + + Whether allows Milvus saves the values of undefined fields in a dynamic field if the data being inserted into the target collection includes fields that are not defined in the collection's schema. + + When you set this to **True**, Milvus and will create a field called **$meta** to store any undefined fields and their values from the data that is inserted. + +
+ +

what is a dynamic field?

+ +

If the data being inserted into the target collection includes fields that are not defined in the collection's schema, those fields will be saved in a dynamic field as key-value pairs.

+ +
+ + - **primary_field** (*str*) + + The name of the primary field. + + The value should be the name of a field listed in **fields**. + + As an alternative, you can set **is_primary** when creating a **FieldSchema** object. + + - **partition_key_field** (*str*) + + The name of the field that serves as the partition key. + + The value should be the name of a field listed in **fields**. + + Setting this makes Milvus manage all partitions in the current collection. + + As an alternative, you can set **is_partition_key** when creating a **FieldSchema** object. + +
+ +

what is a partition key?

+ +

Once a field is designated as the partition key, Milvus automatically creates a partition for each unique value in this field and saves entities in these partitions accordingly.

+

This is particularly useful when implementing data separation based on a specific key, such as partition-oriented multi-tenancy.

+

As an alternative, you can set partitionkeyfield when creating a CollectionSchema object.

+ +
+ +**RETURN TYPE:** + +*CollectionSchema* + +**RETURNS:** + +A **CollectionSchema** object. + +**EXCEPTIONS:** + +- **FieldsTypeException**: + + This exception will be raised when the **fields** parameter is not a list. + +- **FieldTypeException**: + + This exception will be raised when a field in the **fields** list is not a **FieldSchema** object. + +- **PrimaryKeyException:** + + This exception will be raised if + + - The **primary_field** parameter has been set but the value is not a string. + + - The **primary_field** parameter has been set but the value is not the name of any listed fields. + +- **PartitionKeyException:** + + This exception will be raised if + + - The **partition_key_field** parameter has been set but the value is not a string. + + - The **partition_key_field** parameter has been set but the value is not the name of any listed fields. + +- **AutoIDException:** + + - This exception will be raised if the **auto_id** parameter has been set but the value is not a boolean. + +## Examples + +```python +from pymilvus import CollectionSchema, FieldSchema, DataType + +# Define fields in a schema +primary_key = FieldSchema( + name="id", + dtype=DataType.INT64, + is_primary=True, +) + +vector = FieldSchema( + name="vector", + dtype=DataType.FLOAT_VECTOR, + dim=768 +) + +# Construct a schema with the predefined fields +schema = CollectionSchema( + fields=[primary_key, vector], + description="example_schema" +) +``` + +## Methods + +The following are the methods of the `CollectionSchema` class: + diff --git a/API_Reference/pymilvus/v2.5.x/MilvusClient/CollectionSchema/add_field.md b/API_Reference/pymilvus/v2.5.x/MilvusClient/CollectionSchema/add_field.md new file mode 100644 index 000000000..a0dbcaa09 --- /dev/null +++ b/API_Reference/pymilvus/v2.5.x/MilvusClient/CollectionSchema/add_field.md @@ -0,0 +1,265 @@ +# add_field() + +This operation adds a field to the schema of a collection. + +## Request Syntax + +```python +add_field( + field_name: str, + datatype: DataType, + **kwargs +) +``` + +**PARAMETERS:** + +- **field_name** (*string*) - + + **[REQUIRED]** + + The name of the field. + +- **datatype** (*[DataType](../../MilvusClient/Collections/DataType.md)*) - + + **[REQUIRED]** + + The data type of the field. + + You can choose from the following options when selecting a data type for different fields: + + - Primary key field: Use **DataType.INT64** or **DataType.VARCHAR**. + + - Scalar fields: Choose from a variety of options, including + + - **DataType.BOOL**, + + - **DataType.INT8**, + + - **DataType.INT16**, + + - **DataType.INT32**, + + - **DataType.INT64**, + + - **DataType.FLOAT**, + + - **DataType.DOUBLE**, + + - **DataType.BINARY_VECTOR**, + + - **DataType.FLOAT_VECTOR**, + + - **DataType.FLOAT16_VECTOR**, + + - **DataType.BFLOAT16_VECTOR**, + + - **DataType.VARCHAR**, + + - **DataType.JSON**, and + + - **DataType.ARRAY** + + - Vector fields: Select **DataType.BINARY_VECTOR**, **DataType.FLOAT_VECTOR**, **DataType.FLOAT16_VECTOR**, **DataType.BFLOAT16_VECTOR**, or **DataType.SPARSE_FLOAT_VECTOR**. + +- **is_primary** (*bool*) - + + Whether the current field is the primary field in a collection. + +
+ +

notes

+ + + +
+ +- **max_length** (*int*) - + + The maximum byte length for strings allowed to be inserted. Note that multibyte characters (e.g., Unicode characters) may occupy more than one byte each, so ensure the byte length of inserted strings does not exceed the specified limit. Value range: [1, 65,535]. + + This is mandatory for a **DataType.VARCHAR** field. + +- **element_type** (*str*) - + + The data type of the elements in the field value. + + This is mandatory for a **DataType.ARRAY** field. + +- **max_capacity** (*int*) - + + The number of elements in an Array field value. + + This is mandatory for a **DataType.ARRAY** field. + +- **dim** (*int*) - + + The dimension of the vector embeddings. The value should be an integer greater than 1. + + This is mandatory for a field of the **DataType.FLOAT_VECTOR**, **DataType.BINARY_VECTOR**, **DataType.FLOAT16_VECTOR**, or **DataType.BFLOAT16_VECTOR** type. If you use **DataType.SPARSE_FLOAT_VECTOR**, omit this parameter. + +- **is_partition_key** (*bool*) - + + Whether the current field serves as the partition key. Each collection can have one partition key. + + This parameter is not applicable to Milvus Lite. For more information on Milvus Lite limits, refer to [Run Milvus Lite](https://milvus.io/docs/milvus_lite.md). + +
+ +

what is the partition key?

+ +

To facilitate partition-oriented multi-tenancy, you can set a field as the partition key field so that Milvus hashes the field values and distributes entities among the specified number of partitions accordingly.

+

When retrieving entities, ensure that the partition key field is used in the boolean expression to filter out entities of a specific field value.

+

For details, refer to Use Partition Key and Multi-tenancy.

+ +
+ +- **is_clustering_key** (*bool*) - + + Whether the current field serves as the clustering key. Each collection can have one partition key. You can also use the partition key as the clustering key. For details, refer to Clustering Compaction. + +- **mmap_enabled** (*bool*) - + + Whether Milvus maps the field data into memory instead of fully loading it. For details settings, refer to MMap-enabled Data Storage. + +- **nullable** (*bool*) - + + A Boolean parameter that specifies whether the field can accept null values. Valid values: + + - **True**: The field can contain null values, indicating that the field is optional, and missing data is permitted for entries. + + - **False** (default): The field must contain a valid value for each entity; missing data is not allowed, making the field mandatory. + + For more information, refer to [Nullable & Default](https://milvus.io/docs/nullable-and-default.md). + +- **default_value** (*DataType*) + + Sets a default value for a specific field in a collection schema when creating it. This is particularly useful when you want certain fields to have an initial value even if no value is explicitly provided during data insertion. + +- **analyzer_params** (*dict*) - + + Configures the analyzer for text processing, specifically for `DataType.VARCHAR` fields. This parameter configures tokenizer and filter settings, particularly for text fields used in [keyword matching](https://milvus.io/docs/keyword-match.md) or [full text search](https://milvus.io/docs/full-text-search.md). Depending on the type of analyzer, it can be configured in either of the following methods: + + - Built-in analyzer + + ```python + analyzer_params = { + "type": "standard" # Uses the standard built-in analyzer + } + ``` + + - `type` (*str*) - + + Pre-configured analyzer type built into Milvus, which can be used out-of-the-box by specifying its name. Possible values: `standard`, `english`, `chinese`. For more information, refer to [Standard Analyzer](https://milvus.io/docs/standard-analyzer.md), [English Analyzer](https://milvus.io/docs/english-analyzer.md), and [Chinese Analyzer](https://milvus.io/docs/chinese-analyzer.md). + + - Custom analyzer + + ```python + analyzer_params = { + "tokenizer": "standard", + "filter": ["lowercase"], + } + ``` + + - `tokenizer` (*str*) - + + Defines the tokenizer type. Possible values: `standard` (default), `whitespace`, `jieba`. For more information, refer to [Standard Tokenizer](https://milvus.io/docs/standard-tokenizer.md), [Whitespace Tokenizer](https://milvus.io/docs/whitespace-tokenizer.md), and [Jieba Tokenizer](https://milvus.io/docs/jieba-tokenizer.md). + + - `filter` (*Union[List[str], List[dict]*]) - + + - Lists filters to refine tokens produced by the tokenizer, with options for built-in filters and custom filters. For more information, refer to [Alphanumonly Filter](https://milvus.io/docs/alphanumonly-filer.md) and others. + +- **enable_analyzer** (*bool*) + + Whether to enable text analysis for the specified `VARCHAR` field. When set to `True`, it instructs Milvus to use a text analyzer, which tokenizes and filters the text content of the field. + +- **enable_match** (*bool*) + + Whether to enable keyword matching for the specified `VARCHAR` field. When set to `True`, Milvus creates an inverted index for the field, allowing for quick and efficient keyword lookups. `enable_match` works in conjunction with `enable_analyzer` to provide structured keyword-based text search, with `enable_analyzer` handling tokenization and `enable_match` handling the search operations on these tokens. + +**RETURN TYPE:** + +*[CollectionSchema](CollectionSchema.md)* + +**RETURNS:** + +A **CollectionSchema** object contains the fields that have been added to the schema. + +**EXCEPTIONS:** + +- **MilvusException** + + This exception will be raised when any error occurs during this operation. + +## Examples + +```python +from pymilvus import DataType, FieldSchema, CollectionSchema + +schema = CollectionSchema( + fields = [primary_key, vector] +) + +# Add the primary key field +schema.add_field( + field_name="id", + datatype=DataType.INT64, + is_primary=True +) + +# Add the vector field +schema.add_field( + field_name="vector", + datatype=FLOAT_VECTOR, + dim=768 +) + +# Add a scalar field to the schema +schema.add_field( + field_name="scalar_01", + datatype=DataType.INT32 +) + +# { +# 'auto_id': False, +# 'description': '', +# 'fields': [ +# { +# 'name': 'id', +# 'description': '', +# 'type': , +# 'is_primary': True, +# 'auto_id': False +# }, +# { +# 'name': 'vector', +# 'description': '', +# 'type': , +# 'params': {'dim': 768} +# }, +# { +# 'name': 'scalar_01', +# 'description': '', +# 'type': +# } +# ] +# } +``` + +## Related operations + +The following operations are related to `add_field()`: + +- [FieldSchema](../FieldSchema/FieldSchema.md) + +- [DataType](../../MilvusClient/Collections/DataType.md) + +- [construct_from_dict()](construct_from_dict.md) + +- [to_dict()](to_dict.md) + +- [verify()](verify.md) + diff --git a/API_Reference/pymilvus/v2.5.x/MilvusClient/CollectionSchema/construct_from_dict.md b/API_Reference/pymilvus/v2.5.x/MilvusClient/CollectionSchema/construct_from_dict.md new file mode 100644 index 000000000..5474ef77f --- /dev/null +++ b/API_Reference/pymilvus/v2.5.x/MilvusClient/CollectionSchema/construct_from_dict.md @@ -0,0 +1,81 @@ +# construct_from_dict() + +This operation constructs a **CollectionSchema** object from a dictionary representation. + +## Request Syntax + +```python +construct_from_dict( + raw: dict +) +``` + +**PARAMETERS:** + +- **raw** (*dict*) + + A dictionary containing the raw data to construct the collection schema. + +**RETURN TYPE:** + +*CollectionSchema* + +**RETURNS:** + +A **CollectionSchema** object. + +**EXCEPTIONS:** + +- **MilvusException** + + This exception will be raised when any error occurs during this operation. + +## Examples + +```python +from pymilvus import DataType, FieldSchema, CollectionSchema + +# Define fields and create a schema +primary_key = FieldSchema( + name="id", + dtype=DataType.INT64, + is_primary=True, +) + +vector = FieldSchema( + name="vector", + dtype=DataType.FLOAT_VECTOR, + dim=768, +) + +# Create dictionary representation +schema_dict = { + "fields": [ + primary_key.to_dict(), + vector.to_dict() + ] +} + +# Reconstruct schema from dictionary +schema = CollectionSchema.construct_from_dict(schema_dict) +# schema is now a CollectionSchema instance reconstructed from the dictionary +print(schema) + +# Output +# {'auto_id': False, 'description': '', 'fields': [{'name': 'id', 'description': '', 'type': , 'is_primary': True, 'auto_id': False}, {'name': 'vector', 'description': '', 'type': , 'params': {'dim': 768}}]} +``` + +## Related operations + +The following operations are related to `construct_from_dict()`: + +- [FieldSchema](../FieldSchema/FieldSchema.md) + +- [DataType](../../MilvusClient/Collections/DataType.md) + +- [add_field()](add_field.md) + +- [to_dict()](to_dict.md) + +- [verify()](verify.md) + diff --git a/API_Reference/pymilvus/v2.5.x/MilvusClient/CollectionSchema/to_dict.md b/API_Reference/pymilvus/v2.5.x/MilvusClient/CollectionSchema/to_dict.md new file mode 100644 index 000000000..af858dd20 --- /dev/null +++ b/API_Reference/pymilvus/v2.5.x/MilvusClient/CollectionSchema/to_dict.md @@ -0,0 +1,75 @@ +# to_dict() + +This operation converts a CollectionSchema object to a dictionary representation. + +## Request Syntax + +```python +to_dict() +``` + +**PARAMETERS:** + +None + +**RETURN TYPE:** + +*dict* + +**RETURNS:** + +The dictionary representation of the collection schema. + +**EXCEPTIONS:** + +- **MilvusException** + + This exception will be raised when any error occurs during this operation. + +## Examples + +```python +from pymilvus import CollectionSchema, FieldSchema, DataType + +# Create field schemas +primary_key = FieldSchema( + name="id", + dtype=DataType.INT64, + is_primary=True, +) + +vector = FieldSchema( + name="vector", + dtype=DataType.FLOAT_VECTOR, + dim=768, +) + +# Create a CollectionSchema with field schemas + +schema = CollectionSchema( + fields = [primary_key, vector] +) + +# Call to_dict() to get a dictionary representation of the schema + +schema_dict = schema.to_dict() +print(schema_dict) + +# Output +# {'auto_id': False, 'description': '', 'fields': [{'name': 'id', 'description': '', 'type': , 'is_primary': True, 'auto_id': False}, {'name': 'vector', 'description': '', 'type': , 'params': {'dim': 768}}]} +``` + +## Related operations + +The following operations are related to `to_dict()`: + +- [FieldSchema](../FieldSchema/FieldSchema.md) + +- [DataType](../../MilvusClient/Collections/DataType.md) + +- [add_field()](add_field.md) + +- [construct_from_dict()](construct_from_dict.md) + +- [verify()](verify.md) + diff --git a/API_Reference/pymilvus/v2.5.x/MilvusClient/CollectionSchema/verify.md b/API_Reference/pymilvus/v2.5.x/MilvusClient/CollectionSchema/verify.md new file mode 100644 index 000000000..30b8155a2 --- /dev/null +++ b/API_Reference/pymilvus/v2.5.x/MilvusClient/CollectionSchema/verify.md @@ -0,0 +1,70 @@ +# verify() + +This operation performs final validation checks on the CollectionSchema to detect any obvious problems. + +## Request Syntax + +```python +verify() +``` + +**PARAMETERS:** + +None + +**RETURN TYPE:** + +None + +**RETURNS:** + +None + +**EXCEPTIONS:** + +- **MilvusException** + + This exception will be raised when any error occurs during this operation. + +## Examples + +```python +from pymilvus import CollectionSchema, FieldSchema, DataType + +# Create field schemas +primary_key = FieldSchema( + name="id", + dtype=DataType.INT64, + is_primary=True, +) + +vector = FieldSchema( + name="vector", + dtype=DataType.FLOAT_VECTOR, + dim=768, +) + +# Create a CollectionSchema with field schemas + +schema = CollectionSchema( + fields = [primary_key, vector] +) + +# Call verify() to validate the schema +schema.verify() +``` + +## Related operations + +The following operations are related to `verify()`: + +- [FieldSchema](../FieldSchema/FieldSchema.md) + +- [DataType](../../MilvusClient/Collections/DataType.md) + +- [add_field()](add_field.md) + +- [construct_from_dict()](construct_from_dict.md) + +- [to_dict()](to_dict.md) +