Skip to content

Commit

Permalink
Minor changes
Browse files Browse the repository at this point in the history
  • Loading branch information
prrao87 committed Apr 30, 2024
1 parent bc367d2 commit 6c95db3
Showing 1 changed file with 11 additions and 9 deletions.
20 changes: 11 additions & 9 deletions src/content/post/2024-04-30-kuzu-v-0.4.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,14 +79,14 @@ the database to a specific directory. The query below exports the database to an
`/path/to/export`, utilizing the same configuration parameters as `COPY FROM` statements.

```
EXPORT DATABASE '/path/to/export' (HEADER=true);
EXPORT DATABASE '/path/to/export' (FORMAT=csv, HEADER=true);
```

Underneath, the database files are exported to CSV (by default),
or Parquet, if desired, and we generate several files that contains the Cypher commands needed to
The data is exported to CSV with headers included, but you can also export to
Parquet, if desired. We also generate several files that contains the Cypher commands needed to
import the database, including the node and relationship tables and macros, back into Kùzu.
You can import the database from `/path/to/export` to the database your current CLI or client session is connected to
with the `IMPORT DATABASE` command:
You can import the database from `/path/to/export` to the database your current CLI or client session
is connected to with the `IMPORT DATABASE` command:

```
IMPORT DATABASE '/path/to/export';
Expand Down Expand Up @@ -153,11 +153,11 @@ COPY Person FROM (LOAD FROM "person2.csv" RETURN *);

Note that the usual primary key constraints still apply; i.e., if the file `person2.csv` contains a record
whose primary key already exists in the `Person` table, it will produce a `RuntimeError` and the
transaction will be rolled back. From a performance perspective, you should expect some slow down
transaction will be rolled back. From a performance perspective, you should expect some slowdown
in terms of records inserted/second for the subsequent bulk inserts (because the system needs to more I/O to
scan the data that is already stored on disk) but it will still be much faster than
inserting records one at a time via `CREATE` commands. So,
you should use this approach if you're inserting large amounts of data into your database.
inserting records one at a time via `CREATE` commands. We recommend that
you use this approach if you're inserting large amounts of data into your database.

### Scan from Pandas PyArrow backend

Expand Down Expand Up @@ -238,7 +238,9 @@ useful for users who want to perform search & retrieval using embeddings stored

## Internal ID compression

We now apply compression to the internal IDs in the storage layer. Internally, for each relationship, we store, in each direction, its source and destination node IDs, and a unique relationship ID. All node and relationship IDs are represented as internal IDs, and compressed as integer values now.
We now apply compression to the internal IDs in the storage layer. Internally, for each relationship, we store,
in each direction, its source and destination node IDs, and a unique relationship ID. All node and relationship
IDs are represented as internal IDs, and compressed as integer values now.
Applying compression on internal IDs can result in significant reduction in the size of a Kùzu database. For
LDBC SF100, [we observed](https://github.com/kuzudb/kuzu/pull/3116) a **45%** reduction in size for
the `data.kz` file within the Kùzu database directory.
Expand Down

0 comments on commit 6c95db3

Please sign in to comment.