Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between FSQ and LFQ #152

Open
JasonShen-SH opened this issue Aug 6, 2024 · 2 comments
Open

Difference between FSQ and LFQ #152

JasonShen-SH opened this issue Aug 6, 2024 · 2 comments

Comments

@JasonShen-SH
Copy link

JasonShen-SH commented Aug 6, 2024

Hi,

I was wondering about the difference between FSQ (finite scalar quantization) and LFQ (look-up free quantization) for vector quantization.

As for a single vector among the encoder output, whatever the architecture, if we consider the quantizer to only map to -1 and 1. Then what's the difference ? Within LFQ, it still call the quantized vector to be a "codebook vector" within the "codebook". Well, in essence, that "codebook" is somewhat a concept manually made, it is no more than combinations of different quantization possibilities at each dimension, as it's not learned at all.

Then what's exactly made within FSQ, as the paper said: If we map each entry zi to L values (....., followed by rounding to integers), we obtain a quantized zˆ, where zˆ is one of Lˆd unique possible vectors (d is vector dimension, L is quantization levels, L being 2 is -1 and 1). I believe this Lˆd possible vectors are the codebook itself anyway.

Hope someone could teach & discuss about this, many thanks!

@leolin65
Copy link

@function2-llx
Copy link

function2-llx commented Dec 11, 2024

I think both of them are lookup-free quantization, while the LFQ method, a specific and simple instance of lookup-free quantization, just occupies this general term (so for clarity, I won't use abbreviation for this general approach).

The lookup-free quantization works without explicitly maintaining embeddings for index lookup (e.g., via nearest neighbor search), but using a pre-defined map (probably without any learnable parameter) to obtain an index. As you can verify, both LFQ and FSQ works like this.

Well, in essence, that "codebook" is somewhat a concept manually made, it is no more than combinations of different quantization possibilities at each dimension, as it's not learned at all.

Yes I agree, the "codebook" in this context is implicitly represented as a set of integers.

So, since both of them are lookup-free quantization, they can only differ in specific implementations, that is, the pre-defined map to obtain the index given the (projected) representation vector. As far as I can tell, LFQ simply uses sign function, and FSQ defines a function with something like f(z) = round((L - 1) / 2 * tanh(z)).

Note that even without explicitly maintaining the codebook embedding, after obtaining the index (with base of L), they still have to project the index to higher dimension, and I believe the weight for the projection layer plays a similar role to the codebook embedding. I have also noticed some some popular explanation for this is that the lookup-free quantization decouples the codebook embedding from the index lookup procedure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants