1D Visual Tokenization and Generation

This repo hosts the code and models for the following projects:

Updates

01/14/2025: The tech report of MaskGen is available. MaskGen is a powerful and efficient text-to-image masked generative model trained exclusively on open-data. For more details, refer to the README_MaskGen.
11/04/2024: We release the tech report and code for RAR models.
10/16/2024: We update a set of TiTok tokenizer weights trained with an updated single-stage recipe, leading to easier training and better performance. We release the weight of different model size for both VQ and VAE variants TiTok, which we hope could facilitate the research in this area. More details will be available in a tech report later.
09/25/2024: TiTok is accepted by NeurIPS 2024.
09/11/2024: Release the training codes of generator based on TiTok.
08/28/2024: Release the training codes of TiTok.
08/09/2024: Better support on loading pretrained weights from huggingface models, thanks for the help from @NielsRogge！
07/03/2024: Evaluation scripts for reproducing the results reported in the paper, checkpoints of TiTok-B64 and TiTok-S128 are available.
06/21/2024: Demo code and TiTok-L-32 checkpoints release.
06/11/2024: The tech report of TiTok is available.

Short Intro on Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens (README)

We introduce TA-TiTok, a novel text-aware transformer-based 1D tokenizer designed to handle both discrete and continuous tokens while effectively aligning reconstructions with textual descriptions. Building on TA-TiTok, we present MaskGen, a versatile text-to-image masked generative model framework. Trained exclusively on open data, MaskGen demonstrates outstanding performance: with 32 continuous tokens, it achieves a FID score of 6.53 on MJHQ-30K, and with 128 discrete tokens, it attains an overall score of 0.57 on GenEval.

See more details at README_MaskGen.

Short Intro on Randomized Autoregressive Visual Generation (README)

RAR is a an autoregressive (AR) image generator with full compatibility to language modeling. It introduces a randomness annealing strategy with permuted objective at no additional cost, which enhances the model's ability to learn bidirectional contexts while leaving the autoregressive framework intact. RAR sets a FID score 1.48, demonstrating state-of-the-art performance on ImageNet-256 benchmark and significantly outperforming prior AR image generators.

See more details at README_RAR.

Short Intro on An Image is Worth 32 Tokens for Reconstruction and Generation (README)

We present a compact 1D tokenizer which can represent an image with as few as 32 discrete tokens. As a result, it leads to a substantial speed-up on the sampling process (e.g., 410 × faster than DiT-XL/2) while obtaining a competitive generation quality.

See more details at README_TiTok.

Installation

pip3 install -r requirements.txt

Citing

If you use our work in your research, please use the following BibTeX entry.

@article{kim2025democratizing,
  author    = {Kim, Dongwon and He, Ju and Yu, Qihang Yu and Yang, Chenglin and Shen, Xiaohui and Kwak, Suha and Chen Liang-Chieh},
  title     = {Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens},
  journal   = {arXiv preprint arXiv:2501.07730},
  year      = {2025}
}

@article{yu2024randomized,
  author    = {Qihang Yu and Ju He and Xueqing Deng and Xiaohui Shen and Liang-Chieh Chen},
  title     = {Randomized Autoregressive Visual Generation},
  journal   = {arXiv preprint arXiv:2411.00776},
  year      = {2024}
}

@article{yu2024an,
  author    = {Qihang Yu and Mark Weber and Xueqing Deng and Xiaohui Shen and Daniel Cremers and Liang-Chieh Chen},
  title     = {An Image is Worth 32 Tokens for Reconstruction and Generation},
  journal   = {NeurIPS},
  year      = {2024}
}

Acknowledgement

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
assets		assets
configs		configs
data		data
evaluator		evaluator
modeling		modeling
scripts		scripts
utils		utils
LICENSE		LICENSE
README.md		README.md
README_MaskGen.md		README_MaskGen.md
README_RAR.md		README_RAR.md
README_TiTok.md		README_TiTok.md
demo.ipynb		demo.ipynb
demo_util.py		demo_util.py
imagenet_classes.py		imagenet_classes.py
requirements.txt		requirements.txt
sample_imagenet_rar.py		sample_imagenet_rar.py
sample_imagenet_titok.py		sample_imagenet_titok.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1D Visual Tokenization and Generation

Updates

Short Intro on Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens (README)

Short Intro on Randomized Autoregressive Visual Generation (README)

Short Intro on An Image is Worth 32 Tokens for Reconstruction and Generation (README)

Installation

Citing

Acknowledgement

About

Releases

Packages

Contributors 3

Languages

License

bytedance/1d-tokenizer

Folders and files

Latest commit

History

Repository files navigation

1D Visual Tokenization and Generation

Updates

Short Intro on Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens (README)

Short Intro on Randomized Autoregressive Visual Generation (README)

Short Intro on An Image is Worth 32 Tokens for Reconstruction and Generation (README)

Installation

Citing

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages