Learning by implementing papers
- GPT2, 2019
- ColBERT, 2020
- Coconut, sample code
- Combine Whisper with Qwen 7b
- Reinforce
- On-policy DPO, Tulu 3, code
- Reinforcement on Verifiable Reward (RLVR), Tulu 3, code
- free process rewards without process labels
- Turn an LLM into a Speech to Speech Model
- Test Time Training