A gentle introduction to the latest multi-modal transfusion model Recently, Meta and Waymo released their latest paper — Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model, which integrates the popular transformer model with the diffusion model for multi-modal training and prediction purposes. Like Meta’s previous work, the Transfusion model is based on the…
