RhymeFlow qualitative results

Training-free video diffusion acceleration

RhymeFlow

Training-Free Acceleration for Video Generation with Asynchronous Denoising Flow Scheduling

RhymeFlow decouples the denoising trajectories of video frames. Keyframes keep full denoising fidelity, while predictable non-keyframes skip selected steps and are recovered through latent trajectory projection.

Chensheng Dai1,* Shengjun Zhang1,* Yifan Li1 Zhang Zhang1 Zheng Zhu2 Yueqi Duan1,†
1Tsinghua University 2GigaAI
*Equal contribution Corresponding author

Abstract

Video generation models based on Diffusion Transformers achieve strong synthesis quality, but their 3D attention and long denoising chains create substantial inference latency. Existing training-free methods mostly reduce the cost inside each denoising step, while every frame still follows the same dense timestep schedule.

RhymeFlow introduces an orthogonal acceleration dimension: asynchronous denoising flow scheduling. It detects pivotal keyframes that anchor semantic transitions, applies dense step-by-step denoising to those frames, lets non-keyframes progressively skip predictable denoising steps, and uses lightweight latent trajectory projection to maintain complete temporal context for 3D attention.

Wan 2.1 1.53x

speedup reported with high Dense-reference fidelity.

HunyuanVideo 2.60x

best latency and speedup when combined with SAP.

Composable +SAP

orthogonal to sparse attention and cache-style accelerators.

Visual comparison

Dense vs. SAP vs. RhymeFlow

Each slide compares the Dense reference, SAP, and RhymeFlow on the same prompt. Use the arrow buttons or progress dots to browse representative samples.

Video 1

DenseReference
SAPSparse attention baseline
RhymeFlowOurs
DenseReference
SAPSparse attention baseline
RhymeFlowOurs
DenseReference
SAPSparse attention baseline
RhymeFlowOurs
DenseReference
SAPSparse attention baseline
RhymeFlowOurs
DenseReference
SAPSparse attention baseline
RhymeFlowOurs
DenseReference
SAPSparse attention baseline
RhymeFlowOurs
DenseReference
SAPSparse attention baseline
RhymeFlowOurs
DenseReference
SAPSparse attention baseline
RhymeFlowOurs
DenseReference
SAPSparse attention baseline
RhymeFlowOurs
DenseReference
SAPSparse attention baseline
RhymeFlowOurs
DenseReference
SAPSparse attention baseline
RhymeFlowOurs

Pipeline breakdown

Asynchronous Denoising Flow Scheduling

The pipeline video walks through warm-up, keyframe selection, rhythmic synchronization, asynchronous updates, and latent projection for skipped non-keyframes.

1. Warm-up

All frames are updated synchronously at the beginning, giving the model a stable shared latent trajectory.

2. Keyframe anchors

Semantic changes identify pivotal frames that should preserve full denoising fidelity.

3. Progressive skipping

Non-keyframes receive fewer updates as denoising becomes more predictable.

4. Latent projection

Projected skipped states keep attention context complete without running the full network for every frame.

Method and analysis

Inside RhymeFlow

RhymeFlow qualitative comparison figure

Qualitative comparisons from the paper. RhymeFlow preserves textures, lighting consistency, and motion structure while assigning heterogeneous denoising schedules to different frames.

Benchmark results

Comparison with Baselines

We report paper baseline comparisons rather than hyperparameter tuning tables. RhymeFlow improves the speed-quality trade-off on Wan 2.1 and can be combined with SAP for higher throughput.

Wan 2.1

Method PSNR ↑ SSIM ↑ LPIPS ↓ SubCon. ↑ ImgQual. ↑ Latency (s) ↓ Speedup ↑
Dense---0.91020.6946993.5-
SpargeAttn20.3990.6130.3930.86320.7118719.71.38x
SVG22.4190.6940.2900.87580.6913708.01.40x
SAP24.4540.7300.2230.87890.6837608.51.63x
Ours26.2910.7830.1680.88310.6706650.41.53x
Ours + SAP24.5860.7370.2210.87920.6806596.81.66x

HunyuanVideo

Method PSNR ↑ SSIM ↑ LPIPS ↓ Latency (s) ↓ Speedup ↑
SVG21.170.6840.39234591.92x
SAP24.640.9040.06826342.52x
EasyCache23.510.8610.11928502.33x
DiCache23.540.8600.11428142.36x
VGDFR19.490.7790.19930192.20x
Ours26.340.9180.06029392.26x
Ours + SAP25.010.9100.06825552.60x

Citation

BibTeX

@article{dai2026rhymeflow,
  title={RhymeFlow: Training-Free Acceleration for Video Generation with Asynchronous Denoising Flow Scheduling},
  author={Dai, Chensheng and Zhang, Shengjun and Li, Yifan and Zhang, Zhang and Zhu, Zheng and Duan, Yueqi},
  journal={arXiv preprint arXiv:2606.06309},
  year={2026}
}