Skip to content

Deep Time Series Embedding Compression

⬅️ [Disentangled Representation Learning](<./Disentangled Representation Learning.md>) | ⬆️ [Ideas](<./README.md>) | [Can we quality skills versus decisions](<./Can we quality skills versus decisions.md>) ➡️

For time series with tight correlation between prior and future samples, using a foundation model to embed every sample in the series is wasteful in downstream tasks.

For example, if we have a MLLM ingesting video having a full patch-wise image embedding for each frame adds many tokens to the context. Instead, why don't we use a keyframe system where we have the patch-wise embedding for a single frame and then one token per frame as and "update" to the keyframe.

This can be trained like an autoencoder with added context. During training, we compute the full patch-wise embeddings for each frame. Then we train an encoder that takes the keyembeddings and all prior update embeddings and computes a new update embedding along with a decoder that takes all of those plus the new update frame and regresses the original full embedding for that frame.


⬅️ [Disentangled Representation Learning](<./Disentangled Representation Learning.md>) | ⬆️ [Ideas](<./README.md>) | [Can we quality skills versus decisions](<./Can we quality skills versus decisions.md>) ➡️