Blog posts and Articles

KV-Cache Refresh Methods for Long Generation Permalink

Authors: Yahya Emara, Woojeong Kim, Mohamed Abdelfattah

In this blog, we show how helpful KV cache refreshes can be for long generation from small models, along with efficient ways of finding when to refresh using inference algorithms.

The MCMC Inference Engine Behind a PPL

Authors: Yahya Emara

Overview of building an MCMC engine with practical implementation details.

Collaborative Filtering Methods for Paper Recommendation Systems

Authors: Yahya Emara

Describing collaborative filtering methods from classical SVD to NeuMF/GraphNeuMF/DMF and an ensemble to recommend papers.