Apple has partnered with Nvidia to enhance large language model (LLM) inference using Apple's open-source ReDrafter technology. This collaboration aims to improve efficiency and reduce latency in LLM applications by employing a novel speculative decoding approach. Integrated with Nvidia's TensorRT-LLM framework, ReDrafter boosts token generation speed and minimizes GPU usage, lowering costs and power consumption. Nvidia anticipates this will drive advancements in LLM capabilities and foster innovation in the AI community.