Flash Attention optimizes GPU memory usage in LLMs by replacing quadratic complexity with linear tiling, enabling longer contexts and faster inference speeds.