• MiniGPT-4 combines a frozen visual encoder with a frozen large language model (LLM) using one projection layer • Generates detailed image descriptions, creates websites from hand-written drafts, writes stories and poems inspired by images, provides solutions to problems shown in images, and teaches users how to cook based on food photos • Highly computationally efficient, only requires training the linear layer to align the visual features with the Vicuna using approximately 5 million aligned image-text pairs