Master AI Video Creation

Grok Imagine xAI Prompt GuideMaster AI Image & Video Generation

Welcome to the Grok Imagine xAI Prompt Guide — your complete resource for crafting effective prompts for AI image generation and video generation. Whether you’re using Text-to-Video or Image-to-Video, these tips will help you produce cinematic, high-quality AI content powered by xAI Aurora and Grok Spicy.

Start Creating with Grok Imagine 1.0

Image-to-Video Prompt Tips for Grok Imagine xAI

Transform your static images into dynamic videos with these proven prompt strategies

1Basic Prompt Words for Image-to-Video

Prompt = Subject + Motion, Background + Motion, Camera + Motion...

Essential Guidelines

  • Basic structure: Since text-to-video already has a scene, try to reduce (or even avoid) descriptions of static/unchanged parts.
  • Simple and direct: The model will expand the prompt based on our expressions and understanding of the image, generating videos that meet expectations.
  • Feature description: When the main body has some prominent features, add the prominent features to better position the main body, such as "an old man," "a woman wearing sunglasses," etc.
  • Follow the picture You need to write based on the content of the input picture, and you need to clearly write the main body and the action or mirror movement you want to do. **It is necessary to pay attention that the prompt words should not contradict the facts of the picture content/basic parameters. **For example, there is a man in the picture, and the prompt word says "a woman is dancing"; for example, the background is grassland, and the prompt word says "a man is singing in a coffee shop"; for example, there are no accessories on the hand, and the prompt word says "the hand with accessories"; for example, the basic parameter selects a fixed lens, but the camera surround is written in the prompt word.
  • Negative prompts do not take effect The model does not respond to negative prompts.

Generate video (single subject + single action)

Input parameterIntermediate result (not displayed)Generate video (single subject + single action)
Input image
Prompt: The old man wears glasses
Basic parameters: Fixed lens, 720p, 10s

According to the input image, the model will obtain scene information:

Cool-colored, close-up shot of a bearded, middle-aged, white man with a furrowed brow, fierce eyes and a serious expression looking to the right. He has a gray beard, a wrinkled face, large, deep eyes, a high nose bridge, prominent nostrils, visible wrinkles and nasolabial folds around the eyes, and a light-colored shirt. The background is blurry.

2Multiple Consecutive Actions Prompt Words

The model has a strong response to the quality of multi-beat actions, supports multiple sequential actions of timing, and different actions of multiple subjects. You can try to write:

Prompt words = subject 1 + movement 1 + movement 2
Prompt words = subject 1 + movement 1 + subject 2 + movement 2...

Just list them in turn, and the model will expand the prompt words according to our expression and understanding of the image to generate a video that meets the expectations.

Input parameterGenerate video
Input image
Prompt: The camera focuses on the teacher in the background, the girl in the foreground becomes blurred, and the teacher curses very angrily
Basic parameters: Unfixed lens, 720p, 5s

3Mirror Movement Prompt Words

You can use natural language to describe the lens change you want in the prompt word, and support surround, aerial, zoom, pan, follow, handheld and other mirror movements, as well as lens switching.

  • When writing consistent multi-shot prompt words, it is necessary to write down the internal connection between the shots.
  • The change of shots is connected by the clear prompt word "camera switch".
  • If the scene changes after cutting, you need to describe the new scenario.
  • When there is a mirror movement prompt word, choose "not fixed lens" for the basic parameters.
  • The mirror movement prompt word also applies in Text-to-Video scenes.
Mirror movementPrompt wordsInput parameterGenerate video
Shot Switch
Shot Switch
Input image
Prompt: Kittens and puppies eat cat food.Shot Switch. Close-up cat food is distinct.
Basic parameters: Unfixed lens, 720p, 5s

4Adverb of Degree

If you want to highlight the frequency and intensity of the action in the video, or the characteristics of the subject, use the adverb of degree reasonably.

1.

clear: The model cannot obtain the degree of motion from the input reference map, so it must be clear in the prompt, otherwise the model will supplement according to its own understanding, which may deviate from the user's intention. For example, "car passing" is changed to "car passing quickly".

2.

Appropriately exaggerate the degree to enhance the video's expressiveness: such as "man's roar" is changed to "man's crazy roar", "wing flapping" is changed to "wing flapping greatly", it will be easier to approach the desired effect.

tip

Degree prompt words: quickly, violently, with large amplitude, at high frequency, powerfully, wildly...

Input parameterGenerate video
Input image
Prompt: An athlete in a professional tracksuit. Legs alternate rapidly, arms swing powerfully, sprinting with all his strength on the field. After crossing the finish line, the audience erupts in cheers. The screen uses a follow-shot perspective to fully showcase the details of the athlete crossing the finish line.
Basic parameters: Unfixed lens, 720p, 5s

Text-to-Video Tips for Grok Imagine xAI

Create videos from scratch using detailed text descriptions

1Basic Prompt Words

tip

Prompt words = subject + movement + scene + shot, style...

1.

Subject + motion + scene are the core and basic elements. The model will expand the prompt words to generate a video that meets expectations.

2.

With reference to the guidance on Prompt words for Image-to-Video, elements such as continuous actions, camera movements, and adverbs of degree also apply to Text-to-Video. Negative Prompts do not elicit responses.

3.

How to better describe what you need:

a.

Detailed character description: Pay attention to the appearance, dress, and posture of the characters.

b.

The presentation of environmental details: A detailed description of the natural environment (e.g. mountains, deserts, waterfalls, etc.) or the built environment (e.g. studios, bathrooms, etc.) can help you emphasize the visual and sensory experience of the scene.

c.

The marriage of emotion and dynamics: By depicting the emotional states and environmental dynamics of the characters, a rich narrative is created.

d.

Rendering the atmosphere: Usually we have some techniques for rendering the atmosphere in visual presentation. For example, we can use descriptions of light: dusk, early morning, dim, warm light, etc.

T2V

Prompt: Portrait photography with a sense of design, psychedelic cold light blue tones, butterfly light, close-up shot of a young white woman. She has high-level short black hair, raised eyebrows on the right, thick eyelashes, high nose bridge, biting red lips, and staring disdainfully at the camera. The camera pulls back, and the foreground is broken glass in the air, which blocks part of the woman's face.
Basic parameters: Unfixed lens, 720p, ratio 16:9, 5s
Prompt: In the winter valley where the snow is falling, a bright and beautiful young white woman in a close-up shot turns her head sideways to look at the camera. She has long black wavy hair, a pointed chin, bushy raised eyebrows, deep eye sockets, red eyes, dark eyeshadow and upturned eyeliner, straight nose, thick lips, very bright red lips, clear jaw line, very long nails, red manicure. The woman is wearing a black robe, wearing a hat to cover her eyebrows, and the collar is slightly open with clear collarbone. Her eyes are fixed on the camera, and her eyes are very attractive. The background is green vegetation covered with thick snow, and snowflakes fall in the air. The camera slightly surrounds the woman to the left, and the woman raises her right hand on her chin, looking at the camera with a charming smile.
Basic parameters: Unfixed lens, 720p, scale 9:16, 5s

Get started with Grok Imagine xAI today Grok Imagine

the best AI image generator and video generator for creators, marketers, and professionals.

Grok Imagine AI Video Generator Interface