一秒出圖！NVIDIA 開源AI繪圖模型 SANA，低硬體需求，可在筆電上運行！

上週我關注的AI繪圖模型 SANA 終於開源了！它的特點是極快的生成速度、較少的參數量和較低的硬體需求，號稱可以在筆記型電腦上部署。雖然生成品質可能不及 FLUX.1 dev 或 SD3.5 Large，但 1024x1024 的解析度可以近乎一秒出圖，官方提供的 DEMO 圖片看起來也很不錯。當然，DEMO 圖片通常都是精挑細選的，所以本文後半段也會展示我實際測試的生成效果。SANA 模型有 0.6B 和 1.6B 兩種參數版本，硬體需求都比 FLUX.1 和 SD3.5 Large 低很多。今天就來跟大家簡單介紹一下 SANA！

SANA 模型介紹

SANA是由 NVIDIA、麻省理工學院以及中國清華大學共同研發。根據官方介紹，SANA 能夠高效生成高達 4096x4096 解析度的圖片，並且經過 int8 量化的模型可以在筆記型電腦上部署。

模型架構方面，SANA 團隊不使用傳統的 Auto Encoder，而是自行訓練了一個可以將影像壓縮 32 倍的 Auto Encoder，將潛在標記數量減少 16 倍，這對於高效訓練和生成高解析度圖像至關重要。

SANA 還引進了新的 Linear DiT 取代普通的二次注意力，將計算複雜度從 O(N^2) 降為 O(N)。

文字編碼器部分，SANA 使用 Google gemma2-2B 小型語言模型，而非 FLUX.1 和 Stable Diffusion 常用的 T5 或 CLIP。Gemma 提供了更佳的文字理解和指令遵循能力。SANA 團隊也設計了一套複雜的人工指令，讓 Gemma 模型更好地學習改善圖像文字對齊。

最後，Flow-DPM-Solver 將推理步數從 28-50 steps 減少到 14-20 steps。

SANA 的各種改進幾乎都朝著加快訓練和推理速度的方向，研究團隊提供的數據和架構圖也顯示，SANA 生成 4096x4096 圖片所需時間大幅減少。如果圖片品質有保證，這項技術突破未來可能讓更多裝置都能輕鬆使用文生圖模型。

如何使用 SANA

目前 SANA 權重已開源到 Hugging Face，使用 CC BY-NC-SA 4.0 協議（不可商用）。由於 ComfyUI 版本尚未正式推出，在 Windows 電腦上部署較為困難，因此這裡先介紹官方線上生成圖片用的 Demo。

線上 Demo 連結:
https://nv-sana.mit.edu/

進入線上 Demo連結

線上 demo 介面
輸入提示詞，按下 Run 生成圖片。

生成測試
Advanced Options 可調整解析度、`Sampling Steps、負面提示詞、風格、一次生成的張數。

Advanced Options
網頁底部提供範例提示詞。

範例提示詞

SANA 生成效果測試

接下來就給大家看一下我測試生成的圖片，同時我也給大家看看，一樣的提示詞使用 FLUX.1 dev 和 SD3.5 Large 生成的效果。

官方 Demo 圖片測試與重測

主題：女孩肖像

portrait photo of a girl, photograph, highly detailed face, depth of field

三者表現都還算不錯，只是對於 girl 這個詞理解的年紀不同，使用 SANA 重測一樣的提示詞，也可以得到差不多的效果。

主題：叢林中的太空人

Astronaut in a jungle, cold color palette, muted colors, detailed, 8k

三者表現都不錯，SD3.5 Large 畫面略平。SANA 生成效果與 Demo 圖片相似。

主題：宇宙漩渦星雲中的海盜船

Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumetric lighting, spectacular, ambient lights, light pollution, cinematic atmosphere, art nouveau style, illustration art artwork by SenseiJaye, intricate detail.

SANA Demo 圖片效果驚艷，但重複測試多次未能得到相似結果。 FLUX.1 dev 表現不錯，SD3.5 Large 表現較差。

其他提示詞測試

主題：戴眼鏡的老人肖像

old man with glasses portrait, photo, 50mm, f1.4, natural light, Pathéchrome

SANA 生成圖片有點詭異，介於一個想要寫實，但又不夠寫實的邊界，皮膚過於光滑，頭髮又糊在一起。

主題：一位亞洲女士

an asian woman

SANA 生成圖片尚可，但放大後可見降噪未完成的痕跡。

主題：一個女士站在雨中，撐著彩色的雨傘, 現實特寫

A realistic close-up of a serene woman standing in the rain under a colorful umbrella, capturing only her upper body. 
She is wearing a stylish trench coat, with raindrops gently falling around her. 
Her face is softly illuminated by the city lights in the background, with a peaceful and contemplative expression, and water droplets glistening on her hair and shoulders, creating a serene and tranquil atmosphere.

SANA 生成圖片崩壞，臉型、雨傘等全部都變形。

主題：一個穿黑長裙的女士，坐在消防栓上，背後有很高的建築

A woman in a long black dress commands attention as she sits perched atop a weathered fire hydrant. 
The camera angle, low and looking upwards, emphasizes the towering presence of the stately, vintage building behind her, its grand facade seemingly reaching for the sky. 
She leans back slightly, one hand casually resting on the hydrant while the other gently touches her hair, her silhouette a stark contrast against the imposing architecture. 
The interplay of light and shadow creates a dramatic chiaroscuro effect, adding to the cinematic quality of the image. 
The overall composition is one of power and elegance, juxtaposing the delicate beauty of the woman against the unyielding strength of the urban environment. Upper body close-up.

SANA 生成圖片依舊不佳，主要是有結構的東西，可能會出現在不對的位置。

主題：一台機器噴拍立得相片出來

A machine generating endless of polaroid images and blowing them up into the air. Realistic National geographic photo

SANA 表現意外地好，甚至比 SD 3.5 Large 還好，氛圍、創意都有到位。

主題：生成抽象、流體，用來當桌面背景的圖

Fluid abstract shapes, gradient colors, glowing textures, intricate details, hyperrealistic, 8k resolution

SANA 表現不錯，風格獨特。

主題：圖片中生成英文

SANA 較長一點就容易缺字，但是生成貓效果意外的好。

Rain-streaked brick wall, cyberpunk alleyway, glowing fluorescent graffiti depicts words "The Walking Fish," dripping paint mixes with rain, neon reflections dance across the wet surface, gritty and atmospheric

realistic cat, hold a cardboard with words "SANA" write on it.

主題：讓他生成按紅藍黃綠，順序擺放在桌上的球，用以測試指令遵循

Four balls are placed on the table, ordered by red, blue, yellow and green

共生成 4 張皆無法辦到。

主題：生成4位亞洲女士，用以測試指令遵循

4 asian women

生成了一張 4 個人，三張 3 個人。

總結

SANA 的優點在於生成速度快、硬體需求低。部分主題表現尚可，但整體而言生成的圖片經不起放大。未來 ComfyUI 版本推出後，我會測試在我的舊筆電 1650Ti 上運行 SANA 的情況。

影片介紹

暱稱

郵箱

網址

0/500

OωO
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
（╯‵□′）╯︵┴─┴
￣﹃￣
(/ω＼)
∠( ᐛ 」∠)＿
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ｀)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ(￣∇￣o)
ヾ(´･･｀｡)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò｡)
Σ(っ °Д °;)っ
( ,,´･ω･)ﾉ"(´っω･｀｡)
╮(╯▽╰)╭
o(*////▽////*)q
＞﹏＜
( ๑´•ω•) "(ㆆᴗㆆ)

😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣

颜文字
Emoji
Bilibili

0 則留言

沒有留言