Aiffel) Exploration - ChatGPT, STable Diffusion 정리

January 29, 2024 5 분 소요

내용 출처 : Aiffel

학습목표

ChatGPT와 Stable Diffusion 모델을 사용할 수 있습니다.
ChatGPT와 Stable Diffusion의 원리를 설명할 수 있습니다.
Stable Diffusion 모델을 사용하여 이미지를 생성할 수 있습니다.

참고 자료

Attention Mechanism이 처음 소개된 논문: Neural Machine Translation by Jointly Learning to Align and Translate
Self-Attention 논문: Attention is All you need
GPT3 논문: Language Models are Few-Shot Learners
ImageGPT
- OpenAI의 ImageGPT 소개
- 논문: Generative Pretraining from Pixels
DALL-E
- OpenAI의 DALL-E 소개
- 논문: Zero-Shot Text-to-Image Generation
DALL-E 2
- OpenAI의 DALL-E 2 소개
- 논문: Hierarchical Text-Conditional Image Generation with CLIP Latents
Diffusion 모델 논문: Denoising diffusion probabilistic models
Stable Diffusion 논문: High-Resolution Image Synthesis with Latent Diffusion Models
OpenAI의 ChatGPT 소개
- ChatGPT는 GPT3 기반으로 만들어진 모델이라 논문이 없습니다.
RLHF 관련 논문: Training language models to follow instructions with human feedback

1. ChatGPT를 이용하여 이미지 생성 프롬프트 만들기

1) AIPRM for ChatGPT 설치 및 계정 연동하기

크롬 확장 프로그램인 AIPRM for ChatGPT를 설치 후, ChatGPT 실행.
AIPRM for ChatGPT와 ChatGPT를 연결하겠냐 메시지 팝업
Continue 클릭

다음 창이 나오면 I have read and agree to the terms of use for AIPRM and Privacy policy에 체크하고, Connect with Google account를 클릭

구글 계정을 선택, 엑세스를 허가한 후 이메일로 승인

2) AIPRM for ChatGPT 사용하기

세팅 완료 시 ChatGPT가 아래와 같은 화면으로 바뀜

이미지 프롬프트를 얻기 위해 필요한 몇 가지 세팅
- Topic: Generative AI
- Activity: Stable Diffusion
- Sort by: Top Votes(원하는 것을 선택해 주세요)

프롬프트 엔지니어링 가이드 중 하나를 선택, 이미지의 키워드를 적어보세요.
- Output in: English

결과

  ((Best quality)), ((masterpiece)), ((realistic)), "Castle in the Sky" on eye level, scenic, masterpiece.

  Subject: Castle
  Action: Floating
  Context: Fantasy world
  Environment: In the clouds
  Lighting: Bright, sunbeams
  Artist: John Howe
  Style: Realism
  Medium: Oil on canvas
  Type: Landscape
  Color Scheme: Pastel
  Computer graphics: Cycles
  Quality: 8K

  Positive Prompt:
  A magnificent castle floats gracefully in the clouds, basking in the warm glow of the sunbeams. Its grandeur and majesty are accentuated by the pastel hues of the sky, making it seem like a scene straight out of a fairy tale. The realistic detailing in the oil painting captures every intricate feature of the castle, giving it a lifelike quality. John Howe's masterful use of realism and the subtle play of light and shadow create a breathtaking landscape that transports the viewer to a magical world.

  Negative Prompts:
  Cropped image, out of frame, grainy, low-res, blurry, bad anatomy, poorly drawn, bad art

2. Stable Diffusion 모델로 이미지 생성

모델 준비

  ! pip install --upgrade -qq git+https://github.com/huggingface/diffusers.git transformers accelerate

RuntimeError: CUDA out of memory가 발생할 때
```
  torch.cuda.empty_cache()
```

1) Text-to-Image_Generation

Text-to-Image Generation모델 세팅

  import torch
  from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
    
  device = "cuda"

  # 파이프라인 불러오기
  repo_id = "stabilityai/stable-diffusion-2-base"
  pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16)

  pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
  pipe = pipe.to(device)

이미지 생성

  # 이미지 저장 폴더 만들기  
  import os
  os.mkdir("/aiffel/aiffel/diffusers")

  # 입력한 프롬프트를 사용하여 이미지 생성
  image = pipe(prompt, num_inference_steps=25).images[0]

  # 이미지 저장
  image.save("/aiffel/aiffel/diffusers/image.png")  

  # 이미지 출력
  image

여러 개의 이미지 생성

  # 파이썬 이미지 처리 라이브러리 pillow 불러오기
  from PIL import Image  
    
  # 틀 만들기
  def image_grid(imgs, rows, cols):
      assert len(imgs) == rows * cols
      w, h = imgs[0].size  
      grid = Image.new('RGB', size=(cols * w, rows * h))
      grid_w, grid_h = grid.size

      for i, img in enumerate(imgs):
          grid.paste(img, box = (i%cols * w, i // cols * h))
      return grid

  # 이미지의 개수
  num_images = 6

  # 프롬프트 입력
  prompt = ['a horse riding a person'] * num_images

  # 이미지 생성
  images = pipe(prompt).images
    
  # 이미지 출력
  grid = image_grid(images, rows= 3, cols= 2)
  grid

2) Image-to-Image Generation

Image-to-Image Generation모델 세팅

  # Image-to-Image Generation 파이프라인 불러오기    
  from diffusers import StableDiffusionImg2ImgPipeline

  device = "cuda"
  model_path = "CompVis/stable-diffusion-v1-4"

  pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_path, torch_dtype=torch.float16)
  pipe = pipe.to(device)

하나의 이미지 가져오기

  import requests  
  from io import BytesIO

  url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"

  # url 호출하기
  response = requests.get(url)
    
  # 이미지 열기
  # 이미지를 메모리로 읽어와서 RGB로 변경합니다.
  init_img = Image.open(BytesIO(response.content)).convert("RGB")  
  init_img = init_img.resize((768, 512))  # 이미지의 크기를 조절합니다.
  init_img

프롬프트 : 스타일 설정

  generator = torch.Generator(device=device).manual_seed(1024)   # 모델을 사용할 때마다 동일한 이미지를 생성하기 위해 seed를 설정합니다.  

  prompt = "A fantasy landscape, trending on artstation"
  images = pipe(prompt=prompt, image=init_img, strength=0.75, guidance_scale=7.5).images

  images[0].save("/aiffel/aiffel/diffusers/fantasy_landscape.png")
  images[0]

여러 개의 이미지 생성

  num_images = 2

  # 프롬프트 입력
  prompt = ['A fantasy landscape, trending on artstation'] * num_images

  # 이미지 생성
  generator = torch.Generator(device=device).manual_seed(1024)
  images = pipe(prompt=prompt, image=init_img, strength=0.9, guidance_scale=13.5, num_inference_steps=50, generator=generator).images
  images

  # 이미지 출력
  grid = image_grid(images, rows=1, cols=2)
  grid

3. ControlNet으로 조건을 준 이미지 생성

Stable Diffusion은 이미지가 원하는 자세, 구도, 배경을 갖도록 하는 것은 거의 불가능
ControlNet은 Diffusion 모델에 추가 조건을 추가하여 출력되는 이미지를 쉽게 제어
출처: https://arxiv.org/pdf/2302.05543.pdf

라이브러리 준비

  ! pip install --upgrade -qq git+https://github.com/huggingface/diffusers.git transformers accelerate

1) 윤곽선 검출

Canny 알고리즘을 사용

이미지 불러오기

  import torch
  from diffusers import StableDiffusionControlNetPipeline
  from diffusers.utils import load_image

  # 이미지 불러오기
  image = load_image("https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png") 
  image

OpenCV를 사용, 이미지 윤곽선 검출

  import cv2
  from PIL import Image
  import numpy as np
    
  # 이미지를 NumPy 배열로 변환합니다.
  image = np.array(image)
    
  # threshold를 지정합니다.
  low_threshold = 100
  high_threshold = 200
    
  # 윤곽선을 검출합니다.
  image = cv2.Canny(image, low_threshold, high_threshold)
  image = image[:, :, None]
  image = np.concatenate([image, image, image], axis=2)
  canny_image = Image.fromarray(image)  # NumPy 배열을 PIL 이미지로 변환합니다.
  canny_image

모델 파이프라인 불러오기

  from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
    
  canny_controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
  canny_pipe = StableDiffusionControlNetPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", controlnet=canny_controlnet, torch_dtype=torch.float16)

윤곽선 이미지 + 프롬프트 이미지 생성

  from diffusers import UniPCMultistepScheduler
  canny_pipe.scheduler = UniPCMultistepScheduler.from_config(canny_pipe.scheduler.config)
  canny_pipe = canny_pipe.to("cuda")

  # 동일한 이미지를 생성하기 위해 seed를 지정합니다.
  generator = torch.manual_seed(0)  

  # 이미지를 생성합니다.
  canny_image = canny_pipe(
      prompt="disco dancer with colorful lights",
      num_inference_steps=20,
      generator=generator,
      image=canny_image
  ).images[0]
    
  # 생성된 이미지를 저장합니다.
  canny_image.save("/aiffel/aiffel/canny_image.png")  
    
  # 생성된 이미지를 출력합니다.
  canny_image

negative_propmt: 원하지 않는 것을 적는 프롬 프트
controlnet_conditioning_scale: ContorlNet으로 조건을 어느 정도 주느냐를 조절
num_inference_steps: 추론 횟수(큰 숫자일 수록 고해상도 이미지가 출력)
guidance_scale: 프롬프트에 근접한 이미지를 생성 정도(높으면 이미지 품질이 떨어질 수 있다)

2) 인체 자세 감지

라이브러리 설치

  # controlnet-aux를 설치합니다. Human pose를 검출해주는 controlnet의 보조용 모델입니다.
  !pip install controlnet-aux==0.0.1

사용할 이미지 불러오기

  from diffusers.utils import load_image
  openpose_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/person.png")
  openpose_image

자세 검출

  from controlnet_aux import OpenposeDetector

  # 인체의 자세를 검출하는 사전 학습된 ControlNet 불러오기
  openpose = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
    
  # 이미지에서 자세 검출
  openpose_image = openpose(openpose_image)
  openpose_image

모델 파이프라인 불러오기

  from diffusers import StableDiffusionControlNetPipeline, ControlNetModel

  openpose_controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", torch_dtype=torch.float16)
  openpose_pipe = StableDiffusionControlNetPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", controlnet=openpose_controlnet, torch_dtype=torch.float16)

자세 토대로 이미지 생성성

  from diffusers import UniPCMultistepScheduler

  openpose_pipe.scheduler = UniPCMultistepScheduler.from_config(openpose_pipe.scheduler.config)
  openpose_pipe = openpose_pipe.to("cuda")

  # Q. 코드를 작성해 보세요.
  # 동일한 이미지를 생성하기 위해 seed를 넣어줍니다.
  generator = torch.manual_seed(0)
    
  #프롬프트를 작성합니다.
  prompt =  "powerful dance"
  negative_prompt =  "smile"
  images = openpose_image
    
  # 이미지를 생성합니다.
  openpose_image = openpose_pipe(prompt=prompt, negative_prompt=negative_prompt, num_inference_steps=20, generator=generator, image=images).images[0]

  # 생성된 이미지를 출력합니다.
  openpose_image

3) 윤곽선 검출 + 인체 자세 감지

두 개의 파이프 라인을 활용한 사진

  from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler  

  # Q. 코드를 작성해 보세요.
  # Edge Detection과 Openpose, 2개의 전처리기를 controlnets라는 리스트로 만듭니다.
  controlnets = [openpose_controlnet, canny_controlnet]

  # 리스트 controlnets를 파이프라인으로 전달합니다.
  pipe = StableDiffusionControlNetPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", controlnet=controlnets, torch_dtype=torch.float16)
  pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
  pipe = pipe.to("cuda")
    
  # 프롬프트를 작성합니다.
  prompt = 'happy night people dancing'
  negative_prompt = 'angry'

  # seed를 지정합니다.
  generator = torch.manual_seed(0)
  images = [openpose_image.resize(np.array(canny_image).shape[1::-1]), canny_image]

  # 이미지를 생성합니다.
  image = pipe(prompt=prompt, negative_prompt=negative_prompt, num_inference_steps=20, generator=generator, image=images).images[0]

  # 생성된 이미지를 저장합니다.
  image.save("multi_controlnet_output.png")
    
  # 생성된 이미지를 출력합니다.  
  image

✨ 관련 프로젝트 링크

github_file_link

Twitter Facebook LinkedIn