0

Visual ChatGPT: Microsoft GPT-4 Coming Soon

ChatGPT ushers in a new era in the generative artificial intelligence industry. With the success of ChatGPT, more and more artificial intelligence tools have emerged. Over the past few years, Microsoft has taken steps to improve generative AI tools. However ChatGPT is a text-based language model, which does not have the same capabilities as DALL-E 2 or Wombo Dream. However, with the launch of Visual ChatGPT, that will change.

What is Visual ChatGPT?

ChatGPT is a text-only chatbot without the ability to generate images or videos, and GPT-4 will change that. Visual ChatGPT can generate, modify or crop images. It combines features of ChatGPT and other VFMs, such as Stable Diffusion, connecting ChatGPT and a series of Visual Foundation Models to send and receive images during chat.

Visual ChatGPT helps users generate images from text prompts. Although right now it lacks features that other AI tools like Stable Diffusion have.

Microsoft stated that “Instead of training a new multimodal ChatGPT from scratch, we built Visual ChatGPT directly based on ChatGPT and combined various VFMs.”

GPU Memory Usage

Visual ChatGPT requires high GPU and computing power. The GPU memory usage of each vision base model is as follows:

Foundation ModelMemory Usage
ImageCaption1755
ImageEditing6667
T2I6677
line2image6679
canny2image5540
hed2image6679
pose2image6681
scribble2image6679
BLIPVQA2709
depth2image6677
seg2image5540
normal2image3974
InstructPix2Pix2795

As we mentioned above, while ChatGPT is trained to provide users with text-based answers, it lacks image or video creation. And Visual ChatGPT can change this:

  • Not only words are sent and received, but also images.
  • Providing complex visual questions or visual editing instructions requires multi-step collaboration of multiple AI models.
  • Provide feedback and request corrections to results.

When Will GPT-4 Release?

The CTO of Microsoft Germany issued a statement on March 9 that GPT-4 will be released “next week”. GPT-4 will be a multimodal LLM capable of creating images and videos from text cues on top of GPT-3.5’s text cues capabilities. Click here to see more information about Visual ChatGPT on the official Github.

vanceinews

Leave a Reply