Artificial Intelligence

OpenAI GPT-4 Latest Features, Image Input, How to Access & Use it

The fourth installment in OpenAI GPT-4 series is the Generative Pre-trained Transformer 4, a large multimodal language model. It became available on March 14, 2023, and is currently accessible to the public in a restricted form via ChatGPT Plus. However, access to its commercial API is only available through a waitlist.

OpenAI caused a significant stir with the recent launch of GPT-4. It is an update of ChatGPT Artificial Intelligence Chatbot Developed by OpenAI. This new generation of AI language model is a notable upgrade from its predecessor, boasting a much wider range of capabilities. If you’re already familiar with ChatGPT and other similar technologies, then you likely understand the potential impact of GPT-4 on the world of chatbots and AI in general.

However, if you’re not well-versed in language models or GPT-4 specifically, don’t worry; we’ve got you covered. We’ve searched through OpenAI’s blogs and scoured the internet to put together a comprehensive guide on GPT-4. So if you’re someone who’s unfamiliar with this technology, grab a cup of coffee, sit down, and let us tell you everything you need to know about this exciting AI model.

GPT-4: Everything You Need to Know (2023)

Gathering information about GPT-4 can be overwhelming due to its broad range of topics. To assist with sorting through the information, we have provided a table below. Please refer to it if you need to quickly find information on a specific aspect of the model.

What is GPT-4?

In simple terms, GPT-4 is the latest addition to OpenAI’s large language model systems (LLM). These systems aim to predict the next word in a sentence and intelligently incorporate their inputs. By studying a vast dataset, they can identify patterns and use them to improve their performance.

Compared to its predecessors like GPT-3 and 3.5, GPT-4 is expected to be a significant improvement. It boasts several specific enhancements that we will explore further below. Notably, this new model will enhance the capabilities of chatbots like ChatGPT and MS Bing, enabling them to provide better responses, more creative designs, and improved performance with both old and new ChatGPT prompts.

GPT-4 Is Multimodal

One of the most notable features of GPT-4 is that it is multimodal, a significant improvement over previous models. While earlier models could only interpret text inputs, GPT-4 can now handle prompts containing both text and images.

This means that GPT-4 can not only receive images but can also interpret and understand them. This capability applies to prompts that include both text and visual inputs. Additionally, GPT-4’s multimodal capability extends to all types and sizes of images and text, including documents with text and photos, hand-drawn diagrams, and screenshots. Despite this multimodal input, GPT-4’s output remains as effective as it is with text-only inputs.

GPT-4 Demostration

During a developer livestream hosted by OpenAI, the company demonstrated GPT-4’s multimodal capabilities.

In the stream, GPT-4 was given a screenshot of a Discord window and asked to provide a detailed description. GPT-4 produced an incredibly precise and detailed response in just over a minute, capturing nearly every aspect of the input screen, from the server name in the top left corner to the various voice channels and even identifying all online Discord members in the right pane.

OpenAI, the company showcased GPT-4’s multimodal nature.

Further tests were conducted where individuals submitted various random artworks, including a photograph of a squirrel holding a camera, and GPT-4 was asked to identify the humorous aspect of the image. Once again, it provided a response akin to that of a human, stating that the picture was funny because squirrels generally do not behave like humans, but instead consume nuts. See my previous post Tips to Get Text from Image Automatically with AI to learn more.

Moreover, as previously mentioned, the model’s specificity extends beyond screenshots and applies to all types of text and image inputs. OpenAI demonstrated this when Greg uploaded a hand-drawn mockup of a joke website to GPT-4’s API-connected Discord server and asked the model to “write brief HTML/JS code to turn the page into a website and replace the jokes with actual ones.”

GPT-4 and HTML/JS Code Generator

As previously stated, the model’s capability extends beyond just screenshots and encompasses various types of text and image inputs. OpenAI demonstrated this by having Greg take a picture of a hand-drawn mockup of a humorous website, which was then uploaded to GPT-4’s API-linked Discord server. The model was subsequently prompted to generate concise HTML/JS code that would transform the page into an actual website and substitute the jokes with genuine ones.

It’s astonishing that GPT-4 was able to generate functional code for a website, and that pressing buttons on the site would reveal jokes. What’s truly mindblowing is that this was achieved through a combination of text and image inputs, including the model’s ability to decipher human handwriting. GPT-4’s ability to handle multiple types of input is a major step towards AI being able to understand prompts accurately and deliver precise results.

Although there were no major issues with the code, OpenAI acknowledged that GPT-4 could benefit from improvements in speed, which may take some time to achieve. Additionally, the model’s visual input capabilities are still being researched and are not yet publicly available.

How Is GPT-4 Better than GPT 3.5/ GPT-3?

Incredible as it may seem, GPT-4 successfully generated working code based on the input provided. Testing the code resulted in a fully functional website where pressing buttons revealed the jokes. GPT-4’s ability to decipher human handwriting and produce code from a combination of text and image inputs is truly remarkable. This multimodal capability represents a significant advancement in AI’s ability to comprehend prompts and deliver accurate results.

1. Better Understanding Nuanced Prompts

While the test produced no significant issues, OpenAI has acknowledged that GPT-4’s speed could be improved and that visual inputs are still in research preview and not publicly available.

Aside from its remarkable multimodal approach, GPT-4 surpasses its predecessors in various other areas. For instance, it can better comprehend nuanced prompts, a skill that is not immediately apparent when comparing it to GPT-3.5. To illustrate this point, OpenAI subjected both models to a battery of human-level exams without any specific training. The results demonstrated GPT-4’s superior performance over GPT-3.5.

2. Exponentially Larger Word Limit

The data speaks for itself and shows that GPT-4 outperformed its predecessor in all the exams. While some exams, like SAT EBRW, only saw a minor improvement, there was a significant leap in performance for other exams such as the Uniform bar exam, AP Chemistry, and more. OpenAI reported that “GPT-4 is also more dependable, innovative, and capable of handling nuanced instructions than GPT-3.5,” indicating that the bot can effortlessly comprehend more intricate prompts.

GPT-4 brings two significant improvements over its predecessors. Firstly, it can process inputs with an astonishing word limit of 25,000, which is significantly larger than GPT-3.5’s limit of 8,000 words. This means that users can now provide the bot with more detailed and lengthier inputs for it to analyze and generate responses. This is excellent news for developers, who can use new APIs and documentation to enhance their chatbots and obtain assistance in writing and fixing code.

3. Supports more languages

Secondly, GPT-4 supports more than 26 different languages, including Ukranian, Korean, Germanic languages, and many others. This is a substantial improvement over ChatGPT, which was primarily used by English speakers worldwide.

OpenAI evaluated GPT-4’s performance on MMLU benchmarks by translating them into multiple languages. Results showed that out of 26 languages, GPT-4 outperformed GPT-3.5’s English-language performance in 24 of them. However, further data training is necessary before GPT-4 can fully support all languages.

4. Different Personalities

GPT-4 brings the concept of steerability to a whole new level. This means that developers can now instruct the AI to act in a certain way with a specific tone of speech. For instance, one can ask ChatGPT to behave like a cowboy or a police officer by assigning it a particular role using the ChatGPT API. GPT-4 is designed to maintain its character while having steerability, and OpenAI is working on making the system message more secure as it is susceptible to jailbreaking. Additionally, developers can now establish their AI’s style right from the start by describing the directions in the system message.

According to the demos presented in OpenAI’s blog post, it was amusing to see a user attempting to make GPT-4 stop being a Socratic tutor and simply provide a direct answer to their question. However, since GPT-4 was programmed to behave as a tutor, it refused to deviate from its character, which is similar to what developers can anticipate when training their bots in the future.

Possible Applications of GPT-4?

Although GPT-4’s multimodal application is not yet available to the public, OpenAI has already partnered with Be My Eyes, an app designed for the visually impaired. By incorporating GPT-4 into the app, users can take a picture of their surroundings and the AI will accurately describe what’s on the screen. This includes identifying objects such as clothing, plants, gym equipment, and even reading maps.

GPT-4 has formed partnerships with several other apps such as Duolingo, Khan Academy, and the government of Iceland to aid in language preservation and intelligent learning. Although the GPT-4 API is only available through a waitlist at the moment, the release of the API is anticipated to inspire developers to create innovative experiences. However, there are already live applications such as the one with Be My Eyes that people can use even before the release of the API.

Does GPT-4 Have Any Limitations?

Despite its hype as the future of AI, GPT-4 faces some obstacles. One of them is its limited knowledge of events that have taken place after September 2021. Additionally, the model does not learn from experience, which means it can make reasoning mistakes and may even accept false statements from users.

Similar to humans, GPT-4 is also prone to failures. It can make mistakes, hallucinate, and make confidently wrong predictions. As with GPT 3.5, GPT-4 may not always double-check its work, leading to errors. However, OpenAI assures that GPT-4 has undergone better training to minimize these issues. In fact, the model has shown a 35% improvement in reducing hallucinations in OpenAI’s internal adversarial factuality evaluations compared to GPT 3.5. Although its performance has improved, it is still advisable to consider human input when using GPT-4’s outputs.

OpenAI Evals – Make GPT-4 better together

OpenAI employs its own software framework to build and execute benchmarks for models such as GPT-4. Although it is an open-source framework, the company has provided a few templates that are widely used. OpenAI has stated that evaluations will play a critical role in crowdsourcing benchmarks, which can help enhance the training and performance of GPT-4.

Consequently, OpenAI has extended an invitation to every GPT-4 user to test their models against benchmarks and share their examples. For more details, you can refer to OpenAI’s GPT-4 research page.

How to Get Access to GPT-4 Right Now

Rewritten: GPT-4 is not yet available to everyone as OpenAI has only released it to ChatGPT Plus subscribers, with some usage restrictions. These subscribers have access to two versions of GPT-4, but with different limitations. Some can use the 32K engine, which allows for longer word limits, while others are limited to the 8K version. The availability of these versions is subject to change depending on demand. Those who are interested in accessing GPT-4 right away can refer to OpenAI’s guide on how to do so on the ChatGPT Plus platform.

If obtaining ChatGPT Plus seems like a hassle, you may still find it interesting that GPT-4 is already being utilized by Microsoft Bing. Although you won’t have the same freedom to tinker with the language model as with OpenAI, there are still opportunities to experiment and explore its capabilities.

Frequently Asked Questions (FAQs)

1. Has ChatGPT already integrated GPT-4?

Yes, GPT-4 is already integrated into ChatGPT for ChatGPT Plus subscribers. You can start using it by selecting the correct model after signing in or follow the link provided above to learn how to become a ChatGPT Plus subscriber.

2. Is GPT-4 available for free use?

No, GPT-4 is not currently available for free use. It requires a ChatGPT Plus subscription which costs $20 per month. OpenAI has expressed the possibility of offering free GPT-4 queries in the future or introducing a new subscription tier for better access to new AI language models.

3. Can GPT-4 be relied upon completely?

No, GPT-4 cannot be relied upon completely as it still has limitations such as occasional hallucinations and confidently providing wrong answers. It also lacks knowledge of events after September 2021. While it has improved compared to GPT 3.5, it should still be used in conjunction with human judgment.

4. What is the dataset size of GPT-4?

The rumors of GPT-4 having 100 trillion parameters are most likely false. OpenAI CEO Sam Altman indirectly refuted those rumors in an interview and called the “GPT-4 rumor mill” ridiculous.

OpenAI has set modest expectations for GPT-4’s dataset size and has refrained from disclosing a specific number. Whether or not this information will be disclosed in the future remains to be seen. However, despite this, we anticipate that GPT-4 will perform impressively based on its initial demonstration.

5. How has GPT-4 been trained?

Similar to its predecessors, GPT-4’s underlying model has been trained to forecast the next word in a document. The data used in this process comprises both publicly available data and OpenAI’s proprietary data, which includes a mix of accurate and incorrect information, strong and weak reasoning, self-contradictory statements, and various other ideas. This diversity of data enables GPT-4 to draw on a broad range of inputs and identify the intended meaning of a given prompt.

Get Ready for OpenAI’s New Multimodal GPT-4 AI Model

We hope this explanation has provided you with a better understanding of GPT-4. This new model presents a plethora of opportunities and generates excitement for everyone. Once it is fully integrated into ChatGPT for everyone, it will be fascinating to see how people use the new model to create unique experiences. Nevertheless, you don’t have to wait for GPT-4 to try out ChatGPT. Explore all the cool features of ChatGPT and even integrate it with Siri or get it on your Apple Watch! What are your thoughts on this exciting new model?