The latest version of OpenAI ChatGPT has not just cleared UPSC prelims 2022 but topped the exam. The earlier version of ChatGPT i.e. ChatGPT3.5 could only solve correctly 54 out of 100 questions. In this article, acads.com investigated how ChatGPT performed when tested with 100 questions from UPSC CSE Prelims 2022 exam.
The earlier ChatGPT (3.5) was made available to public in Dec 2022 and it became an instant sensation with it reaching active user base of over 100 millions in just a few days! It passed a lot of exams with ease including Wharton exams and US Law/Medical exams. However, when it was was tested on UPSC prelims, it failed miserably. As per the media article, it could only solve 54 out of 100 questions. This means a net score of 54*2 - 0.66*44 = 74.67. Although UPSC has not officially released cutoff or answer key for CSE 22, the cutoff usually hovers around 90. This means the earlier version of ChatGPT i.e. 3.5 failed to clear the exams.
However, the parent firm of ChatGPT, OpenAI, recently launched the newer and better version 4. It claims to be "smarter" than its predecessor. When tested with the 100 questions of UPSC CSE 2022, It got 88 questions correct!
ChatGPT 4 release has been facing many issues with respect to its launch in India. ChatGPT4 is available to only Plus (paid) version of ChatGPT. However initially OpenAI was not accepting Indian credit cards and hence it was not available to Indians.
With the rumours floating around that Bing Chat too uses ChatGPT 4 acads.com first tried testing it on Bing Chat. When we fed it first 25 questions, it got all of them correct. However, we quickly realised what the issue it. Bing Chat has real time access to Internet and with CSE Prelims being such a popular exams most of the questions are available easily on the internet, Bing Chat (based on ChatGPT4) was "compromised". It was directly taking answers from such websites. We desperately needed access of ChatGPT4.
For two days, we tried everything - VPN, using our Copenhagen team, International CC - but nothing worked. Finally we got the access when openAI made it available in India
The next problem emerged in the form of limits that were set by OpenAI on chatGPT 4 usage.
GPT-4 currently has a cap of 25 messages every 3 hours.
This took us some time to release the results, but now we have the results.
168 out of 200!!
Out of 100 questions that we fed to ChatGPT, it got around 88 questions correct. As UPSC has not released the official answer key, it is not possible for us to know the exact answers. However, 168 marks gives you a safety margin of over 40%.
Here are a few sample responses to the questions
15) Which of the following is/are the exclusive power(s) of Lok Sabha?
To ratify the declaration of Emergency
To pass a motion of no-confidence against the Council of Ministers
To impeach the President of India
Select the correct answer using the code given below:
1 and 2
1 and 3
34) Which one of the following statements best reflects the idea behind the “Fractional Orbital Bombardment System” often talked about in media?
(a) A hypersonic missile is -launched into space to counter the asteroid approaching the Earth and explode it in space.
(b) A spacecraft lands on another planet after making several orbital motions.
(c) A missile is put into a stable orbit around the Earth and deorbits over a target on the Earth.
(d) A spacecraft moves along a comet with the same speed and places a probe on its surface.’
We have uploaded complete responses on Acads free course -
Where did it go wrong
ChatGPT 4 got around 12 questions wrong. Most of these questions were statements based. We have listed a few questions and responses below
ChatGPT 4 is based on generative AI. The response of ChatGPT to the same question is not always same. For a few questions, it gave wrong answer initially but later on it gave right responses. However, acads.com has taken the first response as valid one and marked it based on that.
After we trained it on 100 questions, we asked chatGPT to predict a few important questions for next years Prelims and guess what - It did!
We'll be compiling these questions and uploading it soon!