What is the extent of changes in LLM performance and the possible reasons behind them? — When the ‘steering’ of AI worth the squeezing?


health strategy review

management, engineering and 
technology review


Joaquim Cardoso MSc.

Senior Research and Strategy Officer (CRSO), 
Chief Editor and Senior Advisor


January 22, 2023

Diagram of how RLHF is built atop the pretrained model to steer that pre-trained model to more useful behavoopr.


The rapid advancements in artificial intelligence (AI) have led to an increased focus on the “steering” or “alignment” of large language models (LLMs) to ensure they align with human values and produce relevant and safe responses. 


This report examines the changing behavior of LLMs, specifically GPT-3.5 and GPT-4, over a short interval of eight months. 

The findings indicate a noticeable decline in performance, with LLMs becoming less argumentative, more obsequious, less insightful, and less creative.


The study, conducted by Zou and colleagues at Berkeley and Stanford, raises important questions about how we assess, regulate, and monitor AI 


One central question revolves around the extent of changes in LLM performance and the possible reasons behind them. 


While it’s possible that simplifications and optimizations might have impacted the models, evidence suggests that active steering through techniques like Reinforcement Learning from Human Feedback (RLHF) plays a crucial role in shaping LLM behavior.


active steering through techniques like Reinforcement Learning from Human Feedback (RLHF) plays a crucial role in shaping LLM behavior.


RLHF involves human-generated prompts and responses that help steer LLMs to produce discourse relevant to human users. 


Without such steering, LLMs often generate syntactically correct but uninteresting sentences. 


The success of models like ChatGPT highlights the narrow effectiveness of RLHF in generating human-relevant responses and curbing unexpected antisocial behavior.


However, there are concerns about the extensive connections between pre-trained models and RLHF models, which could lead to unintended consequences beyond the narrow set of cases considered during the steering phase


This opens up exciting research avenues to explore the broader implications of modifying LLM outputs through RLHF.


Notably, the process of steering through RLHF allows for a more explicit incorporation of societal or personal values into LLMs, making it a critical area of interest for policymakers, ethicists, and all stakeholders invested in the safe deployment of AI systems.


Key Messages:


  • The study underscores the importance of understanding the effects of steering LLMs and highlights the need for ongoing research to ensure AI aligns with human values and produces valuable and responsible outputs. 

  • As AI continues to advance, addressing these issues will be crucial for harnessing the full potential of AI technology while mitigating potential risks.


This is an Executive Summary of the article “When the ‘steering’ of AI worth the squeezing?”, published by Zaklab Blog” on July 19, 2023.


To read the full version of the original publication

http://www.zaklab.org
July 19, 2023″






REFERENCE PUBLICATION








How Is ChatGPT’s Behavior Changing over Time? 


Lingjiao Chen† , Matei Zaharia‡ , James Zou† †Stanford University ‡UC Berkeley


https://arxiv.org/pdf/2307.09009.pdf

Originally published at http://www.zaklab.org.

Total
0
Shares
Deixe um comentário

O seu endereço de e-mail não será publicado. Campos obrigatórios são marcados com *

Related Posts

Subscribe

PortugueseSpanishEnglish
Total
0
Share