芝加哥城市新聞署(City News Bureau of Chicago)是一家目前已經(jīng)倒閉的新聞機(jī)構(gòu),曾經(jīng)被譽(yù)為培訓(xùn)意志堅(jiān)定的實(shí)地報(bào)道記者的傳奇基地,該機(jī)構(gòu)有一句著名的非官方格言:“如果你的母親說(shuō)她愛(ài)你,那也得去核實(shí)一下?!倍嗵澚薈hatGPT、新版必應(yīng)搜索(Bing Search)、Bard和大量基于大型語(yǔ)言模型的山寨搜索聊天機(jī)器人的出現(xiàn),我們不得不奉行該機(jī)構(gòu)的古老信條。
研究人員已經(jīng)知道,對(duì)于搜索查詢(xún)或任何基于事實(shí)的請(qǐng)求來(lái)說(shuō),大型語(yǔ)言模型訓(xùn)練出來(lái)的引擎都遠(yuǎn)非完美,因?yàn)檫@樣的引擎傾向于編造事物(人工智能研究人員稱(chēng)之為“幻覺(jué)”現(xiàn)象)。但科技公司巨頭認(rèn)為,可以進(jìn)行對(duì)話(huà)的用戶(hù)界面帶來(lái)的“利”大于“弊”(提供的信息不準(zhǔn)確或是提供了錯(cuò)誤信息),這些大型語(yǔ)言模型能夠執(zhí)行大量從翻譯到做總結(jié)的自然語(yǔ)言相關(guān)任務(wù),還可以將這些模型與其他軟件工具結(jié)合起來(lái)執(zhí)行任務(wù)(無(wú)論是進(jìn)行搜索還是預(yù)訂劇院門(mén)票)。
當(dāng)然,當(dāng)這些系統(tǒng)產(chǎn)生幻覺(jué)時(shí),可能會(huì)造成真正的損害——甚至當(dāng)它們沒(méi)有產(chǎn)生幻覺(jué)時(shí),只是從訓(xùn)練數(shù)據(jù)中學(xué)習(xí)了一些與事實(shí)有出入的東西,也會(huì)造成真正的損害。Stack Overflow不得不禁止用戶(hù)提交使用ChatGPT生成的編碼,因?yàn)樵摼W(wǎng)站上充斥著看似合理但實(shí)則錯(cuò)誤的代碼??苹秒s志《克拉克世界》(Clarkesworld)不得不停止接受投稿,因?yàn)楹芏嗳颂峤坏墓适虏⒉皇撬麄冏约簞?chuàng)作的,而是ChatGPT創(chuàng)作的。一家名為OpenCage的德國(guó)公司提供能夠進(jìn)行地理編碼的應(yīng)用程序接口,該接口可以將物理地址轉(zhuǎn)換為能夠標(biāo)記在地圖上的經(jīng)緯度坐標(biāo)。該公司表示,由于ChatGPT的推薦出錯(cuò)(將其應(yīng)用程序接口作為一種僅根據(jù)號(hào)碼就可以查找手機(jī)位置的方法做了推薦),他們不得不應(yīng)對(duì)越來(lái)越多大失所望的注冊(cè)用戶(hù)。ChatGPT甚至還幫助用戶(hù)編寫(xiě)了python代碼,允許他們?yōu)榇四康恼{(diào)用OpenCage的應(yīng)用程序接口。
但是,正如OpenCage被迫在一篇博文中解釋的那樣,這不是它提供的服務(wù),也不是使用該公司的技術(shù)能夠?qū)崿F(xiàn)的。OpenCage表示,ChatGPT之所以有這樣錯(cuò)誤的想法,是因?yàn)樗鼜腨ouTube的視頻教程中學(xué)習(xí)了相關(guān)內(nèi)容,有人聲稱(chēng)OpenCage的應(yīng)用程序接口可以用于反向推斷手機(jī)地理定位,其實(shí)這種說(shuō)法是錯(cuò)誤的。但是,那些教程只說(shuō)服了少數(shù)人注冊(cè)O(shè)penCage的應(yīng)用程序接口,而ChatGPT卻促使人們成群結(jié)隊(duì)地注冊(cè)O(shè)penCage。OpenCage寫(xiě)道:“關(guān)鍵的區(qū)別在于,人們?cè)诮邮芩说慕ㄗh時(shí)持懷疑態(tài)度,例如在視頻編碼教程學(xué)習(xí)時(shí),人們也會(huì)持懷疑態(tài)度。但在人工智能或ChatGPT方面,我們似乎還沒(méi)有把這一點(diǎn)內(nèi)化于心。我想我們最好把這一點(diǎn)內(nèi)化于心,保持適當(dāng)?shù)膽岩蓱B(tài)度?!?/p>
與此同時(shí),在一系列關(guān)于其基于OpenAI的新版必應(yīng)聊天功能的陰暗面的報(bào)道引發(fā)人們擔(dān)憂(yōu)后——聊天機(jī)器人自稱(chēng)希德尼,變得很暴躁,有時(shí)甚至充滿(mǎn)敵意,極具威脅性——微軟(Microsoft)決定限制用戶(hù)與必應(yīng)聊天機(jī)器人的對(duì)話(huà)長(zhǎng)度。但正如我和其他許多人所發(fā)現(xiàn)的那樣,顯而易見(jiàn)的是,雖然這種對(duì)對(duì)話(huà)長(zhǎng)度的隨意限制讓新版必應(yīng)的聊天功能更安全,但也讓它的功能大打折扣。
比如,我向必應(yīng)聊天詢(xún)問(wèn)了計(jì)劃去希臘旅行的問(wèn)題。我正試圖讓它為建議的行程提供詳細(xì)的時(shí)間安排和航班選擇時(shí),這時(shí)突然彈出“哎呀,我們的對(duì)話(huà)到此結(jié)束嘍。如果你還想繼續(xù)和我聊天的話(huà),就請(qǐng)點(diǎn)擊‘新話(huà)題’!”
長(zhǎng)度限制顯然是微軟被迫給出的“克魯格”(不夠精巧,但還能夠應(yīng)付要求的解決方案),因?yàn)樗婚_(kāi)始就沒(méi)有對(duì)其新產(chǎn)品進(jìn)行足夠嚴(yán)格的測(cè)試。關(guān)于Prometheus(微軟對(duì)新版必應(yīng)模型的命名)究竟是什么,以及它究竟有什么功能,還有很多亟待解決的問(wèn)題(沒(méi)有人聲稱(chēng)新版必應(yīng)有感知能力或自我意識(shí),但新版必應(yīng)出現(xiàn)了一些非常奇怪的突現(xiàn)行為,甚至超出了希德尼人格的范疇,微軟應(yīng)該就此事做出解釋?zhuān)皇羌傺b它不存在)。微軟在公開(kāi)場(chǎng)合對(duì)它和OpenAI如何創(chuàng)建了這個(gè)模型諱莫如深。除了微軟之外,沒(méi)有人確切地知道為什么新版必應(yīng)聊天機(jī)器人傾向于扮演暴躁的希德尼的角色,而當(dāng)ChatGPT基于一個(gè)更小、功能更弱的大型語(yǔ)言模型時(shí),它似乎表現(xiàn)得好得多——而且,微軟對(duì)它已知的事情也是三緘其口。
[OpenAI的早期研究發(fā)現(xiàn),通常情況下,用更高質(zhì)量的數(shù)據(jù)訓(xùn)練出來(lái)的較小模型會(huì)給出人類(lèi)用戶(hù)更喜歡的答案,盡管在一些基準(zhǔn)測(cè)試中,它們的表現(xiàn)不如大模型。這導(dǎo)致一些人猜測(cè)Prometheus是OpenAI的GPT-4,該模型被認(rèn)為比之前推出的任何模型都要大很多倍。但如果是這樣的話(huà),微軟為什么選擇使用GPT-4,而不是一個(gè)更小但性能更好的系統(tǒng)來(lái)支持新版必應(yīng),這是真正的問(wèn)題所在。坦率地說(shuō),另外一個(gè)問(wèn)題是,如果OpenAI實(shí)際上意識(shí)到新版必應(yīng)聊天機(jī)器人很有可能讓用戶(hù)感到不安,那么為什么它會(huì)建議微軟使用更強(qiáng)大的模型呢?微軟的研究人員可能和許多人工智能研究人員前輩一樣,被領(lǐng)先的基準(zhǔn)性能蒙蔽了雙眼(他們可以向其他人工智能開(kāi)發(fā)人員炫耀這些性能),但這些性能本身卻是非常差的指標(biāo),并不能代表人類(lèi)用戶(hù)的需求。]
可以肯定的是,如果微軟不盡快解決這個(gè)問(wèn)題,如果其他公司,例如谷歌(正在努力完善其即將推出的搜索聊天機(jī)器人),或者包括Perplexity和You.com等創(chuàng)業(yè)公司在內(nèi)的任何一家(已經(jīng)推出了自己的聊天機(jī)器人)表明他們的聊天機(jī)器人能夠進(jìn)行長(zhǎng)時(shí)間對(duì)話(huà),而且也不會(huì)變身達(dá)米安這樣的人格,那么微軟就有可能在新的搜索引擎之爭(zhēng)中失去其先發(fā)優(yōu)勢(shì)。
同時(shí),讓我們花點(diǎn)時(shí)間來(lái)感受一下這樣的反諷,微軟,一家曾經(jīng)以自己是最負(fù)責(zé)任的大型科技公司而自豪的公司(不無(wú)道理),現(xiàn)在卻讓我們重回早期社交媒體時(shí)代“快速行動(dòng),打破陳例”的艱難往昔——可能后果更糟。(但我猜,當(dāng)你的首席執(zhí)行官癡迷于讓他的主要競(jìng)爭(zhēng)對(duì)手“跳舞”時(shí),樂(lè)隊(duì)里的樂(lè)手們很難反駁說(shuō),也許他們不應(yīng)該現(xiàn)在就開(kāi)始演奏這首曲子。)除了OpenCage、《克拉克世界》和Stack Overflow之外,人們還可能因?yàn)殄e(cuò)誤的用藥建議而導(dǎo)致嚴(yán)重后果,因?yàn)轭?lèi)似希德尼的虐待行為導(dǎo)致某人自殘或自殺,或者因?yàn)閺?qiáng)化可憎的刻板印象和措辭而受到傷害。
我以前說(shuō)過(guò)這一點(diǎn),但我要再?gòu)?qiáng)調(diào)一遍:鑒于這些潛在的威脅,現(xiàn)在是時(shí)候讓政府介入,就如何構(gòu)建和部署系統(tǒng)制定明確的規(guī)定?;陲L(fēng)險(xiǎn)的方法是起點(diǎn),比如歐盟(European Union)的人工智能法案提案(A.I. Act)的最初草案中提出的想法。但風(fēng)險(xiǎn)的定義和評(píng)估不應(yīng)該完全由公司自己來(lái)決定。如果沒(méi)有特定的標(biāo)準(zhǔn),就需要有明確的外部標(biāo)準(zhǔn)和相應(yīng)的問(wèn)責(zé)制度。(財(cái)富中文網(wǎng))
譯者:中慧言-王芳
芝加哥城市新聞署(City News Bureau of Chicago)是一家目前已經(jīng)倒閉的新聞機(jī)構(gòu),曾經(jīng)被譽(yù)為培訓(xùn)意志堅(jiān)定的實(shí)地報(bào)道記者的傳奇基地,該機(jī)構(gòu)有一句著名的非官方格言:“如果你的母親說(shuō)她愛(ài)你,那也得去核實(shí)一下?!倍嗵澚薈hatGPT、新版必應(yīng)搜索(Bing Search)、Bard和大量基于大型語(yǔ)言模型的山寨搜索聊天機(jī)器人的出現(xiàn),我們不得不奉行該機(jī)構(gòu)的古老信條。
研究人員已經(jīng)知道,對(duì)于搜索查詢(xún)或任何基于事實(shí)的請(qǐng)求來(lái)說(shuō),大型語(yǔ)言模型訓(xùn)練出來(lái)的引擎都遠(yuǎn)非完美,因?yàn)檫@樣的引擎傾向于編造事物(人工智能研究人員稱(chēng)之為“幻覺(jué)”現(xiàn)象)。但科技公司巨頭認(rèn)為,可以進(jìn)行對(duì)話(huà)的用戶(hù)界面帶來(lái)的“利”大于“弊”(提供的信息不準(zhǔn)確或是提供了錯(cuò)誤信息),這些大型語(yǔ)言模型能夠執(zhí)行大量從翻譯到做總結(jié)的自然語(yǔ)言相關(guān)任務(wù),還可以將這些模型與其他軟件工具結(jié)合起來(lái)執(zhí)行任務(wù)(無(wú)論是進(jìn)行搜索還是預(yù)訂劇院門(mén)票)。
當(dāng)然,當(dāng)這些系統(tǒng)產(chǎn)生幻覺(jué)時(shí),可能會(huì)造成真正的損害——甚至當(dāng)它們沒(méi)有產(chǎn)生幻覺(jué)時(shí),只是從訓(xùn)練數(shù)據(jù)中學(xué)習(xí)了一些與事實(shí)有出入的東西,也會(huì)造成真正的損害。Stack Overflow不得不禁止用戶(hù)提交使用ChatGPT生成的編碼,因?yàn)樵摼W(wǎng)站上充斥著看似合理但實(shí)則錯(cuò)誤的代碼。科幻雜志《克拉克世界》(Clarkesworld)不得不停止接受投稿,因?yàn)楹芏嗳颂峤坏墓适虏⒉皇撬麄冏约簞?chuàng)作的,而是ChatGPT創(chuàng)作的。一家名為OpenCage的德國(guó)公司提供能夠進(jìn)行地理編碼的應(yīng)用程序接口,該接口可以將物理地址轉(zhuǎn)換為能夠標(biāo)記在地圖上的經(jīng)緯度坐標(biāo)。該公司表示,由于ChatGPT的推薦出錯(cuò)(將其應(yīng)用程序接口作為一種僅根據(jù)號(hào)碼就可以查找手機(jī)位置的方法做了推薦),他們不得不應(yīng)對(duì)越來(lái)越多大失所望的注冊(cè)用戶(hù)。ChatGPT甚至還幫助用戶(hù)編寫(xiě)了python代碼,允許他們?yōu)榇四康恼{(diào)用OpenCage的應(yīng)用程序接口。
但是,正如OpenCage被迫在一篇博文中解釋的那樣,這不是它提供的服務(wù),也不是使用該公司的技術(shù)能夠?qū)崿F(xiàn)的。OpenCage表示,ChatGPT之所以有這樣錯(cuò)誤的想法,是因?yàn)樗鼜腨ouTube的視頻教程中學(xué)習(xí)了相關(guān)內(nèi)容,有人聲稱(chēng)OpenCage的應(yīng)用程序接口可以用于反向推斷手機(jī)地理定位,其實(shí)這種說(shuō)法是錯(cuò)誤的。但是,那些教程只說(shuō)服了少數(shù)人注冊(cè)O(shè)penCage的應(yīng)用程序接口,而ChatGPT卻促使人們成群結(jié)隊(duì)地注冊(cè)O(shè)penCage。OpenCage寫(xiě)道:“關(guān)鍵的區(qū)別在于,人們?cè)诮邮芩说慕ㄗh時(shí)持懷疑態(tài)度,例如在視頻編碼教程學(xué)習(xí)時(shí),人們也會(huì)持懷疑態(tài)度。但在人工智能或ChatGPT方面,我們似乎還沒(méi)有把這一點(diǎn)內(nèi)化于心。我想我們最好把這一點(diǎn)內(nèi)化于心,保持適當(dāng)?shù)膽岩蓱B(tài)度。”
與此同時(shí),在一系列關(guān)于其基于OpenAI的新版必應(yīng)聊天功能的陰暗面的報(bào)道引發(fā)人們擔(dān)憂(yōu)后——聊天機(jī)器人自稱(chēng)希德尼,變得很暴躁,有時(shí)甚至充滿(mǎn)敵意,極具威脅性——微軟(Microsoft)決定限制用戶(hù)與必應(yīng)聊天機(jī)器人的對(duì)話(huà)長(zhǎng)度。但正如我和其他許多人所發(fā)現(xiàn)的那樣,顯而易見(jiàn)的是,雖然這種對(duì)對(duì)話(huà)長(zhǎng)度的隨意限制讓新版必應(yīng)的聊天功能更安全,但也讓它的功能大打折扣。
比如,我向必應(yīng)聊天詢(xún)問(wèn)了計(jì)劃去希臘旅行的問(wèn)題。我正試圖讓它為建議的行程提供詳細(xì)的時(shí)間安排和航班選擇時(shí),這時(shí)突然彈出“哎呀,我們的對(duì)話(huà)到此結(jié)束嘍。如果你還想繼續(xù)和我聊天的話(huà),就請(qǐng)點(diǎn)擊‘新話(huà)題’!”
長(zhǎng)度限制顯然是微軟被迫給出的“克魯格”(不夠精巧,但還能夠應(yīng)付要求的解決方案),因?yàn)樗婚_(kāi)始就沒(méi)有對(duì)其新產(chǎn)品進(jìn)行足夠嚴(yán)格的測(cè)試。關(guān)于Prometheus(微軟對(duì)新版必應(yīng)模型的命名)究竟是什么,以及它究竟有什么功能,還有很多亟待解決的問(wèn)題(沒(méi)有人聲稱(chēng)新版必應(yīng)有感知能力或自我意識(shí),但新版必應(yīng)出現(xiàn)了一些非常奇怪的突現(xiàn)行為,甚至超出了希德尼人格的范疇,微軟應(yīng)該就此事做出解釋?zhuān)皇羌傺b它不存在)。微軟在公開(kāi)場(chǎng)合對(duì)它和OpenAI如何創(chuàng)建了這個(gè)模型諱莫如深。除了微軟之外,沒(méi)有人確切地知道為什么新版必應(yīng)聊天機(jī)器人傾向于扮演暴躁的希德尼的角色,而當(dāng)ChatGPT基于一個(gè)更小、功能更弱的大型語(yǔ)言模型時(shí),它似乎表現(xiàn)得好得多——而且,微軟對(duì)它已知的事情也是三緘其口。
[OpenAI的早期研究發(fā)現(xiàn),通常情況下,用更高質(zhì)量的數(shù)據(jù)訓(xùn)練出來(lái)的較小模型會(huì)給出人類(lèi)用戶(hù)更喜歡的答案,盡管在一些基準(zhǔn)測(cè)試中,它們的表現(xiàn)不如大模型。這導(dǎo)致一些人猜測(cè)Prometheus是OpenAI的GPT-4,該模型被認(rèn)為比之前推出的任何模型都要大很多倍。但如果是這樣的話(huà),微軟為什么選擇使用GPT-4,而不是一個(gè)更小但性能更好的系統(tǒng)來(lái)支持新版必應(yīng),這是真正的問(wèn)題所在。坦率地說(shuō),另外一個(gè)問(wèn)題是,如果OpenAI實(shí)際上意識(shí)到新版必應(yīng)聊天機(jī)器人很有可能讓用戶(hù)感到不安,那么為什么它會(huì)建議微軟使用更強(qiáng)大的模型呢?微軟的研究人員可能和許多人工智能研究人員前輩一樣,被領(lǐng)先的基準(zhǔn)性能蒙蔽了雙眼(他們可以向其他人工智能開(kāi)發(fā)人員炫耀這些性能),但這些性能本身卻是非常差的指標(biāo),并不能代表人類(lèi)用戶(hù)的需求。]
可以肯定的是,如果微軟不盡快解決這個(gè)問(wèn)題,如果其他公司,例如谷歌(正在努力完善其即將推出的搜索聊天機(jī)器人),或者包括Perplexity和You.com等創(chuàng)業(yè)公司在內(nèi)的任何一家(已經(jīng)推出了自己的聊天機(jī)器人)表明他們的聊天機(jī)器人能夠進(jìn)行長(zhǎng)時(shí)間對(duì)話(huà),而且也不會(huì)變身達(dá)米安這樣的人格,那么微軟就有可能在新的搜索引擎之爭(zhēng)中失去其先發(fā)優(yōu)勢(shì)。
同時(shí),讓我們花點(diǎn)時(shí)間來(lái)感受一下這樣的反諷,微軟,一家曾經(jīng)以自己是最負(fù)責(zé)任的大型科技公司而自豪的公司(不無(wú)道理),現(xiàn)在卻讓我們重回早期社交媒體時(shí)代“快速行動(dòng),打破陳例”的艱難往昔——可能后果更糟。(但我猜,當(dāng)你的首席執(zhí)行官癡迷于讓他的主要競(jìng)爭(zhēng)對(duì)手“跳舞”時(shí),樂(lè)隊(duì)里的樂(lè)手們很難反駁說(shuō),也許他們不應(yīng)該現(xiàn)在就開(kāi)始演奏這首曲子。)除了OpenCage、《克拉克世界》和Stack Overflow之外,人們還可能因?yàn)殄e(cuò)誤的用藥建議而導(dǎo)致嚴(yán)重后果,因?yàn)轭?lèi)似希德尼的虐待行為導(dǎo)致某人自殘或自殺,或者因?yàn)閺?qiáng)化可憎的刻板印象和措辭而受到傷害。
我以前說(shuō)過(guò)這一點(diǎn),但我要再?gòu)?qiáng)調(diào)一遍:鑒于這些潛在的威脅,現(xiàn)在是時(shí)候讓政府介入,就如何構(gòu)建和部署系統(tǒng)制定明確的規(guī)定。基于風(fēng)險(xiǎn)的方法是起點(diǎn),比如歐盟(European Union)的人工智能法案提案(A.I. Act)的最初草案中提出的想法。但風(fēng)險(xiǎn)的定義和評(píng)估不應(yīng)該完全由公司自己來(lái)決定。如果沒(méi)有特定的標(biāo)準(zhǔn),就需要有明確的外部標(biāo)準(zhǔn)和相應(yīng)的問(wèn)責(zé)制度。(財(cái)富中文網(wǎng))
譯者:中慧言-王芳
City News Bureau of Chicago, a now-defunct news outfit once legendary as a training ground for tough-as-nails, shoe-leather reporters, famously had as its unofficial motto: “If your mother says she loves you, check it out.” Thanks to the advent of ChatGPT, the new Bing Search, Bard, and a host of copycat search chatbots based on large language models, we are all going to have to start living by City News’ old shibboleth.
Researchers already knew that large language models were imperfect engines for search queries, or any fact-based request really, because of their tendency to make stuff up (a phenomenon A.I. researchers call “hallucination”). But the world’s largest technology companies have decided that the appeal of dialogue as a user interface—and the ability of these large language models to perform a vast array of natural language-based tasks, from translation to summarization, along with the potential to couple these models with access to other software tools that will enable them to perform tasks (whether it is running a search or booking you theater tickets)—trumps the potential downsides of inaccuracy and misinformation.
Except, of course, there can be real victims when these systems hallucinate—or even when they don’t, but merely pick up something that is factually wrong from their training data. Stack Overflow had to ban users from submitting answers to coding questions that were produced using ChatGPT after the site was flooded with code that looked plausible but was incorrect. The science fiction magazine Clarkesworld had to stop taking submissions because so many people were submitting stories crafted not by their own creative genius, but by ChatGPT. Now a German company called OpenCage—which offers an application programming interface that does geocoding, converting physical addresses into latitude and longitude coordinates that can be placed on a map—has said it has been dealing with a growing number of disappointed users who have signed up for its service because ChatGPT erroneously recommended its API as a way to look up the location of a mobile phone based solely on the number. ChatGPT even helpfully wrote python code for users allowing them to call on OpenCage’s API for this purpose.
But, as OpenCage was forced to explain in a blog post, this is not a service it offers, nor one that is even feasible using the company’s technology. OpenCage says that ChatGPT seems to have developed this erroneous belief because it picked up on YouTube tutorials in which people also wrongly claimed OpenCage’s API could be used for reverse mobile phone geolocation. But whereas those erroneous YouTube tutorials only convinced a few people to sign up for OpenCage’s API, ChatGPT has driven people to OpenCage in droves. “The key difference is that humans have learned to be skeptical when getting advice from other humans, for example via a video coding tutorial,” OpenCage wrote. “It seems though that we haven’t yet fully internalized this when it comes to AI in general or ChatGPT specifically.” I guess we better start internalizing.
Meanwhile, after a slew of alarming publicity about the dark side of its new, OpenAI-powered Bing chat feature—where the chatbot calls itself Sydney, becomes petulant, and at times even downright hostile and menacing—Microsoft has decided to restrict the length of conversations users can have with Bing chat. But as I, and many others have found, while this arbitrary restriction on the length of a dialogue apparently makes the new Bing chat safer to use, it also makes it a heck of a lot less useful.
For instance, I asked Bing chat about planning a trip to Greece. I was in the process of trying to get it to detail timings and flight options for an itinerary it had suggested when I suddenly hit the “Oops, I think we’ve reached the end of this conversation. Click ‘New topic,’ if you would!”
The length restriction is clearly a kluge that Microsoft has been forced to implement because it didn’t do rigorous enough testing of its new product in the first place. And there are huge outstanding questions about exactly what Prometheus, the name Microsoft has given to the model that powers the new Bing, really is, and what it is really capable of (no one is claiming the new Bing is sentient or self-aware, but there’s been some very bizarre emergent behavior documented with the new Bing, even beyond the Sydney personality, and Microsoft ought to be transparent about what it understands and doesn’t understand about this behavior, rather than simply pretending it doesn’t exist). Microsoft has been cagey in public about how it and OpenAI created this model. No one outside of Microsoft is exactly sure why it is so prone to taking on the petulant Sydney persona, especially when ChatGPT, based on a smaller, less capable large language model, seems so much better behaved—and again, Microsoft is saying very little about what it does know.
(Earlier research from OpenAI had found that it was often the case that smaller models, trained with better quality data, produced results that human users much preferred even though they were less capable when measured on a number of benchmark tests than larger models. That has led some to speculate that Prometheus is OpenAI’s GPT-4, a model believed to be many times more massive than any it has previously debuted. But if that is the case, there is still a real question about why Microsoft opted to use GPT-4 rather than a smaller, but better-behaved system to power the new Bing. And frankly, there is also a real question about why OpenAI might have encouraged Microsoft to use the more powerful model if it in fact realized it had more potential to behave in ways that users might find disturbing. The Microsoft folks may have, like many A.I. researchers before them, become blinded by stellar benchmark performance that can convey bragging rights among other A.I. developers, but which are a poor proxy for what real human users want.)
What is certain is that if Microsoft doesn’t fix this soon—and if someone else, such as Google, which is hard at work trying to hone its search chatbot for imminent release, or any of the others, including startups such as Perplexity and You.com, that have debuted their own chatbots, shows that their chatbot can hold long dialogues without it turning into Damien—then Microsoft risks losing its first mover advantage in the new search wars.
Also, let’s just take a moment to appreciate the irony that it’s Microsoft, a company that once prided itself, not without reason, on being among the most responsible of the big technology companies, which has now tossed us all back to the bad old “move fast and break things” days of the early social media era—with perhaps even worse consequences. (But I guess when your CEO is obsessed with making his arch-rival “dance” it is hard for the musicians in the band to argue that maybe they shouldn’t be striking up the tune just yet.) Beyond OpenCage, Clarkesworld, and Stack Overflow, people could get hurt from incorrect advice on medicines, from abusive Sydney-like behavior that drives someone to self-harm or suicide, or from reinforcement of hateful stereotypes and tropes.
I’ve said this before, but I’ll say it again: Given these potential harms, now is the time for governments to step in and lay down some clear regulation about how these systems need to be built and deployed. The idea of a risk-based approach, such as that broached in the original draft of the European Union’s proposed A.I. Act, is a potential starting point. But the definitions of risk and those risk assessments should not be left entirely up to the companies themselves. There need to be clear external standards and clear accountability if those standards aren’t meant.