成人小说亚洲一区二区三区,亚洲国产精品一区二区三区,国产精品成人精品久久久,久久综合一区二区三区,精品无码av一区二区,国产一级a毛一级a看免费视频,欧洲uv免费在线区一二区,亚洲国产欧美中日韩成人综合视频,国产熟女一区二区三区五月婷小说,亚洲一区波多野结衣在线

首頁(yè) 500強(qiáng) 活動(dòng) 榜單 商業(yè) 科技 領(lǐng)導(dǎo)力 專題 品牌中心
雜志訂閱

如何讓人工智能更智能?

Jonathan Vanian
2020-09-29

神經(jīng)網(wǎng)絡(luò)商業(yè)應(yīng)用的推廣進(jìn)度取決于其是否能夠像分析圖像一樣理解單詞的含義。

文本設(shè)置
小號(hào)
默認(rèn)
大號(hào)
Plus(0條)

電子表格是一種非常巧妙的發(fā)明,誕生之初,其使命是實(shí)現(xiàn)簿記的數(shù)字化,自此而后的50年間,因?yàn)樗拇嬖?,研究人員與商業(yè)人士得以不受行、列數(shù)量限制任意輸入各種數(shù)據(jù),然后再借助計(jì)算機(jī)對(duì)這些信息進(jìn)行分析。如今,電子表格被廣泛應(yīng)用于工作生活的各個(gè)方面,甚至連學(xué)童都可以像財(cái)務(wù)分析師管理預(yù)算一樣使用這一工具。

不過(guò)電子表格沒(méi)有思考能力,而這則是更新一代、功能也更強(qiáng)大的“神經(jīng)網(wǎng)絡(luò)”軟件的專長(zhǎng)(神經(jīng)網(wǎng)絡(luò)是一種復(fù)雜的人工智能程序,能夠模擬人腦的計(jì)算過(guò)程)。近年來(lái),由于神經(jīng)網(wǎng)絡(luò)的發(fā)展,頂尖人工智能研究人員關(guān)注的焦點(diǎn)已經(jīng)從結(jié)構(gòu)化數(shù)據(jù)(如成行成列的文字、數(shù)字)轉(zhuǎn)向了圖像。換句話說(shuō),功能強(qiáng)大的計(jì)算機(jī)可以通過(guò)瀏覽數(shù)百萬(wàn)張貓咪的照片來(lái)了解這種小型貓科動(dòng)物的特征,但同樣的軟件卻很難在簡(jiǎn)單的電子表格中直觀地做到這一點(diǎn)。

這讓醫(yī)學(xué)研究、金融和運(yùn)營(yíng)等領(lǐng)域的數(shù)據(jù)科學(xué)家們深感沮喪,因?yàn)樵谶@些領(lǐng)域,結(jié)構(gòu)化數(shù)據(jù)才是真正的“硬通貨”。金融公司Capital One的應(yīng)用型機(jī)器學(xué)習(xí)研究人員巴彥?布魯斯說(shuō):“我們的數(shù)據(jù)大多是結(jié)構(gòu)化數(shù)據(jù),或者至少是對(duì)這些數(shù)據(jù)進(jìn)行了某種結(jié)構(gòu)化處理。深度學(xué)習(xí)的進(jìn)展與我們的數(shù)據(jù)之間有著很大距離,我們做的很多工作都是為了縮小這種距離?!?/p>

圖片來(lái)源:Illustration by Lena Vargas

一些公司為解決這一問(wèn)題也推出了自己的新項(xiàng)目。以生物技術(shù)巨頭基因泰克為例,該公司的數(shù)據(jù)科學(xué)家最近花費(fèi)數(shù)月時(shí)間制作了一個(gè)包含55,000名癌癥患者健康記錄和基因組數(shù)據(jù)的電子表格,既收錄了年齡、膽固醇水平、心率等信息,也收錄了一些更為復(fù)雜的屬性數(shù)據(jù),如分子特征和基因異常狀況等?;蛱┛擞?jì)劃將這些信息輸入神經(jīng)網(wǎng)絡(luò),并借此描繪出患者的健康屬性,以期開發(fā)出突破性藥物,針對(duì)每位患者的情況對(duì)癥下藥。

問(wèn)題在于,研究人員現(xiàn)在才剛開始訓(xùn)練神經(jīng)網(wǎng)絡(luò)學(xué)習(xí)使用(像基因泰克制作的電子表格那樣的)結(jié)構(gòu)化數(shù)據(jù)?;蛱┛说膫€(gè)性化醫(yī)療數(shù)據(jù)科學(xué)分析業(yè)務(wù)全球主管瑞安?科平表示:“包括臨床試驗(yàn)數(shù)據(jù)和電子病歷在內(nèi),我們的大多數(shù)數(shù)據(jù)都是結(jié)構(gòu)化數(shù)據(jù)?!比绻?jì)算機(jī)網(wǎng)絡(luò)能夠分析并自主認(rèn)知病人資料中的相似性,“那么我們就可以開始對(duì)結(jié)果進(jìn)行觀察,并考慮如何針對(duì)病人的具體情況選擇治療方案。然而,現(xiàn)在還做不到這一點(diǎn)?!?/p>

除醫(yī)療行業(yè)外,很多其它行業(yè)也有機(jī)會(huì)從中受益。據(jù)研究公司IDC估計(jì),今年,商業(yè)領(lǐng)域?qū)a(chǎn)生5.8澤字節(jié)的銷售預(yù)測(cè)、客戶數(shù)據(jù)等生產(chǎn)力數(shù)據(jù)。一個(gè)澤字節(jié)大致相當(dāng)于全球所有海灘上沙粒的總數(shù)。也就是說(shuō),這是一個(gè)天文數(shù)字,IDC全球數(shù)據(jù)層項(xiàng)目(該項(xiàng)目負(fù)責(zé)計(jì)算全球每年產(chǎn)生的數(shù)據(jù)量)的負(fù)責(zé)人約翰?瑞德寧如是說(shuō)。

這意味著,只要能夠?qū)?shù)據(jù)壓縮成神經(jīng)網(wǎng)絡(luò)可以學(xué)習(xí)的格式,那么各種類型的企業(yè)都將有機(jī)會(huì)從中獲益。食品巨頭百事公司的首席戰(zhàn)略和轉(zhuǎn)型官阿蒂納?卡尼烏拉認(rèn)為,預(yù)測(cè)能力的小幅提升也能夠帶來(lái)巨大的財(cái)務(wù)回報(bào)。她說(shuō):“準(zhǔn)確度的增加將會(huì)帶來(lái)數(shù)百萬(wàn)美元的收益?!?/p>

接下來(lái)的挑戰(zhàn)則是要找到那些對(duì)商業(yè)活動(dòng)最有價(jià)值的數(shù)據(jù)供研究人員使用。斯坦福大學(xué)教授、硅谷初創(chuàng)公司Sisu Data(該公司的主營(yíng)業(yè)務(wù)是為企業(yè)開發(fā)分析工具)的首席執(zhí)行官彼得?貝利斯說(shuō):“深度網(wǎng)絡(luò)非??犰?,在汽車、推文理解等領(lǐng)域都大有可為。但如果只是儲(chǔ)存在表格中的數(shù)據(jù),那么對(duì)我們?cè)谡J(rèn)知風(fēng)險(xiǎn)、了解客戶滿意度等方面的幫助就非常有限了。”如果換成商業(yè)人士都可以聽懂的話,那么問(wèn)題依然是:人工智能能否解決自己難以識(shí)別Excel內(nèi)容的問(wèn)題?

神經(jīng)網(wǎng)絡(luò)商業(yè)應(yīng)用的推廣進(jìn)度取決于其是否能夠像分析圖像一樣理解單詞的含義。為解決這一問(wèn)題,研究人員將目光轉(zhuǎn)向了一種名為Word2vec的技術(shù)。(“vec”代表向量,是神經(jīng)網(wǎng)絡(luò)最擅長(zhǎng)理解的分析單元類型。)Word2vec由谷歌的一個(gè)研究小組于2013年開發(fā),并已作為開源軟件項(xiàng)目對(duì)外發(fā)布,可以幫助計(jì)算機(jī)理解特定單詞之間的聯(lián)系。Word2vec技術(shù)為更強(qiáng)大語(yǔ)言系統(tǒng)的出現(xiàn)鋪平了道路,這些新推出的系統(tǒng)已經(jīng)能夠識(shí)別出與“汽車”一詞關(guān)系更密切的企業(yè)是寶馬、尼桑這樣的汽車制造商,而不是卡夫亨氏這樣的食品公司。

word2vec之所以具備神奇的計(jì)算能力,是因?yàn)槠淇梢詫卧~轉(zhuǎn)換成神經(jīng)網(wǎng)絡(luò)能夠理解的數(shù)字串,進(jìn)而識(shí)別出詞語(yǔ)之間的相關(guān)性。經(jīng)過(guò)一段時(shí)間的訓(xùn)練,通過(guò)對(duì)更多文本進(jìn)行學(xué)習(xí),神經(jīng)網(wǎng)絡(luò)便具備了根據(jù)單詞共同出現(xiàn)的頻率對(duì)其進(jìn)行打分的能力,并能夠根據(jù)分?jǐn)?shù)對(duì)單詞進(jìn)行分組。與更早出現(xiàn)的所謂自然語(yǔ)言處理技術(shù)相比,這些較新的系統(tǒng)提升了與人類思維典型相關(guān)的模式識(shí)別屬性。

借助這種計(jì)算機(jī)輔助的單詞聯(lián)想游戲,計(jì)算機(jī)將可以理解表格中存儲(chǔ)的信息。這個(gè)過(guò)程相當(dāng)于為神經(jīng)網(wǎng)絡(luò)創(chuàng)建了一套自己的摩爾斯電碼:當(dāng)應(yīng)用程序在一份有關(guān)銷售情況的電子表格中遇到一列表示“日期”的數(shù)據(jù)時(shí),無(wú)需獲得明確指令,只要借助足夠的數(shù)據(jù),便能夠理解某些假日可能會(huì)對(duì)特定季節(jié)的銷售產(chǎn)生影響。舊金山大學(xué)應(yīng)用數(shù)據(jù)倫理中心的主任、非營(yíng)利教育機(jī)構(gòu)Fast.ai的聯(lián)合創(chuàng)始人雷切爾?托馬斯表示:“這是底層的核心概念。神經(jīng)網(wǎng)絡(luò)通過(guò)建模特定形態(tài)的模式創(chuàng)造了一種無(wú)限靈活的學(xué)習(xí)架構(gòu)?!?/p>

雷切爾?托馬斯,Uber的前工程師,舊金山教育性非營(yíng)利機(jī)構(gòu)Fast.ai與一家專注于倫理的智庫(kù)的聯(lián)合創(chuàng)始人。她是一名人工智能領(lǐng)域的“布道者”,其目標(biāo)受眾包括商人和科學(xué)家。圖片來(lái)源:Gabriela Hasbun

僅在投資領(lǐng)域就有大量通過(guò)文字分析創(chuàng)造價(jià)值的機(jī)會(huì)。高盛的一個(gè)研究小組正在對(duì)神經(jīng)網(wǎng)絡(luò)進(jìn)行訓(xùn)練,使其獲得搜尋“家庭房產(chǎn)內(nèi)部轉(zhuǎn)讓”相關(guān)詞匯的能力。在進(jìn)行非商業(yè)性質(zhì)的交易時(shí),交易雙方很可能不會(huì)如實(shí)描述房產(chǎn)的真實(shí)價(jià)值,如果可以教會(huì)軟件在篩選資料時(shí)將相關(guān)信息排除在外,自然能夠提高銀行的分析能力。“為此,我們訓(xùn)練了一個(gè)可以識(shí)別此類交易、并減少對(duì)其關(guān)注程度的神經(jīng)網(wǎng)絡(luò)?!奔又荽髮W(xué)圣迭戈分校計(jì)算機(jī)科學(xué)專業(yè)的常任教授查爾斯?埃爾坎表示,直到最近,他還在負(fù)責(zé)領(lǐng)導(dǎo)高盛的機(jī)器學(xué)習(xí)項(xiàng)目。

復(fù)雜的詞語(yǔ)聯(lián)想對(duì)物流行業(yè)也有很大價(jià)值。舊金山外賣初創(chuàng)公司Instacart便使用了word2vec的一種變體技術(shù),讓自己的算法能夠預(yù)測(cè)顧客的偏好,這一能力在公司無(wú)法提供顧客想要的產(chǎn)品時(shí)尤其有用。為方便神經(jīng)網(wǎng)絡(luò)處理相關(guān)信息,該公司使用的程序會(huì)將超市庫(kù)存商品的“單詞”轉(zhuǎn)換成“數(shù)字形式的數(shù)據(jù)”,隨后,神經(jīng)網(wǎng)絡(luò)會(huì)對(duì)相應(yīng)物品進(jìn)行分組,以便理解這些數(shù)據(jù)的意義:比如,(通過(guò)分組,神經(jīng)網(wǎng)絡(luò)會(huì)發(fā)現(xiàn),)與咖啡相比,什錦干果與干果或堅(jiān)果的共同點(diǎn)更多。Instacart的機(jī)器學(xué)習(xí)主管沙拉特?拉奧表示,使用這種技術(shù)幫助公司節(jié)約了時(shí)間和資金成本。他說(shuō):“不然我們就得思考所有可能的配對(duì),還得留一張(手填)表?!?/p>

雖然在結(jié)構(gòu)化數(shù)據(jù)領(lǐng)域應(yīng)用深度學(xué)習(xí)技術(shù)已經(jīng)是大勢(shì)所趨,但障礙依然存在。首先,這是一個(gè)全新想法,此前并未對(duì)其效果進(jìn)行過(guò)驗(yàn)證,沒(méi)有人知道與更為傳統(tǒng)的統(tǒng)計(jì)方法相比,這種技術(shù)能夠有哪些優(yōu)勢(shì)。人工智能芯片生產(chǎn)公司英偉達(dá)的數(shù)據(jù)科學(xué)家伊文?奧爾德里奇說(shuō):“現(xiàn)在我們還不知道這個(gè)問(wèn)題的答案?!?/p>

的確,考慮到訓(xùn)練神經(jīng)網(wǎng)絡(luò)的費(fèi)用,對(duì)于那些不具備人工智能專長(zhǎng)的企業(yè)來(lái)說(shuō),原有的數(shù)據(jù)分析方法可能已經(jīng)夠用了。百事公司高管、人工智能專家卡尼烏拉表說(shuō):“我堅(jiān)信,這個(gè)世界上絕不存在可以解決所有問(wèn)題的‘錦囊妙計(jì)’,對(duì)所有公司來(lái)說(shuō)都是如此?!痹品?wù)巨頭亞馬遜、微軟和谷歌在推銷自己的服務(wù)時(shí)實(shí)際上也隱含著這層意思:與其投入巨資、招攬人才去爭(zhēng)取潛在的增量回報(bào),還不如直接從我們這里購(gòu)買人工智能服務(wù)。

與其它以“教會(huì)計(jì)算機(jī)具備‘思考’能力”為目的的項(xiàng)目一樣,人類的偏見也會(huì)對(duì)項(xiàng)目的成功構(gòu)成威脅。深度學(xué)習(xí)系統(tǒng)的優(yōu)劣取決于訓(xùn)練它們所用的數(shù)據(jù),數(shù)據(jù)太多或太少都可能會(huì)使軟件的預(yù)測(cè)產(chǎn)生偏差。以基因泰克的數(shù)據(jù)集為例,該數(shù)據(jù)集收入了此前15年的臨床數(shù)據(jù),但只收入了此前8年的基因組測(cè)試數(shù)據(jù),也就是說(shuō),在此之前的患者數(shù)據(jù)并不像研究人員所希望的那樣具有可比性。供職于基因泰克的科平說(shuō):“如果我們對(duì)這些數(shù)據(jù)集缺乏了解,那么據(jù)此建立起來(lái)的模型可能毫無(wú)可靠性可言?!?/p>

科平表示,盡管如此,對(duì)這些電子表格中的內(nèi)容進(jìn)行強(qiáng)化分析依然具有很高的潛在價(jià)值,其意義完全不亞于獲得“預(yù)測(cè)一個(gè)病人在接受某種治療之后能夠存活多久”的能力。對(duì)一堆表格來(lái)說(shuō),可以做到這一點(diǎn)也算是不錯(cuò)的成績(jī)了。

數(shù)家公司正在對(duì)神經(jīng)網(wǎng)絡(luò)進(jìn)行訓(xùn)練,希望其能夠處理自己已有的結(jié)構(gòu)化數(shù)據(jù),這些公司包括:

基因泰克

這家生物技術(shù)先驅(qū)企業(yè)制作了一份內(nèi)含繁雜健康數(shù)據(jù)、覆蓋數(shù)百萬(wàn)名患者的電子表格,從常規(guī)記錄到基因組圖譜,不一而足。這一研究具有重要意義:如果人工智能真可以通過(guò)正確方式分析這些數(shù)據(jù),個(gè)體病患未來(lái)或?qū)⒛軌颢@得針其疾病制定的個(gè)性化治療方案。

高盛

人工智能為投資者提供了無(wú)限機(jī)遇。受高盛聘請(qǐng),一位機(jī)器學(xué)習(xí)專業(yè)的教授開發(fā)了一種訓(xùn)練工具,借助這種工具,神經(jīng)網(wǎng)絡(luò)能夠?qū)W會(huì)忽略那些可能使金融分析復(fù)雜化的詞語(yǔ),如“家庭內(nèi)部轉(zhuǎn)讓”(出現(xiàn)這一詞語(yǔ)時(shí),交易中的房產(chǎn)價(jià)值可能失真)。神經(jīng)網(wǎng)絡(luò)學(xué)會(huì)識(shí)別、忽略此類詞語(yǔ)可以提升現(xiàn)有分析模型的效率。

Instacart

這家外賣初創(chuàng)公司擁有一套易于理解的數(shù)據(jù)集,內(nèi)含員工需為顧客選取的各種超市商品。該公司正在訓(xùn)練算法進(jìn)行復(fù)雜單詞聯(lián)想的能力,比如在看到什錦干果時(shí),能夠聯(lián)想到堅(jiān)果和干果,方便在顧客所需商品缺貨時(shí)為其提供替代選擇。(財(cái)富中文網(wǎng))

本文另一版本登載于《財(cái)富》雜志2020年10月刊,標(biāo)題為《是什么讓人工智能看起來(lái)很蠢》。

譯者:梁宇

審校:夏林

電子表格是一種非常巧妙的發(fā)明,誕生之初,其使命是實(shí)現(xiàn)簿記的數(shù)字化,自此而后的50年間,因?yàn)樗拇嬖?,研究人員與商業(yè)人士得以不受行、列數(shù)量限制任意輸入各種數(shù)據(jù),然后再借助計(jì)算機(jī)對(duì)這些信息進(jìn)行分析。如今,電子表格被廣泛應(yīng)用于工作生活的各個(gè)方面,甚至連學(xué)童都可以像財(cái)務(wù)分析師管理預(yù)算一樣使用這一工具。

不過(guò)電子表格沒(méi)有思考能力,而這則是更新一代、功能也更強(qiáng)大的“神經(jīng)網(wǎng)絡(luò)”軟件的專長(zhǎng)(神經(jīng)網(wǎng)絡(luò)是一種復(fù)雜的人工智能程序,能夠模擬人腦的計(jì)算過(guò)程)。近年來(lái),由于神經(jīng)網(wǎng)絡(luò)的發(fā)展,頂尖人工智能研究人員關(guān)注的焦點(diǎn)已經(jīng)從結(jié)構(gòu)化數(shù)據(jù)(如成行成列的文字、數(shù)字)轉(zhuǎn)向了圖像。換句話說(shuō),功能強(qiáng)大的計(jì)算機(jī)可以通過(guò)瀏覽數(shù)百萬(wàn)張貓咪的照片來(lái)了解這種小型貓科動(dòng)物的特征,但同樣的軟件卻很難在簡(jiǎn)單的電子表格中直觀地做到這一點(diǎn)。

這讓醫(yī)學(xué)研究、金融和運(yùn)營(yíng)等領(lǐng)域的數(shù)據(jù)科學(xué)家們深感沮喪,因?yàn)樵谶@些領(lǐng)域,結(jié)構(gòu)化數(shù)據(jù)才是真正的“硬通貨”。金融公司Capital One的應(yīng)用型機(jī)器學(xué)習(xí)研究人員巴彥?布魯斯說(shuō):“我們的數(shù)據(jù)大多是結(jié)構(gòu)化數(shù)據(jù),或者至少是對(duì)這些數(shù)據(jù)進(jìn)行了某種結(jié)構(gòu)化處理。深度學(xué)習(xí)的進(jìn)展與我們的數(shù)據(jù)之間有著很大距離,我們做的很多工作都是為了縮小這種距離?!?/p>

一些公司為解決這一問(wèn)題也推出了自己的新項(xiàng)目。以生物技術(shù)巨頭基因泰克為例,該公司的數(shù)據(jù)科學(xué)家最近花費(fèi)數(shù)月時(shí)間制作了一個(gè)包含55,000名癌癥患者健康記錄和基因組數(shù)據(jù)的電子表格,既收錄了年齡、膽固醇水平、心率等信息,也收錄了一些更為復(fù)雜的屬性數(shù)據(jù),如分子特征和基因異常狀況等?;蛱┛擞?jì)劃將這些信息輸入神經(jīng)網(wǎng)絡(luò),并借此描繪出患者的健康屬性,以期開發(fā)出突破性藥物,針對(duì)每位患者的情況對(duì)癥下藥。

問(wèn)題在于,研究人員現(xiàn)在才剛開始訓(xùn)練神經(jīng)網(wǎng)絡(luò)學(xué)習(xí)使用(像基因泰克制作的電子表格那樣的)結(jié)構(gòu)化數(shù)據(jù)。基因泰克的個(gè)性化醫(yī)療數(shù)據(jù)科學(xué)分析業(yè)務(wù)全球主管瑞安?科平表示:“包括臨床試驗(yàn)數(shù)據(jù)和電子病歷在內(nèi),我們的大多數(shù)數(shù)據(jù)都是結(jié)構(gòu)化數(shù)據(jù)。”如果計(jì)算機(jī)網(wǎng)絡(luò)能夠分析并自主認(rèn)知病人資料中的相似性,“那么我們就可以開始對(duì)結(jié)果進(jìn)行觀察,并考慮如何針對(duì)病人的具體情況選擇治療方案。然而,現(xiàn)在還做不到這一點(diǎn)。”

除醫(yī)療行業(yè)外,很多其它行業(yè)也有機(jī)會(huì)從中受益。據(jù)研究公司IDC估計(jì),今年,商業(yè)領(lǐng)域?qū)a(chǎn)生5.8澤字節(jié)的銷售預(yù)測(cè)、客戶數(shù)據(jù)等生產(chǎn)力數(shù)據(jù)。一個(gè)澤字節(jié)大致相當(dāng)于全球所有海灘上沙粒的總數(shù)。也就是說(shuō),這是一個(gè)天文數(shù)字,IDC全球數(shù)據(jù)層項(xiàng)目(該項(xiàng)目負(fù)責(zé)計(jì)算全球每年產(chǎn)生的數(shù)據(jù)量)的負(fù)責(zé)人約翰?瑞德寧如是說(shuō)。

這意味著,只要能夠?qū)?shù)據(jù)壓縮成神經(jīng)網(wǎng)絡(luò)可以學(xué)習(xí)的格式,那么各種類型的企業(yè)都將有機(jī)會(huì)從中獲益。食品巨頭百事公司的首席戰(zhàn)略和轉(zhuǎn)型官阿蒂納?卡尼烏拉認(rèn)為,預(yù)測(cè)能力的小幅提升也能夠帶來(lái)巨大的財(cái)務(wù)回報(bào)。她說(shuō):“準(zhǔn)確度的增加將會(huì)帶來(lái)數(shù)百萬(wàn)美元的收益?!?/p>

接下來(lái)的挑戰(zhàn)則是要找到那些對(duì)商業(yè)活動(dòng)最有價(jià)值的數(shù)據(jù)供研究人員使用。斯坦福大學(xué)教授、硅谷初創(chuàng)公司Sisu Data(該公司的主營(yíng)業(yè)務(wù)是為企業(yè)開發(fā)分析工具)的首席執(zhí)行官彼得?貝利斯說(shuō):“深度網(wǎng)絡(luò)非常酷炫,在汽車、推文理解等領(lǐng)域都大有可為。但如果只是儲(chǔ)存在表格中的數(shù)據(jù),那么對(duì)我們?cè)谡J(rèn)知風(fēng)險(xiǎn)、了解客戶滿意度等方面的幫助就非常有限了?!比绻麚Q成商業(yè)人士都可以聽懂的話,那么問(wèn)題依然是:人工智能能否解決自己難以識(shí)別Excel內(nèi)容的問(wèn)題?

神經(jīng)網(wǎng)絡(luò)商業(yè)應(yīng)用的推廣進(jìn)度取決于其是否能夠像分析圖像一樣理解單詞的含義。為解決這一問(wèn)題,研究人員將目光轉(zhuǎn)向了一種名為Word2vec的技術(shù)。(“vec”代表向量,是神經(jīng)網(wǎng)絡(luò)最擅長(zhǎng)理解的分析單元類型。)Word2vec由谷歌的一個(gè)研究小組于2013年開發(fā),并已作為開源軟件項(xiàng)目對(duì)外發(fā)布,可以幫助計(jì)算機(jī)理解特定單詞之間的聯(lián)系。Word2vec技術(shù)為更強(qiáng)大語(yǔ)言系統(tǒng)的出現(xiàn)鋪平了道路,這些新推出的系統(tǒng)已經(jīng)能夠識(shí)別出與“汽車”一詞關(guān)系更密切的企業(yè)是寶馬、尼桑這樣的汽車制造商,而不是卡夫亨氏這樣的食品公司。

word2vec之所以具備神奇的計(jì)算能力,是因?yàn)槠淇梢詫卧~轉(zhuǎn)換成神經(jīng)網(wǎng)絡(luò)能夠理解的數(shù)字串,進(jìn)而識(shí)別出詞語(yǔ)之間的相關(guān)性。經(jīng)過(guò)一段時(shí)間的訓(xùn)練,通過(guò)對(duì)更多文本進(jìn)行學(xué)習(xí),神經(jīng)網(wǎng)絡(luò)便具備了根據(jù)單詞共同出現(xiàn)的頻率對(duì)其進(jìn)行打分的能力,并能夠根據(jù)分?jǐn)?shù)對(duì)單詞進(jìn)行分組。與更早出現(xiàn)的所謂自然語(yǔ)言處理技術(shù)相比,這些較新的系統(tǒng)提升了與人類思維典型相關(guān)的模式識(shí)別屬性。

借助這種計(jì)算機(jī)輔助的單詞聯(lián)想游戲,計(jì)算機(jī)將可以理解表格中存儲(chǔ)的信息。這個(gè)過(guò)程相當(dāng)于為神經(jīng)網(wǎng)絡(luò)創(chuàng)建了一套自己的摩爾斯電碼:當(dāng)應(yīng)用程序在一份有關(guān)銷售情況的電子表格中遇到一列表示“日期”的數(shù)據(jù)時(shí),無(wú)需獲得明確指令,只要借助足夠的數(shù)據(jù),便能夠理解某些假日可能會(huì)對(duì)特定季節(jié)的銷售產(chǎn)生影響。舊金山大學(xué)應(yīng)用數(shù)據(jù)倫理中心的主任、非營(yíng)利教育機(jī)構(gòu)Fast.ai的聯(lián)合創(chuàng)始人雷切爾?托馬斯表示:“這是底層的核心概念。神經(jīng)網(wǎng)絡(luò)通過(guò)建模特定形態(tài)的模式創(chuàng)造了一種無(wú)限靈活的學(xué)習(xí)架構(gòu)?!?/p>

僅在投資領(lǐng)域就有大量通過(guò)文字分析創(chuàng)造價(jià)值的機(jī)會(huì)。高盛的一個(gè)研究小組正在對(duì)神經(jīng)網(wǎng)絡(luò)進(jìn)行訓(xùn)練,使其獲得搜尋“家庭房產(chǎn)內(nèi)部轉(zhuǎn)讓”相關(guān)詞匯的能力。在進(jìn)行非商業(yè)性質(zhì)的交易時(shí),交易雙方很可能不會(huì)如實(shí)描述房產(chǎn)的真實(shí)價(jià)值,如果可以教會(huì)軟件在篩選資料時(shí)將相關(guān)信息排除在外,自然能夠提高銀行的分析能力?!盀榇?,我們訓(xùn)練了一個(gè)可以識(shí)別此類交易、并減少對(duì)其關(guān)注程度的神經(jīng)網(wǎng)絡(luò)?!奔又荽髮W(xué)圣迭戈分校計(jì)算機(jī)科學(xué)專業(yè)的常任教授查爾斯?埃爾坎表示,直到最近,他還在負(fù)責(zé)領(lǐng)導(dǎo)高盛的機(jī)器學(xué)習(xí)項(xiàng)目。

復(fù)雜的詞語(yǔ)聯(lián)想對(duì)物流行業(yè)也有很大價(jià)值。舊金山外賣初創(chuàng)公司Instacart便使用了word2vec的一種變體技術(shù),讓自己的算法能夠預(yù)測(cè)顧客的偏好,這一能力在公司無(wú)法提供顧客想要的產(chǎn)品時(shí)尤其有用。為方便神經(jīng)網(wǎng)絡(luò)處理相關(guān)信息,該公司使用的程序會(huì)將超市庫(kù)存商品的“單詞”轉(zhuǎn)換成“數(shù)字形式的數(shù)據(jù)”,隨后,神經(jīng)網(wǎng)絡(luò)會(huì)對(duì)相應(yīng)物品進(jìn)行分組,以便理解這些數(shù)據(jù)的意義:比如,(通過(guò)分組,神經(jīng)網(wǎng)絡(luò)會(huì)發(fā)現(xiàn),)與咖啡相比,什錦干果與干果或堅(jiān)果的共同點(diǎn)更多。Instacart的機(jī)器學(xué)習(xí)主管沙拉特?拉奧表示,使用這種技術(shù)幫助公司節(jié)約了時(shí)間和資金成本。他說(shuō):“不然我們就得思考所有可能的配對(duì),還得留一張(手填)表?!?/p>

雖然在結(jié)構(gòu)化數(shù)據(jù)領(lǐng)域應(yīng)用深度學(xué)習(xí)技術(shù)已經(jīng)是大勢(shì)所趨,但障礙依然存在。首先,這是一個(gè)全新想法,此前并未對(duì)其效果進(jìn)行過(guò)驗(yàn)證,沒(méi)有人知道與更為傳統(tǒng)的統(tǒng)計(jì)方法相比,這種技術(shù)能夠有哪些優(yōu)勢(shì)。人工智能芯片生產(chǎn)公司英偉達(dá)的數(shù)據(jù)科學(xué)家伊文?奧爾德里奇說(shuō):“現(xiàn)在我們還不知道這個(gè)問(wèn)題的答案?!?/p>

的確,考慮到訓(xùn)練神經(jīng)網(wǎng)絡(luò)的費(fèi)用,對(duì)于那些不具備人工智能專長(zhǎng)的企業(yè)來(lái)說(shuō),原有的數(shù)據(jù)分析方法可能已經(jīng)夠用了。百事公司高管、人工智能專家卡尼烏拉表說(shuō):“我堅(jiān)信,這個(gè)世界上絕不存在可以解決所有問(wèn)題的‘錦囊妙計(jì)’,對(duì)所有公司來(lái)說(shuō)都是如此。”云服務(wù)巨頭亞馬遜、微軟和谷歌在推銷自己的服務(wù)時(shí)實(shí)際上也隱含著這層意思:與其投入巨資、招攬人才去爭(zhēng)取潛在的增量回報(bào),還不如直接從我們這里購(gòu)買人工智能服務(wù)。

與其它以“教會(huì)計(jì)算機(jī)具備‘思考’能力”為目的的項(xiàng)目一樣,人類的偏見也會(huì)對(duì)項(xiàng)目的成功構(gòu)成威脅。深度學(xué)習(xí)系統(tǒng)的優(yōu)劣取決于訓(xùn)練它們所用的數(shù)據(jù),數(shù)據(jù)太多或太少都可能會(huì)使軟件的預(yù)測(cè)產(chǎn)生偏差。以基因泰克的數(shù)據(jù)集為例,該數(shù)據(jù)集收入了此前15年的臨床數(shù)據(jù),但只收入了此前8年的基因組測(cè)試數(shù)據(jù),也就是說(shuō),在此之前的患者數(shù)據(jù)并不像研究人員所希望的那樣具有可比性。供職于基因泰克的科平說(shuō):“如果我們對(duì)這些數(shù)據(jù)集缺乏了解,那么據(jù)此建立起來(lái)的模型可能毫無(wú)可靠性可言?!?/p>

科平表示,盡管如此,對(duì)這些電子表格中的內(nèi)容進(jìn)行強(qiáng)化分析依然具有很高的潛在價(jià)值,其意義完全不亞于獲得“預(yù)測(cè)一個(gè)病人在接受某種治療之后能夠存活多久”的能力。對(duì)一堆表格來(lái)說(shuō),可以做到這一點(diǎn)也算是不錯(cuò)的成績(jī)了。

數(shù)家公司正在對(duì)神經(jīng)網(wǎng)絡(luò)進(jìn)行訓(xùn)練,希望其能夠處理自己已有的結(jié)構(gòu)化數(shù)據(jù),這些公司包括:

基因泰克

這家生物技術(shù)先驅(qū)企業(yè)制作了一份內(nèi)含繁雜健康數(shù)據(jù)、覆蓋數(shù)百萬(wàn)名患者的電子表格,從常規(guī)記錄到基因組圖譜,不一而足。這一研究具有重要意義:如果人工智能真可以通過(guò)正確方式分析這些數(shù)據(jù),個(gè)體病患未來(lái)或?qū)⒛軌颢@得針其疾病制定的個(gè)性化治療方案。

高盛

人工智能為投資者提供了無(wú)限機(jī)遇。受高盛聘請(qǐng),一位機(jī)器學(xué)習(xí)專業(yè)的教授開發(fā)了一種訓(xùn)練工具,借助這種工具,神經(jīng)網(wǎng)絡(luò)能夠?qū)W會(huì)忽略那些可能使金融分析復(fù)雜化的詞語(yǔ),如“家庭內(nèi)部轉(zhuǎn)讓”(出現(xiàn)這一詞語(yǔ)時(shí),交易中的房產(chǎn)價(jià)值可能失真)。神經(jīng)網(wǎng)絡(luò)學(xué)會(huì)識(shí)別、忽略此類詞語(yǔ)可以提升現(xiàn)有分析模型的效率。

Instacart

這家外賣初創(chuàng)公司擁有一套易于理解的數(shù)據(jù)集,內(nèi)含員工需為顧客選取的各種超市商品。該公司正在訓(xùn)練算法進(jìn)行復(fù)雜單詞聯(lián)想的能力,比如在看到什錦干果時(shí),能夠聯(lián)想到堅(jiān)果和干果,方便在顧客所需商品缺貨時(shí)為其提供替代選擇。(財(cái)富中文網(wǎng))

本文另一版本登載于《財(cái)富》雜志2020年10月刊,標(biāo)題為《是什么讓人工智能看起來(lái)很蠢》。

譯者:梁宇

審校:夏林

The electronic spreadsheet has been around for about 50 years. An ingenious invention originally meant to digitize bookkeeping, the software has enabled researchers and businesspeople to input infinite rows and columns of disparate data and then analyze the information with the aid of a computer. It is such standard fare today that schoolchildren are as likely to use free spreadsheet programs as financial analysts are to manage budgets.

What spreadsheets cannot do is think. That’s the preserve of newer, more powerful types of software called neural networks, complex artificial intelligence programs designed to mimic the computational processes of the human brain. And for reasons unique to the development of neural networks in recent years, images—rather than so-called structured data, columns and rows of text and numbers, for example—have been the preoccupation of top A.I. researchers. In other words, powerful computers can sift through millions of photos of cats to understand minute feline characteristics. But the same software struggles to intuit fields in a humble spreadsheet.

This has been deeply frustrating to data scientists in fields like medical research, finance, and operations, where structured data is the coin of the realm. The problem, researchers say, is one of emphasis as well as capabilities. “Most of data we deal with is structured, or we have imposed some kind of structure on it,” says Bayan Bruss, an applied machine learning researcher at the financial firm Capital One. “There’s this big gap between the advances in deep learning and the data that we have. A lot of what we do is try to close that gap.”

Fledgling projects at a handful of companies are trying to bridge the divide. At biotech powerhouse Genentech, for example, data scientists recently spent months building a spreadsheet with the health records and genomic data of 55,000 cancer patients. The fields contain nuggets such as age, cholesterol levels, and heart rates, as well as more sophisticated attributes like molecular profiles and genetic abnormalities. Genentech’s plan is to feed this information into a neural network that can map a patient’s health attributes. The hoped-for outcome is a breakthrough drug that is potentially unique to each patient.

The problem is that researchers are just now beginning to teach neural networks how to consume structured data like the spreadsheets Genentech is building. “The majority of our data is structured data, whether it’s from clinical trials or electronic health records,” says Ryan Copping, global head of analytics for personalized health care data science at Genentech. If computer networks can analyze and make their own realizations about similarities among patient profiles, he says, “then you could start looking at outcomes and thinking about which patients we can target with which therapies. That’s the unmet need.”

The opportunities extend far beyond health care. Research firm IDC estimates the commercial sector will generate 5.8 zettabytes of productivity data—sales forecasts, customer data, and the like—this year. A zettabyte of information corresponds roughly to the number of grains of sand on all the world’s beaches. A lot, in other words, says John Rydning, head of IDC’s Global DataSphere program, which measures the amount of data created each year.

This means that businesses of all types, if they can corral the data into a form neural networks can learn from, have a lucrative opportunity. Even slight improvements in predictive capabilities can lead to enormous financial gains, says Athina Kanioura, chief strategy and transformation officer for food giant PepsiCo. “The additional level of accuracy translates to millions of dollars,” she says.

The challenge, then, is getting researchers to work with the kind of data that can be most helpful to business. “The deep networks that are so cool can really do amazing things for our cars and for understanding sentiment from tweets online,” says Peter Bailis, a Stanford professor and also CEO of a Silicon Valley startup called Sisu Data that builds analytical tools for businesses. “But they don’t help us with understanding things like risk or customer satisfaction if our data is stored in tables.” In terms any businessperson can relate to, the question remains: Can A.I. conquer its Excel problem?

*****

Progress in promoting business applications for neural networks rests on getting the programs to understand words as well as they have been able to analyze images. For that, researchers have turned to a technique called word2vec. (The “vec” stands for vector, the type of analytical unit best understood by a neural network.) Word2vec, invented in 2013 by a team of Google researchers and published as an open-source software project, helps computers map the relationships among certain words. It has led to more powerful language systems that recognize, for example, that the word “car” is more closely related to automakers like BMW or Nissan than a food company like Kraft Heinz.

The computational magic of word2vec is its ability to discover those correlations by converting words into a string of numbers that neural networks can understand. Over time, as a neural network is trained on additional text, it groups words according to numerical scores measuring how frequently the words appear near each other. Compared with older so-called natural language processing technologies, these newer systems improve on the pattern recognition attributes typically associated with human thought.

From this computer-assisted word-association game comes an ability to make sense of what is stored in the rows and columns, for instance, of a spreadsheet. This process creates a type of Morse code for a neural network: If the program comes across a sales spreadsheet with a column indicating “days,” it can learn with enough data that certain holidays could impact sales during a particular season without being explicitly told to do so. “It’s kind of the core idea,” says Rachel Thomas, director of the University of San Francisco’s Center for Applied Data Ethics and cofounder of an educational nonprofit called Fast.ai. “Neural networks are providing this infinitely flexible architecture for learning by modeling a particular shape of patterns.”

The investment world alone is rife with opportunities for analyzing words. At Goldman Sachs, a team of researchers trained a neural network to look for words associated with intra-family home transfers. Such noncommercial transactions likely won’t describe the true value of a house, and teaching a software program to factor them out can improve the bank’s analysis. “So we trained a neural network so it learns to pay less attention to a transaction that has that label,” says Charles Elkan, a longtime professor of computer science at the University of California at San Diego who until recently led machine learning projects for Goldman.

Sophisticated word association is also invaluable for logistics operators. The San Francisco grocery-delivery startup Instacart uses a variant of word2vec to teach its algorithms to anticipate customer preferences, particularly when requested items are unavailable. The program converts the words for supermarket inventory items into numerical data so neural networks can process them. The network then groups items together so it can understand, for example, that trail mix has more in common with dried fruit or nuts than it does with coffee. The result is a time and money saver, says Sharath Rao, a machine learning director for Instacart. “Otherwise you would have to think of all the possible pairs and keep a [manual] table,” he says.

*****

For all the momentum behind using deep learning on structured data, hurdles remain. For one, the idea is so new that there’s no tried-and-true way to evaluate how good these techniques are compared with more conventional statistical methods. “It’s a bit of an open question right now,” says Even Oldridge, a data scientist for Nvidia, which makes chips that power A.I. software.

Indeed, given the expense of training neural networks, older data analytics methods may be sufficient for companies that don’t have the right A.I. expertise in-house. “I’m a firm believer that for every company, there isn’t a magic solution that can solve every problem,” says A.I. expert Kanioura, the PepsiCo executive. This is in fact behind the pitch that cloud-services giants Amazon, Microsoft, and Google make: Buy A.I. services from us rather than making large expenditures on talent for potentially incremental returns.

And as with any project where humans aim to teach computers how to “think,” the biases of the living organisms threaten the project. Deep learning systems are only as good as the software’s predictions. Genentech’s data set, for instance, has clinical data on cancer patients dating back 15 years. However, the genomic testing data it uses in its spreadsheet is eight years old, meaning that patient data from before then isn’t as comparable as researchers might like. “If we don’t understand these data sets, we could build models that are totally unreliable,” says Genentech’s Copping.

Still, the potential value of supercharging the analysis of all those spreadsheet fields is nothing less than being able to “predict how long a patient can survive” with a certain treatment, says Copping. Not bad for a bunch of rows and columns.

*****

A handful of corporations are teaching neural networks to work with the kind of structured data that already exists within their walls. A few examples:

Genentech

The biotech pioneer has built a spreadsheet with complex health data from routine records to genomic profiles—from tens of thousands of patients. The stakes are high: If artificial intelligence can properly analyze the data, the result could be medical treatments targeting the disease of iindividual patients.

Goldman Sachs

A. I. presents untold opportunities for investors. The bank hired a machine learning professor to build a tool to teach networks to ignore phrases that could complicate a financial analysis. Example: “Intra-family transfers” likely don’t reflect the accurate value of a home. Teaching a network to find them can improve the model.

Instacart

The grocery-delivery startup has an understandable data set in the inventory of supermarket items its workers pick for customers. The company is teaching its algorithms to do sophisticated word association like matching trail mix with nuts and dried fruit—in order to offer customers alternatives when their choices are out of stock.

A version of this article appears in the October 2020 issue of Fortune with the headline "What makes artificial intelligence look dumb."

財(cái)富中文網(wǎng)所刊載內(nèi)容之知識(shí)產(chǎn)權(quán)為財(cái)富媒體知識(shí)產(chǎn)權(quán)有限公司及/或相關(guān)權(quán)利人專屬所有或持有。未經(jīng)許可,禁止進(jìn)行轉(zhuǎn)載、摘編、復(fù)制及建立鏡像等任何使用。
0條Plus
精彩評(píng)論
評(píng)論

撰寫或查看更多評(píng)論

請(qǐng)打開財(cái)富Plus APP

前往打開
熱讀文章
国产免费av片在线观看与下载| 色久桃花影院在线观看| 国产精品乱码高清在线观看| 多人性激烈的欧美三级视频| 男女体裸下00动态视频| 精品久久久久久无码专区不卡| 日本不卡在线视频二区三区| 欧美久久久久久免费国产精品中文字幕| 亚洲欧美日韩综合俺去了| 少妇久久久久久人妻无码| 亚洲AV日韩AV永久无码免下载| 欧洲熟妇一区二区三区| 少妇高潮喷水惨叫久无码| 无码制服丝袜人妻在线视频| 国产AV无码一区二区三区| h无码高清视频在线播放 | 国产又粗又猛又黄又爽无遮挡,轻点灬大| 亚洲AV无码专区在线电影| 99国产欧美久久久精品| AV毛片无码亚洲人| 久久婷婷五月综合97色直播| 无遮挡又黄又刺激的视频网站| 亚洲国产精品久久久久久| 99精品欧美一区蜜桃在线| 欧美成aⅴ人高清免费| 2020亚洲欧美日韩在线观看| 必看无人区一码二码三码| 99精品国产在热久久婷婷| 亚洲成aⅴ人片久青草影院| 午夜无码一区二区三区在线观看| 人妻中文字幕无码系列| 亚洲日韩精品一区二区三区| 国产白嫩护士无码在线播放| 国产一区二区精品九九| 卡通动漫午夜一级毛片| 国产AV影片久久久久久| 久久精品一本到东京热| 丰满少妇?级毛片| 久久精品国产99久久丝袜| 又黄又无遮挡又湿的视频网站| 精品国产亚洲人成在线观看|