Facebook于周二在一系列博文中指出,該公司是多項(xiàng)人工智能技術(shù)的先驅(qū),這些技術(shù)被其用于監(jiān)管其社交網(wǎng)絡(luò)內(nèi)容。
就在Facebook發(fā)布所用技術(shù)細(xì)節(jié)的同一天,它還發(fā)布了最新的季度信息更新,介紹了其在應(yīng)對(duì)仇恨言論、兒童色情、虛假賬戶、政治錯(cuò)誤信息、恐怖主義惡意宣傳,以及其他社區(qū)標(biāo)準(zhǔn)違反行為方面所采取的舉措。該報(bào)告稱,公司自年初以來(lái)一直在遏制仇恨言論以及新冠疫情相關(guān)錯(cuò)誤信息的激增。
Facebook在周二重點(diǎn)提到的新人工智能系統(tǒng)包括:能夠更好地理解語(yǔ)義及其所使用語(yǔ)境的系統(tǒng),以及結(jié)合圖片和語(yǔ)言處理以偵測(cè)有害模因的新系統(tǒng)。
為了幫助遏制與新冠疫情相關(guān)的錯(cuò)誤信息,該公司還采用了新人工智能算法來(lái)監(jiān)管新廣告政策的實(shí)施。該政策旨在禁止發(fā)布利用新冠疫情來(lái)謀取利益的廣告,例如口罩、洗手液和其他物品的銷售廣告。
Facebook在一篇博文中指出,公司在4月向5000萬(wàn)個(gè)帖子發(fā)出了警告,因?yàn)檫@些帖子可能涉及與新冠疫情有關(guān)的錯(cuò)誤信息。公司還稱,自3月初以來(lái),公司已經(jīng)刪除了250萬(wàn)條違反個(gè)人防護(hù)設(shè)備或冠狀病毒測(cè)試包銷售禁令的內(nèi)容。
Facebook稱,得益于這項(xiàng)新技術(shù),在過(guò)去一個(gè)季度刪除的帖文中,有88.8%都是在他人看見(jiàn)并向公司的人工審核員舉報(bào)之前便已被自動(dòng)偵測(cè),較前一個(gè)季度的80%有所增長(zhǎng)。
但Facebook稱,公司發(fā)現(xiàn)的仇恨言論總數(shù)依然在上升,2020年前三個(gè)月共刪除了960萬(wàn)條此類內(nèi)容,較上個(gè)季度多出了390萬(wàn)條。
Facebook的首席技術(shù)官邁克·施洛普弗表示,仇恨言論數(shù)量的增加源于公司檢測(cè)仇恨言論能力的增強(qiáng),而不是仇恨言論自身數(shù)量的增加。他在報(bào)告發(fā)布之前的一次記者電話會(huì)議上說(shuō):“我認(rèn)為這一點(diǎn)明顯得益于技術(shù)進(jìn)步?!?/p>
特別值得一提的是,F(xiàn)acebook的這一技術(shù)得益于海量語(yǔ)言學(xué)習(xí)算法的進(jìn)步,只不過(guò)它是近三年期間才開(kāi)發(fā)出來(lái)的算法。這些模型的工作方式在于,繪制一張能夠展示貼文內(nèi)容文字與貼文發(fā)布前后其他文字關(guān)聯(lián)度的統(tǒng)計(jì)圖譜。Facebook已經(jīng)開(kāi)發(fā)了一個(gè)名為XLM-R的系統(tǒng),它經(jīng)過(guò)了2TB數(shù)據(jù)的訓(xùn)練,相當(dāng)于50萬(wàn)冊(cè)300頁(yè)書(shū)籍所含的所有文字。它會(huì)一次性學(xué)習(xí)所有這些文字的多種語(yǔ)言統(tǒng)計(jì)圖譜。該算法背后的理念在于,任何語(yǔ)言的仇恨言論在概念上的共性意味著仇恨言論的統(tǒng)計(jì)圖譜在任何語(yǔ)言中看起來(lái)都是相似的,哪怕語(yǔ)言本身完全不同。
Facebook正在盡力展示自己在這一方面取得的成功,因?yàn)槭紫瘓?zhí)行官扎克伯格曾多次承諾機(jī)器學(xué)習(xí)和人工智能將允許公司在其各大平臺(tái)遏制仇恨言論、恐怖主義言論和政治錯(cuò)誤信息的散播。在過(guò)去四年中,這些問(wèn)題已經(jīng)讓Facebook成為了全球監(jiān)管方的靶子,而且也讓很多曾幾何時(shí)的粉絲站到了公司的對(duì)立面。
施洛普弗說(shuō):“我們很現(xiàn)實(shí)。人工智能并非是所有問(wèn)題的解決方案,而且我們認(rèn)為在可預(yù)見(jiàn)的未來(lái),人力依然是不可或缺的部分?!?/p>
Facebook介紹的大多數(shù)技術(shù)旨在簡(jiǎn)化其內(nèi)容管理人員和相關(guān)事實(shí)核查機(jī)構(gòu)的工作,并減少重復(fù)性。
當(dāng)前,F(xiàn)acebook和眾多國(guó)家實(shí)施的社交隔離舉措意味著其內(nèi)容管理人員的工作場(chǎng)所將不得不關(guān)閉,而且審核人員也都返回了家中,其中很多都是合同工,因此這些技術(shù)在眼下這個(gè)時(shí)期尤為重要。施羅普弗表示,在某些情況下,公司已經(jīng)通過(guò)一些方式讓這些人能夠繼續(xù)在家工作,不過(guò),并非所有人都可以做到這一點(diǎn)。
施羅普弗說(shuō):“我們希望人們能夠做出最終決定,尤其是在當(dāng)前這個(gè)局面比較微妙的時(shí)候。但我們也希望可以為我們與之共事的人員提供日常所需的強(qiáng)有力工具?!彼赋觯?,如果一名內(nèi)容審核員認(rèn)定一整套圖片都包含錯(cuò)誤信息,那么Facebook應(yīng)該能夠自動(dòng)將這一標(biāo)簽應(yīng)用至Facebook和Facebook旗下Instagram的類似內(nèi)容,這樣便無(wú)需審核人員找出所有內(nèi)容并手動(dòng)刪除。
人們嘗試規(guī)避Facebook內(nèi)容黑名單的一種方式在于,對(duì)被屏蔽的內(nèi)容進(jìn)行小幅修改,例如更改圖片中的某些像素區(qū)域或使用照片濾鏡,然后嘗試再次上載,并希望其能夠逃過(guò)Facebook的算法。為了應(yīng)對(duì)這類手段,公司已經(jīng)開(kāi)發(fā)了一套名為SimSearchNet的新人工智能系統(tǒng),旨在尋找相似度很高的內(nèi)容。
為了推行其新冠疫情相關(guān)的廣告政策舉措,F(xiàn)acebook部署了另一個(gè)計(jì)算機(jī)視覺(jué)系統(tǒng),它能識(shí)別圖像中的物體,而不是簡(jiǎn)單地將其所含像素匯總為統(tǒng)計(jì)圖譜。施羅普弗說(shuō),通過(guò)這種方式,哪怕口罩遭到扭曲或放置于讓機(jī)器學(xué)習(xí)軟件難以識(shí)別的背景當(dāng)中,算法應(yīng)該都能識(shí)別出來(lái)。
最后,F(xiàn)acebook稱其正在開(kāi)發(fā)“多模式”機(jī)器學(xué)習(xí)系統(tǒng),來(lái)應(yīng)對(duì)仇恨模因的散布。該系統(tǒng)能夠分析文本和圖片,而且在未來(lái)有望分析視頻和聲音。
為了實(shí)現(xiàn)這一目標(biāo),F(xiàn)acebook已經(jīng)打造了一個(gè)由1萬(wàn)個(gè)模因構(gòu)成的新數(shù)據(jù)集,并將其作為遏制仇恨言論舉措的一部分,而且研究人員可免費(fèi)使用這一資源來(lái)打造能夠成功偵測(cè)仇恨言論的人工智能系統(tǒng)。Facebook將舉辦一個(gè)獎(jiǎng)金達(dá)10萬(wàn)美元的競(jìng)賽,以尋找最佳仇恨言論偵測(cè)軟件。但參賽的前提是,研究人員必須開(kāi)放其算法的源代碼。
作為基準(zhǔn),F(xiàn)acebook的人工智能研究人員自行開(kāi)發(fā)了多款系統(tǒng),并利用上述數(shù)據(jù)集來(lái)培訓(xùn)這些系統(tǒng)。然而,公司目前的結(jié)果顯示了該挑戰(zhàn)的難度:盡管Facebook最好的仇恨言論偵測(cè)器已經(jīng)同時(shí)接受了大量文本和圖片數(shù)據(jù)集的培訓(xùn),但其準(zhǔn)確率只有63%。作為對(duì)比,人工審核的準(zhǔn)確率約為85%,遺漏率不到20%。(財(cái)富中文網(wǎng))
譯者:Feb
Facebook于周二在一系列博文中指出,該公司是多項(xiàng)人工智能技術(shù)的先驅(qū),這些技術(shù)被其用于監(jiān)管其社交網(wǎng)絡(luò)內(nèi)容。
就在Facebook發(fā)布所用技術(shù)細(xì)節(jié)的同一天,它還發(fā)布了最新的季度信息更新,介紹了其在應(yīng)對(duì)仇恨言論、兒童色情、虛假賬戶、政治錯(cuò)誤信息、恐怖主義惡意宣傳,以及其他社區(qū)標(biāo)準(zhǔn)違反行為方面所采取的舉措。該報(bào)告稱,公司自年初以來(lái)一直在遏制仇恨言論以及新冠疫情相關(guān)錯(cuò)誤信息的激增。
Facebook在周二重點(diǎn)提到的新人工智能系統(tǒng)包括:能夠更好地理解語(yǔ)義及其所使用語(yǔ)境的系統(tǒng),以及結(jié)合圖片和語(yǔ)言處理以偵測(cè)有害模因的新系統(tǒng)。
為了幫助遏制與新冠疫情相關(guān)的錯(cuò)誤信息,該公司還采用了新人工智能算法來(lái)監(jiān)管新廣告政策的實(shí)施。該政策旨在禁止發(fā)布利用新冠疫情來(lái)謀取利益的廣告,例如口罩、洗手液和其他物品的銷售廣告。
Facebook在一篇博文中指出,公司在4月向5000萬(wàn)個(gè)帖子發(fā)出了警告,因?yàn)檫@些帖子可能涉及與新冠疫情有關(guān)的錯(cuò)誤信息。公司還稱,自3月初以來(lái),公司已經(jīng)刪除了250萬(wàn)條違反個(gè)人防護(hù)設(shè)備或冠狀病毒測(cè)試包銷售禁令的內(nèi)容。
Facebook稱,得益于這項(xiàng)新技術(shù),在過(guò)去一個(gè)季度刪除的帖文中,有88.8%都是在他人看見(jiàn)并向公司的人工審核員舉報(bào)之前便已被自動(dòng)偵測(cè),較前一個(gè)季度的80%有所增長(zhǎng)。
但Facebook稱,公司發(fā)現(xiàn)的仇恨言論總數(shù)依然在上升,2020年前三個(gè)月共刪除了960萬(wàn)條此類內(nèi)容,較上個(gè)季度多出了390萬(wàn)條。
Facebook的首席技術(shù)官邁克·施洛普弗表示,仇恨言論數(shù)量的增加源于公司檢測(cè)仇恨言論能力的增強(qiáng),而不是仇恨言論自身數(shù)量的增加。他在報(bào)告發(fā)布之前的一次記者電話會(huì)議上說(shuō):“我認(rèn)為這一點(diǎn)明顯得益于技術(shù)進(jìn)步?!?/p>
特別值得一提的是,F(xiàn)acebook的這一技術(shù)得益于海量語(yǔ)言學(xué)習(xí)算法的進(jìn)步,只不過(guò)它是近三年期間才開(kāi)發(fā)出來(lái)的算法。這些模型的工作方式在于,繪制一張能夠展示貼文內(nèi)容文字與貼文發(fā)布前后其他文字關(guān)聯(lián)度的統(tǒng)計(jì)圖譜。Facebook已經(jīng)開(kāi)發(fā)了一個(gè)名為XLM-R的系統(tǒng),它經(jīng)過(guò)了2TB數(shù)據(jù)的訓(xùn)練,相當(dāng)于50萬(wàn)冊(cè)300頁(yè)書(shū)籍所含的所有文字。它會(huì)一次性學(xué)習(xí)所有這些文字的多種語(yǔ)言統(tǒng)計(jì)圖譜。該算法背后的理念在于,任何語(yǔ)言的仇恨言論在概念上的共性意味著仇恨言論的統(tǒng)計(jì)圖譜在任何語(yǔ)言中看起來(lái)都是相似的,哪怕語(yǔ)言本身完全不同。
Facebook正在盡力展示自己在這一方面取得的成功,因?yàn)槭紫瘓?zhí)行官扎克伯格曾多次承諾機(jī)器學(xué)習(xí)和人工智能將允許公司在其各大平臺(tái)遏制仇恨言論、恐怖主義言論和政治錯(cuò)誤信息的散播。在過(guò)去四年中,這些問(wèn)題已經(jīng)讓Facebook成為了全球監(jiān)管方的靶子,而且也讓很多曾幾何時(shí)的粉絲站到了公司的對(duì)立面。
施洛普弗說(shuō):“我們很現(xiàn)實(shí)。人工智能并非是所有問(wèn)題的解決方案,而且我們認(rèn)為在可預(yù)見(jiàn)的未來(lái),人力依然是不可或缺的部分?!?/p>
Facebook介紹的大多數(shù)技術(shù)旨在簡(jiǎn)化其內(nèi)容管理人員和相關(guān)事實(shí)核查機(jī)構(gòu)的工作,并減少重復(fù)性。
當(dāng)前,F(xiàn)acebook和眾多國(guó)家實(shí)施的社交隔離舉措意味著其內(nèi)容管理人員的工作場(chǎng)所將不得不關(guān)閉,而且審核人員也都返回了家中,其中很多都是合同工,因此這些技術(shù)在眼下這個(gè)時(shí)期尤為重要。施羅普弗表示,在某些情況下,公司已經(jīng)通過(guò)一些方式讓這些人能夠繼續(xù)在家工作,不過(guò),并非所有人都可以做到這一點(diǎn)。
施羅普弗說(shuō):“我們希望人們能夠做出最終決定,尤其是在當(dāng)前這個(gè)局面比較微妙的時(shí)候。但我們也希望可以為我們與之共事的人員提供日常所需的強(qiáng)有力工具?!彼赋觯纾绻幻麅?nèi)容審核員認(rèn)定一整套圖片都包含錯(cuò)誤信息,那么Facebook應(yīng)該能夠自動(dòng)將這一標(biāo)簽應(yīng)用至Facebook和Facebook旗下Instagram的類似內(nèi)容,這樣便無(wú)需審核人員找出所有內(nèi)容并手動(dòng)刪除。
人們嘗試規(guī)避Facebook內(nèi)容黑名單的一種方式在于,對(duì)被屏蔽的內(nèi)容進(jìn)行小幅修改,例如更改圖片中的某些像素區(qū)域或使用照片濾鏡,然后嘗試再次上載,并希望其能夠逃過(guò)Facebook的算法。為了應(yīng)對(duì)這類手段,公司已經(jīng)開(kāi)發(fā)了一套名為SimSearchNet的新人工智能系統(tǒng),旨在尋找相似度很高的內(nèi)容。
為了推行其新冠疫情相關(guān)的廣告政策舉措,F(xiàn)acebook部署了另一個(gè)計(jì)算機(jī)視覺(jué)系統(tǒng),它能識(shí)別圖像中的物體,而不是簡(jiǎn)單地將其所含像素匯總為統(tǒng)計(jì)圖譜。施羅普弗說(shuō),通過(guò)這種方式,哪怕口罩遭到扭曲或放置于讓機(jī)器學(xué)習(xí)軟件難以識(shí)別的背景當(dāng)中,算法應(yīng)該都能識(shí)別出來(lái)。
最后,F(xiàn)acebook稱其正在開(kāi)發(fā)“多模式”機(jī)器學(xué)習(xí)系統(tǒng),來(lái)應(yīng)對(duì)仇恨模因的散布。該系統(tǒng)能夠分析文本和圖片,而且在未來(lái)有望分析視頻和聲音。
為了實(shí)現(xiàn)這一目標(biāo),F(xiàn)acebook已經(jīng)打造了一個(gè)由1萬(wàn)個(gè)模因構(gòu)成的新數(shù)據(jù)集,并將其作為遏制仇恨言論舉措的一部分,而且研究人員可免費(fèi)使用這一資源來(lái)打造能夠成功偵測(cè)仇恨言論的人工智能系統(tǒng)。Facebook將舉辦一個(gè)獎(jiǎng)金達(dá)10萬(wàn)美元的競(jìng)賽,以尋找最佳仇恨言論偵測(cè)軟件。但參賽的前提是,研究人員必須開(kāi)放其算法的源代碼。
作為基準(zhǔn),F(xiàn)acebook的人工智能研究人員自行開(kāi)發(fā)了多款系統(tǒng),并利用上述數(shù)據(jù)集來(lái)培訓(xùn)這些系統(tǒng)。然而,公司目前的結(jié)果顯示了該挑戰(zhàn)的難度:盡管Facebook最好的仇恨言論偵測(cè)器已經(jīng)同時(shí)接受了大量文本和圖片數(shù)據(jù)集的培訓(xùn),但其準(zhǔn)確率只有63%。作為對(duì)比,人工審核的準(zhǔn)確率約為85%,遺漏率不到20%。(財(cái)富中文網(wǎng))
譯者:Feb
It has pioneered a number of artificial intelligence techniques to help it police content across its social networks, Facebook said Tuesday in a series of blog posts.
The details about the technology Facebook is using came on the same day the company released its latest quarterly update on its efforts to combat hate speech, child pornography, fake accounts, political misinformation, terrorist propaganda, and other violations of its community standards. The report showed the company was combating a big surge in hate speech and COVID-19 related misinformation since the start of the year.
Among the new A.I. systems Facebook highlighted on Tuesday are systems that better understand the meaning of language and the context in which it is used, as well as nascent systems that combine image and language processing in order to detect harmful memes.
As well as helping to combat misinformation related to COVID-19, Facebook has also turned to new A.I. algorithms to police its new policy banning ads selling face masks, hand sanitizer, and other items that seek to exploit the pandemic for profit.
The company put warning labels on 50 million posts in April for possible misinformation around COVID-19, the company said in a blog. It also said that since the beginning of March it has removed 2.5 million pieces of content that violated rules about selling personal protective equipment or coronavirus test kits.
Facebook said that thanks to the new techniques, 88.8% of the hate speech the social network took down in the past quarter was detected automatically before someone saw and flagged the offensive material for review by the company's human reviewers. This is up from about 80% in the previous quarter.
But the company said that the total amount of hate speech it's finding continues to rise—9.6 million pieces of content were removed in the first three months of 2020, 3.9 million more than in the previous three months.
Mike Schroepfer, Facebook's chief technology officer, said the increase was due to the company getting better at finding hateful content, not a surge in hate speech itself. "I think this is clearly attributable to technological advances," he said on a call with reporters ahead of the release of the report.
In particular, Facebook has built on advances in very large language learning algorithms that have only been developed in the past three years. These models work by building a statistical picture of how the words in posted content relate to the other words that come both before and after it. Facebook has developed a system called XLM-R, trained on two terrabytes of data, or about the equivalent of all the words in half a million 300-page books. It learns the statistical map of all of those words across multiple languages at once. The idea is that conceptual commonalities between hate speech in any language will mean the statistical maps of hate speech will look similar across every language even if the words themselves are completely different.
Facebook is at pains to show it is making good at CEO Mark Zuckerberg's repeated promises that machine learning and A.I. will enable the company to combat the spread of hate speech, terrorist propaganda, and political misinformation across its platforms—problems that have put Facebook in the crosshairs of regulators globally and turned many one-time fans against the company in the past four years.
"We are not naive," Schroepfer said. "A.I. is not the solution to every single problem and we believe that humans will be in the loop for the foreseeable future."
Much of the tech Facebook highlighted is designed to make the job of its human content moderators and associated fact-checking organizations easier and less repetitive.
That is especially important at a time when social distancing measures instituted by the company as well as by various countries have meant that the centers where many of its human content moderators work have had to close, and the reviewers, many of whom are contractors, have been sent home. In some cases, Schroepfer said, the company has found ways for these people to continue their work from home, although that has not been possible in all cases.
"We want people making the final decisions, especially when the situation is nuanced," Schroepfer said. "But we want to give people we work with every day power tools." For instance, he said, if a human reviewer decided that a whole class of images constituted misinformation, Facebook should be able to automatically apply that label to similar content across both Facebook and Facebook-owned Instagram without the human reviewers having to find and manually remove all of it.
One way people try to evade Facebook's content blacklists is by making small modifications to blocked content—altering some pixels in an image or using a photo filter, for instance—and then trying to upload it again and hope it sneaks past Facebook's algorithms. To battle these tactics, the company has developed a new A.I. system, called SimSearchNet, trained to find pieces of nearly identical content.
Another computer vision system the company has deployed to enforce its new COVID-19 ad policy works by identifying the objects present in an image, not simply forming a statistical map of all of the pixels it contains. This way the algorithm should be able to determine that the image has a face mask in it, even if that face mask is rotated at a funny angle or shown against a background designed to make it harder for machine learning software to recognize it, Schroepfer said.
Finally, the company said it was also working on "multimodal" machine learning systems—ones that can simultaneously analyze text and imagery, and in the future, possibly video and sound too—to combat the spread of hateful memes.
To that end, the company has created a new dataset consisting of 10,000 memes that were determined to be part of hate speech campaigns and it is making it freely available for researchers to use to build A.I. systems capable of successfully detecting them. The company is creating a competition with a $100,000 prize pool to find the best hateful meme detection software, with the condition that in order to enter the contest, researchers must commit to open-sourcing their algorithms.
As a benchmark, Facebook's A.I. researchers created several systems of their own and trained them on this dataset. But the company's results so far indicate how difficult the challenge is: Facebook's best hateful meme detector, which was pre-trained on very large dataset of both text and images simultaneously, was only 63% accurate. Human reviewers, by contrast, were about 85% accurate and missed less than 20% of the memes it should have caught.