Kepler
Cornell Big Red
#freeluigi
The author and presenter adds that, in an exposé by The Atlantic magazine, it becomes apparent that Meta at first looked into acquiring the rights to use this content legally, but it soon became clear that the cost could be prohibitive.
"In all these internal communications, it becomes apparent that it is not particularly cheap and it is not particularly easy," Richard explains.
"In the communications, someone says 'Look, it is really, really important for Meta to get these books ASAP. Another person on the email chain said, 'Look, I've spoken to publishers and this seems unreasonably expensive'."
It was after this exchange, Richard says, that the decision was made to mine LibGen to feed Meta's AI: "Apparently someone then gave the go ahead to scrape LibGen, which as we say is a deeply, deeply, deeply illegal website."
He adds that several other AI companies are using similar tactics to train their Large Language Models [LLMs] and the argument in favour of racing ahead without doing the proper paperwork is that in less scrupulous régimes – such as China and Russia – this will be going on anyway.