In a highly guarded room, authors suing OpenAI
will be able to view its confidential training data.
This sounds like no fun.
The training data of OpenAI will be made available to authors who are suing the company for copyright infringement, but only in a strictly restricted space.
Lawyers for writers Sarah Silverman, Ta-Nehisi Coates, and Paul Tremblay revealed this week in a new court filing that they had negotiated a deal with OpenAI that will grant the writers’ agents access to the company’s training data bank, as The Hollywood Reporter reveals.
There are obviously significant restrictions, and this is noteworthy because it is the first time OpenAI has made its training data available to the public.
According to the report, the training data is only accessible on a locked-down computer without access to the internet or any shared networks in a “secure” area at OpenAI’s San Francisco offices.
The representatives may take notes, but they are not permitted to make copies of any part of the data which is a strange request, given that the content was all provided by the public and no outside equipment will be permitted in the room.
The story further states that everyone who accesses the datasets which are unquestionably massive will have to sign a non-disclosure agreement, produce identification, and enter their name in a visitor’s log.
Head Case
These conditions are the most recent development in the lengthy legal battle between these authors and OpenAI, which could potentially set a precedent for AI’s future use of copyrighted content.
These requirements sound more like the protocols for viewing state secrets than a collection of AI training data.
These conditions are the most recent development in the lengthy legal battle between these authors and OpenAI, which could potentially set a precedent for AI’s future use of copyrighted content.
Coates, Silverman, and Tremblay are being represented by attorneys from the Joseph Saveri Law Firm in San Francisco. They are bringing a similar lawsuit against Meta, claiming that the authors’ copyrighted work was used without authorization or payment, leading to ChatGPT spitting out answers that violate the authors’ copyrights.
Other allegations from similar lawsuits have been dismissed twice this year.
US District Judge Araceli Martinez-Olguin threw out most of the OpenAI lawsuit in February, ruling that the lawyers’ allegations of carelessness, unjust enrichment, and vicarious copyright infringement were baseless.
US District Judge Araceli Martinez-Olguin threw out most of the OpenAI lawsuit in February, ruling that the lawyers’ allegations of carelessness, unjust enrichment, and vicarious copyright infringement were baseless.
The same judge dismissed the component of the lawsuit later this year that claimed OpenAI used the writers’ copyrighted works in unethical business practices; nevertheless, as THR points out, the direct copyright infringement charges have stood the test of time.
As of now, it’s unclear when these locked down training data viewing sessions will take place, how long they’ll have with the data, and how many people will go in at once.
All the same, we’ll be watching this one closely while not quite envying those who will have to dig through all that data to try to find a smoking gun.
All the same, we’ll be watching this one closely while not quite envying those who will have to dig through all that data to try to find a smoking gun.
Discover more from Postbox Live
Subscribe to get the latest posts sent to your email.