Concerns Related to DeepSeek’s Training Techniques May Focus on IP Issues
概况
DeepSeek’s announcement of its Janus-Pro-7B model’s performance and the cost that it incurred to develop that model caused a cumulative trillion-dollar loss of market capitalization of a collection of companies at the forefront of the AI market. Several publications have reported that some companies have responded to the announcement by questioning whether DeepSeek used data from existing AI application program interfaces (APIs) to train its models. The concerns about the training of DeepSeek’s models may turn on legal principles different from those raised in the existing litigations that relate to how AI models are trained.
Several content owners have brought litigations alleging that the training method for certain AI models amounted to copyright infringement and violations of the Digital Millennium Copyright Act (DMCA) by the removal of copyright-management information. Based on media reports, concerns related to DeepSeek do not appear to focus on copyright allegations but, rather, with the practices of bombarding AI APIs with millions of questions, using the responses to understand the underlying data associated with these models, and training DeepSeek’s models with that information. As reported in the media, the concern is based on Microsoft’s observance of accounts reported to be associated with DeepSeek exfiltrating large amounts of data from the API. Such activities may generate legal liabilities under several theories. The activities could implicate breach-of-contract allegations for violations of any usage agreements that users acknowledge before using the API.
A more potent allegation may be an assertion of trade secret misappropriation. The US Defend Trade Secrets Act (DTSA) defines a trade secret as any form and type of information that the owner has taken reasonable steps to protect and which derives independent economic value from not being generally well-known or readily ascertainable through proper means. If an entity obtains this information by “improper means,” that entity could be liable for trade secret misappropriation. The liability may extend to an entity that, while not obtaining the information by “improper means,” knew or should have known that the entity providing the information used “improper means.” The present situation may raise considerations as to whether the ability to query an AI API to exfiltrate a large amount of data evidences the lack of “reasonable measures” to protect the information and whether the technique uses “improper means” to obtain the underlying information. Notably, a recent decision found that “scraping” of a significant amount of information from a proprietary database amounted to “improper means” supporting a finding of trade secret misappropriation. Compulife Software, Inc. v. Newman, 111 F.4th 1147 (11th Cir. 2024). In view of the growing importance of AI, companies need to stay abreast of legal developments to ensure that they do not generate legal liabilities by the manner they obtain outside data or miss the opportunity to obtain legal and equitable remedies for misappropriation or misuse of their data.


