What you don’t know can have devastating consequences for your business.
By Bill Dunnion, Director, Corporate Cybersecurity at Calian
Machine learning and artificial intelligence (AI) offer immense potential for streamlining processes and automating repetitive tasks. However, there are some important considerations to keep in mind when implementing these technologies. The risks and challenges associated with machine learning and AI can be as substantial as the efficiencies gained through their use.
Data Retention and Ownership
One of the fundamental aspects of machine learning is training algorithms to distinguish between good and bad information. This requires a large sample size of data. As machine learning applications strive to improve the accuracy of their algorithms, they assimilate vast amounts of information to use in their pool of training data, which could include sensitive corporate information. While these organizations claim data anonymization, we should be concerned about data ownership and the potential loss of control over valuable intellectual property.
There’s no way for you to tell whether your information is being assimilated and used by an application, unless they are above board and tell you in your master sales agreement or in their terms and conditions, or if you ask the specific question and get the answer. So, when it comes to doing a threat risk assessment for new applications that an organization wants to implement to make employees’ lives easier and more productive, there is a lot of pressure on the security team to do their due diligence around how these tools are used.
Lack of Transparency
Unless explicitly stated in contracts or terms and conditions, it can be challenging to determine whether proprietary data is being assimilated and used by a machine learning tool. Recent breaches have demonstrated the potential consequences of unintentionally sharing sensitive data with public platforms.
Putting a company’s proprietary code into a public open-source version of a ChatGPT engine to streamline and problem solve seems innocent enough. But it’s not that simple. Putting information into a public open-source machine learning model to gain access to a large pool of data adds it to the existing data for further machine learning. By doing this a company adds its own intellectual property to the machine learning model, exposing it to the public.
Do Your Due Diligence
At Calian we have recently considered three different SaaS tools that leverage machine learning, two of which we have rejected because of their terms and conditions. Sometimes you need to consider questions that you wouldn’t normally think to ask. Even though a tool might be just a database of documents that you put your contracts in to help you track and maintain compliance with the terms and conditions of your contracts, you may not be aware of what that opens you up to. In the terms and conditions for the sale, we noticed that a company, in effect, declared all rights and ownership for all information and documentation that we would put into the tool. We would be handing over to a third party all the data in our corporate contracts.
Even something as benign as a cloud-based tool used to correct mistakes in writing has risks. As part of your day-to-day tasks, you might innocently put some intellectual property into the tool for it to check. However, you may or may not have considered how the information you are entering is being used by and for the tool. Maybe it’s not sensitive information, but what if you are looking to streamline your product roadmap or implement a new feature within your flagship solution? Regardless of the open-source tool used, it is the sensitivity level of the information that is potentially being made available for public use and consumption. So, data retention and data ownership are vital things that you need to pay attention to because, like the internet, once it’s public you’ll never be able to retrieve it.
Unintended Consequences and Automation
Integrating machine learning tools into organizational environments introduces challenges related to access control and data management as well. Do these tools adhere to access control policies and restrict searches to authorized information? If you ask it to find information on a certain topic, is it going to restrict the search to just the information that you would normally have access to, or will it find examples of data in your files or your history, or even in your CEO’s history? These concerns about the potential for an application to access data from sources without proper authorization further complicate the security issues.
In addition, while these tools can enhance productivity, there is a risk of them granting permissions or creating administrative accounts without human intervention or approval. The ability of these tools to autonomously perform tasks demands careful consideration of potential risks and the need for robust control mechanisms.
Machine learning and AI have transformative potential, but they also bring new risks and challenges. Organizations must carefully navigate issues related to data retention and ownership, transparency and privacy, access control and unintended consequences. By addressing these concerns and implementing safeguards, businesses can harness the power of these technologies while ensuring the security and protection of their sensitive information. But it is crucial for security teams to be proactive and informed as the landscape of machine learning and AI continues to evolve.