How to evaluate security and privacy of AI tools: Guidelines for biotech from Benchling’s experts

Audrey Kittock Head of Product and Privacy, Legal at BenchlingZachary Powers Chief Information Security Officer at Benchling

Posted on January 23, 2024

Summary: AI tools are just software, and the security and privacy diligence you've already built for your tech stack largely applies here. This guide shares how Benchling vets third-party AI vendors and what biotech companies should be negotiating for.

Enterprise-grade AI tools offer meaningfully stronger protections than consumer-facing ones, and that distinction matters in a regulated industry.
A solid supplier questionnaire should cover how data is ingested, stored, and protected, whether your data trains the underlying model, and what compliance certifications are in place.
On the contract side, key asks include limited data storage, no cross-customer model training, SAML 2.0 support, and a data processing agreement with clear terms on subprocessors and international transfers.

AI will become as integral in our lives as the internet and smartphones. In biotech, this growing reach and inevitability is visible today. Generative models are used to propose new therapeutic targets, and LLM-savvy scientists are using resources like BioGPT and Gemini for biomedical text mining, shaving hours or days off of research.

As with any new technology, it’s impossible to discuss potential without risks. Given the high-stakes nature of disease treatments and the intricate regulatory and compliance landscape, companies must get ahead of and mitigate concerns with AI around IP, data privacy, and security. For many, the perceived risks and the unknowns are a deterrent.

AI can seem daunting, but it’s reassuring to know that the AI tools used today are just software. The efforts that companies have already put into ensuring privacy and security due diligence for their existing software can largely be transferred to AI technologies, with slight adjustments. Biotech companies can have confidence in their security and privacy while reaping the benefits of what is arguably one of the most powerful innovations of the last decade with AI.

In this guide, we’re sharing how we evaluate third-party AI offerings at Benchling. This includes: cautions around consumer versus enterprise-grade AI tools; what to negotiate for with security and privacy requirements; and a privacy and security due diligence framework.

Enterprise AI solutions vs. consumer-facing AI solutions

Enterprise SaaS products are often architected and maintained with high security and operating standards. They typically have both the contractual obligations and the business and reputational motivation to maintain trust with their enterprise customers (who are often subject to strict regulatory requirements). In contrast, consumer SaaS products may not always be subject to the same rigorous standards, often due to less stringent regulatory requirements, lower resource allocation for security and privacy, and differing market expectations and client demands.

This principle extends to generative AI tooling as well. Typically, enterprise AI solutions offer enhanced protections compared to consumer-focused models, as they tend to incorporate comprehensive measures for data security, privacy, compliance and reliability into their offerings.

As such, for biotech companies where security, privacy, and compliance are paramount, we advise considering an enterprise-grade AI solution and steering away from most consumer-facing applications. However, it’s important to note that security can vary greatly among enterprise AI providers, and as a result, detailed due diligence is key.

Due diligence for third-party AI tools and functionality

In the next few years, more and more SaaS tools your company already uses will incorporate some level of AI. Whether you’re assessing an AI tool or for a SaaS tool with AI features, companies need to be proactive with due diligence.

Benchling’s procurement process for third-party AI tools and functionality incorporates a Supplier Security & Privacy Questionnaire, focusing on systematic evaluation, including a section dedicated to AI suppliers. We rigorously evaluate how suppliers handle data, and validate the purpose and performance of tools against regulatory standards. This includes an evaluation of:

What data is being ingested in the system
Where that data comes from
Whether there are any privacy-enhancing techniques applied to the data
How the data is protected during transmission and storage within the AI system
Whether the underlying AI tool will be trained on Benchling’s data
What regulatory standards and guidelines exist for the tool

The framework below is a useful starting point for vetting any AI software vendor. Every biotech should consider risks based on their specific industry, data, customers, and regulatory requirements, and refine the below framework as needed.

From the bench to your inbox

Our monthly newsletter features science insights, industry best practices, and stories from teams pushing biotech forward.

Sample questionnaire for suppliers providing [Company A] with software features and functionalities that involve AI.

Provider Information. Identify the creator and hosting provider of the AI tool. Is it the Supplier or a third-party enterprise provider (such as OpenAI, Anthropic, etc?)

Purpose and Usage
1. Describe the intended purpose of the AI tool. How does [Company A] plan to utilize it?
2. Has [Company A] performed any diligence or testing to verify that the AI Tool functions effectively and meets our expectations? Please provide relevant details.
Performance and Validation. What mechanisms are in place to validate the accuracy and reliability of the AI/ML tool?
Data Management and Security
1. Specify the types of data ingested by the AI tool: [the types of data that each company deals with will be very specific to their business, adjust this section]
  1. [Company A] Enterprise data (e.g., internal confidential information, data from internal tooling and systems)
  2. [Company A] Customer data (e.g., data input into the Services by our customers)
  3. [Company A] Usage data (e.g., metadata about how our Services are used)
  4. Any personal data (e.g., data that can potentially identify an individual)
2. If personal data is involved, describe any privacy-enhancing techniques employed (e.g., anonymization, de-identification, differential privacy).
3. How is the data protected during transmission and storage within the AI/ML system?
4. Provide the latest reports from the supplier for penetration tests, security assessments, and security documentation.
AI/ML Training and Model Management
1. Will the AI tool be trained using [Company A]’s data?
2. If yes, will the training be exclusive to a model specific to [Company A], or will it also enhance the underlying general model that other companies could benefit from?
Guidelines and Compliance
1. Are there specific guidelines or protocols for using the AI/ML tool? If so, please specify.
2. Describe how the AI tool adheres to applicable regulatory standards or guidelines.
3. Provide all details about the security compliance certifications and attestations held by both the developer and the hosting provider of the AI tool. Additionally, include the most recent compliance reports from the supplier for our review. Provide a copy of the supplier’s latest compliance reports.
Support and Maintenance. Outline any post-deployment support requirements for the AI tool, including training for [Company A] personnel and ongoing technical support.
Data Storage and Retention. How long is data stored? Where is data stored?

Negotiating security and privacy assurances

To ensure the trustworthiness of AI products, Benchling actively negotiates certain assurances with its suppliers, and in turn, pledges to uphold similar commitments when serving its own customers. Biotech companies can keep the following key points in mind when negotiating privacy and security commitments.

Security

Limited or no data storage: Reducing and/or eliminating data storage significantly reduces the exposure to any potential data security incident. For example, if no data is stored, then prompt injection attacks cannot return any data other than that which was a part of the original model.
Data is not used to train the underlying model OR if training is present, the underlying model is not provided to any other company/persons: Insights and learnings from a company’s data should not be available to other companies to benefit from.
Access to the application can be restricted to specific IP ranges: Reducing the IP ranges that can successfully access the application makes it much harder to attack and reduces the risk of exposure from accidents.
The application must support integration with SAML 2.0 for IAM: Integrating the application with SAML 2.0 allows a company to use its internal access control framework (i.e. SSO & IAM) and apply it consistently to the third-party application.
Security logging data must be made available to the customer: Were there to be a security event and/or incident, does the company’s own security team have the “visibility” necessary to perform investigations and plan response actions?
SLAs for security vulnerability remediation: Will the creator and hosting provider of the tool commit to remediating security vulnerabilities within tight timelines, reducing the exposure window for the customer?
SLAs for system availability and performance: Will the creator and hosting provider of the tool commit to industry best practices around system availability and performance? Will the customer’s teams be able to use the application reliably?
Notification requirements for reasonably suspected security incidents: Transparency is key. Will the supplier notify the customer and share details anytime they suspect a security incident or breach is happening so that the customer may begin response planning quickly.
Right to audit annually and after any security incident: Will the supplier allow the customer to perform an audit of the systems, infrastructure, policies, procedures, and practices used to secure the customer’s data? A customer should be able to do this on an annual basis if they would like to, but they should also be able to do this after any security incident involving the supplier.
Requirement that the supplier maintains compliance with ISO 27001, SOC 2 Type 2, etc.: Knowing that the supplier will maintain a security compliance program, with third-party assessment and attestation, helps reduce the risk that the supplier lapses at some point in the future.
Requirement that the supplier provide penetration test reports to the customer on an annual basis: Knowing that the supplier will, at a minimum, have a penetration test performed by a reputable third-party security firm at least once per year AND will be bound by SLA to remediate any identified vulnerabilities helps to reduce risk exposure for the customer. Transparency is also key here. The penetration test report will show the customer what types of vulnerabilities are being found, how well did the supplier address the risks.

Privacy

If personal data is being processed, then negotiate for the following:

Data Processing Agreement. The Enterprise AI Provider must be willing to sign a Data Processing Agreement (DPA) with robust protections, including agreement to adhering to all applicable data protection laws.
- This is important because a DPA clarifies the parties roles and responsibilities, and includes important details about how personal data should be handled, stored, and protected, as well as the legal basis for processing. Be sure to carefully negotiate the DPA and understand all the terms, as this ensures that all data handling aligns with legal requirements and also protects your company’s rights as well as individuals’ privacy rights.
International Data Transfers. If personal data is transferred across borders, the DPA should include a legal framework for international data transfers (and specify any valid transfer mechanisms, such as Standard Contractual Clauses, the Data Privacy Framework, Binding Corporate Rules, etc.)
- This is essential for compliance with data protection laws, safeguarding data privacy, mitigating legal risks, maintaining customer trust, and standardizing data handling procedures.
Data Subject Requests. Is the Enterprise AI Provider technically able to respond to data subject requests (DSR) (such as requests for access and/or deletion);
- This is necessary for compliance with a variety of data protection laws, such as the GDPR and CCPA, and is also crucial for building and maintaining trust, especially in fields where sensitive data is processed.
Privacy-Enhancing Techniques. Where necessary, is able to implement privacy-enhancing techniques (PETs) (such as de-identification or pseudonymization).
- PETs help to ensure the protection of personal data via minimizing the risk of exposing that data. Techniques like data anonymization and differential privacy ensure that data can be used and analyzed without compromising an individual’s privacy. PETs also assist in compliance with data protection regulations that require that personal data is processed in a manner to ensure confidentiality and integrity.
Responsible for Third-Party Subprocessors. The DPA should stipulate the conditions under which the Enterprise AI Provider can engage with third-party subprocessors (whether via specific authorization or general authorization) and should specify that the Enterprise AI Provider is responsible for any acts and/or omissions of all subprocessors.
- This is important under certain regulatory regimes like GDPR, where specific conditions for the use of subprocessors are required. It also serves to simplify accountability, ensuring a clear line of responsibility in the event of data breaches or compliance issues.

Our commitment remains firm: to help scientists achieve more with biotech. We hope that by sharing our approach and best practices for AI, this serves as a starting point for companies on this journey.

With this fast-evolving technology, we know that the demands on privacy and security will change too. At Benchling, we’re dedicated to upholding your trust while adapting to your needs — we invite any questions or feedback as we continue to serve as your trusted partner in biotech.

Learn more about AI experimentation at Benchling

See how Benchling is using generative AI today and get resources on getting started with LLMs in biotech.

AI Overview