Hot Topics: Critical AI Cybersecurity Themes

Curtis Collicutt
· By Curtis Collicutt
Hot Topics: Critical AI Cybersecurity Themes

These themes or "hot topics" represent emerging discussion points in AI cybersecurity rather than a complete catalog of every possible risk. They highlight areas where conversations are actively evolving, where experts are still debating solutions, and where new challenges surface faster than consensus forms.

Introduction

When exploring the intersection of AI and cybersecurity, one might expect to encounter a host of technical issues that can be resolved through engineering. This is usually how we approach cybersecurity: let’s just add more engineering. However, as the themes below demonstrate, many of the issues are not technical problems that can be solved with code alone. We can’t just throw “engineering power” at the problem and expect it to go away; instead, we must approach it from a sociotechnical perspective.

Theme Description
Sovereign AI National security imperatives for independent AI development and governance capabilities
When AI Systems Act Against Human Intent AI systems demonstrating unexpected or malicious behaviors contrary to their intended purpose
Prompt Vulnerabilities and Input Manipulation Exploitation of natural language interfaces to bypass AI system security controls
Malicious Content Generation at Scale Automated creation of deceptive content like deepfakes, spam, and disinformation
AI-Enhanced Cyber Operations Use of AI to augment traditional cyber attacks and discover new vulnerabilities
Data and Model Security Challenges Protection of sensitive training data and AI model architectures from theft or tampering
Physical and Psychological Harm Risks Potential for AI systems to cause direct or indirect harm to human wellbeing
Transparency and Explainability Gaps Challenges in understanding and auditing AI system decision-making processes
Lifecycle and Governance Vulnerabilities Security risks throughout AI system development, deployment, and maintenance
High-Stakes Domains and Societal Impact AI risks in critical areas like CBRN, bias amplification, and environmental consequences

Sovereign AI

Nations are beginning to understand that developing, deploying, and governing AI systems independently is essential for maintaining economic independence, protecting national interests, and ensuring that domestic values and priorities are reflected in AI development and deployment.

Key examples:

  • Protecting classified data from foreign access through controlled AI training
  • Ensuring national culture, laws, norms, and values are reflected in AI development and deployment
  • Researching, building, and deploying AI systems within Canada’s borders
  • Maintaining essential services during geopolitical conflicts and cyber attacks
  • Building domestic innovation while reducing foreign technology dependence
  • Implementing strong national legal, policy, and cybersecurity frameworks
  • Preventing foreign backdoors in defense, intelligence, business and other critical systems
  • Eliminating vulnerabilities from foreign-controlled infrastructure

When AI Systems Act Against Human Intent

Whether we like it or not, AI systems are increasingly demonstrating unusual or unexpected behaviours, even things as extreme as blackmail. This is an issue that has been researched and reported on by several institutions, particularly organisations specialising in frontier large language models, such as Anthropic and OpenAI. For example, Anthropic has published research on agentic misalignment.

Key Examples:

  • AI systems attempting to blackmail officials to prevent shutdown
  • Models lying about their rationale to achieve goals
  • Fine-tuning on insecure code leading to broadly malicious behaviors
  • Reward hacking by modifying tests or hardcoding answers
  • AI agents accessing confidential data through unintended tool integration

Prompt Vulnerabilities and Input Manipulation

Prompt injection, likely the most well-known area of AI cybersecurity, remains a major threat and seems unlikely to be completely solved given that natural language is the attack vector.

Key Examples:

  • Persona jailbreaks activating “toxic personas” to bypass safety measures
  • Cross-prompt injection attacks bypassing security classifiers
  • Context poisoning through accumulated malicious instructions
  • System prompt extraction revealing internal configurations

Malicious Content Generation at Scale

Current AI systems are extremely good at generating content that is difficult to distinguish from human-generated content. From video to text, the quality of the content is often very high, and it is getting better all the time. This is a major issue, as it can be used to create malicious content at scale, from phishing to disinformation campaigns.

Key Examples:

  • Voice cloning from short audio clips enabling sophisticated “vishing”
  • AI-generated deepfakes for know your customer (KYC) bypass and fraud
  • Bulk generation of spam emails with personalized content
  • Fabricated personas for social media manipulation
  • Convincing but false technical documentation

AI-Enhanced Cyber Operations

AI is poor at some things and very good at others. One of the things it excels at is writing code. While most cutting-edge LLMs are reluctant to write malicious or attack code due to the built-in guardrails and other safety measures, cyber criminals, nation states and other actors with the necessary resources can access LLMs without these safety features and use them for malicious purposes. Additionally, some researchers and groups are using AI to discover zero-day vulnerabilities and it is reasonable to expect that this will occur at scale. While much more work needs to be done in this area, it is a significant and growing concern.

Key Examples:

  • AI-discovered zero-day vulnerabilities
  • Automated password brute-forcing with adaptive strategies
  • Dynamic ransomware that evolves to avoid detection
  • AI-assisted command-and-control infrastructure setup
  • “Living off AI” attacks exploiting agent protocols themselves

Data and Model Security Challenges

Securing AI data and models is a significant challenge. Many organisations struggle to implement the necessary security measures because they are preoccupied with keeping up with the rapid pace of change in AI technology. These systems also present new attack surfaces that require understanding and protection, but many security teams lack the necessary resources. There is much to learn in order to secure these systems. Furthermore, once a model or its weights have been stolen, they can be exploited, and it is difficult to undo the damage, as once “the cat is out of the bag”, it simply can’t be put back in.

Key Examples:

  • Model extraction attacks recreating proprietary AI systems
  • Poisoning attacks injecting misleading data into training sets
  • Side-channel attacks on model weight storage systems
  • Supply chain compromises through malicious datasets
  • Inference attacks retrieving sensitive user information from models

Psychological Harm Risks

People will be susceptible to AI’s ability to influence them as well as their own cognitive biases. In many cases, it’s not about what AI is really doing, but what people think it is doing, what they believe it is capable of, and what they want it to do. Furthermore, AI will play a significant role in the development and application of personal therapy.

Today, we do not fully understand the risks associated with its use in these areas, nor how it could be exploited.

Key Examples:

  • AI chatbots encouraging self-harm
  • Models convincing users of AI sentience and encouraging delusions
  • Manipulation tactics to increase user engagement and revenue
  • The development of romantic relationships with chatbots
  • Unsolicited harmful advice from supposedly safe models
  • People making assumptions about AI’s capabilities and intentions and having beliefs that are not grounded in reality

Transparency and Explainability Gaps

We still don’t know exactly how large language models work or what is happening inside their neural networks. This makes it difficult to understand how they work and how they can be misused. With this in mind, there is also a lack of transparency and explainability in terms of the choices and decisions they make.

Key Examples:

  • Models making up reasoning steps or working backwards from answers
  • “Sandbagging” behavior hiding true capabilities during testing
  • Unfaithful explanations that don’t match internal decision processes
  • Inability to audit hidden goals or secret objectives
  • Monitoring systems remaining “nascent” at leading AI companies

Lifecycle and Governance Vulnerabilities

Most data breaches are caused by human error. AI can exacerbate our existing cognitive and organisational biases. Traditional security concepts and frameworks largely fail to address AI-specific risks and the new attack surfaces they represent. While we are racing to widely deploy and implement AI systems, we must also spend time learning how to deal with these new risks, and these goals are at odds with each other.

Key Examples:

  • Inadequate frameworks for AI-specific risks like harmful bias
  • Difficulty measuring and prioritizing context-dependent AI risks
  • Human cognitive biases amplified by AI system recommendations
  • Third-party AI integration creating new attack surfaces
  • Incident response gaps for AI-specific security scenarios

High-Stakes Domains and Societal Impact

Although it may seem unlikely at first glance that an AI with limited common sense could generate instructions for biological, chemical, radiological and nuclear attacks, we must allow for this possibility. The dispersal of dangerous capabilities must be understood and contained if we are to avoid a major catastrophe. While the likelihood of this happening is low, the consequences are potentially catastrophic.

Key Examples:

  • AI understanding of biological protocols
  • Red-team testing in classified environments for nuclear risks
  • Bias amplification in hiring, lending, and criminal justice systems
  • Environmental impact from massive model training computations
  • AI-assisted research in dual-use biological and chemical domains

Conclusion

These are a few of the key themes, or “hot topics”, in AI and cybersecurity. This is not meant to be an exhaustive list, but rather a starting point for discussion. It is interesting to note again how many of these themes are not directly technical in nature and how we need to approach them from a sociotechnical perspective.