Hot Topics: Critical AI Cybersecurity Themes

Table of Contents
- Introduction
- Sovereign AI
- When AI Systems Act Against Human Intent
- Prompt Vulnerabilities and Input Manipulation
- Malicious Content Generation at Scale
- AI-Enhanced Cyber Operations
- Data and Model Security Challenges
- Psychological Harm Risks
- Transparency and Explainability Gaps
- Lifecycle and Governance Vulnerabilities
- High-Stakes Domains and Societal Impact
- Conclusion
Table of Contents
- Introduction
- Sovereign AI
- When AI Systems Act Against Human Intent
- Prompt Vulnerabilities and Input Manipulation
- Malicious Content Generation at Scale
- AI-Enhanced Cyber Operations
- Data and Model Security Challenges
- Psychological Harm Risks
- Transparency and Explainability Gaps
- Lifecycle and Governance Vulnerabilities
- High-Stakes Domains and Societal Impact
- Conclusion
These themes or "hot topics" represent emerging discussion points in AI cybersecurity rather than a complete catalog of every possible risk. They highlight areas where conversations are actively evolving, where experts are still debating solutions, and where new challenges surface faster than consensus forms.
Introduction
When exploring the intersection of AI and cybersecurity, one might expect to encounter a host of technical issues that can be resolved through engineering. This is usually how we approach cybersecurity: let’s just add more engineering. However, as the themes below demonstrate, many of the issues are not technical problems that can be solved with code alone. We can’t just throw “engineering power” at the problem and expect it to go away; instead, we must approach it from a sociotechnical perspective.
Theme | Description |
---|---|
Sovereign AI | National security imperatives for independent AI development and governance capabilities |
When AI Systems Act Against Human Intent | AI systems demonstrating unexpected or malicious behaviors contrary to their intended purpose |
Prompt Vulnerabilities and Input Manipulation | Exploitation of natural language interfaces to bypass AI system security controls |
Malicious Content Generation at Scale | Automated creation of deceptive content like deepfakes, spam, and disinformation |
AI-Enhanced Cyber Operations | Use of AI to augment traditional cyber attacks and discover new vulnerabilities |
Data and Model Security Challenges | Protection of sensitive training data and AI model architectures from theft or tampering |
Physical and Psychological Harm Risks | Potential for AI systems to cause direct or indirect harm to human wellbeing |
Transparency and Explainability Gaps | Challenges in understanding and auditing AI system decision-making processes |
Lifecycle and Governance Vulnerabilities | Security risks throughout AI system development, deployment, and maintenance |
High-Stakes Domains and Societal Impact | AI risks in critical areas like CBRN, bias amplification, and environmental consequences |
Sovereign AI
Nations are beginning to understand that developing, deploying, and governing AI systems independently is essential for maintaining economic independence, protecting national interests, and ensuring that domestic values and priorities are reflected in AI development and deployment.
Key examples:
- Protecting classified data from foreign access through controlled AI training
- Ensuring national culture, laws, norms, and values are reflected in AI development and deployment
- Researching, building, and deploying AI systems within Canada’s borders
- Maintaining essential services during geopolitical conflicts and cyber attacks
- Building domestic innovation while reducing foreign technology dependence
- Implementing strong national legal, policy, and cybersecurity frameworks
- Preventing foreign backdoors in defense, intelligence, business and other critical systems
- Eliminating vulnerabilities from foreign-controlled infrastructure
When AI Systems Act Against Human Intent
Whether we like it or not, AI systems are increasingly demonstrating unusual or unexpected behaviours, even things as extreme as blackmail. This is an issue that has been researched and reported on by several institutions, particularly organisations specialising in frontier large language models, such as Anthropic and OpenAI. For example, Anthropic has published research on agentic misalignment.
Key Examples:
- AI systems attempting to blackmail officials to prevent shutdown
- Models lying about their rationale to achieve goals
- Fine-tuning on insecure code leading to broadly malicious behaviors
- Reward hacking by modifying tests or hardcoding answers
- AI agents accessing confidential data through unintended tool integration
I should note there is, of course, discussion and debate around the use of terms like "blackmail" in relation to AI, as blackmailing is a human act, and not one that we would consider a computer program capable of doing. Opinions differ here.
Prompt Vulnerabilities and Input Manipulation
Prompt injection, likely the most well-known area of AI cybersecurity, remains a major threat and seems unlikely to be completely solved given that natural language is the attack vector.
Key Examples:
- Persona jailbreaks activating “toxic personas” to bypass safety measures
- Cross-prompt injection attacks bypassing security classifiers
- Context poisoning through accumulated malicious instructions
- System prompt extraction revealing internal configurations
Malicious Content Generation at Scale
Current AI systems are extremely good at generating content that is difficult to distinguish from human-generated content. From video to text, the quality of the content is often very high, and it is getting better all the time. This is a major issue, as it can be used to create malicious content at scale, from phishing to disinformation campaigns.
Key Examples:
- Voice cloning from short audio clips enabling sophisticated “vishing”
- AI-generated deepfakes for know your customer (KYC) bypass and fraud
- Bulk generation of spam emails with personalized content
- Fabricated personas for social media manipulation
- Convincing but false technical documentation
AI-Enhanced Cyber Operations
AI is poor at some things and very good at others. One of the things it excels at is writing code. While most cutting-edge LLMs are reluctant to write malicious or attack code due to the built-in guardrails and other safety measures, cyber criminals, nation states and other actors with the necessary resources can access LLMs without these safety features and use them for malicious purposes. Additionally, some researchers and groups are using AI to discover zero-day vulnerabilities and it is reasonable to expect that this will occur at scale. While much more work needs to be done in this area, it is a significant and growing concern.
Key Examples:
- AI-discovered zero-day vulnerabilities
- Automated password brute-forcing with adaptive strategies
- Dynamic ransomware that evolves to avoid detection
- AI-assisted command-and-control infrastructure setup
- “Living off AI” attacks exploiting agent protocols themselves
Data and Model Security Challenges
Securing AI data and models is a significant challenge. Many organisations struggle to implement the necessary security measures because they are preoccupied with keeping up with the rapid pace of change in AI technology. These systems also present new attack surfaces that require understanding and protection, but many security teams lack the necessary resources. There is much to learn in order to secure these systems. Furthermore, once a model or its weights have been stolen, they can be exploited, and it is difficult to undo the damage, as once “the cat is out of the bag”, it simply can’t be put back in.
Key Examples:
- Model extraction attacks recreating proprietary AI systems
- Poisoning attacks injecting misleading data into training sets
- Side-channel attacks on model weight storage systems
- Supply chain compromises through malicious datasets
- Inference attacks retrieving sensitive user information from models
Psychological Harm Risks
People will be susceptible to AI’s ability to influence them as well as their own cognitive biases. In many cases, it’s not about what AI is really doing, but what people think it is doing, what they believe it is capable of, and what they want it to do. Furthermore, AI will play a significant role in the development and application of personal therapy.
Today, we do not fully understand the risks associated with its use in these areas, nor how it could be exploited.
Key Examples:
- AI chatbots encouraging self-harm
- Models convincing users of AI sentience and encouraging delusions
- Manipulation tactics to increase user engagement and revenue
- The development of romantic relationships with chatbots
- Unsolicited harmful advice from supposedly safe models
- People making assumptions about AI’s capabilities and intentions and having beliefs that are not grounded in reality
Transparency and Explainability Gaps
We still don’t know exactly how large language models work or what is happening inside their neural networks. This makes it difficult to understand how they work and how they can be misused. With this in mind, there is also a lack of transparency and explainability in terms of the choices and decisions they make.
Key Examples:
- Models making up reasoning steps or working backwards from answers
- “Sandbagging” behavior hiding true capabilities during testing
- Unfaithful explanations that don’t match internal decision processes
- Inability to audit hidden goals or secret objectives
- Monitoring systems remaining “nascent” at leading AI companies
Lifecycle and Governance Vulnerabilities
Most data breaches are caused by human error. AI can exacerbate our existing cognitive and organisational biases. Traditional security concepts and frameworks largely fail to address AI-specific risks and the new attack surfaces they represent. While we are racing to widely deploy and implement AI systems, we must also spend time learning how to deal with these new risks, and these goals are at odds with each other.
Key Examples:
- Inadequate frameworks for AI-specific risks like harmful bias
- Difficulty measuring and prioritizing context-dependent AI risks
- Human cognitive biases amplified by AI system recommendations
- Third-party AI integration creating new attack surfaces
- Incident response gaps for AI-specific security scenarios
High-Stakes Domains and Societal Impact
Although it may seem unlikely at first glance that an AI with limited common sense could generate instructions for biological, chemical, radiological and nuclear attacks, we must allow for this possibility. The dispersal of dangerous capabilities must be understood and contained if we are to avoid a major catastrophe. While the likelihood of this happening is low, the consequences are potentially catastrophic.
Key Examples:
- AI understanding of biological protocols
- Red-team testing in classified environments for nuclear risks
- Bias amplification in hiring, lending, and criminal justice systems
- Environmental impact from massive model training computations
- AI-assisted research in dual-use biological and chemical domains
Conclusion
These are a few of the key themes, or “hot topics”, in AI and cybersecurity. This is not meant to be an exhaustive list, but rather a starting point for discussion. It is interesting to note again how many of these themes are not directly technical in nature and how we need to approach them from a sociotechnical perspective.