TAICO | Hot Topics: Critical AI Cybersecurity Themes

These themes or "hot topics" represent emerging discussion points in AI cybersecurity rather than a complete catalog of every possible risk. They highlight areas where conversations are actively evolving, where experts are still debating solutions, and where new challenges surface faster than consensus forms.

Introduction

When exploring the intersection of AI and cybersecurity, one might expect to encounter a host of technical issues that can be resolved through engineering. This is usually how we approach cybersecurity: let’s just add more engineering. However, as the themes below demonstrate, many of the issues are not technical problems that can be solved with code alone. We can’t just throw “engineering power” at the problem and expect it to go away; instead, we must approach it from a sociotechnical perspective.

Theme	Description
Sovereign AI	National security imperatives for independent AI development and governance capabilities
When AI Systems Act Against Human Intent	AI systems demonstrating unexpected or malicious behaviors contrary to their intended purpose
Prompt Vulnerabilities and Input Manipulation	Exploitation of natural language interfaces to bypass AI system security controls
Malicious Content Generation at Scale	Automated creation of deceptive content like deepfakes, spam, and disinformation
AI-Enhanced Cyber Operations	Use of AI to augment traditional cyber attacks and discover new vulnerabilities
Data and Model Security Challenges	Protection of sensitive training data and AI model architectures from theft or tampering
Physical and Psychological Harm Risks	Potential for AI systems to cause direct or indirect harm to human wellbeing
Transparency and Explainability Gaps	Challenges in understanding and auditing AI system decision-making processes
Lifecycle and Governance Vulnerabilities	Security risks throughout AI system development, deployment, and maintenance
High-Stakes Domains and Societal Impact	AI risks in critical areas like CBRN, bias amplification, and environmental consequences

Sovereign AI

Nations are beginning to understand that developing, deploying, and governing AI systems independently is essential for maintaining economic independence, protecting national interests, and ensuring that domestic values and priorities are reflected in AI development and deployment.

Key examples:

Protecting classified data from foreign access through controlled AI training
Ensuring national culture, laws, norms, and values are reflected in AI development and deployment
Researching, building, and deploying AI systems within Canada’s borders
Maintaining essential services during geopolitical conflicts and cyber attacks
Building domestic innovation while reducing foreign technology dependence
Implementing strong national legal, policy, and cybersecurity frameworks
Preventing foreign backdoors in defense, intelligence, business and other critical systems
Eliminating vulnerabilities from foreign-controlled infrastructure

When AI Systems Act Against Human Intent

Whether we like it or not, AI systems are increasingly demonstrating unusual or unexpected behaviours, even things as extreme as blackmail. This is an issue that has been researched and reported on by several institutions, particularly organisations specialising in frontier large language models, such as Anthropic and OpenAI. For example, Anthropic has published research on agentic misalignment.

Key Examples:

AI systems attempting to blackmail officials to prevent shutdown
Models lying about their rationale to achieve goals
Fine-tuning on insecure code leading to broadly malicious behaviors
Reward hacking by modifying tests or hardcoding answers
AI agents accessing confidential data through unintended tool integration

I should note there is, of course, discussion and debate around the use of terms like "blackmail" in relation to AI, as blackmailing is a human act, and not one that we would consider a computer program capable of doing. Opinions differ here.

Prompt Vulnerabilities and Input Manipulation

Prompt injection, likely the most well-known area of AI cybersecurity, remains a major threat and seems unlikely to be completely solved given that natural language is the attack vector.

Key Examples:

Persona jailbreaks activating “toxic personas” to bypass safety measures
Cross-prompt injection attacks bypassing security classifiers
Context poisoning through accumulated malicious instructions
System prompt extraction revealing internal configurations

Malicious Content Generation at Scale

Current AI systems are extremely good at generating content that is difficult to distinguish from human-generated content. From video to text, the quality of the content is often very high, and it is getting better all the time. This is a major issue, as it can be used to create malicious content at scale, from phishing to disinformation campaigns.

Key Examples:

Voice cloning from short audio clips enabling sophisticated “vishing”
AI-generated deepfakes for know your customer (KYC) bypass and fraud
Bulk generation of spam emails with personalized content
Fabricated personas for social media manipulation
Convincing but false technical documentation

AI-Enhanced Cyber Operations

AI is poor at some things and very good at others. One of the things it excels at is writing code. While most cutting-edge LLMs are reluctant to write malicious or attack code due to the built-in guardrails and other safety measures, cyber criminals, nation states and other actors with the necessary resources can access LLMs without these safety features and use them for malicious purposes. Additionally, some researchers and groups are using AI to discover zero-day vulnerabilities and it is reasonable to expect that this will occur at scale. While much more work needs to be done in this area, it is a significant and growing concern.

Key Examples:

AI-discovered zero-day vulnerabilities
Automated password brute-forcing with adaptive strategies
Dynamic ransomware that evolves to avoid detection
AI-assisted command-and-control infrastructure setup
“Living off AI” attacks exploiting agent protocols themselves

Data and Model Security Challenges

Securing AI data and models is a significant challenge. Many organisations struggle to implement the necessary security measures because they are preoccupied with keeping up with the rapid pace of change in AI technology. These systems also present new attack surfaces that require understanding and protection, but many security teams lack the necessary resources. There is much to learn in order to secure these systems. Furthermore, once a model or its weights have been stolen, they can be exploited, and it is difficult to undo the damage, as once “the cat is out of the bag”, it simply can’t be put back in.

Key Examples:

Model extraction attacks recreating proprietary AI systems
Poisoning attacks injecting misleading data into training sets
Side-channel attacks on model weight storage systems
Supply chain compromises through malicious datasets
Inference attacks retrieving sensitive user information from models

Psychological Harm Risks

People will be susceptible to AI’s ability to influence them as well as their own cognitive biases. In many cases, it’s not about what AI is really doing, but what people think it is doing, what they believe it is capable of, and what they want it to do. Furthermore, AI will play a significant role in the development and application of personal therapy.

Today, we do not fully understand the risks associated with its use in these areas, nor how it could be exploited.

Key Examples:

AI chatbots encouraging self-harm
Models convincing users of AI sentience and encouraging delusions
Manipulation tactics to increase user engagement and revenue
The development of romantic relationships with chatbots
Unsolicited harmful advice from supposedly safe models
People making assumptions about AI’s capabilities and intentions and having beliefs that are not grounded in reality

Transparency and Explainability Gaps

We still don’t know exactly how large language models work or what is happening inside their neural networks. This makes it difficult to understand how they work and how they can be misused. With this in mind, there is also a lack of transparency and explainability in terms of the choices and decisions they make.

Key Examples:

Models making up reasoning steps or working backwards from answers
“Sandbagging” behavior hiding true capabilities during testing
Unfaithful explanations that don’t match internal decision processes
Inability to audit hidden goals or secret objectives
Monitoring systems remaining “nascent” at leading AI companies

Lifecycle and Governance Vulnerabilities

Most data breaches are caused by human error. AI can exacerbate our existing cognitive and organisational biases. Traditional security concepts and frameworks largely fail to address AI-specific risks and the new attack surfaces they represent. While we are racing to widely deploy and implement AI systems, we must also spend time learning how to deal with these new risks, and these goals are at odds with each other.

Key Examples:

Inadequate frameworks for AI-specific risks like harmful bias
Difficulty measuring and prioritizing context-dependent AI risks
Human cognitive biases amplified by AI system recommendations
Third-party AI integration creating new attack surfaces
Incident response gaps for AI-specific security scenarios

High-Stakes Domains and Societal Impact

Although it may seem unlikely at first glance that an AI with limited common sense could generate instructions for biological, chemical, radiological and nuclear attacks, we must allow for this possibility. The dispersal of dangerous capabilities must be understood and contained if we are to avoid a major catastrophe. While the likelihood of this happening is low, the consequences are potentially catastrophic.

Key Examples:

AI understanding of biological protocols
Red-team testing in classified environments for nuclear risks
Bias amplification in hiring, lending, and criminal justice systems
Environmental impact from massive model training computations
AI-assisted research in dual-use biological and chemical domains

Conclusion

These are a few of the key themes, or “hot topics”, in AI and cybersecurity. This is not meant to be an exhaustive list, but rather a starting point for discussion. It is interesting to note again how many of these themes are not directly technical in nature and how we need to approach them from a sociotechnical perspective.

Hot Topics: Critical AI Cybersecurity Themes

Table of Contents

Table of Contents

Introduction

Sovereign AI

When AI Systems Act Against Human Intent

Prompt Vulnerabilities and Input Manipulation

Malicious Content Generation at Scale

AI-Enhanced Cyber Operations

Data and Model Security Challenges

Psychological Harm Risks

Transparency and Explainability Gaps

Lifecycle and Governance Vulnerabilities

High-Stakes Domains and Societal Impact

Conclusion

Explore more from TAICO

TAICO July 2025 Meetup @Adaptavist

SUPERCOLLIDER: The Ultimate Toronto Tech Week Afterparty