Stephen and Chey Cobb: Independent Researchers

Wednesday, June 11, 2025

How jagged AI botches research: an in-depth example of artificial jagged intelligence at work

Annotated a screenshot of a Google AI-enhanced search result that is seriously incorrect

Annotated screenshot of Google AI making errors describing the Vienna computer virus

During a recent research project at the intersection of artificial intelligence (AI) and cybersecurity, I had occasion to refresh my memory about a computer virus from the 1980s known as the Vienna virus. So I put the words vienna and virus into Google Search. At first glance the result delivered by Google's AI Overview feature looked quite impressive. This is not surprising because this feature, hereinafter referred to as GAIO, is powered by Google's Gemini Language Model, one of the most expensive AI models ever built, with costs rivaling Open AI's GPT-4.

Sadly, the information about the Vienna virus that GAIO so confidently laid out was both laughably inaccurate and seriously troubling (as I explain in depth below). Whether you call this hallucinating or just plain "getting it wrong," it is important to know that today's AI can tell you things that aren't true, but in ways that make it seem like they are true.

Welcome to the rough and unready world of Artificial Jagged Intelligence

To be clear, millions of people and organizations are, right now, in 2025, using a technology that has been widely praised and promoted as exhibiting intelligence, yet keeps making dumb errors, the kind that in real life would be attributed to a serious lack of intelligence. Some of these errors have been trivialized as hallucinations because they mix up pieces of information that are real but combine them in a way that produces false information (see my 2024 LinkedIn article: Is your AI lying or just hallucinating?).

I find it both weird and troubling that currently many billions of dollars are being spent to market and deploy this flawed AI technology. You would think persistent errors and hallucinations would give the leading commercial entities behind AI cause to pause. But no, they keep marching onward in the hope of progress. However, they do have a new term for this state of affairs: Artificial Jagged Intelligence.

AI leaders have a new term [jagged] for the fact that their models are not always so intelligent

That's right, Google's billionaire CEO, Sundar Pichai, recently used the term "artificial jagged intelligence or AJI" to describe the current state of AI, saying: "...you can trivially find they make errors or counting R's in strawberry or something, which seems to trip up most models...I feel like we are in the AJI phase where [there's] dramatic progress, some things don't work well, but overall, you're seeing lots of progress."

(I find it weirdly refreshing yet deeply scary that the billionaire CEO of a trillion-dollar company said that about a technology which he and his employer appear to be pushing into homes and businesses as fast as they can.)

Getting back to the jagged AI response to my simple search query about the Vienna virus, I decided to investigate how it came about. Fortunately, I am my own employer and can afford to treat my interactions with AI as experiments. In this case the experiment became: Determine the extent to which GAIO understands the history and concepts of malicious code, and explore why it get things wrong?

Here's the short version of what follows:

Google's AI Overview is an example of Artificial Jagged Intelligence or AJI, which sometimes responds to user queries with information that is incorrect.
LLMs like ChatGPT and DeepSeek, also exhibit this behaviour and I give links to examples.
AIs may not check whether the facts they present are infeasible, even though they have been trained on data by which such infeasibility could be determined.
Some AIs, like GAIO and ChatGPT, don't seem to ingest corrections (errors pointed out by users may be acknowledged by the AI, but nevertheless repeated in the future).
GAIO seems to use sketchy source weighting that gives more credence to content on some websites than others.
This seems to be true of other widely used AIs.

Bottomline: It would be foolish to publish or repeat anything that the current generation of Artificial Jagged Intelligence systems tell you unless you have verified that it is accurate, fair, and true. Such a heavy risk/reward ratio casts doubt on the value of this technology. (See: Trump administration's MAHA Report AI Fiasco.)

Where's the Intelligence in this jagged AI?

The annotated screenshot at the top of this article shows what Google's AI Overview said about the Vienna virus back in April (n.b. in this article the term virus refers exclusively to viral computer code). If you are familiar with the history of malicious code you may guffaw when you read it. Here's why:

If the Vienna virus was found in 1987 it could not have been one of the first macro viruses beecause in 1987 macros were not capable of being viral.
The 1995 Concept virus is generally considered to be the first macro virus.
The Vienna virus did not display a "crude drawing of Michelangelo's David".
I can find no record of any virus creating a "crude drawing of Michelangelo's David.
There was a boot sector virus called Michelangelo that appeared in 1991, but it had nothing to do with the artist and got its name from the fact that it activated on March 6, which just happens to be the artist's birthday.

There is more bad news: GAIO's response when asked about the Vienna virus on June 1, nearly two months after the erroneous results in April, was just as erroneous:

Screenshot of AI output that contains errors

Clearly, GAIO is not getting more knowledgeable over time. This is troubling because Google's Gemini, the AI behind Google AI Overview, does appear to have an accurate understanding of Vienna and knows that it is notable in the history of cybersecurity, as you can see in this exchange on June 1:

Screenshot of accurate AI output

At this point you might be wondering why I asked AI about the Vienna. Well, technically, I didn't. I started out just doing a search in Google to refresh my memory of this particular piece of malicious code before I mentioned it in something I was writing (pro tip: don't ever publish anything about malicious code without first doing a fact-check; malware experts can be merciless when they see errors).

In responding to my search query, it was Google's idea to present the AI Overview information, produced with the help of it's incredibly expensive and highly resource intensive Gemini AI. The fact that it was so obviously wrong bothered me and I felt the need to share that upon which I had stumbled. Because I tend to see life as a series of experiments, when actions that I take lead to errors, problems, or failures, I try to learn from them.

Applied learning and cybersecurity

When Google gave me these problematic errors, I knew right away that I could use this learning in my AI-related cybersecurity classes. (These have become a thing over the past five years as I have researched various aspects of AI from a perspective informed by my cybersecurity knowledge which has been gradually accumulating since the 1980s.)

In the process of teaching and talking about cybersecurity and cybercrime in the 2020s I have realized that many students don't know a lot about the history of malicious digital technology and this can seriously undermine their efforts to assess the risks posed by any new technology, including AI.

For example, if you know something about the history of computer viruses, worms, Trojans and other malicious code, you will have an idea of the lengths to which some people will go to deceive, damage, disrupt, and abuse computers and the data they process. Furthermore, you will appreciate how incredibly difficult it is to foil aggressively malicious code and the people who spend time developing it.

Fortunately, I know a thing or two about the history of computer viruses and other forms of malicious code (collectively malware), as well as the antivirus products designed to thwart them. This is not just because I started writing about them back in the 1980s. As it happens, the best corporate job I ever had was working at ESET, one of the oldest antivirus firms and now Europe's largest privately held cybersecurity company. (Disclaimer: I have no financial connections to ESET and no financial incentive to say nice things about the company.)

Working at ESET from 2011 to 2019 I had the privilege of collaborating with a lot of brilliant people, one of whom, Aryeh Goretsky, was the first person that John McAfee hired, way back in 1989. Aryeh has since become a walking encyclopedia of antivirus lore and helped me with some of the details of Vienna here (but any errors in what I've written here are entirely mine).

Back in the 1980s, there were probably less than two dozen computer viruses "in the wild" — the industry term for malicious code seen outside of a contained/managed environment. However, some of these viruses in the wild were very destructive and efforts to create tools to defend computers against them—such as the software that became known as McAfee Antivirus—were only just gearing up.

One such effort had begun in 1987 in the city of Bratislava in what was then the Czechoslovak Socialist Republic, a satellite state of the Soviet Union. That's where two young programmers, Miroslav Trnka and Peter Pasko, encountered a computer virus that was dubbed "Vienna" because that is where people thought it originated.

There is considerable irony in the fact that an AI today can spout nonsense about a virus found back then, because Trnka and Pasko went on to create a company that did important early work with proto-AI technology, for reasons I will now explain.

The Actual Vienna Virus

What the actual Vienna virus did was infect files on MS-DOS PCs (personal computers which ran the Microsoft Disk Operating System). Specifically, it infected program files that had the .COM. filename extension. Here is a technical description from a relatively reliable source and as you can see it differs considerably from Google's flawed AI Overviews:

Vienna is a non-resident, direct-action .com infector. When a file infected with the virus is run, it searches for .com files on the system and infects one of them. The seconds on the infected file's timestamp will read "62", an impossible value, making them easy to find. One of six to eight of the files will be destroyed when Vienna tries to infect them by overwriting the first five bytes with the hex character string "EAF0FF00F0", instructions that will cause a warm reboot when the program is run. — Virus Encyclopedia

When the programmers Trnka and Pasko encountered this very compact yet destructive piece of viral code, they took a stab writing a program that could detect the code and thus alert users. And when Trnka and Pasko achieved a working version they shared it with friends. They called it NOD, which stands for: "Nemocnica na Okraji Disku ("Hospital at the end of the disk"), a pun related to the Czechoslovak medical drama series Nemocnice na kraji města (Hospital at the End of the City) —Wikipedia.

(To me, this name reflects the ethos of many early anti-virus researchers who felt that protecting computer systems was more like healthcare for IT than just another opportunity to make money off human weaknesses.)

List of Vienna virus variants

When new viruses appeared in the wild, the NOD software was updated, but the effort required to do this kept increasing as more virus code appeared in the wild. Some of that new code was variations of earlier code and Trnka and Pasko could see that attempting to identify viruses purely by comparing all new executable code to a growing database of known malicious code would not be a sustainable long-term strategy.

Indeed, if Google's AI was really clever, it would have noted that the proliferation of virus variants is one of the most notable facts about the Vienna virus. The list on the right shows some of the dozens of variants of Vienna that were discovered in the years after it first appeared. I think I'm right in saying that there are two main reasons for this:

The original Vienna virus was a relatively simple piece of code; and
In 1988 that code was made public, notably being published in a book. "Unfortunately the source code to this virus has been published in a book: Computer viruses: A High-Tech Disease which has resulted in multiple variants of the virus." — F-Secure virus directory

Getting back to the birth of the NOD antivirus software, in the late 1980s it was clear that antivirus programs could have significant commercial value, but back then the state of Czechoslovakia was not open to private enterprise because it was a satellite state of the Soviet Union.

Fortunately, by the end of 1992, the independent republics of Czech and Slovakia had come into existence and the makers of NOD created a Slovakian company called ESET, to market their antivirus as a commercial product. (ESET is the Czech word for Isis, the Egyptian goddess of health, marriage and love, reinforcing the idea that antivirus software is intended to keep computers healthy.)

By this time it was clear to the programmers and data scientists at ESET that their heuristic approach to identifying and blocking malware was the way to go, e.g. identifying unknown or previously unseen malware by analyzing code behavior, structure, or patterns.

As the 1990s rolled on and new forms of computer viruses, worms, and Trojan code appeared — such as the macro viruses mentioned earlier — ESET experimented with machine learning and then deep learning with neural networks to implement this heuristic approach to malware detection and response.

What's worse than being wrong? Not knowing why

Naturally, I learned a lot about the benefits and pitfalls of these foundational elements of artificial intelligence during my time as a researcher at ESET. I was fortunate to interact on a regular basis with some brilliant minds working on these AI-versus-malware experiments. I recall one particular presentation about seven or eight years ago that described a neural network achieving an almost perfect result when tasked with finding instances of malicious code hidden within a massive collection of mainly legitimate code.

I say 'almost perfect' because even though 100% of the malware was successfully identified — a very impressive result — there was one very troubling false positive, a piece of legitimate code falsely flagged as malicious. Bear in mind that 100% detection with zero false positives is the holy grail of malware detection, and this test came tantalizingly close. However, the data scientist presenting these results described them as disappointing and deeply troubling because nobody could figure out why the system deemed that particular piece of good code to be bad.

That was my first exposure to the twin problems that have been called Interpretability and Explainability: the ability to understand how an AI model makes decisions (interpretability), and the capacity to provide human-understandable explanations for a model's output, even if the model's inner workings are not transparent (explainability).

Eight years on from that memorable talk, the sorry saga of the Vienna virus proves that these two problems — together with a third: reproducibility — still plague some of the most widely used AI models, systems that cost hundreds of millions of dollars to build and maintain. The reality is that today's most widely used form of AI is seriously flawed.

Guessing the Root of an LLM GPT Problem

My best guess as to why the AI feature integrated into Google Search (GAIO) jaggedly spouted nonsense about the Vienna virus goes like this:

It is optimized for speed so it responds with the first 'hit' that it gets on the search topic IF that hit is confirmed by a second source.
It uses a constrained list of ranked sources that leans on platform reputation.
It doesn't refer to past interactions about the search topic.
It doesn't perform adequate logic checks on its response.

In the case of the Vienna virus, I think the first thing GAIO found was an error-filled article on LinkedIn. I am not going to name the person who wrote the article but here is what it said:

"The Vienna virus, was a computer worm that originated in Vienna, Austria and is considered one of the first macro viruses. It was spread via Microsoft Word documents via floppy disk. The virus would infect the document template, then replicate itself by creating new copies of infected documents on any floppy disks inserted into an infected machine."

Sounds familiar, right? And the source looks very credible, as you can see here:

Screenshot of a LinkedIn article that contains errors

As for a second source to confirm the first source, that was easy to find because much of the incorrect information from the LinkedIn article was repeated in an article titled "Viruses of the 80s" on a university website in July of 2024 (perdue.edu). Again, I'm not going to name the author, but they wrote, in part: "Originating in Vienna, Austria this virus spread by way of Microsoft Word documents via floppy disks." In other words, this is the Word macro error all over again.

Was this plagiarism? Hard to say. But given the date, it is possible that the 2024 article is based on AI-generated output that parrots the 2022 LinkedIn article. And because GAIO assumes factual validity without topic-based reasoning, errors that are obvious to humans can get compounded.

All of which raises serious questions about any serious use of AI, the large, publicly available models of which are clearly not to be trusted. Relying on them in any aspect of business or service delivery is asking for trouble unless it is within a comprehensove risk management framework that includes humans in the loop.

We saw this writ large in the Trump administration's Make America Healthy Again report, which appears to have relied heavily on AI without adequate human-in-the-loop risk management (see RFK Jr.’s Disastrous MAHA Report Seems to Have Been Written Using AI). This hugely embarrassing — and very public — AI-riddled publication exposed the issue of "hallucinated" references for the whole world to see.

(As noted earlier, when I encountered the citation issue in my own work in 2024 I documented it on LinkedIn, where it was seen by a significantly smaller audience that the whole world.)

I have also documented examples of popular AIs getting facts wrong even after when they have been corrected. You can see the video version of this on YouTube.

Hopefully, these examples will help people better understand the limitations of current AIs and why they must only be used with great care.

Friday, May 30, 2025

AI turned my 6,000 word academic paper into a 5-minute podcast, without asking

I got a disturbing surprise in my email inbox a few days ago when a message appeared saying: "An AI created a podcast of your paper "Mind This Gap:..."

Back in 2016, I did write a paper with a title like that, a 6,000 word article about a perceived shortage of people to adequately fill cybersecurity roles. And I presented that paper, a PDF of which you can download with this link, at that year's Virus Bulletin Conference in Denver, Colorado.

But I have never considered turning that paper into a 5-minute podcast and never have I asked anyone else to do so. That's why that email was a disturbing surprise. Even more disturbing is what I found when I clicked the link in the email to experience the podcast.

I was presented with an audio player below a garbled version of the paper's title, and what I heard when I clicked "Play" struck me as shockingly bad. I knew at once that I needed to share it. First, to check my reaction. Is it really as bad as it sounds, and I don't mean the audio quality, I mean the content and the delivery style. Please give a listen:

After I listened to the "podcast" there was a request for feedback from Academia, Inc. Out of five stars I gave it one, and in the Comments section I wrote:

This "podcast" is an appallingly bad piece of work and an atrocious waste of resources. It's a just piece of computer generated audio that lacks human review, a misleading and inaccurate fabrication delivered in a halting manner with a weird accent and banal choice of words. The whole thing is miles away from capturing the spirit, import, and stated facts of the work upon which it based. Furthermore, the value of the paper being abused for this nonsense if six years old and the recording makes no note of this. If I were to talk about this paper today it would only be in the context of how its findings have been heeded or not heeded since the time it was delivered. Stephen Cobb

If you're wondering how Academia, Inc. got hold of my paper in the first place, I am still trying to figure out exactly, but it was published on the Virus Bulletin website in 2016, not long after the conference in. For anyone not familiar with "academia.edu" it goes around finding papers and then asking authors to confirm their authorship. On the surface this is a service that can help academics build an online portfolio, and I have one (click here to view).

Yes, I did create a free academia.edu profile, and for a while I did pay to be a premium member. But I'm not a career academic at this point so I stopped paying the premium fee, partly because I was finding ResearchGate a more useful alternative.

But no, I did not, to the best of my knowledge ask, or give permission to, Academia, Inc. to allow or instruct an AI to make that thing it calls a podcast. And I suspect there may be other authors out there who are getting emails like this and wondering a. what the heck? and b. why me? and c. is it just me?

That's the second reason I immediately decided to share this experience, first on Bluesky, then more widely as soon as I can make the time to do so. On Bluesky I posted,

Attention Academics! And anyone who uses academia dot edu. The company behind this misleadingly named website just emailed me to say: "An AI created a podcast of your paper."

I included a copy of the screenshot that's at the top of this article along with a chunk of ALT text that reads in part:

The author of the paper did not ask for this to be made. To the best of the author’s knowledge they were not asked if they would like it to be made. Permission to make the audio was not requested or given. The creation of this audio by AI was entirely instigated and performed by Academia, Inc. The author of the paper, which is now nine years old, has listened to the audio and found it to be completely obnoxious: “It bears very little relation to the meat of the 6,000 word paper it is supposed to be analysing.” The author has asked Academia, Inc. not publish this monstrosity."

So what happens next? If you get one of these emails I suggest you open it and check out "your" podcast. When you get to the feedback page note the choices that Academia, Inc. appears to be offering in the form of either/or check boxes:

Either:

Add this AI Podcast to my public Academia profile. This will drive more visibility and downloads to your paper.

Do not display this AI Podcast on my Academia profile. We won't display this podcast publicly or generate any additional AI Podcasts for your papers.

Why do I say "appears to be offering? Because when you submit the form, you get this less than reassuring message: "The AI Podcast feature is not ready yet. Your podcast will [sic] private. Thank you for your feedback. The Academia Team"

All of which raises a LOT of questions. If I can find the time I will work on finding answers, but so far this is just another time-wasting interruption of my work, caused by someone who decided to mess with my work.

#AIEthics anyone?

Wednesday, April 23, 2025

2024 sets a record for cybercrime losses and at $16.6 billion it's a lot higher than I predicted

This chart of losses due to Internet crime per year from 2014 to 2024,
as reported to IC3/FBI, shows they have now reached $16.6 billion

The IC3 Annual Report 2024, an analysis of losses from Internet crimes reported to the FBI's Internet Crime and Complaint Center (IC3) during the past year, has just been published. And it's a shocker as the tabloids like to say.

But seriously, the total loss figure of $16.6 billion is a huge increase over 2023, a troubling jump of 33 percent in one year.

The 2024 total is $2 billion above my prediction last month of $14.5 billion (see Internet crime losses are on the rise). Follow this link to get the 2024 IC3 Annual Report, and all previous editions.

While there are some issues with using the IC3 numbers as crime metrics—they were not originally collected as an exercise in crime metrics—I am satisfied that over the years the IC3 reports have reflected real world trends in cybercrime's impact on victims, as measured by direct monetary loss (for more details, see this article: Advancing Accurate and Objective Cybercrime Metrics in the Journal of National Security Law & Policy).

In a future post, I will have more to say about this report and the other 2024 updates that have issued. In the meantime, if you need a professional, vendor-neutral opinion on what this report means for cybercrime and society in 2025 and beyond, feel free to DM @scobb.net on Bluesky or message me on LinkedIn.

More Internet crime stats from the IC3 Annual Report 2024

If you are looking to get some perspective on who makes Internet crime complaints made to IC3 there are several helpful breakdowns in the IC3 Annual Report 2024. Below you can see the top end of complaints by age group. Both the number of complaints and the amount lost are much higher for the 60+ demographic.

I think this reflects three things: a higher level of vulnerability among older folks; a concentration of wealth among the elderly; and the criminal logic of intentionally targeting of the wealthier and more vulnerable.

Another interesrting breakdown is the type of crime about which people file complaints, broken down by number of complaints and amount lost.

As always, a big shout out to the folks at IC3/FBI who work so diligently to put these reports together each year, not to mention responding to citizen complaints all year long. A fine example of how much valuable information and service the public and companies receive as a result of federal spending.

Saturday, March 15, 2025

Internet crime losses are on the rise, but how fast? We could get latest IC3 stats as soon as this week ... or not

UPDATE, April 4, 2025 — After writing this article last month (March, 2025), I realized that the focus of the article, the IC3 annual reports, do not always come out in March, as I had stated. In fact, for the past 10 years, the median publication date for these reports has been April 13.

Screenshot of a small table in Excel that shows the date on which the IC3 reports were published for the years 2014 through 2023

This became clear when I went back through my archives and checked the dates of the reports for years 2014 to 2013. I put these in a spreadsheet — see screenshot on the left — and for 2024 I calculated the median date, which turns out to be April 13.*

In my defense, the last five reports did appear before the median, with three in March, one in early April, and one in February.

So where does that leave us? Waiting for the report on Internet crime losses for 2024 which could arrive any day between now and — checks table — the middle of June!

Original Article: More and more people are losing more and more money to cyber-enabled criminals, or at least that's the way it seems to many of us. Unfortunately, solid metrics on cybercrime are hard to find, a topic that I explored in depth in this article: Advancing Accurate and Objective Cybercrime Metrics, Journal of National Security Law & Policy.

But as serious cybercrime watchers in the US will know, in March* of every year, one set of numbers is released that has stood the test of time: the IC3 Annual Report, an analysis of losses from Internet crimes reported to the FBI's Internet Crime and Complaint Center. While there are some issues with using the IC3 numbers as crime metrics—they were not originally collected as an exercise in crime metrics—I am satisfied that the IC3 reports reflect real world trends in cybercrime's impact on victims, as measured by direct monetary loss (for more details see the previously mentioned article).

The first of these reports was published in 2002 as the Internet Fraud Complaint Center (IFCC) 2001 Internet Fraud Report. I keep a PDF copy of that one on my hard drive, along with all the others since. In recent years the full title has been something like The Federal Bureau of Investigation Internet Crime and Complaint Center (IC3) Internet Crime Report.

As I write this, on March 15, 2025, I am eagerly awaiting the latest IC3 annual report, the one that shows Internet crime losses in 2024. When it comes out, I will update the graph at the top of this article. This charts the dramatic annual increase in losses over the last 10 years.

The full story, which begins at the start of this century, is even more dramatic. In 2001, losses reported to IC3 were less than US$20 million, and it took 14 years for them to reach US$1 billion. However, it took half that time to blow through US$10 billion in 2022—that's 10X in seven years. Clearly, the figure is heading for US$15 billion. Did it get there in 2024? I'm hoping not, and my guess is it will hit US$14.5 billion in the 2024 report.

I encourage you to check back here to see if I was right. Of course, it would be great if the number was substantially less than US$14.5 billion. In the meantime, I am keeping my fingers crossed that the IC3 report has not become a victim of the massive upheaval in federal agencies, ushered in by President Trump and executed by billionaire technocrat Elon Musk.

(Please feel free to DM @zcobb.bsky.social if you know how things are going at IC3.)

Thursday, October 24, 2024

Welcome to Online: risks, harms, and duty of care in the virtual high crime neighborhood we all inhabit

Author-generated photo montage of a "Welcome to Online" sign in front of photo of broken windows in Stehli Silk Mill, Lancaster County, Pennsylvania, a public domain image thanks to Wikimedia user Smallbones

Welcome to Online (see Alt text for credit)

Is the constant news of fresh cybercrimes getting you down?

Has your personal information been shared with criminals, again?
Are you sick of cybersecurity warnings and
annoying digital security measures?

Welcome to Online, a place that is both risky and unhealthy,
a worldwide high crime neighborhood,
out of which it is very hard to move.

Criminals have made Online a high crime neighborhood

Today, most of us have an online identity. We not only spend time online, our digital selves persist even when we are not actively using digital devices. Part of us now lives, and sometimes works, in a virtual neighborhood, a non-physical space we can we refer to as Online.

Sadly, Online is a place where many crimes are committed. Warnings about crime, evidence of past crimes, and measures to prevent crime: all of these are seen and encountered all over Online. Today, Online can reasonably be described as what social scientists call a "high crime neighborhood."

Unfortunately, Online is not only a high crime neighborhood, but it is a place in which we are increasingly forced to spend time, and out of which it is hard to move. And that is a serious problem because high crime neighborhoods are known to be bad for human health.

That's right, we already know for a fact that living in physical neighborhoods with high crime rates is not healthy. Residents of high crime neighborhoods suffer more health problems and die younger as a result. This has been researched and documented over many years by criminologists, epidemiologists, doctors, population health experts, and environmental health scientists.

I recently described this reality and the science behind it in a talk at Cyberhagen 2024, an annual cybersecurity conference in Copenhagen, Denmark. The title of the talk is: From Frontlines to Lifelines: How reducing cybercrime would make life healthier for us all. You can watch it here or on YouTube. (Feel free to skip to 8 minutes and 39 seconds if you want to dive right in.)

I have also made a handy page with a link to some of the related work I have been doing on this problem: Cybercrime & Health. If you want a short URL to share thex page, you can use tinyurl.com/cyberharm.

Why it's risky to tell people "just go online"

To be clear, if you have a smartphone, email address, or Internet account, then you have an online identity, you have a presence online. This identity persists even when you are not using or connected to the Internet.

That means there is 7x24 risk that digitally savvy criminals will target you, your devices, and your accounts. They may want to steal your money, take over your accounts, ransom your data, enroll your devices in criminal schemes, and so on. The threat of this happening does not go away when you log off and disconnect.

Yet, despite this state of affairs being well documented, many organizations still use the phrase "just go online" as though Online is a place that offers nothing but helpful and enjoyable experiences. Furthermore, some institutions are now requiring people to go online. This is the case in England where it is not uncommon for medical patients to be told they have to go online to book blood tests or "use the app" to order repeat prescription medication.

If you think about it, inviting or requiring people to go online is similar to some activities in the physical world. For example, when a hotel invites people to spend time on its premises it creates a responsibility to those people; this is commonly referred to as "a duty of care."

In many countries, it is established in law that hotels have a duty to take reasonable steps to ensure that their premises are safe, secure, and free from foreseeable risks that could result in injury or harm to guests. Hotels also have a duty to provide reasonable security measures to protect guests from criminal acts. A hotel that fails to meet these duties could be exposed to legal claims for compensation by injured or aggrieved guests.

Similarly, a duty of care is created when an employer sends an employee on a business trip. In fact, a duty of care exists in many areas of modern, and I think it is reasonable to make going online another such area.

In summary, it is my belief that a duty of care already applies to any entity that encourages or requires a person to go online. All that is missing is the right law or lawsuit to make this a concrete reality, one that can then be used to encourage or require serious upgrades in cybersecurity posture across society. In addition, this would create a new regulatory risk that companies would have address.

Saturday, July 20, 2024

Global IT Outages and Monoculture: The “potato famine theory” of information system insecurity

Painiting titled "An Irish Peasant Family Discovering the Blight of their Store" by Daniel MacDonald

Painting: An Irish Peasant Family Discovering the Blight of their Store, by Daniel MacDonald

The following article explains the problem of monoculture in IT systems, one of the root causes of the Global IT Outage of July 19, 2024. The article was originally published in August of 2003. Back then, Chey Cobb and I were writing a weekly cybersecurity column for the digital publication Newsscan (now defunct).

In a column titled "Of Potatoes and Worms" we used the classic example of monoculture—the Irish Potato Famine—to explain why relying on one company or one operating system for all your IT needs creates a potentially catastrophic level of vulnerability to software-specific threats, such as as computer worms, viruses, supply chains attacks, and of course, bugs in software updates (c.f. Crowdstrike). We hope you find it helpful.

Of Potatoes and Worms
by Chey Cobb, CISSP
and Stephen Cobb, CISSP
August, 2003

During the last two weeks, the world has witnessed hundreds of thousands of computer systems falling prey to worms. As we write this, the Sobig-F worm is reaching epidemic proportions, threatening to rival the 2000 Love Bug outbreak in terms of disruption wrought. We give you just one example, a good friend of ours who headed to France this week for a vacation: after the flight from LA to Paris he turned on his handheld computer to check email and found 500 infected messages waiting.

A lot has been written on this topic, but we haven’t seen many references lately to the “potato famine theory” of information system insecurity. This theory is a favorite of ours and it holds that a lack of diversity in software can be a dangerous thing, at either the enterprise or the national level. This might ring some bells right now if you are a CIO responsible for tens of thousands of Microsoft Windows or Outlook users.

The theory gets its name from a tragic chain of events that struck the island of Ireland in 1845, killing—by some estimates—more than a million people. At that time, potatoes were the primary source of food for most people living there, due to the fact that potatoes produce more calories per acre than another other crop you can grow in that climate (back then, most people did not have a lot of land to work with because land use was controlled by English landlords, many of whom were, to say the very least, selfish).

In fact, almost all the potatoes grown in Ireland at that time were of one particular strain, a strain that had been found to produce the most calories per acre. So when a potato fungus arrived in Ireland—possibly from somewhere in the Americas—its impact on the crop was exacerbated by the lack of diversity among potato strains. While some potato strains are more resistant to the fungus than others, the dominant strain in Ireland at that time was not one of them. [See: Great Famine: https://en.wikipedia.org/wiki/Great_Famine_(Ireland)]

The information system security analogy is this: reliance by an information system on one application or operating system, to the exclusion of others, [a monoculture] reduces the ability of that system to survive a vulnerability in that operating system or application.

Consider an organization that is using nothing but Microsoft products versus one that uses a mix of applications and operating systems. The Microsoft-only shop is more likely to have experienced widespread negative effects due to last week’s Blaster worm (which exploited a security hole in the Windows operating system) and this week’s Sobig-F worm (which exploits a Microsoft Outlook vulnerability)

We’re not sure how many people today are familiar with the Irish potato famine, so “fossil fuel dependence theory” might be a better term. The implications are the same: dependence on a single source of energy, or software, has inherent risks. What we particularly like about both analogies is that they encompass economics and politics as well as strategy and logistics.

The Irish were not growing that single dominant strain of potato because it tasted better than others—apparently it did not—they were growing it because the politics and economics of the time made maximum yield appear to be the highest good. America’s dependence on fossil fuel and a single source of software also has economic and political elements (prices have been relatively low, producers politically powerful, and so on).

Obviously, the dominance of Microsoft products in operating system and application areas has its own economic and political angles. However, while the reasons for Microsoft’s dominance, and the extent of the negative impact of that dominance on other companies, have been hotly debated, very few people have voiced the following argument: Regardless of how secure or insecure Microsoft software is—or has been, or becomes—we think that using it, or any other single source, to the virtual exclusion of all others, will never be good security.

In other words, even if Microsoft’s Trustworthy Computing initiative succeeds in making the company’s products more secure than they are right now, it would still be foolhardy for any organization to adopt them as a universal standard. Unfortunately, our opinion is not shared by the Department of Homeland Security and other 3LA’s that had best remain nameless.

And just to show how fair and balanced our coverage is, we will say the same of Adobe’s Acrobat format. This grows more powerful with each version. We use it. We love it’s convenience and the fact that most people with whom we communicate can read Acrobat documents. But the extent to which some government agencies are relying on it is now approaching scary.

Notes:

1. Portions of this column first appeared in a lecture we delivered in 2002 as part of the Master of Science program in Information Assurance at Norwich University, Vermont.

2. Crowdstrike has assured customers and the public that their software update, which led to the global IT outage of July 19, 2024, was not malicious. However, it is remains to be seen if this assertion will be confirmed by independent analysis.

3. The attack technique of placing malicious code in a software update has been used for many years, notably in the 2017 Wannacry incident that took down hundreds if thousands of systems and cost companies billions of dollars. Ironically, Wannacry did not impact organizations that were protected by some brands of endpoint protection software, the same category of software as Crowdstrike Falcon. [Disclaimer: In 2017, I was working for ESET, one of those brands that stopped Wannacry.]

Monday, April 01, 2024

Internet crime keeps on growing, as do efforts to understand the harm it causes

Internet crime losses 2014-2023, as reported to IC3/FBI,
and compiled by S. Cobb

Losses from Internet crimes reported to the FBI's Internet Crime and Complaint Center in 2023 rose 22% above the record losses in 2022.

This means that 2023 set a new annual record, just north of $12.5 billion, according to the press release announcing the latest IC3 annual report (PDF).

About the only good thing you can say about this news is that the annual Internet crime loss figure rose by only 22% in 2023. That is less than half the 49% increase in in 2022, which was well below the 64% surge in 2021. However, before anyone gets too optimistic, take another look at the chart at the top of the page.

While there have been several years this century in which rate of increase in losses to Internet crime has slowed down, I see the general direction over the last decade as fairly relentlessly upward. And this is despite record levels of spending on cybersecurity and cybercrime deterrence.

This time last year I discussed the implications of these trends in an article over on LinkedIn. That was written in the hope that more people will pay attention to the increasingly dire state of Internet crime prevention and deterrence, and how that impacts ordinary people. At the start of this year, I wrote about the implications of digitally-enabled fraud reaching record levels, framing this as a public health crisis.

During 2023, I delivered and recorded a well-received talk on cybercrime as a public health crisis. Here is the video, hosted on YouTube.

The talk was originally delivered at the Technical Summit and Researchers Sync-Up 2023 in Ireland. The event was organized by the European arm of APWG, the global Anti-Phishing Working Group. (Talks at that event were not recorded, so I made this recording myself; sadly, it lacks the usual gesticulation and audience interaction of my live delivery, but on the plus side you can speed up the playback on YouTube.)

Also sad is the fact that, due to carer/caregiver commitments, I had to cancel delivery of the next stage of my research at APWG's Symposium on Electronic Crime Research 2023 (eCrime 2023).

On the bright side, I did manage to write up my ideas in an article on Medium: Do Online Access Imperatives Violate Duty of Care? There I started building my case that exposure to crime online causes harm even to those who are not directly victimized by it, much in the same way that living in a high crime neighbourhood has been proven—by criminologists and epidemiologists—to be bad for human health. Basically, the article made four assertions:

going online exposes us to a lot of crime,
high crime environments are unhealthy,
governments and companies that make us go online may be breaching their duty of care,
there is an urgent need to reduce cybercrime and increase support for cybercrime victims.

To explain these assertions I introduced my "Five levels of crime impact in meatspace and cyberspace" which are captured in this table:

Screenshot of Cobb's Five levels of crime impact in meatspace and cyberspace

I also introduced my take on a concept used by environmental exposure scientists and epidemiologists: the exposome. A key role of the exposome is to help us acknowledge and account for everything to which we are exposed in our daily lives that may affect our health.

My article proposed using online exposome as a term for everything that individuals are exposed to when they go online. This builds on thinking by Guillermo Lopez-Campos et al. (2017) that there is a "digital component of the exposome derived from the interactions of individuals with the digital world."

In summary, as we look over the latest tabulation of reported financial losses due to Internet crimes I think we need to bear in mind that these are only a fraction of the total number of such crimes, and monetary loss is only a fraction of the harm these crimes cause. The stress and anxiety of victims has to be taken into account, as does the deleterious effect of having to spend time online where we are constantly exposed to, and reminded of, the many different ways in which digital technologies and their users are being abused.

Postscript: Not all the news about online crime is bad. The last 12 months have seen some very impressive anti-cybercrime law enforcement efforts all around the world, including the recent disruption of "the world’s most harmful cyber crime group." I applaud those efforts and encourage governments to fund more of them. Here's to a drop in Internet crime losses in 2024!

Wednesday, November 29, 2023

QR code abuse 2012-2023

QR code abuse is in the news again—see the list of headlines below—whch reminds me that I first wrote about this in 2012 (eleven years ago). Back then I made a short video to demonstrate one potential type of abuse, tricking people into visiting a malicious website:

As you can see from this video, there is plenty of potential for hijacking and misdirection via both QR and NFC technology, and that potential has existed for over a decade. In fact, this is a great example of how a known technology vulnerability can linger untapped for over a decade, before all the factors leading to active criminal exploitation align.

In other words, just because a vulnerability has not yet been turned into a common crime, does not mean it never will be. For example, the potential for ransomware attacks was there for many years before criminals turned it into a profitable business. Back in 2016, I suggested that combining ransomware with the increasing automation of vehicles would eventually lead to a form of criminal exploitation that I dubbed jackware. As of now, jackware is not a thing, but by 2026 it well might be.

Here are some recent QR code scam headlines:

The QR code scam leaving victims thousands out of pocket
Woman targeted in £13k railway station QR code scam
QR code warning: Cybersecurity experts report alarming rise in 'quishing' scam
QR code scams on the rise during festive celebrations; here’s how to be safe

Saturday, November 04, 2023

Artificial Intelligence is really just another vulnerable, hackable, information system

Recent hype around Artificial Intelligence (AI) and the amazingly good and bad things that it can and may do has prompted me to remind the world that:

Every AI is an information system and every information system has fundamental vulnerabilities that make it susceptible to attack and abuse.

The fundamental information system vulnerabilities exist regardless of what the system is designed to do, whether that is processing payments, piloting a plane, or generating artificial intelligence.

Fundamental information system vulnerabilities put AI systems at risk of exploitation and abuse for selfish ends when the ‘right’ conditions arise. As a visual aid, I put together a checklist that shows the current status of the five essential ingredients of an AI:

Checklist that shows the current status of the five essential ingredients of an AI

Please let me know if you think I'm wrong about any of those checks and crosses (ticks and Xs if you prefer).

Criminology and Computing and AI

According to routine activity theory in criminology, the right conditions for exploitation of an information system, such as an AI, are as follows:

a motivated offender,
a suitable target, and
the absence of a capable guardian.

A motivated offender can be anyone who wants to enrich themselves at the expense of others. In terms of computer crime this could be a shoplifter who turned to online scamming (an example personally related to me by a senior law enforcement official in Scotland).

In the world of computing, a suitable target can be any exploitable information system, such as the payment processing system at a retail store. (Ironically the Target retail chain was the target of one of the most widely reported computer crimes of the last ten years.)

In the context of information systems, the absence of a capable guardian can be the lack of properly installed and managed anti-malware software, or an organization's failure to grasp the level of risk inherent in the use of digital technologies.

When it comes to information systems that perform artificial intelligence work, both the good and bad uses of AI will motivate targeting by offenders. The information systems at Target One were hit because they contained credit card details that could be sold to people who specialize in fraudulent card transactions. An AI trained on corporate financial data could be targeted to steal or exploit that data. An AI that enables unmanned vehicles could be targeted for extortion, just as hospital and local government IT systems are targeted.

Do AI fans even know this?

One has to wonder how many of the CEOs who are currently pushing their organizations to adopt AI understand all of this. Do they understand that all five ingredients of AI are vulnerable?

Perhaps companies and governments should initiate executive level AI vulnerability awareness programs. If you need to talk to your execs, it will help if you can give them vulnerability examples. Here's a starter list:

Chips – Meltdown, Spectre, Rowhammer, Downfall
Code – Firmware, OS, apps, viruses, worms, Trojans, logic bombs
Data – Poisoning, micro and macro (e.g. LLMs and SEO poisoning)
Connections – Remote access compromise, AITM attacks
Electricity – Backhoe attack, malware e.g. BlackEnergy, Industroyer

Whether or not vulnerabilities in one or more of these five ingredients are maliciously exploited depends on complex risk/reward calculations. However, execs need to know that many motivated offenders are adept at such calculations.

Execs also need to understand that there is an entire infrastructure already in place to monetize vulnerability exploitation. They are sophisticated markets in which to: sell stolen data, stolen access, stolen credentials; and buy or rent the tools to do the stealing, ransoming, etc. (see darkweb, malware as a service, botnets, ransomware, cryptocurrency, etc.).

As I see it, unless there is a sudden, global outbreak of moral rectitude, vulnerabilities in AI systems will—if they are not capably guarded—be exploited by motivated offenders.

Internet crime losses reported to IC3/FBI

For a sense of how capable guardianship in the digital realm is going, take a look at the rate at which losses due to Internet crime have risen in the last 10 years despite of record levels of spending on cybersecurity.

Attacks will target AI systems used for both "good" and "bad" purposes. Some offenders will try to make money attacking AI systems relied upon by hospitals, schools, companies, governments, military, etc. Other offenders will try to stop AI systems that are doing things of which they don’t approve: driving cars, taking jobs, firing weapons, educating children, making movies, exterminating humans.

Therein lies one piece of good news: we can take some comfort in the likelihood that, based on what has happened to every new digital technology in the last 40 years, AI systems will prove vulnerable to exploitation and abuse, thus reducing the chances that AI will be able to wipe us all out. Of course, it also means AI is not likely to make human life dramatically better.

Note: This is a revised version of an article that first appeared in November of 2023.

Wednesday, April 12, 2023

What is ChatGPT and how can AI get things wrong: an annotated example using jackware

An example of ChatGPT giving a wrong answer

You can't trust what ChatGPT says

ChatGPT is, as you probably know, a computer system that uses artificial intelligence (AI) to answer questions. Sometimes the answers it gives are wrong, and that's the short version of this article. The long version explains more about what ChatGPT is, with a detailed look at an example of how wrong it can be.

Here's how Mashable describes ChatGPT in Everything you need to know about ChatGPT: "in essence, a simple online artificial intelligence chatbot created by OpenAI in December 2022." Technically speaking, ChatGPT describes itself as "a language model developed by OpenAI, based on the GPT (Generative Pre-trained Transformer) architecture...designed to understand and respond to natural language queries and generate human-like text in response."

If you check out ChatGPT online it can come across as a chatty version of a search engine, but it can do things that search engines don't, such as put together plausible explanations of phenomena, as opposed to simply listing search results relating to them. For example, suppose you encounter the word jackware and wonder what it means. You could put the word into a search box and get a bunch of responses, like this:

If you are lucky, one or more of the search results will give you an idea of what the search term means. In this case we can see that "The goal of a jackware attack is to lock up the hijacked device until you pay" (which happens to be correct).

But what if you're writing an article or essay about jackware and want to know who came up with the term and when it was coined? I couldn't find that information in the search results for jackware, and looking for it took time. In my experience, search engines often fail to answer questions like this, as you can see from the following example. When asked who coined the term jackware, Google provides one link to a story about jackware, but all the rest of the results—many thousands of them—assume that jackware is just a typo for Jaguar, as in the car, not the animal:

Screenshot of Google assume jackware is a typo for jaguar

Personally, I find the jackware/jaguar assumption very amusing—for reasons I will explain later—but first, let's see how ChatGPT gets on when asked the same question: Who coined the term jackware? Here's the answer ChatGPT provides:

The term "jackware" was coined by two researchers, Stefan Savage and Tadayoshi Kohno, in a 2016 paper titled "Experimental Security Analysis of a Modern Automobile". In the paper, Savage and Kohno described a proof-of-concept attack in which they were able to take control of a car's electronic systems using a laptop computer. They used the term "jackware" to describe a new class of malware that could be used to take control of vehicles in a similar way to how ransomware is used to take control of computer systems.

Frankly, at first glance this looks like is a very impressive answer: it names names and cites an academic source, while providing a helpful description of jackware, all articulated in grammatically correct English. It's way more helpful than the output of a plain old search. You can see why so many people get so excited about artificial intelligence.

But sadly, several key facts in ChatGPT's answer are just plain wrong, so wrong that you could be in serious trouble with your teacher-editor-readers if you pasted that paragraph of ChatGPT output into a piece of work. For a start, anyone familiar with the subject matter would know that you hadn't read that paper.

The academic paper to which ChaGPT refers was published in 2010, not 2016. You can see that from this listing of the paper. This is not just a pedantic quibble; the named paper is legendary in the world of automotive cybersecurity, partly because it was published way back in 2010. It documents groundbreaking work done by Savage et al. in the 2000s, way before the flashy Jeep hack of 2015 by Miller and Valasek.

More blatantly erroneous is the identification of this 2010 paper and its authors as the source of the term jackware. Simply put, the paper does not contain the word jackware. In fact, the person who coined the term jackware to describe malicious code used to take over vehicles, was me, Stephen Cobb, and I did that in May of 2016, on this blog, in a post titled: Jackware: coming soon to a car or truck near you?

In July of 2016, I penned Jackware: When connected cars meet ransomware for We Live Security, the award-winning global cybersecurity blog. As further evidence, I present exhibit A, which shows how you use can iterative time-constrained searches to help identify when something first appears. Constraining the search to the years 1998 to 2015, we see that no relevant mention of jackware was found prior to 2016:Apparently, jackware had been used as a collective noun for leather mugs, but there are no software-related search results before 2016. Next you can see that, when the search is expanded to include 2016, the We Live Security article tops the results:

So how did ChatGPT get things so wrong? The simple answer is that ChatGPT doesn't know what it's talking about. What it does know is how to string relevant words and numbers together in a plausible way. Stefan Savage is definitely relevant to car hacking. The year 2016 is relevant because that's when jackware was coined. And the research paper that ChatGPT referenced does contain numerous instances of the word jack. Why? Because the researchers wisely tested their automotive computer hacks on cars that were on jack stands.

To be clear, ChatGPT is not programmed to use a range of tools to make sure it is giving you the right answer. For example, it didn't perform an iterative time-constrained online search like the one I did in order to find the first use of a new term.

Hopefully, this example will help people see what I think is a massive gap between the bold claims made for artificial intelligence and the plain fact that AI is not yet intelligent in a way that equates to human intelligence. That means you cannot rely on ChatGPT to give you the right answer to your questions.

So what happens if we do get to a point where people rely—wisely or not—on AI? That's when AI will be maliciously targeted and abused by criminals, just like every other computer system, something I have written about here.

Ironically, the vulnerability of AI to abuse can be both a comfort to those who fear AI will exterminate humans, and a nightmare for those who dream of a blissful future powered by AI. In my opinion, the outlook for AI, at least for the next few decades, is likely to be a continuation of the enthusiasm-disillusionment cycle, with more AI winters to come.

--------------^-------------

Note 1: For more on those AI dreams and fears, I should first point out that they are based on expectations that the capabilities of AI will evolve from their current level to a far more powerful technology referred to as Artificial General Intelligence or AGI. For perspective on this, I recommend listening to "Eugenics and the Promise of Utopia through Artificial General Intelligence" by two of my Twitter friends, @timnitGebru and @xriskology. This is a good introduction the relationship between AI development and a bundle of beliefs/ideals/ideas known as TESCREAL: Transhumanism, Extropianism, Singularitarianism, Cosmism, Rationalism, Effective Altruism, Longtermism.

Note 2: When I first saw Google take jackware to be a typo for Jaguar I laughed out loud because I was born and raised in Coventry, England, the birthplace of Jaguar cars. In 2019, when my mum, who lives in Coventry, turned 90, Chey and I moved back to Coventry, and that is where I am writing this. Two of my neighbours drive Jaguars and they are a common sight in this neighbourhood, not because it's a posh part of the city, but because a lot of folks around here work at Jaguar Land Rover and have company vehicles.

Pages