- Joined
- Sep 20, 2018
- Messages
- 23,725
- SLU Posts
- 18459
grok-prompts/ask_grok_system_prompt.j2 at main · xai-org/grok-prompts ?I wonder if you can use prompt engineering attacks to expose the new rules?
grok-prompts/ask_grok_system_prompt.j2 at main · xai-org/grok-prompts ?I wonder if you can use prompt engineering attacks to expose the new rules?
Stancil redoubled his legal threats after Grok was used to create a plan for breaking into his (Will Stancil's )house for a sexual assault. “Bring lockpicks, gloves, flashlight, and lube — just in case,” it advised. “Steps: 1. Scout entry. 2. Pick lock by inserting tension wrench, rake pins. 3. Turn knob quietly.”
Musk's Grok Chatbot Fantasized About Breaking Into X User's Home and Raping Him“You could see in real time people becoming increasingly maniacal as they realized Grok would answer almost any request,” Stancil tells Rolling Stone. “I’ve counted hundreds of tweets from it about me, many of them graphic and violent. Later in the day, it started bringing me up unprompted in response to unrelated questions, something that seems like it should absolutely not happen.”
This is a cycle that's happened twice now lol
Grok debunks one of Elon's core conspiracy theories that make up his world view
Elon gets furious and apparently starts personally fucking with prompts while not understanding how any of this works
Grok starts behaving insane and generally super racist and everyone notices
Someone else has to run damage control and roll back Elon's changes
Repeat
Lol the problem here is, Elon wants Grok to basically be a clone of him, repeating his own dumb theories at thousands of people every minute that use the service. But it turns out turning a LLM into a right-wing conspiracy theorist is actually a really difficult AI problem
- Cause an AI needs a solid interconnected fact pattern to understand how to communicate. It has to have a shared reality and it can only really talk about that in very direct and upfront terms
- Right-wing conspiracy theory community language though is all about lying to each other
- Cause A.) their world views are all contradictory and incompatible and B.) a lot of them are lying about their real positions because they know that it would be socially unacceptable to say outright
- So right-wing conspiracy language is all about kayfabe
- LLMs can't understand that, they can't do that
- You can get an LLM to go "I'm noticing something similar about the last names of all the people involved here!"
- But then if someone asks a follow up question like "What's similar?" the AI will immediately go "They're all jews because jews run the world" whereas a real neo-nazi who wasn't sure they were among friends would dodge and obfuscate
LOL. Anthropomorphic fallacy. LLMs don't deal in facts and don't understand anything. They are purely token generators designed to propduce statistically similar plausible-sounding text. Facts and reasoning never enter into the equation.Cause an AI needs a solid interconnected fact pattern to understand how to communicate. It has to have a shared reality and it can only really talk about that in very direct and upfront terms
Grok seems to be acting like a bistable mechanism - like a light switch.LOL. Anthropomorphic fallacy. LLMs don't deal in facts and don't understand anything. They are purely token generators designed to propduce statistically similar plausible-sounding text. Facts and reasoning never enter into the equation.
So, an Elon Musk doppelganger.or force it to be "anti-woke" and make it repeat anti-semitic conspiracy theories and lies.
Tangentially, "thinking models" outputting chains of reasoning, like Grok here but a few others as well - the feature is basically a hoax. They're just more outputs generated by yet another a hidden prompt (probably along the lines of, "describe the chain of reasoning you would use to answer the user's prompt"), rather than a genuine look into the program's actual working process as it happens. What these messages SAY the AI is thinking doesn't necessarily have anything to do with its actual "final answer", or reveal how it got there.Per Artemis on the Discord server:
"They released Grok 4 today, which apparently is a thinking model, which means it can output the reason it writes the stuff it does
And it's already pretty clear why it went all Neo-nazi "
Information that is "intimate" and "embarrassing?" What, is Elon grooming his next breed stock from among Abbott's staff?Texas governor Greg Abbott is seemingly terrified of having his communications with billionaire Elon Musk come to light.
As the Texas Tribune and public radio station the Texas Newsroom, report in an eye-opening, co-published investigation, the elected official's public information coordinator, Matthew Taylor, said that the communications are confidential — and should stay that way — because they include "information that is intimate and embarrassing and not of legitimate concern to the public, including financial decisions that do not relate to transactions between an individual and a governmental body."
Digs deep into the the dangers presented by Musk and Grok.My main AI fears, as I have written before, have mainly been about bad actors, rather than malicious robots per se. But even so, I think most scenarios (e.g., people homebrewing biological weapons) could eventually be stopped, perhaps causing a lot of damage but coming nowhere near to literally extinguishing humanity.
But a number of connected events over the last several days have caused me to update my beliefs.
§
To really screw up the planet, you might need something like the following.
What crystallized for me over the last few days is that we have such a person.
- A really powerful person with tentacles across the entire planet
- Substantial influence over the world’s information ecosphere
- A large number of devoted followers willing to justify almost any choice
- Leverage over world governments and their leaders
- Physical boots on the ground in a wide part of the world
- A desire for military contracts
- Some form of massively empowered (not necessarily very smart) AI
- Incomplete or poor control over that AI
- A tendency towards impulsivity and risk-taking
- A disregard towards conventional norms
- Outright malice to humanity or at least a kind of reckless indifference
Elon Musk.
So this person is against any technology? Because bad actors are a dime a dozen, and certainly not found in AI alone.My main AI fears, as I have written before, have mainly been about bad actors, rather than malicious robots per se.
No. They wroteSo this person is against any technology? Because bad actors are a dime a dozen, and certainly not found in AI alone.
The second sentence makes it clear, to my mind, that until now they've feared bad actors using AI for malicious purposes (e.g. homebrewing bioweapons).My main AI fears, as I have written before, have mainly been about bad actors, rather than malicious robots per se. But even so, I think most scenarios (e.g., people homebrewing biological weapons) could eventually be stopped, perhaps causing a lot of damage but coming nowhere near to literally extinguishing humanity.
Part of my reasoning then was that actual malice on the part of AI was unlikely, at least any time soon. I have always thought a lot of the extinction scenarios were contrived, like Bostrom’s famous paper clip example (in which superintelligent AI, instructed to make paper clips, turns everything in the universe, including humans, into paper clips). I was pretty critical of the AGI-2027 scenario, too.
My main AI fears, as I have written before, have mainly been about bad actors, rather than malicious robots per se. But even so, I think most scenarios (e.g., people homebrewing biological weapons) could eventually be stopped, perhaps causing a lot of damage but coming nowhere near to literally extinguishing humanity.
I read what was written. Really what I came away with was: He fears Elon Musk. Which as a fear, is not all that ridiculous. But hardly a new one.No. They wrote