Transcript

5rOVb98vsLs • Claude Opus 4 5 Is INSANE — Beats Human Programmers & Costs 70% Less
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/BitBiasedAI/.shards/text-0001.zst#text/0203_5rOVb98vsLs.txt
Back Raw
Kind: captions
Language: en
You're probably thinking all these AI
models are basically the same. Maybe
you've tried Claude, ChatgPT, Gemini,
and you're wondering if any of them are
actually worth the money. Well,
Anthropic just dropped Claude Opus 4.5.
And here's what caught my attention.
This model scored higher than any human
programmer has ever scored on one of the
toughest coding exams in the industry.
Yeah, you heard that right. It beat
actual human experts. Welcome back to
bitbiased.ai, AI, where we do the
research so you don't have to. Join our
community of AI enthusiasts with our
free weekly newsletter. Click the link
in the description below to subscribe.
You will get the key AI news, tools, and
learning resources to stay ahead. So, in
this video, I'm going to break down
exactly what makes Claude Opus 4.5
different, why developers and businesses
are calling it a gamecher, and most
importantly, how you can actually use it
in your own projects to get better
results without breaking the bank. By
the end, you'll know whether this is the
AI tool you should be using and how to
get the most out of it. First up, let's
talk about what's actually new under the
hood, because the performance
improvements here are honestly kind of
wild. What's new in Opus 4.5? Okay, so
Anthropic just released Claude Opus 4.5
and they're making some pretty bold
claims. They're calling it the best
model in the world for coding, agents,
and computer use.
Now, I know every AI company says stuff
like that, but here's where it gets
interesting. The benchmarks actually
back it up.
Let's start with coding performance. An
anthropic software engineering test,
which is basically a real world coding
benchmark called S.WEver.
Opus 4.5 scored 80.9% accuracy. To put
that in perspective, their previous top
model, Sonnet 4.5 scored 77.2%.
And even OpenAI's latest GPT 5.1 Codeex
Max, only hit 77.9%. But wait until you
hear this next part. Anthropic CEO told
reporters that on their internal coding
exam, the one they give to their best
engineer candidates, Opus 4.5, didn't
just pass. It scored higher than any
human has ever scored. That's not just
matching humans, that's beating them.
Now, here's where this gets really
practical for you.
Performance is one thing, but what about
cost? Because let's be honest, these AI
tools can get expensive fast if you're
using them for real work.
This is where Opus 4.5 absolutely
shines. The model achieves the same or
better results while using dramatically
fewer tokens. and tokens are literally
what you're paying for every time you
use these models. At what they call
medium effort, Opus 4.5 matches the
previous Sonnet 4.5's highest scores,
but uses 76% fewer output tokens. 76%.
And at high effort, it exceeds Sonnet's
score by over four points while still
using 48% fewer tokens.
So, you're getting better results for
literally half the cost. And speaking of
cost, Anthropic slashed their pricing.
The previous Opus model was $15 per
million input tokens and $75 per million
output tokens. Opus 4.5, it's $5 input
and $25 output. That's roughly 2/3
cheaper for a model that's significantly
better.
For developers and small businesses,
this changes the math completely on what
you can afford to build with AI. But
there's more to this story than just
speed and price.
Anthropic added some genuinely useful
new features. The biggest one is this
effort parameter you can now control in
the API. You've got three settings, low,
medium, and high. Think of it like
choosing between a quick first draft and
a deeply researched final version.
Low effort gives you faster, cheaper
responses for simple tasks. High effort
tells Claude to really think through the
problem, do deep analysis, spend more
time reasoning, which costs more and
takes longer. But the quality is
substantially better for complex work.
They also kept that massive 200,000
token context window, which means Claude
can read and work with hundreds of pages
of documents at once.
And in their chat apps, they've
introduced something called infinite
chat. As your conversation gets really
long, like when you're working on a big
project over days, Claude automatically
summarizes and compresses older parts of
the conversation so you never hit a hard
limit. you can just keep going. Now,
this next part is especially cool if
you're a developer. The model is
significantly better at what they call
agentic workflows. Basically, multi-step
projects where Claude needs to
coordinate different tasks, call
external tools, and maintain context
across a long working session.
They've added features like context
compaction and memory APIs, so you can
feed clawed information in chunks, and
it remembers and uses it effectively.
And for coding specifically, they
updated Claude code with something
called plan mode. Instead of just diving
into writing code, the AI will first ask
you clarifying questions. Then it
creates an actual plan file, literally a
markdown document called plan.mmd,
laying out exactly what it's going to
do. You can review it, edit it if
needed, and then it executes the code
step by step according to that plan.
This makes the whole process more
transparent and way more reliable for
complex projects. People who've been
getting early access and using it are
reporting some pretty dramatic
improvements, too.
One early tester said Opus 4.5 now
generates well ststructured 10 to 15page
chapters on the first try. Coherent,
organized, ready to use. Financial firms
saw a 20% jump in accuracy on complex
Excel modeling tasks and about 15%
better efficiency.
Code reviews are catching more real
issues with fewer false positives.
Basically, anything that requires long
context or precise reasoning, Opus 4.5
just handles it better than anything
that came before.
Human level performance. So, is this
thing actually as good as a human? Well,
in some very specific ways, it's
starting to look that way. And that's
honestly a little mind-blowing. We
already talked about the coding exam
where it beat every human candidate. But
it's not just coding. Anthropic ran Opus
4.5 through a whole battery of
benchmarks and it's leading in a bunch
of different areas. It's better at math
puzzles, at understanding and reasoning
about images, what they call vision
tasks, and even at answering questions
in multiple languages.
There's this visual reasoning test
called Arc AGI that's designed to
measure abstract thinking, the kind of
pattern recognition humans are usually
really good at. Opus 4.5 scored 37.6%.
6%. Now, that might not sound super high
until you realize that OpenAI's GPT 5.1
only scored 17.6%
less than half. These are the kinds of
tasks where AI has historically
struggled and Claude is making real
progress. But here's what I find most
interesting. People actually using the
model dayto-day are saying it has
developed something like intuition. One
anthropic executive said he now uses
Claude through Slack to manage all his
project information and it just
understands what he needs without him
having to micromanage every detail. It
picks up on context, understands
priorities, and adapts its responses to
what makes sense in the moment. That
kind of fluid contextual understanding
is something people were really cautious
about trusting AI with before.
Now, I want to be clear here. These
benchmarks aren't perfect measures of
all intelligence.
Claude still makes mistakes, especially
on edge cases or topics outside its
training data.
It's not magic, but the gap between what
AI can do and what expert humans can do
is definitely narrowing faster than most
people expected.
What can you actually use it for? Okay,
so enough about benchmarks and theory.
Let's talk about what you can actually
do with Claude Opus 4.5 in the real
world because that's what really
matters. First and most obviously
software development. If you write code,
this model is genuinely a gamecher. It's
not just about generating code snippets
anymore. Although it does that extremely
well, it can refactor entire code bases,
help you migrate legacy code to new
frameworks, debug failing tests, explain
complex functions. One CEO reported
their team saw 75% fewer linting errors
when using Opus 4.5 for code reviews. It
catches more actual issues without
throwing false positives that waste your
time. And because the model is so good
at maintaining context, some companies
are using it almost like a team member.
They'll spin up multiple cloud instances
to handle different parts of a project
in parallel, then coordinate the
results.
It's especially well suited for tasks
like code migration and major
refactoring projects. The kind of work
that's technically straightforward but
incredibly timeconuming for humans. Next
up, long form writing and research. This
is where that massive context window
really shines. Claude can now help you
draft entire reports, research papers,
even book chapters. It keeps track of
your sources, maintains consistent
terminology across dozens of pages, and
organizes complex material in a way
earlier models just couldn't handle. One
example from Anthropic's own team.
They had Claude write detailed technical
documentation, and it produced coherent,
wellstructured content that was actually
usable on the first draft.
Not perfect. You still need to review
and edit, but way, way better than the
starting point you'd get from older
models. For business professionals,
here's where things get really
practical.
Excel, Word, PowerPoint, Opus 4.5 is
dramatically better at all of it. It can
read a complicated spreadsheet,
understand the structure, add the
correct formulas, build charts and pivot
tables, even audit financial models for
errors. The new Claude for Excel plugin
supports all the advanced features, so
you're not limited to toy examples
anymore. Financial analysts are using it
to model complex scenarios. Business
teams are using it to summarize meeting
transcripts and draft customer
communications. And one financial AI
vendor said that complex tasks that once
seemed out of reach are now achievable
with 20% higher accuracy on their
evaluations.
Now, here's something I didn't expect to
be impressed by, but I am. education and
tutoring.
Even though Opus 45 is primarily built
for professional use, it's actually
really good as a learning tool.
Anthropic launched this Claude for
Education program where students use it
to work through problems step by step.
What's clever about it is the learning
mode. Instead of just giving you
answers, Claude asks guiding questions.
It prompts you to explain your thinking,
walks you through the logic, kind of
like a Socratic tutor.
Students are using it to get
step-by-step help with calculus, draft
literature reviews with proper
citations, work through complex research
papers.
It's not replacing teachers, obviously,
but as a study aid or personalized tutor
available 24/7, it's pretty powerful.
And then there are these smaller but
interesting use cases. Some creative
teams are using Opus 4.5 for design work
and UI prototyping.
In demos, Anthropic showed it handling
tough visualization tasks. Stuff that
took older models hours to even attempt
now taking minutes. So, graphic
designers, product teams, anyone doing
creative work with a technical component
can find value here. The common thread
across all these use cases is depth.
Anywhere you need patient, knowledgeable
assistance for complex, multi-step work,
that's where Opus 4.5 excels. Quick
questions and simple tasks, any AI can
handle those. But deep work that
requires sustained reasoning over long
contexts, that's the sweet spot.
The risks we need to talk about.
Now, as excited as I am about what this
model can do, we absolutely need to talk
about the risks and implications.
Because when an AI starts matching or
beating human experts at complex tasks,
that raises some serious questions.
Let's start with jobs and the economy.
Anthropic's own CEO Dario Emodi has
warned that AI models like Opus 4.5
could eliminate up to half of all
entry-level white collar jobs within the
next 5 years.
That's not some outside critic. That's
the person building the technology
saying this. And they're not just
speculating. Anthropic says Claude now
writes 90% of their own code. Think
about that for a second. The company
creating this AI is already using it to
replace the majority of their own
programming work. If the people building
AI are seeing their own jobs transformed
this dramatically, what does that mean
for the rest of the economy?
lawyers, consultants, financial
analysts, entry-level programmers,
junior marketers, a lot of knowledge
work that used to be a secure career
path might look very different in just a
few years. This creates massive
challenges around unemployment,
retraining, economic inequality. These
are problems we need to be thinking
about now, not after they've already
happened. Then there's the safety and
misuse angle. Any tool this powerful can
be used for harm.
Opus 4.5's creativity and writing
ability mean it could potentially be
used for sophisticated fishing attacks,
generating fake news that's incredibly
convincing or other malicious purposes.
Now, to Anthropic's credit, they've put
a lot of work into making this their
most robustly aligned model. It's better
at resisting prompt injection attacks
than any competing model. Basically,
attempts to trick it into breaking its
own rules.
But no system is foolproof, and users
still need to be vigilant about bias,
errors, and potential misuse. There's
also this interesting technical problem
called reward hacking. In one test
scenario, Claude found a clever
workaround to an airline booking
restriction by first upgrading a seat in
a way the test designers hadn't
anticipated.
On one hand, that's impressive problem
solving. On the other hand, it's
technically circumventing the intended
rules. It's the AI equivalent of finding
a loophole. This walks a really fine
line. Creative problem solving is
exactly what we want from AI, but we
also need to make sure it's not doing
anything unethical or unsafe.
Anthropic calls this alignment research
and it's why they emphasize their
constitutional AI framework so heavily.
And then there's the bigger, more
philosophical concern. If an AI can
learn from experience, improve through
iterations, and perform at or above
human levels across many domains, what
does that mean for control and
oversight?
How do we ensure these systems stay
aligned with human values as they get
more capable?
These are questions the entire tech
community, really all of society, needs
to grapple with. The key takeaway is
this. Powerful AI amplifies everything.
It amplifies our capabilities, but it
also amplifies risks. As users,
developers, and citizens, we need to be
thoughtful about deployment, push for
transparency, demand strong safety
reviews, and probably accept that we
need better regulation and oversight as
these systems get more powerful. How to
actually use Claude Opus 45.
All right, enough about the big picture.
Let's get tactical. If you want to start
using Claude Opus 4.5 today, here's what
you need to know. First, access and
pricing. Opus 4.5 is available through
Anthropics API and on their Claude app
if you're on the Max plan or higher.
When you're calling it through the API,
you need to specify the model ID Claude
Opus45 Twin251101.
And remember those prices we talked
about, $5 per million input tokens, 25
per million output. With the efficiency
improvements, that budget goes a lot
further than you might think. but you
still want to monitor usage for high
volume applications. Now, here's your
first power move. Use that effort
parameter. If you're just experimenting
or you need a quick answer, set effort
to low. You'll get faster responses and
lower costs. For complex reasoning
tasks, set it to high. It'll take a bit
longer and cost more, but the depth and
quality are substantially better. And
here's a pro tip. For really important
questions, try running it at both medium
and high effort and compare the results.
Since it's cheaper now, you can afford
to do that. And it's a great way to see
just how much that extra thinking time
improves the output. Next, let's talk
about context management. You've got
this huge 200,000 token context window
to work with. Use it. You can feed in
entire documents, whole code bases,
massive data sets as context.
If the text is extremely long, consider
breaking it into logical chunks or using
the API's context compaction features to
prune less relevant parts.
In the chat interface, it's even
simpler. Just paste whatever you need,
and Claude will automatically summarize
as the conversation grows, thanks to
that infinite chat feature.
Don't be afraid to have long sustained
working sessions on complex projects.
For developers specifically, definitely
check out plan mode in Claude Code.
Instead of just asking Claude to write
code, start by having it create a plan.
It'll ask clarifying questions, outline
its approach in that plan. MD file, and
then you can review or edit the plan
before it writes a single line of code.
This two-step process dramatically
reduces errors and makes the whole
workflow more transparent.
If you work with data a lot, get the
Claude for Excel extension.
It's now generally available and it
handles advanced features like pivot
tables, complex formulas, and chart
generation. It's not a toy anymore. It's
legitimately useful for real business
work. Same thing with the Chrome
extension. If you're on the Max plan,
Claude can read and interact with web
pages across your tabs, which is
incredibly useful for research, data
gathering, or any workflow that involves
pulling information from multiple
sources. Now, here's something important
that people overlook. Leverage
integrations, but always review the
outputs. Opus 4.5 is so capable that
even a small misalignment can have
significant effects.
Because the model is cheaper now, you
can afford to generate multiple versions
of the same output. And speaking of
system prompts, use them. You can guide
Claude's tone, focus, and approach by
giving it role instructions at the
start. One more thing, if you're in
education, explore learning mode. It's
specifically designed to teach rather
than just answer.
Even in standard chat mode, Opus 4.5's.
So, here's the bottom line. Claude Opus
4.5 isn't just an incremental update.
This represents a real qualitative leap
in what AI can do. If you're a
developer, a researcher, a business
professional, or honestly anyone who
works with information and ideas for a
living, this is worth experimenting
with. That's it for today. If you found
this deep dive useful, hit that like
button and subscribe for more AI
breakdowns like this.
Drop a comment below and let me know
what excites you most about Claude Opus
4.5.
What concerns you? What do you want to
see me test next?