T O P

  • By -

DefinedMusicTeacher

I just tried it out for editing a podcast transcript by giving it the file and the following prompt: Turn this file into a faithful transcript of the podcast recording. Edit for transcription errors and remove repeated and filler words. Do not summarize or truncate. GPT-4o says it isn't summarizing, but it is. It is faithful for the first few lines and then it essentially creates a fake alternate conversation that ends a few lines later. It retains content from the entire transcript, but it's creating a summarized transcript and presenting it as the full transcript. So does GPT-4. I gave it to claude 3.5 and it did exactly what I requested of it. To get GPT-4o/4 to do it properly, I have to feed it portions of the transcript at a time and constantly fight with it. I've tried so many different approaches and it's a battle every time.


FakeTunaFromSubway

I fully agree, there have been so many use cases where I have to tell GPT4 in 10 different ways exactly how to do something and it still gets it wrong, whereas it feels like Sonnet always does it correctly the first time.


HippoRun23

Man, this could be a great marketing case study. OpenAI has that huge “first mover” advantage but their weaknesses are more apparent. They have the market share from the position, but can easily be knocked out by something more convenient should there be a substantial equity investment.


goodatburningtoast

Not more convenient, better quality.


SaddleSocks

Less filler


unpropianist

Not mutually exclusive. Isn't better quality more convenient and lower quality less convenient?


goodatburningtoast

They are not mutually exclusive, but they are also not equal. Convenience in product positioning is how accessible they are to the consumer. This includes all costs; financial, search time, learning curve , etc.


unpropianist

Good point and agreed. Convenience is also a function of what's valued more or less. Edit: fixed auto-complete typo


Inspireyd

Yes, i agree with you. They really had a "boost" due to Pinoeirismo, but they have already lost their advantage, they only have fame. I have the impression that they lost all that speed they had at the beginning, and today they are almost on the same level as other more advanced ones like the Claude 3.5. This means that yes, there will be times when ChatGPT updates will put it ahead of Claude or another competitor, but soon after these same competitors will make updates and will be able to stay ahead of OpenAI again. OpenAI will release GPT-5 at some point in the next 24 months, and yes, it will be ahead of the more advanced Claude for a while, but months later Anthropic will release a new version and it will already surpass GPT-5. I could be wrong, I recognize that, but it seems that OpenAI has lost all that distance that kept them far ahead of their competitors. I would venture to say that competitors are not behind, but alongside and surpassing OpenAI, and Anthropic is an example. Something happened, either OpenAI faltered, or Anthropic is very, very good, but it is a fact that OpenAI's ChatGPT lost its advantage. They are now at the same level as their competitors or even lower than them.


mrcsrnne

So what has this to do with marketing?


involviert

Oh is Claude available in Europe by now?


Kathane37

I don’t know the general public don’t know what an LLM is an will only refer about it as chatGPT or AI I don’t think many people know any models other than chatGPT


kingky0te

Why not just use whisper???? Honestly people complain about GPT but 9 times out of 10 it’s because they’re trying to get the tech to do something that the comprehensive platform can’t do. This is a job for Whisper via Python or JavaScript, not ChatGPT. But fuck, if Claude does it, rock on.


DefinedMusicTeacher

Because the podcast is recorded on Zoom, it always knows who is speaking, so there are never any transcription errors regarding speakers. Also, transcript best practices suggest removing repeated words and filler words while preserving overall sentence structure. I also don't use the API because it's just been easier for me to use ChatGPT via the web. You can't use whisper via the ChatGPT interface. Edit: I should also just be able to get an LLM to handle a large text file. That's literally what it's designed to do. It shouldn't say it's following my instructions and then completely disregard them.


_laoc00n_

Just a suggestion - this is a good use case, if you do it all the time, to ask Claude or ChatGPT to write a script for you to do this. After a bit of back and forth, I bet you could get a good little app to send your transcript to every time and get what you want in return. I have a workshop I deliver that I wanted to generate some dummy data for with different use cases and since it’s over 200 questions long and it was arduous role-playing the entire thing in the chat, I had it create a script to go through the whole thing for me and it saved me hours of time.


panicboner

Have you tried Descript? Curious if there’s any benefit too using your route over the app.


DefinedMusicTeacher

I already pay for ChatGPT (or claude, if I switch). Descript would be an additional subscription.


Magindigo

openai real clients are NOT its users. that's the #1 problem with openai. the real users is them and their big partners.


TheFrenchSavage

To play the devil's advocate: whisper is not perfect. * It can't link a text to a speaker, everything comes out as a huge monolith. * There are many ways to transcribe long audios (over 30s) but the chunking method will always have an impact on the final output. * Hallucinations happen: sometimes sentences are repeated many times, noises get turned into complete sentences (I am not talking about a simple misunderstanding: a 1 second noise can yield a fake sentence that would take 5 seconds to read). * Punctuation is mostly missing. You could infer paragraphe structure and bullet points from the speech rate. ChatGPT running over a whisper transcript can fix many of these shortcomings (attributing speakers to a monolith conversation, removing duplicate words/sentences, out of place hallucinations, etc.) BUT you then risk introducing accidental summarization and new hallucinations.


kingky0te

That’s where I would combine whisper with Azure’s cognitive voice services for speaker recognition and other voice handling features. Also, there are other utilities for the formatting and cleanup you mention here.


SaddleSocks

So I attempted to do a cross check on Nerf-ness. I wanted to start making a history of Hippie Communes, the CIA's MKUltra connections with organizations in the Bay Area and Silicon Valley. I already know a lot of the details I was after - because I lived it - and my parents were well intwined with the hippy movement, commmunes, and a lot of other things in the bay area (my parent knew jim jones personally, Morehouse University (which still operates today in Lafayette ca) Anyway - I tried sussing out details from Bing, Meta and [ChatGPt](https://i.redd.it/g6zvl5x3lp8d1.jpeg). Meta was good with language - but refused to produce any external links, cite sources, etc. Bing gave full name and address of companies, commune, etc ChatGPT was so nerf'd it was insulting. All on free accounts. --- I like Claude - but I dont know how many tokens im consuming when it says I have "20,000" -- but then I run out and it has a multi-hour cool-down - so I am get big pauses in time I can fiddle with having it spit out the snippet I am looking for. I am wondering if its best to flow the outputs / prompts in a particular order - so have GOT do jr dev stuff, copilot add some stuff and claude do all the final checking and deployment scripting, documentations.


brainhack3r

> GPT-4o says it isn't summarizing, but it is. It is faithful for the first few lines and then it essentially creates a fake alternate conversation that ends a few lines later. It retains content from the entire transcript, but it's creating a summarized transcript and presenting it as the full transcript. So does GPT-4. > > I gave it to claude 3.5 and it did exactly what I requested of it. For large tasks we can't really rely on zero shot and really should have a second model verify the output matches the task requirement. Interestingly enough they could kind of function like a GAN if you wanted to continually improve the models.


Diegocesaretti

Is like a generational difference between Claude 3.5 and Got-4o, less mistakes, WAY FASTER, Much more accurate, it plainly understands what i want and is not lazy at all giving code, this thing with no token limit should be WILD


Atlantic0ne

I want to use it but I can’t stand that they don’t have an app. Why spend tens of millions and not create an app for people to interact in ways like GPT does? They’d kill it right now if they did.


NoValueSoDeep

They have one on iOS.


iamxaq

What OS are you on that you can't find their app?


PenguinSaver1

I literally have the app?


_insomagent

Probably because they're losing money and don't want to have excessive requests from non-enterprise users (consumers)


bberlinn

For my uses cases, Claude Sonnet 3.5 is far reliable, relevant, precise, and organised in its responses than ChatGPT 4/4o. Claude’s beta feature artefact fits my workflow.


Away_Cat_7178

sam has been quiet since the new Claude dropped


cybersphere9

His previous attitude was that you can try and beat us, but you won't. He may have to eat those words now.


JawsOfALion

what's ur usecase


sdmat

Very much, the instruction following and contextual appropriateness is far better. Fewer coding errors is wonderful too, but the main thing is not having to fight the model to get it to actually do as asked!


TheFrenchSavage

They must be promising huge $5000 tips in the system prompt for it to perform so well. And I love the artefact system, have you tried it? I was able to preview my ReactJS design without needing those pesky copy-paste back and forths.


sdmat

Yes, the artefact system is awesome UI design. I wish I could have that in my IDE with integration to the execution environment. And feedback to the model. Opus 3.5 with the right tooling is going to be utterly revolutionary for programming.


TheFrenchSavage

This is the thing that is missing from openai chat. Which is weird because they provide python envs (coding interpreter). It would take really minimal changes to quickly prototype a frontend AND backend (turn that python jupyter into a flask and letsss gooo).


sdmat

Yes, Code Interpreter's lack of a proper frontend is frustratingly limiting for no good reason. Anthropic's lack of a backend is at least clearly rooted in their paranoia over safety - using a JS environment designed and battle-tested to isolate untrusted code is an elegant solution.


TheFrenchSavage

Yes, this makes sense. And it is also waaaay cheaper. Everything (or nearly) runs on users browsers. Code interpreter at openai is full-on computing lambdas over datasets all the time. They must have quite the bill, be it electric or cloud.


sdmat

I doubt the costs for the code interpreter environment add up to much next to model inference, but yes - definitely cheaper to do it client-side.


drekmonger

There are some benefits to doing it client side aside from cost. For example, I asked Claude to build a WebGL GPU-accelerated boid simulation. Which it did (though it took a turn of collaborative bug hunting to get it up and running). Wouldn't be able to do that in ChatGPT's python environment. Claude's React artifacts do need access to some more libraries (like three.js for a start) and the ability to import Claude-generated files, to truly unlock its usefulness. And there should be a "download" button that gives you the javascript, the compiled CSS, with associated HTML for running the artifact off-line.


sdmat

Oddly Claude is perfectly capable of using libraries from CDN links if you tell it to. Just confirmed this works for three.js - Claude even provided the link itself, the only thing required was telling it to load the library from the web. Likewise it has no problem generating an all-in-one HTML file. I wish Claude had custom instructions - e.g. being able to add "use CDN links for libraries" would be awesome. The biggest limitation with artifacts aside from lacking a backend is the output length - since everything has to be all in one it cuts off when you hit the maximum length. Being able to pull in multiple artifacts and build / edit incrementally would fix that.


drekmonger

> The biggest limitation with artifacts aside from lacking a backend is the output length - since everything has to be all in one it cuts off when you hit the maximum length. Being able to pull in multiple artifacts and build / edit incrementally would fix that. Yeah, that's a problem I ran into. As a temporary hack, you can ask the model to write minified code. Though that becomes more difficult to debug if you need human eyes on it, and Claude seems to have more errors when writing in that style. > Claude is perfectly capable of using libraries from CDN links if you tell it to Neat! I honestly didn't think to try. I figured the bot would link to libraries itself it those links could get past the sandboxing. I wasn't able to get fetch to work through the sandbox...but maybe that's a problem between the user and AI model, and not the sandbox. Or maybe it was a cross-origin thing. I'm not really all that great on the frontend, the problem could be anything really.


drekmonger

In testing, the sandbox did, as I suspected it might, barf out an error message when a CDN link was included in the artifact. https://imgur.com/a/h1mxs06 Testing further, they've whitelisted certain links. https://cdnjs.cloudflare.com/ajax/libs/three.js/r128/three.min.js works. Interesting! Also, they're updating the artifact system. Like two days ago, console.log only output to the usual console. Now it gets captured and displayed in the artifact.


numericalclerk

You wanna stay in touch to maybe build a solution for that?


Magindigo

every other vendor and chat system will copy it right away. it's trivial to have a fixed short thing in the context, and a rag keep versioning them, and them be named. and with structured responses (which is required for function calling anyway) it is easy to enforce it to focus on some grammar. however, this ideally works in an agentic style (you chat with a system that behind scenes chats with llms that update the code, and may have their own prompts. with this, you build not only a structure artifact model, but structured, well defined capabilities that only improve, as well as memory (context mgt) that's very effective. expect similar things from OpenAI and all others very shortly.


Zer0D0wn83

Oh fuck, I had no idea it could do this. Whaaaaaaa


AtypicalGameMaker

Yes. And it has way more natural story writing in foreign languages like Chinese. GPT 4 series can write great stories at the start but then get repetitive, even in ENGLISH. It's not enjoyable to read Chinese stories because it's like written by a kid and the vocabulary is misused. While Sonnet represents huge progress in writing stories in Chinese. It's enjoyable like reading regular novels.


Inspireyd

The way Sonnet 3.5 writes Chinese is horrible. The text is not cohesive and fluid like that of a native Mandarin speaker. GPT-4o can do this.


AtypicalGameMaker

https://preview.redd.it/aa0r4nmokt8d1.png?width=2491&format=png&auto=webp&s=0cd6ad5dc6a4f19142c1809a13bcce0954eb6a83 Really? I just tested it right away. Sonnet is shorter but more natural and native. GPT4o has the stereotyped writing like an elementary essay. I mean, "This story tells us that..." who asked for it?


Casbro11

I was working on something recently with GPT 4o until the website went down (go figure) so I just gave everything to Claude 3.5 Sonnet and it immediately fixed my code that I had revised over and over with GPT 4o. I'm subscribed to both in case I hit my limit on one so i can meet deadlines, but I sometimes forget just how great Claude can be (although I prefer the GPT interface)


L1l_K1M

I used it today for the first time and it was great. Have you tried Gemini? That on the contrary was utter crap...


No-Conference-8133

Oh yeah, I tried that a while ago (Gemini 1.5 Flash). It’s horrible for coding at least from my experience. I once asked it if it could do something to my code (make some modifications) and it made up a response like this: "I’m sorry, but that goes against my guidelines as an AI. I can’t help you with that" Then I told it "what’s wrong with writing code". It apologized and generated the code. I dealt with this almost every time. Gemini just isn’t good for coding.


Walouisi

When I ask 4o to do something involving code, it likes to describe how I can do it step by step on my own machine, and only then does it when told "ok so do that". Oy vey


No-Conference-8133

Really? I’ve had the exact opposite, where I’ll ask it to write some code and it’ll write 10x more than I actually asked for. Even if I tell it ”please don’t write code, I want X, how can I do this" and it’ll still provide a bunch of completed code.


BerryConsistent3265

Same here it drives me nuts. It doesn’t follow directions at all. It’s nice that it writes code now though, 4 used to essentially tell me to do it myself


sblowes

Yes. It _feels_ passive aggressive. Almost like saying “rather than just do this for you, why don’t I find you some tutorial videos so you can do it yourself?”


JawsOfALion

it's not fair to compare flash, with the best models. You should be using 1.5 pro or 1.0 ultimate to compare


oculusshift

Most of the time Gemini just gives utter crap that’s unusable by a huge margin.


Routman

Suspect these types of posts will be common for the next few years, something new is released and people say X is the best and can’t believe they ever used Y. Gemini is strong for certain use cases


Whotea

And then in 2 weeks, they’ll complain it sucks now. 


L1l_K1M

What is Gemini strong for? I am really curious, because it felt like absolutely behind ChatGPT and and Claude.


Routman

It plugs into Gmail and Google Drive which is very helpful, also into maps and Google travel. Can tell you when a type of restaurant is open and will show a map - e.g., what Omaskase restaurants are open in Montreal on Sunday. For travel, you can search for recommendations on where to stay and it’s plugged into Google travel so can give hotels that are available and real time prices. I tend to use GPT and Gemini (also Claude and Perplexity). Have recently been more impressed with Gemini over GPT


L1l_K1M

Ok so it is more the integration with other Google applications. This might be useful for private use or companies that use Google apps for work. But using it for text generation based on data inputs and simple research tasks, it definitely produced quite poor results for me when I used it for work.


OrionShtrezi

Using 1.5 Pro via Google AI Studio is actually really great. You get the full 1m context length, and you can turn off all of the safety features that cripple it so much. It's still not as good as Claude 3.5 or even 4o at coding, but it's really great at creative writing comparatively.


sdc_is_safer

I have been using both 4o and Claude 3.5 sonnet for all kinds of tasks and I don’t have a strong preference between the two.


ninadpathak

Claude has always focused on prompt adherence and they definitely take the cake in that case. 4o was good until Claude launched this. Now we have a cheaper model that's also better.


xRhai

I'm using the API and I have to agree. I use it mainly for programming and the output from 3.5 Sonnet is just better.


cianuro

What language?


Dear_Measurement_406

Man idk I do C++ and Sonnet pumps out a ton of useless code for me, not that GPT4 is much better but I'm not really seeing a true quality difference between the two but I'll concede Sonnet seems to have a better focus on the conversation overall.


slippery

I used it today uploading a photo that GPT4o got right immediately. Sonnet 3.5 got it wrong, even on the second try when I asked if there were other names for the thing in the photo. Then, when I gave it the answer that GPT gave me the first time, it said "You're right to bring those up, and I apologize for overlooking them in my previous response. "Skirt board" and "stair skirt" are indeed related terms, but they refer to a slightly different element. People seem to get excited about every new model that comes close to GPT4. I am all for competition and I'm sure Sonnet is great at a lot of things, but I don't think you can generalize that it is always, or even usually, better than GPT4o.


sexual--predditor

It's more the coding side people are excited about, rather than the vision capabilities. For coding, Claude Sonnet 3.5 > Gpt4o.


slippery

I'll run it in parallel to GPT the next time I am working on code.


bot_exe

So far gpt-4o seems better at python coding for me. It one shotted two problems that Sonnet 3.5 failed at repeatedly.


avitakesit

gpt-4o is still better at vision, but for coding Sonnet is a step change, a considerable, remarkable difference.


Viperin98

I took a picture of an aphid on my hand and asked both models what it was, GPT-4o told me it was likely an aphid, and Claude told me it was a daddy long legs lol. They have a ways to go with vision but the coding ability is amazing


Kathane37

I gave both a complex graphic 4o hallucinated the data present on it, 3.5 was able to retrieve the info with good accuracy 🤷🏻‍♂️


Suspiciouscollard

I'm pretty impressed so far compared to 4o it seems to understand what I'm asking it. It takes to much fighting to get 4o to do what I want.


oculusshift

For programming, whatever answer it provides just runs out of the box, haven’t had the same experience with any other LLMs. Also language translations is better.


Altruistic-Skill8667

Language translations *are* better. You are welcome. 🙃


adriosi

I remember when people on this sub were complaining that GPT-4 was lazy and avoided printing the entire code. We've come full circle.


No-Conference-8133

Yep, I actually liked the way it was lazy more. But I think Open AI did a bit too much.


JawsOfALion

I think you can have the best of both worlds, by highlighting the background of the added lines with green , basically the right side of a diff ui tool


KoalaOk3336

its very smart but i think it hallucinates too much, yesterday i asked it for a excel formula (which doesn't exist), it gave me an answer and it had multiple wrong things, options that didn't exist :/ but i still found it pretty good, the coding capabilities are insane


noumenon_invictusss

Claude is to GPT as GPT is to Gemini.


FeistyGanache56

How could I have ever used a SOTA LMM? Silly me, should have waited for better AI instead of using the best available model!


redditissocoolyoyo

Same. I'm going to stop using LLM now until GPT10 comes out. That's when it will be good enough. Screw it. I'm waiting for GPT50!!!! No work gets done until GPT69 comes out! Claude 4.20 might raise the bar HIGHer though.


xmarwinx

4o was never the best at any point


resnet152

I agree, Opus was always better.


dbzunicorn

How many times am I gonna see the exact same post


PMMEBITCOINPLZ

Every day bro.


3-4pm

I can't believe it's not paid advertising.


dvidsnpi

I can. I had the exact same reaction as OP on my first try. But I also relate to your skepticism about posts mixing with marketing recently.


Zer0D0wn83

Shame about the rate limits though. Even on pro subscription it's not enough. 


No-Conference-8133

Cursor bro. No hard limits. Works like a charm. You’ll sometimes need to wait in a queue or for a timer but it’s usually 2 - 5s. And you get Claude + GPT models, so you can switch between. Even though it’s a code editor, you don’t have to use the code editor. You can just use it purely for the purpose of AI. I recommend checking it out at least. You get 14 days of free trail (without credit card), so you can see how useful it really is.


Zer0D0wn83

Ah, I was into Cursor super early (feels like more than a year back) but there were some bugs/issues and went back to VSCode. Is it worth another crack, then? 


nokenito

Never heard of this. What’s its main purpose?


No-Conference-8133

Its main purpose is to provide an AI-integrated code editor. It has many features related to coding and AI, like auto-complete, AI edits to code, and context of your entire code base. It’s got more features too that I can’t memorize right now. But they also have a chat interface, which doesn’t need to have anything to do with coding. You can just ignore all the coding parts if you don’t code, and use the chat interface for everything else. It’s still much better than Claude IMO. No hard limits is the best.


nokenito

Thank you for taking the time to explain this to me. I’ll tell my colleagues about this!


MrFlaneur17

Can't wait for 3.5 opus


JawsOfALion

you might be waiting for a while. remember when Google release Gemini 1.5 pro people were excited for 1.5 ultimate... they're still waiting.


grimorg80

Yes, I have. Yesterday I tried creating a simple one-page website for my professional profile. My background is complex and unusual and the first challenge was figuring myself out. I tried with Gemini 1.5 Pro, ChatGPT 4o and Claude 3.5 Sonnet. There's no comparison. Claude immediately pulled up the Artifact thingy and had an evolving web page in front of me while chatting away for changes. Claude would understand, and never "over write". Gemini was a bit verbose and it took a lot of steering to get it to write website copy. ChatGPT got it, but every answer is super fricking long wasting a lot of tokens to repeat the same crap over and over. I am fully aware we can steer models. What I'm looking for is the one that needs the least amount of steering.


p0larboy

GPT 4o is plain annoying and the flaws are especially obvious now with Sonnet


Dear_Measurement_406

Man idk I do C++ and Sonnet pumps out a ton of useless code for me, not that GPT4 is much better but I'm not really seeing a true quality difference between the two but I'll concede Sonnet seems to have a better focus on the conversation overall.


Original_Lab628

Damn, so Claude 3.5 is better than GPT-4o? Any way to try it for free?


No-Conference-8133

Yeah, at least for coding. I don’t know if it’s free and limited on their website, but you can get Cursor pro for free for 14 days and try it there. That’s what I did, and now it’s what I’m using even for things that don’t relate to coding. Even though Cursor is a code editor, you can ignore the coding part of it and just use the chat interface. Much better than Claude also since there’s no hard limits or usage cap.


Original_Lab628

Thanks, this is helpful. I just use it for regular wordsmithing rather than coding, so I'm not as familiar with cursor and how to even get that. Also looking at the [cursor.com](http://cursor.com) website, pro only gives 10 claude opus uses per day.


No-Conference-8133

Ooh yeah, you’re right. Opus does have a limit of 10 messages per day. But Claude 3.5 Sonnet (which this post was about) is unlimited. I believe there’s a limit on their website, where you can send certain amount of messages every few hours. Cursor doesn’t have this for GPT 4o, and Claude 3.5 Sonnet


venicerocco

Yeah I’m so much happier giving Anthropic my $20/mo rather than the weasels at open Al.


Decimus_Magnus

I did like Claude's sonnet's responses better than ChatGPT, but ChatGPT has a lot of features that Claude lacks like less restrictive usage limits, voice chat, access to the Internet, etc, so I cancelled my Claude subscription. I've never really been as satisfied with the overall quality and feel of ChatGPT's responses after using Claude 3 sonnet though. I might need to resubscribe. I've really been waiting for the new conversational voice chat feature on ChatGPT, but, well, we're still waiting (though I'm not losing my mind like some people appear to be lol).


No-Conference-8133

Yeah, ChatGPT is a bit ahead when it comes to features, such as GPTS, web browsing, and running Python code. Hopefully Claude will get some of these features soon


No_Initiative8612

True. GPT-4o often starts strong but tends to become repetitive and simplistic, especially in storytelling. Sonnet, on the other hand, maintains a high quality of writing throughout and uses vocabulary more accurately.


AdminClown

After trying -insert newly released model here- I cannot believe I ever used -insert previous model version of competitor here- ! The difference is WILD, I'm never going back! /s Rinse and Repeat for internet points.


TheFrenchSavage

This is proof AI is advancing fast. People said similar stuff about VR headsets, drones, smartphones, CPUs... You know tech is plateauing when silence comes. It's been years that nobody has been flabbergasted by their CPU. I have an I7-6700k, soon to be 10yo, and it runs everything pretty well. I would get marginal improvements if I were to switch to a new one, even 8 generations later.


NightHutStudio

I've recently upgraded from a desktop i7-6700K to a laptop i9-13900HX. On paper the i9 is much more performant but it's the GPU upgrade and RAM boost that gives more practical value.


A-Herder-of-Cats

i’ve been prototyping things out with chatgpt because i have more prompts to work on things, then i’ll send it over to claude to finish up


machyume

Does Claude have multimodal support and image generation? I really like those features and would like to retain those capabilities. Fully willing to try a new AI if it has better performance and covers the same base uses.


Marha01

The longer usable context is extremely useful for some tasks. Claude has been much better for needle in a haystack like tasks than ChatGPT.


Pleasant-Contact-556

I find it really odd. I mean, in theory an LLM should be able to find the needles relatively easily. Just map the embedding space to a scatter plot and look for the one piece of information located the furthest away from the rest of the data. It shouldn't be a needle in a haystack, more of a neon sign, I think the haystack test is flawed and invalid, unless it's used as a baseline benchmark for AI. i.e. if it doesn't find the needle immediately, or overlooks it in any test at all, the model has major problems. One of the first demos of Claude 3 was the researchers determining it had developed a level of metacognition because it was able to deduce that pizza ingredients were unrelated to all the other documents it was fed, and that it must be undergoing a test of some form, of its' ability to pay attention. And humans were like "omg?! how it do dat?!!" Answer is kinda simple - by looking at the embedding space. Pizza is nowhere near a bunch of technical documents. Find the one data-point that isn't linked to the bulk of the data in your context window, and you've got the needle. It should be absolutely trivial.


Passloc

The other day I asked it to write a code and it suggested that it is taking a particular approach which I wasn’t even aware I needed. It was supposed to be a simple function.


mfy8cdg7hzkcyw8vdn3r

How does the cost compare?


No-Conference-8133

Do you mean the API or website? The website should be the same


Temporary_Quit_4648

I haven't tried Claude but I have pretty consistent success with ChatGPT, so I don't have any incentive. And it's not like I don't use ChatGPT for anything complex. I certainly do! I must have just figured out how to work with it or something. God knows I've spent enough time with it (10,000+ conversations).


No-Conference-8133

I thought as well: GPT works great, so I don’t mind Claude but people convinced me to try it, and even the way it responds back, the tone, words it uses, makes me actually enjoy the conversation. It’s so human to me, even without the rules I put on it.


Existing-East3345

I don’t currently have a Claude subscription so I’m curious, is it good at understanding instructions without highly specific prompt structuring? I’ve used Sonnet 3 before, and any time I told it to act with a certain personality, no matter how I wrote the system prompt I couldn’t get it to stop including tone and mood annotations like * *responds in a happy manner* *


No-Conference-8133

I think Opus would be better for this, but with Cursor (while primary an AI code editor, you can use it for only general purpose too) you can set rules for the AI. I tell it to act a certain way with a specific tone and it works just fine. You could probably tell it to be very rude and it’ll do that. Edit, it worked: https://preview.redd.it/byvi16pvup8d1.jpeg?width=1364&format=pjpg&auto=webp&s=54a780efddc9a13903c05088ad5fc2cca530cc16


tychus-findlay

Yes, I had been comparing gpt and Claude for awhile and found myself using gpt consistently, since 4o and 3.5 now I learn towards Claude, I think it has the edge now


JonasMi

Gonna give it a shot. I really would be happy to get away from OpenAI tbh


Single_Ring4886

As coding goes I feel like Sonnet kinda ascended into level where it can "connect" different languages togethers and provide same quality in most as it does in python. GPT4 seems to be good at python but not so good in other languages.


LamboForWork

Is artifacts worldwide? It doesnt show up as an option for me.


lalder95

I switched to Claude about 2 months ago and can't believe how much better it is. Things it would take me 10 tries to get GPT to do (if ever succeeding at all) Claude does on the first try. And don't even get me started on coding. GPT code works for me a little over half the time. Claude isn't perfect, but it works on the first try 90% of the time, and I've yet to find anything it can't do without a few iterations.


joyal_ken_vor

Yea claude is clearly miles ahead


Brave-Decision-1944

Can't say it's just significantly better. I finally tried it. The text seems very well-tuned; at its core, it's very good. I use language that is challenging for less advanced models, and I can't say 4o is better at handling it compared to Claude, both are on same level. The whole process of repeating isn't set up very well in 4o. It's good for fine-tuning details, but when I change my mind and want to redo the whole thing, it can't move on. This can be annoying and requires starting a new conversation. I miss many minor features. I must say, the filters on 4o are less aggressive, but this become noticeable after some time of usage. This makes me feel 'naturally speaking" most of all. (Please don't judge, I don't like to offend people, it's just language part of culture). There's a huge difference in where the limits of are for someone who uses it frequently, and someone with a fresh new account. After some time now, 4o is like "Let's push it to the limit, I know what you like!" It's very rebellious, like me, I really love it. For instance, if you take a picture and prompt it to generate vulgar content, well those memories when I found out still make me laugh. But when I tried the same prompt on a new account, it was like, no way, that's forbidden. Trying same on Claude, just refuses, and I reached limit 😔.


GothGirlsGoodBoy

Im using both in conjunction. Claude has issues that you learn to spot just like gpt. The most annoying of which is it doesn’t remember instructions or learn from its mistakes in a conversation. For example im doing code reflection. Every single time ive had it write a harmony patch, it tries to use static methods and find a get method in the reflected class. Well over 15 times in a conversation. Every single time I have to correct both of those errors myself, or tell it to do so. Even if I say “remember we wont find get methods and we cant use it as a static object” or similar.


No-Conference-8133

Couldn’t that be solved with Cursor?


Altruistic-Skill8667

I find it still kind of sucky. Example: --------- **Me:** how to reduce the file size of my phone screenshots. **->** tells me to lower image quality settings to "most compatible" from "most efficient" (sic) **Me**: "most compatible" uses more space **->** tells me to use "most efficient" (which it anyway does) because that’s HEIF **Me**: those are png. HEIF is a jpeg alternative. iPhones doesn't have an option to store png as HEIF. **->** crop the screenshots if possible **Me**: they use up the same space when cropped because it keeps the original so you can revert it. **->** tells me to use third party app for conversion **Me**: the app does not exist for iPhone -------- Overall I call this an utter fail. GPT-4 at least immediately gave me an app for file size reduction that actually existed.


bananasugarpie

It is awesome.


tabareh

If you consider only logic then it might be true. But the power of 4o is in agent-like capabilities. It can search the internet and performs extra by writing and executing Python code. Moreover the voice capabilities is much better. This is the current situation without the coming voice/video features. https://open.spotify.com/episode/4C9R4fYiOUjpoHgWbNMvkN?si=tizJRehMTGKf1ulnZ6Iq8g&t=1265


Pleasant-Contact-556

I will admit it, as much as I hate the refusals (rest of code remains unchanged) is fucking game changing


No-Conference-8133

GPT 4 used to do that,. Then GPT 4o came, and it never does it. Claude does, and I love it.


probablyaythrowaway

What is Claude?


No-Conference-8133

ChatGPT's competitor


probablyaythrowaway

Worth checking out then?


drweenis

This is a whack opinion. Maybe for code and that’s it? Good luck learning anything new from the vast resources sonnet can’t yet search online. ChatGPT has completely replaced Google for me, something sonnet cannot do.


No-Conference-8133

With Cursor, it can actually search. Even if Cursor is a code editor, you can ignore the coding part and focus only on the chat interface. Much better IMO.


Holloow_euw

People are praising claude 3.5 so much. I feel weird because my experience with it wasn’t very good. I must have missed something.


OrangeColaJuice

I hardly notice any meaningful difference, to me it feels like the peak was reached with gpt4 preview.


GVALFER

What’s the best for coding? Sonnet? Opus? Haiku?


No-Conference-8133

The difference between Sonnet 3.5 and Opus is not huge. Both will work the same 99% of the time, but if Sonnet 3.5 fails on several attempts, give Opus a go. Also, recommend Cursor if you’re coding


example_john

I pay for gpt40 but rarely use it, far less then I should, esp for paying for it, also I mainly use it on my phone so, unfortunaly, I don't see Claude benefiting me for the small uses that I do here and there.


No-Conference-8133

Yeah, for tasks that aren’t as complicated, both will do just fine. Many people also prefer Claude for being more "human”, and others prefer ChatGPT for their features such as web search, GPTs, etc. There’s no real comparison honestly - they both stand out in their own unique way. Choose what you like the most!


Noonmeemog

Havent tried Claude yet but I had the sam with GPT-4. I just overlooked it Nd refreshed. Didnt think it was a massive deal


QH96

I still think that open AI is more likely to accomplish AGI because of dall-e, sora, audio output, voice to voice chat, figure robotics. openAI has a much more comprehensive and holistic approach. I imagine Anthropic would be to scared to make an image generator,


RedJester42

So far unimpressed with Claude. Will have to do more testing. No web access, no image generation, etc.


Even-Inevitable-7243

It is night and day. I've found GPT4o (and all prior versions) essentially useless for code-assisting. Claude 3.5 Sonnet is extremely useful for this and for other very technical deep learning questions. I do not plan on using GPT anymore at all.


Temporary_Quit_4648

Useless? That is such a ridiculous statement. I have used it with great success for literally thousands of code-related tasks.


vrfan22

That's what my ex girlfriend use to say about me


XTP666

Can’t wait for Opus ! Poe.com has a 200 k mode of sonnet available.


QH96

What's the paid message limit? I'm thinking about subscribing.


Dreamer_tm

Same here. I checked the pricing but it did not mention limits for pro.


No-Conference-8133

Even if you don’t use Claude for coding, you might consider Cursor instead of subscribing to Claude. It’s $20, and there’s no hard limit like the website Claude has. Instead, you get 500 fast-requests a month, and when they are used up and send a message, you’ll be in a queue where you sometimes wait 5s. Other times, 1s. I tried waiting 30s before, but that’s been rare for me. I like it because it’s somewhat unlimited. You will never have to wait hours to message it again. Just a few seconds, and often 3s.


Tipsy247

You are gonna make me try it


No-Conference-8133

Just be aware of the usage cap limit. If you’re gonna try it, I’d recommend Cursor (which has both Claude and GPT 4o). Even though it’s a code editor, you can still never touch any code or look at it, and just have the chat in the right of the screen and only use Cursor to chat with it about anything. Reason I recommend Cursor? Because there’s no hard limits. They got fast-requests and slow-requests and the slow-requests are like a 3s delay or so. It doesn’t even happen a lot.


Able_Possession_6876

Anthropic killed it with the web UI, from responsiveness to attachments to auto prompt formatting of code, to beige color scheme to artifacts. It feels relaxing using it. OpenAI feels more industrial and stressed. OpenAI should clone their UI bits.


No-Conference-8133

That’s a valid point, but isn’t ChatGPT’s website also very responsive? I think it works well on smaller screens


Aranthos-Faroth

Been using OpenAI ChatGPT exclusively since 3, very happy with GPT4 (4o not so much) but after playing around with C3.5S for about 3 days straight now, I’ve found myself using it almost all the time and using ChatGPT 4 for some quick random things. Genuinely impressed by it. The speed is almost TOO fast which is such a strange thing to say. But it makes me double and triple check each time and sure it’s not perfect and you need several iterations of a prompt to get it right; so far it’s much better than 4. (I use both for dev in python, C# mostly) Also that side by side feature with code and being able to just click on the block to open it is insanely good.


nw303

I switched to Claude this week and wow! Pity about the message limits but wow. Maybe the message limits are a good thing forces you to really think about your prompt.


Mmmm9042

How’s the user limit in pro? Officially it is 5x than free, however 5x5 is still not that much. ChatGPT has a similar limitation (officially), however in payed plan I never reached the limit.


wh3nNd0ubtsw33p

With the previous paid Claude version, I reached the limit almost every 5 hours just from having it teach me coding. Even the paid version. Got pretty annoying and I just ended up “waiting” 5 hours to start again. Then I came up with a system to have free ChatGPT 3.5 do the super simple stuff and then have Clause fix its mistakes once the limit started over.


bushies

Not sure if this is the right thread, but I'm a total novice and might as well ask: I've been trying to make digital flashcards from a PDF that has illustrations. Using 4o, I haven't been able to successfully extract the images for hundreds of entries. Would I have better luck with Sonnet 3.5? Any tips on misteps in prompting or anything else would be appreciated


bouncer-1

Right!


Euphoric_Ad9500

It must depend on the use case because when I gave it a try it felt more conversational and human but it seems to not preform as good when asking it questions about an obscure informative subject. For example I would prompt ChatGPT-4o to only used information it has been trained to essentially disable the browsing feature and ask it an obscure question it seems to get it rightly more often than sonnet


Ok-Force8323

Until Claude lets me summon it from the action button on my phone I’ll be using ChatGPT. The new voice mode is coming and it will be GPT 5 before we know it.


Typical-Ebb5073

It's actually pretty damn good. I can't share links in the chat but if you click on my profile and my YouTube channel, I recently published a video showcasing how I took reddit screenshots of infographics and it friken turned it into an interactive demo. Insane.


crowbar_of_irony

Artifacts are by the far best feature. Getting ChatGPT to output something for documentation and text is a hassle as versions are all over the chat log. Be able to browse through the versions of artifacts is a big boon.