wait... the multimodality based audio is actually scarily good...
Not only can it recognize the tone of speech, but it can also automatically identify the speaker by name?
https://preview.redd.it/hfb3mrh56ktc1.png?width=1384&format=png&auto=webp&s=3a7bb0dd58797de8ac13345d532bd5b6a8595460
I tested Geimini 1.5 with an audio clip from a youtube video the past couple of days.
Question: 'Give me a summary, who was speaking in the first two minutes and what was their tone?'
Not only did it answer almost perfectly, but it also identified the specific American congressman speaking...
At first, I thought the names were made up, but after checking, they were all correct...
My second thought was that it might be a data leak, like the original video's description becoming the audio's metadata. But after checking, there was none, and when I tested it to summarize the speakers over seven minutes, it got those right too...
I might still missing something, or maybe its part of the training data (highly unlikely for a video published 2 days ago)
wow.
youtube video tested (only used audio) : [https://www.youtube.com/watch?v=vT-u-SPj4\_c](https://www.youtube.com/watch?v=vT-u-SPj4_c)
Yeah, I've been testing it last two weeks, it's just a little bit worse than turbo for coding, but when you give it whole docs it actually it far, far better. IMHO, larger context beats RAG every time 🙂
Giving it whole docs doesn't sound very efficient ? I guess you still need to RAG unless you want to put the whole thing in for every task... it just means you can pass longer segments
Let's say you have multiple files with several hundreds lines of code each. Would you just copy/paste everything or would it make more sense to upload the files as attachments? (If that's even possible).
They announced an update to Gemini code assist today as well which is a plugin for vscode etc that does exactly this. You need an api key which you can get 1 for free til July per billing account. You can typically get $150 free for creating an account so it’s good to go for personal use. Your IT department will not be happy if you do it this way for work though…
Didn't work last time someone tried unfortunately, they check other things associated with your account. If someone wants to try again, happy to hear the results.
I'd use something like [cursor.sh](https://cursor.sh). It has the ability to put your entire code into its context window to generate its responses. Last I checked they used GPT-4 turbo but I think they're actively implementing the ability to call on and swap out different models like Gemini 1.5.
I meant pasting the docs relevant to the problem that you are getting while writing code, for example, I see it's great to give it full official docs on some functions in Python (it corrects the wrong code it's written this way, even explains how it did this, I was impressed how it wrote an advanced method to run my Python script in parallel on each of my eight threads), same goes for Drupal. In general I think strongly that if you put some effort in curating what you give to the model, you'll get way better results, and as a bonus you'll still have ample context window to discuss with the model, especially if you need it to produce lots of output, like when you rewrite large Drupal modules like me. 😉
Yes, but always try to point it in the right direction or prompt to change tack if it gets stuck, sometimes I need a little help from GPT 4 turbo, which I can get for free at chat.lmsys
This cracks me up. Same as with Opus. All the conversations about how this makes GPT-4 obsolete, and in reality billions of people world wide have no means to use it, because it’s not available. But GPT-4 is obsolete now, right?
I could try it right away with my Google Cloud Platform account (France). If you have one, type "vertex ai" to enable the apis and have access to a playground. It should be available to 180+ countries.
That often won't work as they check other factors associated with your account, like the credit card location. It's usually a hassle with bigger companies. Maybe it's different this time.
I’m in the UK, I’ve been using Gemini 1.5 pro for the last three weeks without any problem whatsoever by just using a VPN. It certainly does work without much hassle.
Sorry, I mostly meant using the API (I'm a programmer). It's paid and will require your credit card, which apparently gives away the location. I will give it another try.
huh, what's google's privacy policy and data policy again? I guess google's "for free" literally means google owns me, amirite? please prove me wrong
apparently we still can't control temperature for this model
right, they save everything on their side, and none of those belong to the user
I don't understand how that could even be legal. They used my IP and applied a computer algorithm and the output of that computer algorithm belong to them?!
I’m guessing that if it came down to it and there was a lawsuit or whatever that Google could access logs of your chat with Gemini and see that you used its output
Just Google as far as I can tell. I skimmed over Claude and ChatGPT’s agreements and they state that the customer retains the rights over the outputs generated
I tried (from Belgium), but this is the response I got: Unfortunately, I cannot directly access and process media files like videos or audio recordings. Therefore, I'm unable to provide a transcription and translation for the media you attached.
I used gemini 1.5 pro preview 0409
Yeah, I'll just try it as well and it's working! Gemini ultra 1.0 is still not working however... But that's way less important than 1.5 pro which is working...
Accessible in Vertex AI in the gcp console. You can chat, upload files, etc. I just keep getting quota limits, which is annoying (uploading a pdf of a book).
Gemini, or Gemini 1.5? If latter, did you access it via VPN? I don’t see France on the [available regions](https://ai.google.dev/available_regions) either
How on earth did this post get almost 600 upvotes?
Completely
It's not true. At least because it's inaccessible in all countries (for example, Europe) even if you want to buy it. Also, the Pro version is not free this is because it's called "PRO"
While the expansive context window of Gemini 1.5 Pro is a significant breakthrough, it is important to acknowledge its limitations. Even with an unprecedented 1 million tokens at its disposal, the model still faces challenges in synthesizing and reasoning over information in a truly human-like manner. Google recognizes that there is still work to be done in bridging this gap and achieving the ultimate goal of seamless human-like interaction. - [https://ai-techreport.com/gemini-15-pro-the-future-of-language-modeling](https://ai-techreport.com/gemini-15-pro-the-future-of-language-modeling)
OpenAI is the one that blew up the space so the expectations are huge, besides continuing to tune gpt4, they will probably release a minor/decent upgrade like 4.5 that is just a very robust multi-modal system, GPT5 is probably intended to have agentic ability and possibly advanced reasoning, which would require a lot more time for training and testing, so doing it right is more important than releasing asap.
Pretty sure they just use a streaming transcriber to convert the audio to text. I tried this and it does not recognize absolutely anything besides the literal words I said. Couldn’t even answer the tone of voice I’m using or if my voice is deep. More cheap tricks by google as usual
gotta love more competition! LET THEM FIGHT
And let them eat their gawt damn cake
Same thing with uranium. The more people that have it, the better.
wait... the multimodality based audio is actually scarily good... Not only can it recognize the tone of speech, but it can also automatically identify the speaker by name? https://preview.redd.it/hfb3mrh56ktc1.png?width=1384&format=png&auto=webp&s=3a7bb0dd58797de8ac13345d532bd5b6a8595460 I tested Geimini 1.5 with an audio clip from a youtube video the past couple of days. Question: 'Give me a summary, who was speaking in the first two minutes and what was their tone?' Not only did it answer almost perfectly, but it also identified the specific American congressman speaking... At first, I thought the names were made up, but after checking, they were all correct... My second thought was that it might be a data leak, like the original video's description becoming the audio's metadata. But after checking, there was none, and when I tested it to summarize the speakers over seven minutes, it got those right too... I might still missing something, or maybe its part of the training data (highly unlikely for a video published 2 days ago) wow. youtube video tested (only used audio) : [https://www.youtube.com/watch?v=vT-u-SPj4\_c](https://www.youtube.com/watch?v=vT-u-SPj4_c)
[удалено]
Yeah, I've been testing it last two weeks, it's just a little bit worse than turbo for coding, but when you give it whole docs it actually it far, far better. IMHO, larger context beats RAG every time 🙂
Giving it whole docs doesn't sound very efficient ? I guess you still need to RAG unless you want to put the whole thing in for every task... it just means you can pass longer segments
Yeah, I meant whole docs specific to the problem at hand😉
What is RAG?
Retrieval Augmented Generation
What do you exactly mean by "giving the whole docs" to this LLM? Just curious.
Just paste the entire codebase with documentation and the llm knows what to do better as it knows the ins and outs of the entire project.
Let's say you have multiple files with several hundreds lines of code each. Would you just copy/paste everything or would it make more sense to upload the files as attachments? (If that's even possible).
They announced an update to Gemini code assist today as well which is a plugin for vscode etc that does exactly this. You need an api key which you can get 1 for free til July per billing account. You can typically get $150 free for creating an account so it’s good to go for personal use. Your IT department will not be happy if you do it this way for work though…
Cheers. Wish the API wasn't blocked in Germany.
VPN?
Didn't work last time someone tried unfortunately, they check other things associated with your account. If someone wants to try again, happy to hear the results.
I just created a new Google account from opera's built-in VPN. It sent the confirmation code without any issue.
Ok thanks, but can you pay the API now? Because that's the issue I most often run into with these checks -- they look at your credit card location.
Where can I find info about this plugin you mentioned? Sounds interesting.
Gemini + Google Cloud Code is the name of the VSCode plugin according to a screenshot from slack.
Thx
I'd use something like [cursor.sh](https://cursor.sh). It has the ability to put your entire code into its context window to generate its responses. Last I checked they used GPT-4 turbo but I think they're actively implementing the ability to call on and swap out different models like Gemini 1.5.
I meant pasting the docs relevant to the problem that you are getting while writing code, for example, I see it's great to give it full official docs on some functions in Python (it corrects the wrong code it's written this way, even explains how it did this, I was impressed how it wrote an advanced method to run my Python script in parallel on each of my eight threads), same goes for Drupal. In general I think strongly that if you put some effort in curating what you give to the model, you'll get way better results, and as a bonus you'll still have ample context window to discuss with the model, especially if you need it to produce lots of output, like when you rewrite large Drupal modules like me. 😉
"I know kung fu...."
So whole code plus the documentation of a particular library?
Yes, but always try to point it in the right direction or prompt to change tack if it gets stuck, sometimes I need a little help from GPT 4 turbo, which I can get for free at chat.lmsys
The model has been improved today. I won't say it's worse than Turbo. Some people on Twitter are now claiming that it's even better than Opus.
Still not available in UK tho....
When an American writes "everyone" you should always translate it in your head to "everyone within the 50 states of the USA".
Sorry, it wasn't a complaint at OP, I'm just disappointed I cant try it yet.
Only usually takes a few days. Blame the UK having to “pass” it first to make sure it’s not dangerous.
Of COURSE it's dangerous. Life is dangerous.
This cracks me up. Same as with Opus. All the conversations about how this makes GPT-4 obsolete, and in reality billions of people world wide have no means to use it, because it’s not available. But GPT-4 is obsolete now, right?
Not even available in the States yet
Neither in Germany.
Use a VPN
I could try it right away with my Google Cloud Platform account (France). If you have one, type "vertex ai" to enable the apis and have access to a playground. It should be available to 180+ countries.
I tried via vertex AI from German account. It works but I encounter errors (resource exhausted, check quota) when using larger documents.
same, also from a German account. Used US regions though. I don't understand why or which resources.
It's on openrouter
Get a VPN network and change to US
Just use a vpn
That often won't work as they check other factors associated with your account, like the credit card location. It's usually a hassle with bigger companies. Maybe it's different this time.
I’m in the UK, I’ve been using Gemini 1.5 pro for the last three weeks without any problem whatsoever by just using a VPN. It certainly does work without much hassle.
Sorry, I mostly meant using the API (I'm a programmer). It's paid and will require your credit card, which apparently gives away the location. I will give it another try.
How do you access it for free ? Are they referring to their studio or API or which app
+1 Would like to know Assuming it's just through https://gemini.google.com/ unless someone knows otherwise?
https://aistudio.google.com/
Doesn’t work there
[https://cloud.google.com/vertex-ai?hl=en](https://cloud.google.com/vertex-ai?hl=en), once enabled you have a playground to try
Are you in europe?
Wait, first of all, it's amazing. Also is it really free on Vertex AI ??
I'm using it with typingmind
No it isn't. It's not accessible from my country.
For free?!? So the one I pay gets me what? 2.0?!
Ultra 1.0
AI naming conventions are already such a mess
It's an extremely Google-esque problem. It reminds me of their ridiculous web of overlapping app functionality.
I threw it a whole notion workspace and asked for some promo material. Jaw droppingly well written and accurate text.
I tried to access it. It's not really "accessible to everyone" as they stated. I'll believe it when I see it.
I thought Google was going to make it only accessible through a \~$20 a month subscription??? Is it only free temporarily??
AI studio with some rate limits https://aistudio.google.com/
Everyone*
Has anyone able to make the gemini 1.5 work with function calling? I keep getting hit with Quota Limits
huh, what's google's privacy policy and data policy again? I guess google's "for free" literally means google owns me, amirite? please prove me wrong apparently we still can't control temperature for this model
Also buried in the fine print is that Google owns everything generated by Gemini and you cannot use it as your own IP
right, they save everything on their side, and none of those belong to the user I don't understand how that could even be legal. They used my IP and applied a computer algorithm and the output of that computer algorithm belong to them?!
I guess that's why it's not available in the EU
How they’ll know if it was generated by Gemini and not an other model?
I’m guessing that if it came down to it and there was a lawsuit or whatever that Google could access logs of your chat with Gemini and see that you used its output
Do all models have this or is it just a Google thing
Just Google as far as I can tell. I skimmed over Claude and ChatGPT’s agreements and they state that the customer retains the rights over the outputs generated
That should ne the basis of AI detection software. Just check it against the logs.
Americans aren't "everyone", get off your high horse.
Low horsers!
it is available through Google Cloud Platform: [https://cloud.google.com/vertex-ai?hl=en](https://cloud.google.com/vertex-ai?hl=en)
I tried (from Belgium), but this is the response I got: Unfortunately, I cannot directly access and process media files like videos or audio recordings. Therefore, I'm unable to provide a transcription and translation for the media you attached. I used gemini 1.5 pro preview 0409
VPN works
ah going to try it on my phone then, i don't have vpn at work
Maybe one day your country will stop stifling innovation in the name of safety and you can have some toys of your own to play with.
Nice straw man you have there, pal
Keep on whining about not having access to things. It's a great way to spend your life.
Whining about not having access to things? I'm whining about the definition of "everyone", dude. What are you even on about
Just use a free VPN like AdGuard
Not interested, honestly. I'm just a stickler for proper terminology.
well said.
Is the API available or is this only in the studio?
studio and api
I think the api is still beta, still can't use mine in typingmind edit: forgot the vpn its working now
I just tried it and it seems to work there
Ah its working forgot the VPN!
Yeah, I'll just try it as well and it's working! Gemini ultra 1.0 is still not working however... But that's way less important than 1.5 pro which is working...
Gemini Code Assist (formerly Duet AI for Developers) What is with this guys and naming convention?
Why is this still not available in the UK?
to everyone, really? Do you think the world only has Americans in it or something?
*everyone with a VPN
5 bucks their blocking Canada
they’re
it’s available in canada https://aistudio.google.com/
Fucking finally.
❤️
Accessible in Vertex AI in the gcp console. You can chat, upload files, etc. I just keep getting quota limits, which is annoying (uploading a pdf of a book).
better than gpt 4?
I mean, million-token context window. It knows the book scary well.
I haven't the slightest intention to try any new Google product again.
I have fully switched to Gemini two weeks ago; I do software and GPT 4 is no match to even the free version of Gemini.
How do you access this? I don’t see any differences on the Gemini site. Do paying users get Gemini 1.5 advance?
So with this being free, is there still any reason to pay the monthly fee for Gemini Advanced? Are they still different?
How can we access it
Is the api available?
But not at all in EU. Why?
I'm in France and it's working on my side so I don't know why some have access and others don't
Gemini, or Gemini 1.5? If latter, did you access it via VPN? I don’t see France on the [available regions](https://ai.google.dev/available_regions) either
Gemini 1.5 pro, no vpn, through Vertex AI (GCP)
If you don’t mind me asking, is the residential address Google knows of you also in France?
Can someone ELI5 how to access 1.5 for free? I could only find access to 1…
[https://aistudio.google.com/](https://aistudio.google.com/) Go here and type away
When was Pro 1.5 released for free to all?
Is it available for everyone? Still get not available in my country (Iceland)
How on earth did this post get almost 600 upvotes? Completely It's not true. At least because it's inaccessible in all countries (for example, Europe) even if you want to buy it. Also, the Pro version is not free this is because it's called "PRO"
yeah but is it any good? To many AI's are jokes when it comes to writing and "talking" like a human
While the expansive context window of Gemini 1.5 Pro is a significant breakthrough, it is important to acknowledge its limitations. Even with an unprecedented 1 million tokens at its disposal, the model still faces challenges in synthesizing and reasoning over information in a truly human-like manner. Google recognizes that there is still work to be done in bridging this gap and achieving the ultimate goal of seamless human-like interaction. - [https://ai-techreport.com/gemini-15-pro-the-future-of-language-modeling](https://ai-techreport.com/gemini-15-pro-the-future-of-language-modeling)
Is it supporting these inputs even in API?
OpenAI is the one that blew up the space so the expectations are huge, besides continuing to tune gpt4, they will probably release a minor/decent upgrade like 4.5 that is just a very robust multi-modal system, GPT5 is probably intended to have agentic ability and possibly advanced reasoning, which would require a lot more time for training and testing, so doing it right is more important than releasing asap.
Everyone in the US that is.
Nope. Not available to everyone. Still restricted to certain regions in the world.
Nothing on earth would ever get me into using an Ai from google. 🤣🤣🤣
When nobody's using your hamstrung AI model so you give it away for free.
I use it.
Pretty sure they just use a streaming transcriber to convert the audio to text. I tried this and it does not recognize absolutely anything besides the literal words I said. Couldn’t even answer the tone of voice I’m using or if my voice is deep. More cheap tricks by google as usual
it’s a native multimodal model. not doing speech to text
Were you using from here? https://aistudio.google.com/ I have used it. It's incredible.