Tales from the jar side: We might have a theme song, Image generation using SDXL, Spring AI and embeddings, and the usual silly toots and skeets
I grilled a chicken for nearly two hours. It still wouldn't tell me why it crossed the road. (rimshot)
Welcome, fellow jarheads, to Tales from the jar side, the Kousen IT newsletter, for the week of September 10 - 17, 2023. This week I taught week 2 of my Spring in 3 Weeks course on the O’Reilly Learning Platform.
Regular readers of (and listeners to, and now video viewers of) this newsletter are affectionately known as jarheads, and are far more intelligent, sophisticated, and attractive than the average newsletter reader (or listener, or viewer). If you wish to become a jarhead, please subscribe using this button:
As a reminder, when this message is truncated in email, click on the title to open it in a browser tab formatted properly for both the web and mobile.
We Have A Theme Song? Maybe
Regular readers / jarheads will know I’ve been spending a lot of time on AI tools recently. One application I saw in a video and had to try out is called Chirp, which is a text-to-music model developed by a company called Suno. As with many of these tools, you need to join a Discord server to try it out.
Here’s the “Suno Chirp Bot” description:
The idea is, you send a /chirp
command to the bot, and you get a pop up like this:
My goal was to have Chirp generate a 30-second theme song (that’s the length you get in the free account) for this newsletter. I tried the last option, letting ChatGPT generate the lyrics, but they were not good, so presumably they’re tied into GPT-3.5. I therefore went to GPT-4 on my own account and asked it for lyrics.
Of course it responded.
I guess those are okay. Maybe. Better than making up my own, I suppose. Anyway, I fed them into the Chirp Bot from Suno, selected emo for the Style of Music (I figured that ought to be interesting), and here’s what I got:
Oookay. I mean, that was the best of several attempts, and it’s not horrifically awful.
Here is the pop version:
Again, not horrible, but if I started playing that at the beginning of all my newsletter videos I suspect the subscriber count would drop fast. Also, I’m not sure what that weird pronunciation of the word “Gradle” is all about, but hey, it could be worse (and was on other examples).
You should all be grateful I’m sparing you the metal version. I might try again with other styles, like barbershop, k-pop, or Gregorian chant, but I don’t want to use up all my free credits on this silliness (OR DO I?).
Does this mean Tftjs now has a theme song? I don’t think so, but let me know what you think in the comments.
Stable Diffusion
Keeping with the AI theme this week, here’s the weekly technical YouTube video I published:
It’s about using Java to generate images with the Stable Diffusion XL model, often referred to as SDXL. That’s what generated the robot on the thumbnail, though I removed the background with Canva when I added it to the image.
The documentation for Stable Diffusion REST API is located here. The hard part about translating the REST calls to Java is that they chose some odd conventions for the JSON data. For example, even just getting back the supported engines uses an array as the root element in the response:
[
{
"description": "Stability-AI Stable Diffusion XL v1.0",
"id": "stable-diffusion-xl-1024-v1-0",
"name": "Stable Diffusion XL v1.0",
"type": "PICTURE"
},
{
"description": "Stability-AI Stable Diffusion XL v0.9",
"id": "stable-diffusion-xl-1024-v0-9",
"name": "Stable Diffusion XL v0.9",
"type": "PICTURE"
},
…,
]
At least with that one, I could transform the JSON data into an array of type Engine, where Engine is a record:
That works, and I have a call to get all the engines even though the only one I want is that first one listed (SDXL v1.0). Things got a bit more complicated when you ask for the images, because the response looks like this JSON object:
That’s an array inside an array, and then come the JSON objects. It turned out I could just make an object to hold a List<Image>
, where the Image
record is
and that brings me to what surely must be an oversight. See that property called finishReason? I kept getting null for that. Then I realized I’d set up my Gson parser to convert camelcase in the Java code into snake case (i.e., lowercase with underscores) in the JSON data, and that worked everywhere except here, for the finishReason, where for some bizarre reason it uses camelcase instead. Sigh. Still, at least it was an easy fix.
At least the images from SDXL look way better and more detailed than the ones from DALL-E:
Nice, right? I rewrote my ImageCarousel so that I could store the images in separate subdirectories by generator, like sdxl, dalle, and midjourney, and still display them all. That gave me a chance to use the walk
method in the Files
class, which returns a Stream that does a depth-first search of the whole tree. Nice.
All of those changes are committed in the GitHub repository, and I decided to add four SDXL images as well so the ImageCarousel
still has something to do even if you don’t download any new ones.
Late in the week I did a lot of work with the Whisper API, which transcribes text from audio files, but I’ll talk about that next week.
Spring AI and Embeddings
I promised at the end of my video that I would tackle using the Spring Framework with these features next, but that’s probably going to get delayed again. In the meantime, Craig Walls (of Spring in Action fame), my friend and fellow NFJS speaker who I mentioned last week, continued his exploration of the experimental Spring AI project. He made this video this week:
Nice thumbnail. The idea with embeddings is that you can give the AI tool your own information and a query will search it as well as the documents the AI knows about. Craig did an example where he uploaded the text from a game manual into a vector database, arranged an embedding for it into GPT, and asked it question about the game.
He used a vector database, a data loader, and embeddings all together. That was all the more impressive because this is what the documentation for all of that looks like on the web site:
That is, shall we say, a little thin. I guess sometimes it pays to be an employee of VMware (the company behind Spring) and have access to the head of the project. :)
The good news is that (1) I now have access to his video, and (2) next week he’ll be a guest on a Tales from the jar side Live Stream on Thursday afternoon at 2:30pm EDT. I’ll announce that tomorrow and send a link to anyone who wants to join us.
Toots, Skeets, Etc
My chess skills
I haven’t played much chess recently. I talked to my friend Jim Harmon last week, and he hasn’t done much, either. Maybe that image shows why. :)
Biblical scholarship
Hard to argue with that. Can you imagine how some of our memes are going to sound even ten years from now?
Coming soon
Yeah, I’d watch that. If it was done by Marvel, the first episode would be pretty good, it would peak in episode 4, then there would be four episodes of padding, and then fade into an unresolved, pointless disaster at the end that people would complain about for months, or at least until the next series.
Hope your day goes better than this
I’m thinking, not good.
Magma
So true.
Writers Strike
It’s apparently a theme this week.
I’ll take Injection attacks for the win
I can’t wait to show my Trinity students this one when we talk about security. In case you don’t get the reference, an injection attack is when you accept input data from a user and just add it to your SQL database query without validating it. The classic example is this XKCD cartoon:
Little Bobby Tables is famous in the security literature.
Scrolling and scrolling
Nothing “mini” about it.
Tis the season
Kind of surprised there was no “the spice must flow!” joke in there somewhere. Maybe that’s coming soon.
A dresser for the ages
Very clever.
Whoa
For those users of JetBrains products, you can “clean up leftover tool directories”:
I just freed up about 21 Gb of space (!).
Don’t worry, be happy
Oof. That hit a bit too close to home.
Have a great week everybody!
The video version of this newsletter will be on the Tales from the jar side YouTube channel tomorrow.
Last week:
Week 2 of Spring in 3 Weeks, on the O’Reilly Learning Platform
Software Design, my course for undergrads at Trinity College
This week:
Week 3 of Spring in 3 Weeks, on the O’Reilly Learning Platform
Software Design, my course for undergrads at Trinity College