Tales from the jar side: Flux 1.1 images with Java, Open AI DevDay, dev2next, a new LangChain4j book, and the usual tweets and toots

This morning I coughed up a pawn, a bishop, and a rook. I must have a chess infection. It was a rough knight. (rimshot, h/t @MediocreJoker85)

Oct 06, 2024

Welcome, fellow jarheads, to Tales from the jar side, the Kousen IT newsletter, for the week of September 29 - October 6, 2024. This week I taught week 3 of my O’Reilly Learning Platform course on Spring in 3 Weeks, I gave a couple of talks at the Dev2Next conference in the Denver area, and I taught my regular courses at Trinity College.

Flux 1.1 Pro Image Generator

https://bflapistorage.blob.core.windows.net/public/0c12cc470d224985a5d47ed1e0e7eb89/sample.jpg

Normally to generate an image I go with either DALL-E 3, Stable Diffusion, and Midjourney, though only DALL-E 3 and Stable Diffusion provide programmatic access. I know there are many other image generators out there, but these seemed to be the best, and I could access two of them in code.

A few months ago I heard about Flux 1 [pro] and Flux 1 [dev], from a company called Black Forest Labs, which was accessible on Hugging Face (an open source site for AI tools), but again, not programmatically. This week the company announced:

A new version, Flux 1.1 [pro], and
The BFL (presumably for Black Forest Labs) API

That meant I could try out the new model in code.

Like many AI tools these days, the newer version is actually less expensive than the existing one: Flux 1.1 [pro] is 4 cents/image, and Flux 1 [pro] is 5 cents/image. I can handle that much math in my head, so I decided to go with the latest model.

The docs walk you through registering and generating (yet another) key. That took me a few tries, as their system had trouble for some reason. Eventually I got in, however. They give you 50 credits to start with, leading me to wonder how much a credit was worth. Then I saw this page:

Let’s see: $10 for 1000 credits is … um, $10 is 1000 pennies, so dividing by 1000 … yeah, a credit is a penny. I get that one, too. Since each image is 4 cents, the initial 50 credits was worth 12 1/2 images, though I don’t think I can generate half an image. I splurged and added $10.

The API itself is just some REST endpoints. You access one endpoint with a POST request containing JSON data:

import os
import requests
request = requests.post(
    'https://api.bfl.ml/v1/flux-pro-1.1',
    headers={
        'accept': 'application/json',
        'x-key': os.environ.get("BFL_API_KEY"),
        'Content-Type': 'application/json',
    },
    json={
        'prompt': 'A warrior cat rides a dragon into battle.',
        'width': 1024,
        'height': 768,
    },
).json()
print(request)
request_id = request["id"]

That’s the python example on their getting started page. That returns an id. Then you poll a different endpoint to wait for the image to be ready:

import time

while True:
    time.sleep(0.5)
    result = requests.get(
        'https://api.bfl.ml/v1/get_result',
        headers={
            'accept': 'application/json',
            'x-key': os.environ.get("BFL_API_KEY"),
        },
        params={
            'id': request_id,
        },
    ).json()
    if result["status"] == "Ready":
        print(f"Result: {result['result']['sample']}")
        break
    else:
        print(f"Status: {result['status']}")

(See the call to os.environ.get(“BFL_API_KEY”)? We’ll come back to that.)

In other words, send a GET request, wait half a second, and do it again until the result is ready. The result is another JSON object, this time holding a URL to the image. That means every access requires at least three requests:

The initial image generation request with your prompt.
One or more polling requests to see if the result is ready yet.
A final GET request to download the actual image.

I changed the prompt to “A warrior cat rides a dragon into battle,” and some results are embedded here.

https://bflapistorage.blob.core.windows.net/public/00a28b19e6004816976435335678e2d8/sample.jpg

They gave their examples in Python, but there’s no library to download or any simplifications over the REST API. It’s all just transmitting a simple JSON object via POST, followed by a GET request to poll the endpoint and one final GET to download the image.

After verifying the Python code works as advertised, I decided to port it to Java. That’s the sort of thing I’ve been doing for months now, so I know the process. As better documentation, they provide a JSON file that follows the OpenAPI specification (what we used to call Swagger), which showed everything.

I like to put all my input and output records inside a single class, so I can see them all at the same time:

I use a static import of this class in my service and everything is available from there. As for the service, here’s one version:

The implementation details are mostly just using the HttpClient to transmit the request, a loop to poll every half-second for the results, and a download method to save the image to a generated filename.

https://bflapistorage.blob.core.windows.net/public/e93e25a537cf43f1bd327ae56a77f4ed/sample.jpg

Of course, I couldn’t leave well enough alone, so I spent several hours trying to replace the while/true loop with the Thread.sleep (what they call “busy waiting”) with a ScheduledExecutorService and trying to figure out why when I instantiated that inside a try-with-resources block, it kept timing out and never downloaded the image. I also started to make a video about the process, but I kept going into too much detail and the result was over an hour long. That’s not happening, so I’ll try again soon.

I’m sure the frameworks like LangChain4j and Spring AI will eventually add this source in as an option. The good part is I didn’t have to wait for that. All my code works now. It took a couple of hours to get it into the shape I wanted (other than finishing the scheduled executor service), but the process was easy enough to follow.

For part of my development process, I tried to use the new Canvas inside of GPT-4o:

See how all the code is now in a window on the right? That’s also new this week, announced at the OpenAI DevDay 2024. It’s okay, though it’s obviously just a copy of the Artifacts capability Claude added months ago. Still, a welcome development.

https://bflapistorage.blob.core.windows.net/public/b6021fbf06834a35bae88a2dedc72191/sample.jpg

The code is in this GitHub repo, which I’m using for my Trinity course on AI integration. See the blackforestlabs sub-package for details.

OpenAI DevDay 2024

Speaking of the one-day OpenAI conference, this year it was held on Tuesday of last week. The big announcements were:

A Realtime API that can hold speech-to-speech conversations directly.
Vision fine-tuning, so you can train your own LLM on image data as well as text.
Prompt caching, which is really important as prompts get longer and context windows get bigger, and
Distillation, which allows huge LLMs to generate fine-tuning data for smaller LLMs.

Of those, the big one for me is the Realtime capability. After all these years, it’s going to force me to actually dig into websockets, which are long-lived network connections that enable two-way communication. I know Java has had that capability built into it for a long time, but I’ve never had a use case where I really needed it. My ChatGPT app on my phone finally got the Advanced Voice Assistant, and the Realtime API is supposedly a way to do that programmatically. It’s a stateful, event-based API, and the documentation shows a there’s a lot to it.

On the downside, it is by far the most expensive API they’ve ever released, in addition to being the most complicated. They estimate that audio inputs cost about 6 cents/minute, but the audio outputs cost 24 cents/minute, which is a lot. Between that and the complexity, I may have to wait for the frameworks to catch up to use that one much, though I’ll probably do some experiments sooner rather than later.

VenCon Revisited

This week I attended the dev2next conference in Lone Tree, CO (a suburb of Denver). I was only there a short time. I arrived Sunday because I had to teach an online course Monday morning (meaning 6am in the Mountain time zone, which came really early for me) and didn’t attend their workshops on that day. I gave two talks on Tuesday afternoon, and left Wednesday morning. I did get to see a lot of people I hadn’t run into for a while, which was great, but I could have used an extra day or two. Still, I had to get back for my Trinity courses on Thursday, so it was a quick trip.

The best part for me was that I finally got to talk to Dave Thomas, the publisher at the Pragmatic Bookshelf (not the owner of the Wendy’s fast food chain). He was kind enough to attend both my talks, and even asked the occasional question.

As for my custom GPT, the VenBot 5000, it was well-received, though unfortunately Venkat wasn’t in the room to see the live demo. Most people in the room agreed that the answers sounded very Venkat-like, and we all decided it was clearly the most Venkat of all possible Venkat AI clones. Your mileage may vary, of course.

My only regret is that I fell into NFJS mode during my second talk and kept thinking I had 90 minutes rather than 75. Fortunately, Jeanne was there to stop me from going too far over my time. :)

LangChain4j Book

I knew that Antonio Concalves was working on a LangChain4j book, and I figured there would be some announcement about progress during the Devoxx Belgium conference this week. I didn’t expect that this morning he would say it’s available for purchase everywhere. Understanding LangChain4j is at Amazon and other places. I was going to buy it, but it turned out I could add it to my Kindle Unlimited plan, which I keep meaning to cancel but haven’t yet.

I already started digging in. I have high hopes for the book. I’ve been using and recommending LangChain4j for months now, but it’s got to help having the perspective of one of the core committers.

My Toughest Lesson So Far

As I’ve mentioned, I’m now a “Professor of the Practice” in the Computer Science department at Trinity College in Hartford, CT. I taught here last year as an adjunct, and this year I’m full-time. All of my classes so far have been for junior or seniors, with an occasional experienced sophomore thrown in.

In all that time, I can now confidently say that by far the toughest lesson I’ve had to teach is:

How to set an environment variable in your operating system.

Seriously. The very concept of environment variables — what they are, why an operating system would use them, how to access them and set them — has almost all my students completely mystified.

I get it, at least to a degree. These kids grew up with smart phones, and environment variables aren’t a concept that translates easily to that world. (Neither does the idea of a file system, but that’s a different issue.) I could just be acting like my father being shocked that I didn’t know anything about the carburetor on my first car, but I don’t think so. All the AI tools require you to register and get a key, which you then set as an environment variable in your OS (as I did earlier in this newsletter for Flux 1.1 [pro]), so this is something they’re all going to run into sooner or later.

It’s also not like there aren’t 20 billion YouTube videos talking about how to handle them, and ChatGPT/Claude/Gemini will happily describe the process for you in as much detail as you want. I just never expected the very idea to be as huge an intellectual leap as it’s been so far.

I promise that after my class, my students will know what they are and how to deal with them. But geez, I didn’t expect to have to spent this much time and energy on such a basic thing.

To fix a different problem, I’m thinking of starting off my classes each week with a ten-minute snippet from Monty Python and the Holy Grail, because if you don’t know that movie, you’re not really a developer. But one massive challenge at a time, I guess.