Using AI vision to generate alt text for images
I'm not proud of it, but sometimes I forget to add alt text to the images I post. This flaw makes these images invisible for people who access the web using screen readers. So I decided to fix this using AI, computer vision, and serverless functions.
Whenever I add an image to a blog post, I want this image to be analyzed by an AI. I want the AI to describe an image and append alt text to it if none exists. Since I use Next.js and MDX, and I don't want to spend too much on this operation, I want this analysis to happen only at build time.
There's a while variety of accessible cloud-base computer vision tools:
None of these services is perfect; in my tests, though, Azure came on top as the most accurate when it comes to image descriptions.
Since, again, I'm using Next.js, I don't have to do anything extra to deploy a serverless function that would request image analysis. I can do this right from within Next.js by just adding a file to /pages/api/ folder. The function will be available at https://rosnovsky.us/api/.
That's it. All we need is a cloud-based computer vision service and a way to deploy a serverless function.
Bringing it all together
First, let's create a serverelss function called pictureDescription. This function will accept image URL, reach out to Azure with this URL, and return alt text it recieves in response from Azure.
This function will return something like this:
Now, to the MDX Component. This component allows you to use React components right in your MDX. Let's create a funtion that would take image path, width, height and (optionally) alt text. If alt text is supplied, the component will just return Next.js' Image component with existing alt. However, if I forgot to supply alt text, the function will call our API, fetch alt text, and add it to the Image component. If there was an error fetching alt text, an "I'm sorry" text will populate alt text.
That's it. Now, whenever I include ImageWithAlt component in my MDX, it gets an alt text in case there were none.
Check this out:
This image had no alt text so Azure generated one for us:
There are a few caveats with this approach. First, obviously, its inaccuracy. While a generic description is better than no description, it certainly lacks accuracy. I hope that with time AI vision and analysis algorithms will get better at it. Second, an image has to be publically available on the web at build time. This requires me to first deploy the site with just images for an upcoming post and then redeploy it when I actually publish the post. Arguably, one could just deploy both images and the post at the same time and then just redeploy it again, but neither is great. Something to think about.
P.S. Always incude alt text with your images!
Since I launched my new blog, I discovered that social images were broken; they were an afterthought after all, and I decided to fix it permanently.
At the bottom of this page you can find the latest song I've listened to on Apple Music. Here's how I made it works.