Food Scan Fun with Google Gemini: Building Your Food Recognition App

Md. Abdullah Al Mamun
8 min readJan 20, 2024

--

Calling all foodies and tech enthusiasts! Imagine whipping out your phone, pointing it at a plate of deliciousness, and instantly having access to a wealth of information about the food you’re about to devour. It sounds like magic, right? Well, with the power of Google Gemini, it’s not just a fantasy—it's a reality you can build yourself!

Table of foods scanned by AI (Google Gemini)

In this blog post, we’ll embark on a delicious journey, guiding you through the process of creating your very own food scan app using Google Gemini. No core coding experience is needed, just a healthy dose of curiosity and a love for all things food!

Step 1: Setup your kitchen (environment!)
Create a Python virtual environment

Windows:

Creating a Python virtual environment on Windows is a straightforward process. Here are the steps using the built-in venv module, assuming you have Python installed on your system:

Open a Command Prompt:

Open the Command Prompt by pressing Win + R, typing cmd, and pressing Enter.

Navigate to the desired directory:

Use the cd command to navigate to the directory where you want to create the virtual environment. For example:

cd path\to\your\desired\directory

Create the virtual environment:

Run the following command to create a virtual environment named “venv” (you can replace “venv” with any name you prefer):

python -m venv venv

Note: Ensure that you have Python added to your system PATH. If not, you might need to provide the full path to the Python executable, like C:\Path\to\python.exe -m venv venv.

Activate the virtual environment:

To activate the virtual environment, run the following command:

venv\Scripts\activate

After activation, your command prompt should show the virtual environment name in the prompt, indicating that you are now working within the virtual environment.

Example:

(venv) C:\path\to\your\desired\directory>

Ubuntu/Mac:

Creating a Python virtual environment on Ubuntu is similar to the process on Windows, but the commands are slightly different. Here’s how you can do it:

Install virtualenv:

  • If you haven’t installed virtualenv yet, you can do so using pip. Open a terminal and run:

sudo apt update

sudo apt install python3-pip

pip3 install virtualenv

Navigate to the desired directory:

  • Use the cd command to navigate to the directory where you want to create the virtual environment. For example: cd /path/to/your/desired/directory

Create the virtual environment:

  • Run the following command to create a virtual environment named “venv” (you can replace “venv” with any name you prefer):

virtualenv myenv

Activate the virtual environment:

  • To activate the virtual environment, run the following command: source venv/bin/activate
  • After activation, your command prompt should show the virtual environment name in the prompt, indicating that you are now working within the virtual environment.
    Example: (venv) user@hostname:/path/to/your/desired/directory$

Step 2: Gather your ingredients (tools, not food!)

  • Google Gemini API Key: This powerful language model from Google AI will be the brains behind your app, identifying and analyzing food images. To get the API key for free, go to https://makersuite.google.com/ login and click on Get API Key on the left. After that, click on Create API Key in the new project. Here you will get the API key. Copy the API key and keep it in the .env file in the project.
  • Install the necessary packages
  • pip install streamlit google-generativeai dotenv pillow
  • or
requirements.txt

streamlit
google-generativeai
dotenv
pillow

pip install -r requirements.txt

Streamlit:

  • Purpose: Streamlines the development and deployment of web applications for data scientists and machine learning engineers.
  • Key features:
  • Simple API for creating interactive web apps with pure Python code.
  • Live updates: apps update as you code.
  • Supports a wide range of data visualizations and widgets.
  • Deployment options for sharing apps with others.

google-generativeai:

  • Purpose: Provides access to Google’s Generative AI API, enabling text-based AI capabilities like text generation, translation, and question-answering.
  • Key features:
  • Generates text, translates languages, and answers your questions in an informative way.
  • Leverages Google AI’s powerful text-based models.

python-dotenv:

  • Purpose: Loads environment variables from a .env file, keeping sensitive information out of code and version control.
  • Key features:
  • Simplifies configuration management.
  • Protects secrets and API keys.

Pillow:

Pillow is a powerful library for manipulating and analyzing images in Python. It supports a wide range of image formats, including JPEG, PNG, GIF, and TIFF.

It offers numerous features for:

  • Opening and saving images: Read and write images in various formats.
  • Image manipulation: Resize, rotate, crop, and modify images in various ways.
  • Pixel access and manipulation: Access and modify individual pixels within an image.
  • Drawing and shapes: Draw on images with lines, rectangles, circles, and other shapes.
  • Filters and effects: Apply different filters and effects to your images.
  • Conversions: Convert images between different color spaces and formats

Step 3: Let’s start Cooking (coding!)

  1. Create a Python file name app.py
  • Import necessary libraries:
import streamlit as st
import os

from PIL import Image
import google.generativeai as genai
from dotenv import load_dotenv
  • Load the api key:
load_dotenv()

genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
  • Creating a GenerativeModel Instance:
model = genai.GenerativeModel('gemini-pro-vision')

model = genai.GenerativeModel('gemini-pro-vision') instantiates a GenerativeModel object, representing the "gemini-pro-vision" model.

This model possesses unique abilities: Multimodal: It accepts and seamlessly combines text and images as input. Versatile: It excels in various tasks, including text generation, translation, question-answering, and visual understanding.

  • Create a Function to get the image details:
def input_image_details(uploaded_file):
if uploaded_file is not None:
bytes_data = uploaded_file.getvalue()

image_parts = [
{
"mime_type": uploaded_file.type,
"data": bytes_data
}
]

return image_parts
else:
raise FileNotFoundError("No file uploaded")

The input_image_details function takes an uploaded file, typically from a Streamlit file uploader, and extracts relevant details such as MIME type and byte data. It checks if the uploaded file is not None, indicating a successful upload. If a file is present, it retrieves its byte data and MIME type, encapsulates them in a list of dictionaries representing image parts, and returns this information. If no file is uploaded, it raises a FileNotFoundError with the message "No file uploaded." The function is designed to be part of an image processing or analysis workflow, providing a structured representation of the uploaded image for further processing within a Streamlit application.

  • Create a Function the get the response from Gemini LLM
def get_gemini_response(input, image, prompt):
response = model.generate_content([input, image[0], prompt])
return response.text

This Python function seamlessly integrates text, images, and prompts to generate creative responses from the Gemini AI model: It takes three inputs — text, an image, and a prompt — and sends them to the model, crafting a response that blends all three elements, demonstrating Gemini’s ability to process multimodal information and create novel text based on diverse inputs.

  • Create UI with Streamlit

This code snippet lays the foundation for an interactive food scanning app using Streamlit and Gemini, crafting a user-friendly interface for image uploads and prompt input, setting the stage for food analysis and exploration.


st.set_page_config(page_title="Food Scan")

st.header('Food Scan with Google Gemini')
input = st.text_input("Input prompt: ", key='input')
uploaded_file = st.file_uploader("Choose an image of the food or food table", type=["jpg", 'jpeg', 'png'])
image = ""
if uploaded_file is not None:
image = Image.open(uploaded_file)
st.image(image, caption="Uploaded Image", use_column_width=True)

submit = st.button("Scan the Food(s)")

The code configures the Streamlit page with the title “Food Scan” using st.set_page_config. It then adds a header to the page, displaying "Food Scan with Google Gemini." The code includes a text input widget for users to enter a prompt related to food, a file uploader for selecting an image file of food or a food table, and an area to display the uploaded image. The image, if uploaded, is opened using the Image module from the Python Imaging Library (PIL) and displayed using st.image. Finally, a button labeled "Scan the Food(s)" is provided for users to initiate a food scanning action. The code suggests an interactive interface for users to input prompts, upload images, and trigger the scanning process, involving the Google Gemini model for content generation or analysis related to food.

  • Let’s create the prompt. For example, I created this one for this app. You can change according to your needs.
input_prompt = """
You have to identify different types of food in images.
The system should accurately detect and label various foods displayed in the image, providing the name
of the food and its location within the image (e.g., bottom left, right corner, etc.). Additionally,
the system should extract nutritional information and categorize the type of food (e.g., fruits, vegetables, grains, etc.)
based on the detected items. The output should include a comprehensive report or display showing the
identified foods, their positions, names, and corresponding nutritional details.
"""
  • Submit to Gemini Vision API:
if submit:
image_data = input_image_details(uploaded_file)
response = get_gemini_response(input_prompt, image_data, input)
st.subheader("Food Scan report: ")
st.write(response)

This code block orchestrates the food scanning process, expertly blending user input, image analysis, and Gemini’s insights to generate a comprehensive food scan report, effortlessly bridging user interaction and AI-powered exploration.

  • Here is the complete code
import streamlit as st
import os

from PIL import Image
import google.generativeai as genai
from dotenv import load_dotenv

load_dotenv()

genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

model = genai.GenerativeModel('gemini-pro-vision')

def get_gemini_response(input, image, prompt):
response = model.generate_content([input, image[0], prompt])
return response.text

def input_image_details(uploaded_file):
if uploaded_file is not None:
bytes_data = uploaded_file.getvalue()

image_parts = [
{
"mime_type": uploaded_file.type,
"data": bytes_data
}
]

return image_parts
else:
raise FileNotFoundError("No file uploaded")

st.set_page_config(page_title="Food Scan")

st.header('Food Scan with Google Gemini')
input = st.text_input("Input prompt: ", key='input')
uploaded_file = st.file_uploader("Choose an image of the food or food table", type=["jpg", 'jpeg', 'png'])
image = ""
if uploaded_file is not None:
image = Image.open(uploaded_file)
st.image(image, caption="Uploaded Image", use_column_width=True)

submit = st.button("Scan the Food(s)")

input_prompt = """
You have to identify different types of food in images.
The system should accurately detect and label various foods displayed in the image, providing the name
of the food and its location within the image (e.g., bottom left, right corner, etc.). Additionally,
the system should extract nutritional information and categorize the type of food (e.g., fruits, vegetables, grains, etc.)
based on the detected items. The output should include a comprehensive report or display showing the
identified foods, their positions, names, and corresponding nutritional details.
"""

if submit:
image_data = input_image_details(uploaded_file)
response = get_gemini_response(input_prompt, image_data, input)
st.subheader("Food Scan report: ")
st.write(response)

Step 4: Let’s Try the dish (App!)

streamlit run app.py

Run the above command, you will be able to see such a screen in your web browser.

Try with food Table image, for example,

Even if you don’t write anything in the input prompt it will run and provide accurate results. You can use the input prompt for example to separate liquid food or seafood etc. Upload the image, click the scan the Food(s), and see the magic.

Remember: This is just a basic recipe, feel free to experiment and add your personal touches. With Google Gemini as your sous-chef, the possibilities are endless! So grab your phone, fire up your creativity, and get ready to build your food scan app — the tastiest tech project you’ll ever cook up!

Bonus tips:

  • Use Gemini’s question-answering capabilities to let users ask questions about the food, like “Is this dish gluten-free?” or “What are some good wine pairings?”
  • Integrate food databases like FoodDB or Edamam for even more detailed information and recipe suggestions.
  • Get creative with your app’s design and branding — make it visually appealing and fun to use!

With a dash of code and a sprinkle of imagination, you can turn your love for food and technology into a delicious app that’s both informative and entertaining. So, go forth and create your food scan masterpiece — the world (and your taste buds) will thank you!

--

--

Md. Abdullah Al Mamun
Md. Abdullah Al Mamun

Written by Md. Abdullah Al Mamun

Student, Computer Science and Engineering at the University of Rajshahi. Machine leraning enthusiastic, python lover.

Responses (2)