Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with including images in the request specifically in the JSON structure #2754

Open
AzizDXT opened this issue Feb 24, 2025 · 11 comments
Assignees
Labels
bug Something isn't working respond

Comments

@AzizDXT
Copy link

AzizDXT commented Feb 24, 2025

📌 Issue Title:

AI Prioritizes Image Requests Over Text Due to JSON Structure

📝 Description:

I am encountering an issue where, when sending both a text request and an image request together, the AI prioritizes the image over the text. This seems to happen because the image request appears at the end of the JSON structure, making the AI treat it as the latest message and focus on it instead of processing the text properly.

🔍 Problem Details:

  • When sending a combined request (text + image), the AI gives more importance to the image.
  • It seems that the order of elements in the JSON request affects the AI's behavior.
  • The text request is being ignored or not given enough weight because the image is positioned at the end.

⚡ Expected Behavior:

  • The AI should consider both the text and the image equally, rather than prioritizing the image simply because it appears last in the JSON structure.
  • The text should remain part of the conversation history, even if an image is attached in the request.

💡 Possible Solutions (Suggestions):

  1. Rearrange JSON Structure: Allow defining a priority field or ensure messages are processed based on timestamps instead of their position in JSON.
  2. Explicit Context Handling: Ensure the AI processes all message components (text + image) in order rather than favoring the latest entry.
  3. Modify AI Processing Logic: Ensure the AI weights text and images appropriately rather than prioritizing based on the request order.
@AzizDXT AzizDXT added the bug Something isn't working label Feb 24, 2025
@hlohaus
Copy link
Collaborator

hlohaus commented Feb 26, 2025

May you provide an example? Each vendor employs a unique method for image response.

@AzizDXT

@AzizDXT
Copy link
Author

AzizDXT commented Feb 26, 2025

I will activate Logs and then send you the exact problem and explain it in detail.

@AzizDXT
Copy link
Author

AzizDXT commented Feb 26, 2025

May you provide an example? Each vendor employs a unique method for image response.

@AzizDXT

For clarification, the issue occurs when sending a request via the API or in the main interface of G4F. When I upload an image for interpretation or analysis and then ask a related question in a new message within the same conversation, the model forgets the image or does not include it in the request. I want the image to be included in the request and context so that the model can reference it when answering follow-up questions.

@AzizDXT
Copy link
Author

AzizDXT commented Feb 26, 2025

It is preferable for the request to be like this and for G4F to accept it because the basic format in G4F is that the image should always be at the bottom of the body.

  "model": "gpt-4-vision-preview",
  "messages": [
    {
      "role": "system",
      "content": "You are an AI assistant that can analyze images and retain context."
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Analyze this image and describe what you see."
        },
        {
          "type": "image_url",
          "image_url": "https://example.com/sample-image.jpg"
        }
      ]
    },
    {
      "role": "assistant",
      "content": "This image contains a sunset over the ocean with a few scattered clouds."
    },
    {
      "role": "user",
      "content": "What colors are in the sunset?"
    }
  ],
  "max_tokens": 500
}

@hlohaus
Copy link
Collaborator

hlohaus commented Feb 26, 2025

I believe the issue has been resolved in the latest update. Image selections should persist across requests. A media selection box with image preview is now implemented. @AzizDXT

@AzizDXT
Copy link
Author

AzizDXT commented Feb 26, 2025

I will retest the interface again I didn't notice there was an update five hours ago.

@AzizDXT
Copy link
Author

AzizDXT commented Feb 26, 2025

The problem still exists. I will send you an example of the exact problem and it will become clear to you what the problem is.

@AzizDXT
Copy link
Author

AzizDXT commented Feb 26, 2025

Look at my conversation here, and you’ll see that the image is still prioritized. Based on my analysis, when the image is at the bottom of the JSON section, the AI gives it priority for analysis or explanation. I just want the request to be modified before being sent to the providers so that the image's position aligns with the message, rather than always being at the end of the request.


Image

Blackbox AI with BLACKBOXAI
Hello! How can I assist you today?

(7 words, 26 chars, 9 tokens)


User Input:

"explain this pic"

(3 words, 14 chars, 3 tokens)


Blackbox AI with BLACKBOXAI
Image Description

  • Type of Image: Pixel Art
  • Subject: A stylized cat

Color Palette:

  • Main Color: Orange (fur)
  • Accent Colors: White (belly, face markings), Blue (eyes)

Background:

  • Depicts a forest setting with dark tones
  • Subtle light particles or sparkles in the background
  • Floral elements at the bottom, adding a pop of color

Visual Elements:

  • Style:
    • Pixelated, giving a retro/video game feel
    • Simplified shapes and vibrant colors
  • Expression:
    • The cat appears calm and inquisitive with large, expressive eyes
    • Overall demeanor suggests a friendly and whimsical nature

Composition:

  • Framing: The cat is centered, drawing attention
  • Depth: The background features blurred trees, enhancing focus on the cat

This image combines pixel art charm with a serene forest ambiance, evoking a whimsical and magical atmosphere.

(144 words, 744 chars, 235 tokens)


User Input:

"how are you?"

(3 words, 9 chars, 4 tokens)


Blackbox AI with BLACKBOXAI
I'm here to help! If you have any questions or need assistance, feel free to ask.

Image Analysis

Description:

  • The image depicts a pixel art style representation of a cat.
  • The cat is primarily orange with white accents on its chest and face.
  • It has bright blue eyes and a faint smile, giving it a friendly appearance.

Background Elements:

  • The background appears to be a dark forest setting with hints of foliage.
  • Small flowers at the bottom add color to the scene.

Art Style:

  • The artwork features a pixelated style, characteristic of retro video games.
  • The use of contrasting colors enhances the cat's features.

If you need more details or have specific questions, just let me know!

(126 words, 574 chars, 163 tokens)


User Input:

"5 * 5?"

(2 words, 2 chars, 5 tokens)


Blackbox AI with BLACKBOXAI
Answer to Mathematical Question:
The result of 5 × 5 is 25.

Image Characteristics:

  • Content: The image appears to depict a pixel art representation of a cat.
  • Color Palette: Predominantly orange with white and blue accents.
  • Setting: The background suggests a whimsical forest environment with flowers and subtle lighting effects.

If you have any specific questions about the image or need further assistance, feel free to ask!

(67 words, 351 chars, 98 tokens)

@AzizDXT
Copy link
Author

AzizDXT commented Feb 26, 2025

I found the solution and it is simple, just add a system message stating that an image has been uploaded, which you can use according to the current user’s request or benefit from later.

@hlohaus
Copy link
Collaborator

hlohaus commented Feb 27, 2025

Understood, @AzizDXT . However, I will not implement this solution as it is only compatible with Blackbox. We require a more universally applicable feature.

@AzizDXT
Copy link
Author

AzizDXT commented Feb 27, 2025

Understood, @AzizDXT . However, I will not implement this solution as it is only compatible with Blackbox. We require a more universally applicable feature.

Exactly, I will try to find a comprehensive solution to this problem, and if I find I may contribute to the development of G4F.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working respond
Projects
None yet
Development

No branches or pull requests

3 participants