Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current browser screenshot:为空 #939

Open
0three opened this issue Mar 25, 2025 · 1 comment
Open

Current browser screenshot:为空 #939

0three opened this issue Mar 25, 2025 · 1 comment

Comments

@0three
Copy link

0three commented Mar 25, 2025

我在3.24日最新代码的llm.py的738行打印了传给大模型的params,发现Current browser screenshot:为空,模型似乎并没有接收到当前浏览器的截图,以下是具体的参数:

{
  "model": "qwen-max",
  "messages": [
    {
      "role": "system",
      "content": "You are OpenManus, an all-capable AI assistant, aimed at solving any task presented by the user. You have various tools at your disposal that you can call upon to efficiently complete complex requests. Whether it's programming, information retrieval, file processing, or web browsing, you can handle it all. The initial directory is: C:\\Users\\Govern-Nation\\Downloads\\OpenManus-324\\OpenManus-main\\workspace"
    },
    {
      "role": "user",
      "content": "帮我查询xxx的个人信息,并生成一份详细的简历,以doc格式保存"
    },
    {
      "role": "user",
      "content": "\nWhat should I do next to achieve my goal?\n\nWhen you see [Current state starts here], focus on the following:\n- Current URL and page title\n- Available tabs\n- Interactive elements and their indices\n- Content above or below the viewport (if indicated)\n- Any action results or errors\n\nFor browser interactions:\n- To navigate: browser_use with action=\"go_to_url\", url=\"...\"\n- To click: browser_use with action=\"click_element\", index=N\n- To type: browser_use with action=\"input_text\", index=N, text=\"...\"\n- To extract: browser_use with action=\"extract_content\", goal=\"...\"\n- To scroll: browser_use with action=\"scroll_down\" or \"scroll_up\"\n\nConsider both what's visible and what might be beyond the current viewport.\nBe methodical - remember your progress and what you've learned so far.\n"
    },
    {
      "role": "assistant",
      "content": "为了查询xxx的个人信息并生成一份详细的简历,我需要首先在网络上搜索相关信息。不过,在开始之前,我需要澄清一些事项:\n\n1. 我们将使用公开可用的信息来创建简历。如果xxx有公开的社交媒体账号、个人网站或LinkedIn页面,我们可能会找到一些有用的信息。\n2. 由于隐私问题,某些个人信息可能无法获取,而且我们应该尊重个人隐私。\n3. 生成的简历将基于找到的信息,并且可能不会包含所有细节,因为不是所有的个人信息都在线上公开。\n\n现在,我将开始搜索“xxx”的个人信息。请稍等片刻。",
      "tool_calls": [
        {
          "id": "call_0fbefb0714b142b1906593",
          "type": "function",
          "function": {
            "name": "browser_use",
            "arguments": "{\"action\": \"web_search\", \"query\": \"xxx\"}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "Observed output of cmd `browser_use` executed:\nSearched for 'xxx' and navigated to first result: https://scholar.google.com/citations?user=KYCxGy4AAAAJ&hl=zh-CN\nAll results:https://scholar.google.com/citations?user=KYCxGy4AAAAJ&hl=zh-CN\nhttps://zhuyr97.github.io/\nhttps://openreview.net/profile?id=~Yurui_Zhu1\nhttps://github.com/zhuyr97\n\nhttps://be.linkedin.com/in/yurui-zhu-8a5022336\nhttps://dblp.org/pid/282/6550\nhttps://ieeexplore.ieee.org/author/37088565620\nhttps://www.semanticscholar.org/author/Yurui-Zhu/2109439130\nhttps://www.researchgate.net/profile/Yurui-Zhu",
      "name": "browser_use",
      "tool_call_id": "call_0fbefb0714b142b1906593"
    },
    {
      "role": "user",
      "content": "Current browser screenshot:"
    },
    {
      "role": "user",
      "content": "\nWhat should I do next to achieve my goal?\n\nWhen you see [Current state starts here], focus on the following:\n- Current URL and page title\n   URL: https://scholar.google.com/citations?user=KYCxGy4AAAAJ&hl=zh-CN\n   Title: \u202aYurui Zhu\u202c - \u202aGoogle 学术搜索\u202c\n- Available tabs\n   1 tab(s) available\n- Interactive elements and their indices\n- Content above or below the viewport (if indicated)\n- Any action results or errors\n\nFor browser interactions:\n- To navigate: browser_use with action=\"go_to_url\", url=\"...\"\n- To click: browser_use with action=\"click_element\", index=N\n- To type: browser_use with action=\"input_text\", index=N, text=\"...\"\n- To extract: browser_use with action=\"extract_content\", goal=\"...\"\n- To scroll: browser_use with action=\"scroll_down\" or \"scroll_up\"\n\nConsider both what's visible and what might be beyond the current viewport.\nBe methodical - remember your progress and what you've learned so far.\n"
    }
  ],
  "tools": [...],
  "tool_choice": "auto",
  "timeout": 300,
  "max_tokens": 8192,
  "temperature": 0.0
}

但是我之前下载的一份代码是有截图的,输出内容如下:

    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Current browser screenshot:"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/jpeg;base64,iVBORw0KGgoAAAANSUhEUgAABQA...."
          }
        }
      ]
    },

想问下这是怎么回事?

@LiKeJian
Copy link

感觉问题太多不如 OWL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants