Gemini 3.5 Flash Computer Use - screen-driving agents move into the main model

Google은 2026년 6월 24일 computer use를 Gemini 3.5 Flash의 built-in tool로 통합해, Gemini API와 Gemini Enterprise Agent Platform에서 browser, mobile, desktop 환경을 조작하는 agent를 만들 수 있게 했다. Gemini API 문서는 gemini-3.5-flash를 computer use…

배경 및 맥락

Computer use agent는 모델이 화면을 인식하고 버튼 클릭, 입력, 앱 전환 같은 UI action을 수행하는 방식으로 동작한다. 지금까지 이 영역은 독립 preview model, browser automation framework, RPA 도구, hosted sandbox가 뒤섞인 실험 단계에 가까웠다. Google이 computer use를 Gemini 3.5 Flash의 built-in tool로 넣은 것은 이 기능이 부가 데모가 아니라 일반 API 제품군의 핵심 capability로 이동하고 있음을 보여준다.

최근 Notion cache에는 Microsoft AutoJack처럼 browsing agent와 local MCP control plane이 만났을 때 생기는 보안 문제가 있었고, Claude Tag처럼 agent가 협업 표면으로 들어오는 흐름도 있었다. 이번 항목은 보안 취약점이나 Slack 협업 bot이 아니라, 대형 모델 provider가 화면 조작 능력을 기본 모델 도구로 통합했다는 점에서 별도의 의미가 있다.

핵심 내용

Google은 2026년 6월 24일 Gemini 3.5 Flash에 computer use built-in tool을 공개했다. 공식 발표는 개발자와 기업이 Gemini API와 Gemini Enterprise Agent Platform에서 browser, mobile, desktop 환경을 보고 추론하고 조작하는 custom agent를 만들 수 있다고 설명한다. 기존 Gemini의 function calling, Search grounding, Maps grounding 위에 화면 기반 action loop가 들어가면서 long-horizon enterprise automation과 continuous software testing을 겨냥한다.

Gemini API 문서는 gemini-3.5-flash를 computer use 권장 모델로 제시하고, browser, mobile, desktop environment support와 intent 기반 action, configurable safety policies, prompt injection detection을 포함한다고 정리한다. Google은 live environment에서 indirect prompt injection 위험을 줄이기 위해 adversarial training을 적용하고, 민감하거나 되돌릴 수 없는 작업에는 explicit user confirmation을 요구하거나 prompt injection이 감지되면 task를 중단하는 optional enterprise safeguard를 제공한다고 밝혔다.

경쟁 구도 / 비교

Computer use는 OpenAI, Anthropic, Google이 모두 겨냥하는 agent 경쟁의 핵심 표면이다. 차이는 모델 성능만이 아니라 배포 형태와 guardrail이다. 별도 computer-use model은 실험과 벤치마크에는 유용하지만, 실제 제품에서는 Search, code execution, function calling, enterprise identity, 정책 엔진과 한 흐름으로 통합되는지가 더 중요하다.

최근 AutoJack 사례가 보여준 것처럼 화면 조작 agent는 브라우저와 로컬 서비스, 인증 세션, 파일 시스템을 동시에 만질 수 있다. 따라서 Google의 발표에서 중요한 부분은 agent가 클릭을 잘한다는 사실보다 configurable safety policy와 prompt injection detection을 API surface에 포함했다는 점이다. 다만 이를 사용한다고 해서 sandbox, least privilege, egress control, audit trail 설계를 생략할 수는 없다.

의미

산업적으로 computer use는 agent가 API가 없는 레거시 앱과 업무 소프트웨어까지 다루게 만드는 bridge technology다. SaaS 통합이 없는 업무도 화면이 있으면 자동화할 수 있으므로, 지식근로 자동화의 coverage가 넓어진다. 동시에 이 방식은 구조화된 API보다 취약하고 상태 추적이 어려워 운영 리스크도 커진다.

실무적으로는 Gemini 3.5 Flash computer use를 도입할 때 성공률만 보지 말고 action trace, screenshot retention, PII masking, confirmation threshold, failure rollback, app별 allowlist를 같이 설계해야 한다. 특히 QA automation과 back-office workflow는 같은 computer use라도 risk profile이 다르므로, 업무별 policy를 분리하는 것이 필요하다.

Gemini 3.5 Flash Computer Use - screen-driving agents move into the main model

배경 및 맥락

핵심 내용

경쟁 구도 / 비교

의미

관련 읽을거리

Gemini 3.5 Flash Computer Use - screen-driving agents move into the main model

배경 및 맥락

핵심 내용

경쟁 구도 / 비교

의미

관련 읽을거리