Google TurboQuant — LLM KV 캐시 6배 압축, 정확도 손실 제로 달성 (ICLR 2026)

Google이 LLM의 KV 캐시를 기존 16비트에서 3비트로 압축해 메모리 사용량을 최소 6배 줄이면서도 정확도 손실이 없는 알고리즘 TurboQuant를 공개했다. ICLR 2026에서 정식 발표 예정이며, H100 GPU에서 4비트 적용 시 32비트 대비 최대 8배 추론 성능 향상이 확인됐다. 🔍 왜 주목해야 하나 KV 캐시는 긴 컨텍스트 처리 시 GPU 메모리의 가장 큰 병목…

트렌드

모델

개발도구

같이 읽어볼 만한 글

전체 글 보기

2026년 4월 8일

Cursor, warp decode 공개 — Blackwell 기반 MoE 추론을 1.84x 가속하며 정확도도 개선

Cursor는 2026년 4월 6일 warp decode를 공개하며, Blackwell GPU에서 Mixture-of-Experts(MoE) decode 경로의 병렬화 축을 expert 중심에서 output 중심으로 뒤집어 1.84x throughput 향상과 FP32 기준 1.4x 더 높은 정확도를 얻었다고 밝혔다. 기존 expert-centric 경로의 8단계 중 5단계를 제거하고,…

2026년 5월 10일

Cursor 3.3 PR Review, Build in Parallel, and Split PRs — 코딩 에이전트 경쟁이 코드 생성에서 리뷰·병렬 실행 워크플로로 확장

Cursor는 2026년 5월 7일 Cursor 3.3에서 PR Review, Build in Parallel, Split PRs를 공개했다. 새 PR review UI는 review thread·top-level comment·commit history·changes tree를 한 화면에 모으고, Build in Parallel은 plan의 독립 작업을 비동기 subagent로 병렬…

2026년 5월 9일

More flexible secrets and variables for Copilot cloud agent — 코딩 에이전트 운영의 핵심이 repo별 설정에서 조직 공통 control plane으로 이동

GitHub는 2026년 5월 8일 Copilot cloud agent 전용 Agents secrets and variables를 출시했다. 이제 background development environment에서 동작하는 cloud agent에 필요한 secret과 variable을 repository별 copilot environment에 따로 넣지 않고, 조직 수준에서 공유·제어할 수…

2026년 5월 7일

Enterprise-managed plugins in GitHub Copilot CLI are now in public preview — 코딩 에이전트 도입이 개인 설정에서 중앙 통제형 배포로 이동

GitHub는 2026년 5월 6일 Enterprise-managed plugins가 GitHub Copilot CLI에서 public preview에 들어갔다고 발표했다. 관리자는 .github-private/.github/copilot/settings.json을 통해 플러그인 마켓플레이스, 자동 설치 플러그인, 항상 활성화할 hooks와 MCP 구성을 중앙에서 배포할 수 있다. 🔍 왜…