{"id":95725,"date":"2025-02-02T19:15:06","date_gmt":"2025-02-02T15:45:06","guid":{"rendered":"https:\/\/nabfollower.com\/blog\/grpo-pitfalls-record-1130\/"},"modified":"2025-02-02T19:15:06","modified_gmt":"2025-02-02T15:45:06","slug":"grpo-pitfalls-record-1130","status":"publish","type":"post","link":"https:\/\/nabfollower.com\/blog\/grpo-pitfalls-record-1130\/","title":{"rendered":"\u0631\u06a9\u0648\u0631\u062f \u0645\u0634\u06a9\u0644\u0627\u062a GRPO &#8211; \u062c\u0627\u0645\u0639\u0647 Dev"},"content":{"rendered":"<div data-article-id=\"2255349\" id=\"article-body\">\n<p>\u0641\u0647\u0645\u06cc\u062f\u0645 \u06a9\u0647 \u0628\u063a\u0644 \u06a9\u0631\u062f\u0646 \u0686\u0647\u0631\u0647 GRPO \u0627\u0635\u0644\u06cc \u0641\u0646\u0627\u0648\u0631\u06cc Deepseek-R1 \u0631\u0627 \u062a\u0637\u0628\u06cc\u0642 \u062f\u0627\u062f\u0647 \u0627\u0633\u062a \u060c \u0628\u0646\u0627\u0628\u0631\u0627\u06cc\u0646 \u062a\u0635\u0645\u06cc\u0645 \u06af\u0631\u0641\u062a\u0645 \u0622\u0646 \u0631\u0627 \u0627\u0645\u062a\u062d\u0627\u0646 \u06a9\u0646\u0645. \u0645\u0646 \u0648\u0638\u06cc\u0641\u0647 ERC (\u062a\u0634\u062e\u06cc\u0635 \u0627\u062d\u0633\u0627\u0633\u0627\u062a \u0631\u0627 \u062f\u0631 \u0645\u06a9\u0627\u0644\u0645\u0627\u062a) \u0627\u0646\u062a\u062e\u0627\u0628 \u06a9\u0631\u062f\u0645 \u062a\u0627 \u0628\u0628\u06cc\u0646\u0645 \u0622\u06cc\u0627 \u06cc\u06a9 \u0645\u062f\u0644 \u06a9\u0648\u0686\u06a9\u062a\u0631 \u0645\u06cc \u062a\u0648\u0627\u0646\u062f \u0628\u0627 \u0627\u0633\u062a\u0641\u0627\u062f\u0647 \u0627\u0632 \u06cc\u0627\u062f\u06af\u06cc\u0631\u06cc \u062a\u0642\u0648\u06cc\u062a \u06a9\u0646\u0646\u062f\u0647 \u060c \u0634\u0631\u0648\u0639 \u0628\u0647 \u06a9\u0627\u0631 \u06a9\u0646\u062f \u0648 \u062f\u0631 \u06cc\u06a9 \u06a9\u0627\u0631 \u0648\u0627\u062d\u062f \u0622\u0645\u0648\u0632\u0634 \u062f\u06cc\u062f\u0647 \u0648 \u0639\u0645\u0644\u06a9\u0631\u062f \u0648\u0638\u06cc\u0641\u0647 \u0631\u0627 \u0628\u0647\u0628\u0648\u062f \u0628\u062e\u0634\u062f.<\/p>\n<p>\u0627\u0648\u0644 \u060c \u0627\u06cc\u0646 \u0641\u0646\u0627\u0648\u0631\u06cc \u0628\u0633\u06cc\u0627\u0631 \u062d\u0627\u0641\u0638\u0647 \u0627\u0633\u062a. \u0645\u0646 \u062f\u0631 \u0627\u0628\u062a\u062f\u0627 \u0633\u0639\u06cc \u06a9\u0631\u062f\u0645 \u0622\u0645\u0648\u0632\u0634 \u0628\u0628\u06cc\u0646\u0645 <code>gemma-2\u20132b<\/code> \u0648\u062a <code>qwen-2.5\u20133b-instruct<\/code> \u0628\u0627 \u0627\u0633\u062a\u0641\u0627\u062f\u0647 \u0627\u0632 A100-80G \u060c \u0627\u0645\u0627 \u062d\u0627\u0641\u0638\u0647 \u06a9\u0627\u0641\u06cc \u0646\u0628\u0648\u062f. <\/p>\n<p>\u0628\u0639\u062f \u0627\u0632 \u062a\u063a\u06cc\u06cc\u0631 \u0628\u0647 <code>qwen-2.5\u20130.5b-instruct<\/code>\u060c \u0645\u0633\u0626\u0644\u0647 \u062d\u0627\u0641\u0638\u0647 \u062d\u0644 \u0634\u062f. \u062b\u0627\u0646\u06cc\u0627\u064b \u060c \u0633\u0631\u0639\u062a \u0627\u0633\u062a\u0646\u062a\u0627\u062c \u0628\u0647 \u0648\u06cc\u0698\u0647 \u06a9\u0646\u062f \u0627\u0633\u062a \u0632\u06cc\u0631\u0627 \u062f\u0631 \u0647\u0645\u0627\u0646 \u062a\u0645\u0631\u06cc\u0646 \u0628\u0627\u06cc\u062f \u0628\u0647 \u0637\u0648\u0631 \u0645\u06a9\u0631\u0631 \u0646\u0645\u0648\u0646\u0647 \u0628\u0631\u062f\u0627\u0631\u06cc \u0634\u0648\u062f. <\/p>\n<p>\u062e\u0648\u0634\u0628\u062e\u062a\u0627\u0646\u0647 \u060c \u0628\u063a\u0644 \u06a9\u0631\u062f\u0646 \u0635\u0648\u0631\u062a \u0628\u0647 \u0633\u0631\u0639\u062a VLLM \u0631\u0627 \u0627\u0642\u062a\u0628\u0627\u0633 \u06a9\u0631\u062f\u0647 \u0648 \u0628\u0627\u0639\u062b \u0627\u0641\u0632\u0627\u06cc\u0634 \u06a9\u0627\u0631\u0622\u06cc\u06cc \u0645\u06cc \u0634\u0648\u062f. \u0628\u0627 \u0627\u06cc\u0646 \u062d\u0627\u0644 \u060c \u0627\u06cc\u0646 \u0645\u0648\u0636\u0648\u0639\u0627\u062a \u062c\u062f\u06cc\u062f \u0631\u0627 \u0628\u0647 \u0647\u0645\u0631\u0627\u0647 \u062f\u0627\u0634\u062a:<\/p>\n<ol>\n<li>\u0627\u0633\u062a\u0641\u0627\u062f\u0647 \u0627\u0632 VLLM \u0628\u0631\u0627\u06cc \u06a9\u0645\u06a9 \u0628\u0647 \u0622\u0645\u0648\u0632\u0634 GRPO \u062d\u062f\u0627\u0642\u0644 \u0628\u0647 \u062f\u0648 GPU \u0646\u06cc\u0627\u0632 \u062f\u0627\u0631\u062f \u060c \u06a9\u0647 \u062f\u0631 \u0648\u0627\u0642\u0639 \u062a\u0642\u0627\u0636\u0627\u06cc \u0645\u0646\u0627\u0628\u0639 \u0631\u0627 \u0627\u0641\u0632\u0627\u06cc\u0634 \u0645\u06cc \u062f\u0647\u062f \u060c \u0635\u0631\u0641\u0627\u064b \u0628\u0627\u0631 \u0627\u0633\u062a\u0646\u0628\u0627\u0637 \u0631\u0627 \u0628\u0647 \u06a9\u0627\u0631\u062a \u0627\u062e\u062a\u0635\u0627\u0635\u06cc \u0645\u0646\u062a\u0642\u0644 \u0645\u06cc \u06a9\u0646\u062f.<\/li>\n<li>\u06cc\u06a9 \u062e\u0637\u0627\u06cc \u0639\u062c\u06cc\u0628 \u0648 \u063a\u0631\u06cc\u0628 \u0645\u062f\u0627\u0648\u0645 \u0648\u062c\u0648\u062f \u062f\u0627\u0634\u062a <code>_assert_memory_footprint_increased_during_profiling<\/code>\u0628\u0634\u0631 \u067e\u0633 \u0627\u0632 \u0628\u0631\u0631\u0633\u06cc \u0645\u0634\u06a9\u0644\u0627\u062a \u062f\u0631 TRL \u060c \u0628\u0647 \u0646\u0638\u0631 \u0645\u06cc \u0631\u0633\u062f \u06a9\u0647 \u0627\u0631\u062a\u0642\u0627\u0621 VLLM \u0628\u0647 \u0646\u0633\u062e\u0647 0.7 \u0628\u0631\u0627\u06cc \u062d\u0644 \u0622\u0646 \u0636\u0631\u0648\u0631\u06cc \u0627\u0633\u062a.\n<\/li>\n<\/ol>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight plaintext\"><code>datasets==3.0.1\ntrl==0.14.0\ntransformers==4.48.2\npeft==0.14.0\naccelerate==1.3.0\ndeepspeed==0.15.3\ntorch==2.5.1\nvllm==0.7.1\n<\/code><\/pre>\n<div class=\"highlight__panel js-actions-panel\">\n<div class=\"highlight__panel-action js-fullscreen-code-action\">\n    <svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" class=\"highlight-action crayons-icon highlight-action--fullscreen-on\"><title>\u062d\u0627\u0644\u062a \u062a\u0645\u0627\u0645 \u0635\u0641\u062d\u0647 \u0631\u0627 \u0648\u0627\u0631\u062f \u06a9\u0646\u06cc\u062f<\/title>\n    <path d=\"M16 3h6v6h-2V5h-4V3zM2 3h6v2H4v4H2V3zm18 16v-4h2v6h-6v-2h4zM4 19h4v2H2v-6h2v4z\"\/>\n<\/svg><\/p>\n<p>    <svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" class=\"highlight-action crayons-icon highlight-action--fullscreen-off\"><title>\u0627\u0632 \u062d\u0627\u0644\u062a \u062a\u0645\u0627\u0645 \u0635\u0641\u062d\u0647 \u062e\u0627\u0631\u062c \u0634\u0648\u06cc\u062f<\/title>\n    <path d=\"M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z\"\/>\n<\/svg><\/p>\n<\/div>\n<\/div>\n<\/div>\n<p><\/p>\n<p><img decoding=\"async\" src=\"https:\/\/media2.dev.to\/dynamic\/image\/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto\/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk3tviix304lcqfaunzh6.png\" alt=\"\u0646\u0642\u0644 \u0642\u0648\u0644 GPU\" loading=\"lazy\" width=\"800\" height=\"318\" title=\"\"><\/p>\n<p>\u0645\u0634\u062e\u0635 \u0634\u062f \u06a9\u0647 \u0628\u063a\u0644 \u06a9\u0631\u062f\u0646 \u0686\u0647\u0631\u0647 GRPO \u0627\u0635\u0644\u06cc \u0641\u0646\u0627\u0648\u0631\u06cc Deepseek-R1 \u0631\u0627 \u062a\u0637\u0628\u06cc\u0642 \u062f\u0627\u062f\u0647 \u0627\u0633\u062a \u0648 \u0645\u0646 \u062a\u0635\u0645\u06cc\u0645 \u06af\u0631\u0641\u062a\u0645 \u0622\u0646 \u0631\u0627 \u0627\u0645\u062a\u062d\u0627\u0646 \u06a9\u0646\u0645. \u0645\u0646 \u06a9\u0627\u0631 ERC \u0631\u0627 \u0627\u0646\u062a\u062e\u0627\u0628 \u06a9\u0631\u062f\u0645 (\u0634\u0646\u0627\u0633\u0647 \u0639\u0627\u0637\u0641\u06cc \u06af\u0641\u062a\u06af\u0648) \u0645\u06cc \u062e\u0648\u0627\u0647\u0645 \u0628\u0628\u06cc\u0646\u0645 \u06a9\u0647 \u0622\u06cc\u0627 \u06cc\u06a9 \u0645\u062f\u0644 \u06a9\u0648\u0686\u06a9\u062a\u0631 \u0645\u06cc \u062a\u0648\u0627\u0646\u062f \u0628\u0627 \u062a\u0642\u0648\u06cc\u062a \u06cc\u0627\u062f\u06af\u06cc\u0631\u06cc \u0648 \u0628\u0647\u0628\u0648\u062f \u0639\u0645\u0644\u06a9\u0631\u062f \u06a9\u0627\u0631 \u060c \u0622\u0645\u0648\u0632\u0634 \u0631\u0627 \u062f\u0631 \u0645\u0648\u0631\u062f \u06cc\u06a9 \u06a9\u0627\u0631 \u0648\u0627\u062d\u062f \u0634\u0631\u0648\u0639 \u06a9\u0646\u062f.<br \/>\u0627\u0648\u0644 \u0627\u0632 \u0647\u0645\u0647 \u060c \u0627\u06cc\u0646 \u0641\u0646\u0627\u0648\u0631\u06cc \u0628\u0633\u06cc\u0627\u0631 \u06af\u0631\u0627\u0646 \u0627\u0633\u062a. \u0645\u0646 \u0628\u0631\u0627\u06cc \u0627\u0648\u0644\u06cc\u0646 \u0628\u0627\u0631 \u0633\u0639\u06cc \u06a9\u0631\u062f\u0645 \u0627\u0632 A100-80G \u0628\u0631\u0627\u06cc \u0622\u0645\u0648\u0632\u0634 GEMMA-2-2B \u0648 QWEN-2.5-3B-3B \u0627\u0633\u062a\u0641\u0627\u062f\u0647 \u06a9\u0646\u0645 \u060c \u0627\u0645\u0627 \u062d\u0627\u0641\u0638\u0647 \u06a9\u0627\u0641\u06cc \u0646\u0628\u0648\u062f. \u067e\u0633 \u0627\u0632 \u062a\u0646\u0638\u06cc\u0645 \u062a\u0646\u0638\u06cc\u0645 \u0628\u0647 QWEN-2.5 .0.5b-b-intruct \u060c \u062d\u0627\u0641\u0638\u0647 \u0648\u06cc\u062f\u06cc\u0648\u06cc\u06cc \u062f\u06cc\u06af\u0631 \u067e\u0634\u062a \u0633\u0631 \u0647\u0645 \u0646\u06cc\u0633\u062a. \u062b\u0627\u0646\u06cc\u0627\u064b \u060c \u0633\u0631\u0639\u062a \u0627\u0633\u062a\u062f\u0644\u0627\u0644 \u0628\u0647 \u0648\u06cc\u0698\u0647 \u06a9\u0646\u062f \u0627\u0633\u062a \u060c \u0632\u06cc\u0631\u0627 \u062f\u0631 \u0637\u06cc \u0641\u0631\u0627\u06cc\u0646\u062f \u0622\u0645\u0648\u0632\u0634 \u060c \u0647\u0645\u0627\u0646 \u0641\u0631\u06a9\u0627\u0646\u0633 \u0647\u0627 \u0628\u0627\u06cc\u062f \u0628\u0647 \u0637\u0648\u0631 \u0645\u06a9\u0631\u0631 \u0646\u0645\u0648\u0646\u0647 \u0628\u0631\u062f\u0627\u0631\u06cc \u0634\u0648\u0646\u062f. \u062e\u0648\u0634\u0628\u062e\u062a\u0627\u0646\u0647 \u060c \u0628\u063a\u0644 \u06a9\u0631\u062f\u0646 \u0635\u0648\u0631\u062a \u0628\u0647 \u0633\u0631\u0639\u062a \u0628\u0627 VLLM \u0633\u0627\u0632\u06af\u0627\u0631 \u0634\u062f \u0648 \u06a9\u0627\u0631\u0622\u06cc\u06cc \u0631\u0627 \u0628\u0647\u0628\u0648\u062f \u0628\u062e\u0634\u06cc\u062f. \u0627\u0645\u0627 \u0627\u06cc\u0646 \u0645\u0634\u06a9\u0644\u0627\u062a \u062c\u062f\u06cc\u062f\u06cc \u0631\u0627 \u0628\u0647 \u0647\u0645\u0631\u0627\u0647 \u0645\u06cc \u0622\u0648\u0631\u062f:<\/p>\n<ol>\n<li>\u0627\u0633\u062a\u0641\u0627\u062f\u0647 \u0627\u0632 VLLM \u0628\u0631\u0627\u06cc \u06a9\u0645\u06a9 \u0628\u0647 \u0622\u0645\u0648\u0632\u0634 GRPO \u062d\u062f\u0627\u0642\u0644 \u0628\u0647 \u062f\u0648 \u06a9\u0627\u0631\u062a \u06af\u0631\u0627\u0641\u06cc\u06a9 \u0646\u06cc\u0627\u0632 \u062f\u0627\u0631\u062f.<\/li>\n<li>\u0627\u0634\u062a\u0628\u0627\u0647\u0627\u062a \u0639\u062c\u06cc\u0628<code>_assert_memory_footprint_increased_during_profiling<\/code>\u067e\u0633 \u0627\u0632 \u0628\u0631\u0631\u0633\u06cc \u0645\u0633\u0626\u0644\u0647 TRL \u060c \u0628\u0647 \u0646\u0638\u0631 \u0645\u06cc \u0631\u0633\u062f \u06a9\u0647 VLLM \u0628\u0631\u0627\u06cc \u062d\u0644 \u0622\u0646 \u0628\u0627\u06cc\u062f \u0628\u0647 \u0646\u0633\u062e\u0647 0.7 \u0627\u0641\u0632\u0627\u06cc\u0634 \u06cc\u0627\u0628\u062f.<\/li>\n<\/ol><\/div>\n","protected":false},"excerpt":{"rendered":"<p>\u0641\u0647\u0645\u06cc\u062f\u0645 \u06a9\u0647 \u0628\u063a\u0644 \u06a9\u0631\u062f\u0646 \u0686\u0647\u0631\u0647 GRPO \u0627\u0635\u0644\u06cc \u0641\u0646\u0627\u0648\u0631\u06cc Deepseek-R1 \u0631\u0627 \u062a\u0637\u0628\u06cc\u0642 \u062f\u0627\u062f\u0647 \u0627\u0633\u062a \u060c \u0628\u0646\u0627\u0628\u0631\u0627\u06cc\u0646 \u062a\u0635\u0645\u06cc\u0645 \u06af\u0631\u0641\u062a\u0645 \u0622\u0646 \u0631\u0627 \u0627\u0645\u062a\u062d\u0627\u0646 \u06a9\u0646\u0645. \u0645\u0646 \u0648\u0638\u06cc\u0641\u0647 ERC (\u062a\u0634\u062e\u06cc\u0635 \u0627\u062d\u0633\u0627\u0633\u0627\u062a \u0631\u0627 \u062f\u0631 \u0645\u06a9\u0627\u0644\u0645\u0627\u062a) \u0627\u0646\u062a\u062e\u0627\u0628 \u06a9\u0631\u062f\u0645 \u062a\u0627 \u0628\u0628\u06cc\u0646\u0645 \u0622\u06cc\u0627 \u06cc\u06a9 \u0645\u062f\u0644 \u06a9\u0648\u0686\u06a9\u062a\u0631 \u0645\u06cc \u062a\u0648\u0627\u0646\u062f \u0628\u0627 \u0627\u0633\u062a\u0641\u0627\u062f\u0647 \u0627\u0632 \u06cc\u0627\u062f\u06af\u06cc\u0631\u06cc \u062a\u0642\u0648\u06cc\u062a \u06a9\u0646\u0646\u062f\u0647 \u060c \u0634\u0631\u0648\u0639 \u0628\u0647 \u06a9\u0627\u0631 \u06a9\u0646\u062f \u0648 \u062f\u0631 \u06cc\u06a9 \u06a9\u0627\u0631 \u0648\u0627\u062d\u062f &hellip;<\/p>\n","protected":false},"author":2,"featured_media":95726,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"","fifu_image_alt":"","footnotes":""},"categories":[339],"tags":[],"class_list":["post-95725","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-dev"],"_links":{"self":[{"href":"https:\/\/nabfollower.com\/blog\/wp-json\/wp\/v2\/posts\/95725","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nabfollower.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nabfollower.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nabfollower.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/nabfollower.com\/blog\/wp-json\/wp\/v2\/comments?post=95725"}],"version-history":[{"count":0,"href":"https:\/\/nabfollower.com\/blog\/wp-json\/wp\/v2\/posts\/95725\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/nabfollower.com\/blog\/wp-json\/wp\/v2\/media\/95726"}],"wp:attachment":[{"href":"https:\/\/nabfollower.com\/blog\/wp-json\/wp\/v2\/media?parent=95725"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nabfollower.com\/blog\/wp-json\/wp\/v2\/categories?post=95725"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nabfollower.com\/blog\/wp-json\/wp\/v2\/tags?post=95725"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}