{"id":94724,"date":"2025-01-26T22:21:57","date_gmt":"2025-01-26T18:51:57","guid":{"rendered":"https:\/\/nabfollower.com\/blog\/the-power-of-quantization-shrinking-gpt2-unleashing-speed-5h7c\/"},"modified":"2025-01-26T22:21:57","modified_gmt":"2025-01-26T18:51:57","slug":"the-power-of-quantization-shrinking-gpt2-unleashing-speed-5h7c","status":"publish","type":"post","link":"https:\/\/nabfollower.com\/blog\/the-power-of-quantization-shrinking-gpt2-unleashing-speed-5h7c\/","title":{"rendered":"\u0642\u062f\u0631\u062a \u06a9\u0645\u06cc\u062a: \u06a9\u0648\u0686\u06a9 \u06a9\u0631\u062f\u0646 GPT2 \u060c \u0633\u0631\u0639\u062a \u0631\u0647\u0627 \u06a9\u0631\u062f\u0646"},"content":{"rendered":"<div data-article-id=\"2242376\" id=\"article-body\">\n<p>\u062a\u0635\u0648\u0631 \u06a9\u0646\u06cc\u062f \u06a9\u0647 \u06cc\u06a9 \u0627\u0644\u06af\u0648\u06cc \u0632\u0628\u0627\u0646 \u0642\u062f\u0631\u062a\u0645\u0646\u062f \u0645\u0627\u0646\u0646\u062f GPT-2-\u06a9\u0647 \u0642\u0627\u062f\u0631 \u0628\u0647 \u062a\u0647\u06cc\u0647 \u062f\u0627\u0633\u062a\u0627\u0646 \u060c \u067e\u0627\u0633\u062e \u062f\u0627\u062f\u0646 \u0628\u0647 \u0633\u0624\u0627\u0644\u0627\u062a \u0648 \u062a\u0642\u0644\u06cc\u062f \u0627\u0632 \u0645\u062a\u0646 \u0627\u0646\u0633\u0627\u0646\u06cc \u0627\u0633\u062a-\u0648 \u0641\u0634\u0631\u062f\u0647 \u0633\u0627\u0632\u06cc \u0622\u0646 \u0631\u0627 \u0628\u0647 \u06cc\u06a9 \u0646\u0633\u062e\u0647 \u0644\u0627\u063a\u0631 \u0648 \u0633\u0631\u06cc\u0639\u062a\u0631 \u0628\u062f\u0648\u0646 \u0627\u06cc\u0646\u06a9\u0647 \u0642\u0627\u0628\u0644\u06cc\u062a \u0647\u0627\u06cc \u0622\u0646 \u0631\u0627 \u0641\u0634\u0631\u062f\u0647 \u06a9\u0646\u06cc\u062f \u060c \u0627\u0633\u062a\u0641\u0627\u062f\u0647 \u06a9\u0646\u06cc\u062f.<\/p>\n<p>\u0627\u06cc\u0646 \u0646\u0648\u06cc\u062f \u06a9\u0645\u06cc\u062a \u0627\u0633\u062a: \u062a\u06a9\u0646\u06cc\u06a9\u06cc \u06a9\u0647 \u062f\u0642\u062a \u0645\u062d\u0627\u0633\u0628\u0627\u062a \u06cc\u06a9 \u0645\u062f\u0644 \u0631\u0627 \u06a9\u0627\u0647\u0634 \u0645\u06cc \u062f\u0647\u062f \u060c \u0648 \u062f\u0642\u062a \u062d\u0627\u0634\u06cc\u0647 \u0627\u06cc \u0631\u0627 \u0628\u0631\u0627\u06cc \u0633\u0648\u062f\u0647\u0627\u06cc \u0628\u0647\u0631\u0647 \u0648\u0631\u06cc \u0686\u0634\u0645\u06af\u06cc\u0631 \u0627\u0646\u062c\u0627\u0645 \u0645\u06cc \u062f\u0647\u062f.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_81 counter-hierarchy ez-toc-counter-rtl ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">\u0641\u0647\u0631\u0633\u062a \u0645\u0637\u0627\u0644\u0628<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/nabfollower.com\/blog\/the-power-of-quantization-shrinking-gpt2-unleashing-speed-5h7c\/#%D9%81%D8%A7%D8%B2_0_%D8%AA%D9%86%D8%B8%DB%8C%D9%85_%D9%81%D9%86%DB%8C\" >\u0641\u0627\u0632 0: \u062a\u0646\u0638\u06cc\u0645 \u0641\u0646\u06cc<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/nabfollower.com\/blog\/the-power-of-quantization-shrinking-gpt2-unleashing-speed-5h7c\/#%D9%81%D8%A7%D8%B2_1_%D9%BE%D8%A7%DB%8C%D9%87_%E2%80%93_%D8%AF%D9%82%D8%AA_%DA%A9%D8%A7%D9%85%D9%84_FP32\" >\u0641\u0627\u0632 1: \u067e\u0627\u06cc\u0647 &#8211; \u062f\u0642\u062a \u06a9\u0627\u0645\u0644 (FP32)<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/nabfollower.com\/blog\/the-power-of-quantization-shrinking-gpt2-unleashing-speed-5h7c\/#%D9%81%D8%A7%D8%B2_2_%D9%BE%DB%8C%D8%B1%D8%A7%DB%8C%D8%B4_%DA%86%D8%B1%D8%A8%DB%8C-%D8%A7%D9%86%D8%AF%D8%A7%D8%B2%D9%87_%DA%AF%DB%8C%D8%B1%DB%8C_8_%D8%A8%DB%8C%D8%AA%DB%8C_INT8\" >\u0641\u0627\u0632 2: \u067e\u06cc\u0631\u0627\u06cc\u0634 \u0686\u0631\u0628\u06cc-\u0627\u0646\u062f\u0627\u0632\u0647 \u06af\u06cc\u0631\u06cc 8 \u0628\u06cc\u062a\u06cc (INT8)<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/nabfollower.com\/blog\/the-power-of-quantization-shrinking-gpt2-unleashing-speed-5h7c\/#%D9%81%D8%A7%D8%B2_3_%D9%84%D8%A8%D9%87_%D8%A8%D9%87%D8%B1%D9%87_%D9%88%D8%B1%DB%8C-%DA%A9%D9%85%DB%8C%D8%AA_4_%D8%A8%DB%8C%D8%AA%DB%8C_INT4\" >\u0641\u0627\u0632 3: \u0644\u0628\u0647 \u0628\u0647\u0631\u0647 \u0648\u0631\u06cc-\u06a9\u0645\u06cc\u062a 4 \u0628\u06cc\u062a\u06cc (INT4)<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/nabfollower.com\/blog\/the-power-of-quantization-shrinking-gpt2-unleashing-speed-5h7c\/#%D8%AA%D8%AC%D8%A7%D8%B1%D8%AA_%D8%AF%D9%82%D8%AA_%D8%AF%D8%B1_%D9%85%D9%82%D8%A7%D8%A8%D9%84_%D8%B9%D9%85%D9%84%DB%8C\" >\u062a\u062c\u0627\u0631\u062a: \u062f\u0642\u062a \u062f\u0631 \u0645\u0642\u0627\u0628\u0644 \u0639\u0645\u0644\u06cc<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/nabfollower.com\/blog\/the-power-of-quantization-shrinking-gpt2-unleashing-speed-5h7c\/#%DA%86%DA%AF%D9%88%D9%86%D9%87_%DA%A9%D8%A7%D8%B1_%D9%85%DB%8C_%DA%A9%D9%86%D8%AF_%D9%85%DA%A9%D8%A7%D9%86%DB%8C%DA%A9_%D9%81%D8%B4%D8%B1%D8%AF%D9%87_%D8%B3%D8%A7%D8%B2%DB%8C\" >\u0686\u06af\u0648\u0646\u0647 \u06a9\u0627\u0631 \u0645\u06cc \u06a9\u0646\u062f: \u0645\u06a9\u0627\u0646\u06cc\u06a9 \u0641\u0634\u0631\u062f\u0647 \u0633\u0627\u0632\u06cc<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/nabfollower.com\/blog\/the-power-of-quantization-shrinking-gpt2-unleashing-speed-5h7c\/#%D8%A7%D8%AB%D8%A8%D8%A7%D8%AA_%D8%A8%D8%B5%D8%B1%DB%8C\" >\u0627\u062b\u0628\u0627\u062a \u0628\u0635\u0631\u06cc<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/nabfollower.com\/blog\/the-power-of-quantization-shrinking-gpt2-unleashing-speed-5h7c\/#%D8%AD%D8%B1%D9%81_%D8%A2%D8%AE%D8%B1\" >\u062d\u0631\u0641 \u0622\u062e\u0631<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"%D9%81%D8%A7%D8%B2_0_%D8%AA%D9%86%D8%B8%DB%8C%D9%85_%D9%81%D9%86%DB%8C\"><\/span>\n<p>  \u0641\u0627\u0632 0: \u062a\u0646\u0638\u06cc\u0645 \u0641\u0646\u06cc<br \/>\n<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code>    <span class=\"err\">!<\/span><span class=\"n\">pip<\/span> <span class=\"n\">install<\/span> <span class=\"n\">torch<\/span> <span class=\"n\">transformers<\/span> <span class=\"n\">accelerate<\/span> <span class=\"n\">bitsandbytes<\/span> <span class=\"n\">psutil<\/span>\n\n    <span class=\"kn\">from<\/span> <span class=\"n\">transformers<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">AutoModelForCausalLM<\/span><span class=\"p\">,<\/span> <span class=\"n\">AutoTokenizer<\/span><span class=\"p\">,<\/span> <span class=\"n\">BitsAndBytesConfig<\/span>\n    <span class=\"kn\">import<\/span> <span class=\"n\">torch<\/span>\n    <span class=\"kn\">import<\/span> <span class=\"n\">time<\/span>\n    <span class=\"kn\">import<\/span> <span class=\"n\">gc<\/span>\n\n    <span class=\"k\">def<\/span> <span class=\"nf\">get_memory_usage<\/span><span class=\"p\">():<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"n\">cuda<\/span><span class=\"p\">.<\/span><span class=\"nf\">memory_allocated<\/span><span class=\"p\">()<\/span> <span class=\"o\">\/<\/span> <span class=\"mf\">1e6<\/span> <span class=\"k\">if<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"n\">cuda<\/span><span class=\"p\">.<\/span><span class=\"nf\">is_available<\/span><span class=\"p\">()<\/span> <span class=\"k\">else<\/span> <span class=\"mi\">0<\/span>\n\n\n    <span class=\"n\">device<\/span> <span class=\"o\">=<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"nf\">device<\/span><span class=\"p\">(<\/span><span class=\"sh\">\"<\/span><span class=\"s\">cuda<\/span><span class=\"sh\">\"<\/span> <span class=\"k\">if<\/span> <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"n\">cuda<\/span><span class=\"p\">.<\/span><span class=\"nf\">is_available<\/span><span class=\"p\">()<\/span> <span class=\"k\">else<\/span> <span class=\"sh\">\"<\/span><span class=\"s\">cpu<\/span><span class=\"sh\">\"<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">model_name<\/span> <span class=\"o\">=<\/span> <span class=\"sh\">\"<\/span><span class=\"s\">gpt2<\/span><span class=\"sh\">\"<\/span>\n    <span class=\"n\">input_text<\/span> <span class=\"o\">=<\/span> <span class=\"sh\">\"<\/span><span class=\"s\">Once upon a time<\/span><span class=\"sh\">\"<\/span>\n<\/code><\/pre>\n<div class=\"highlight__panel js-actions-panel\">\n<div class=\"highlight__panel-action js-fullscreen-code-action\">\n    <svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" class=\"highlight-action crayons-icon highlight-action--fullscreen-on\"><title>\u062d\u0627\u0644\u062a \u062a\u0645\u0627\u0645 \u0635\u0641\u062d\u0647 \u0631\u0627 \u0648\u0627\u0631\u062f \u06a9\u0646\u06cc\u062f<\/title>\n    <path d=\"M16 3h6v6h-2V5h-4V3zM2 3h6v2H4v4H2V3zm18 16v-4h2v6h-6v-2h4zM4 19h4v2H2v-6h2v4z\"\/>\n<\/svg><\/p>\n<p>    <svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" class=\"highlight-action crayons-icon highlight-action--fullscreen-off\"><title>\u0627\u0632 \u062d\u0627\u0644\u062a \u062a\u0645\u0627\u0645 \u0635\u0641\u062d\u0647 \u062e\u0627\u0631\u062c \u0634\u0648\u06cc\u062f<\/title>\n    <path d=\"M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z\"\/>\n<\/svg><\/p>\n<\/div>\n<\/div>\n<\/div>\n<hr\/>\n<h2><span class=\"ez-toc-section\" id=\"%D9%81%D8%A7%D8%B2_1_%D9%BE%D8%A7%DB%8C%D9%87_%E2%80%93_%D8%AF%D9%82%D8%AA_%DA%A9%D8%A7%D9%85%D9%84_FP32\"><\/span>\n<p>  \u0641\u0627\u0632 1: \u067e\u0627\u06cc\u0647 &#8211; \u062f\u0642\u062a \u06a9\u0627\u0645\u0644 (FP32)<br \/>\n<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>\u0627\u06cc\u0646 \u0622\u0632\u0645\u0627\u06cc\u0634 \u0628\u0627 GPT-2 \u062f\u0631 \u0648\u0636\u0639\u06cc\u062a \u0637\u0628\u06cc\u0639\u06cc \u0622\u0646 \u0622\u063a\u0627\u0632 \u0645\u06cc \u0634\u0648\u062f: \u062f\u0642\u062a \u0646\u0642\u0637\u0647 \u0634\u0646\u0627\u0648\u0631 32 \u0628\u06cc\u062a\u06cc (FP32). \u0627\u06cc\u0646 \u062d\u0627\u0644\u062a &#8220;\u0642\u062f\u0631\u062a \u06a9\u0627\u0645\u0644&#8221; \u0645\u062f\u0644 \u0627\u0633\u062a-\u0628\u0633\u06cc\u0627\u0631 \u062f\u0642\u06cc\u0642 \u0627\u0645\u0627 \u0645\u0646\u0627\u0628\u0639 \u067e\u0631\u0631\u0646\u06af.<\/p>\n<ul>\n<li>\n<strong>\u062d\u0627\u0641\u0638\u0647:<\/strong> \u0628\u0627\u0631\u06af\u06cc\u0631\u06cc \u0645\u062f\u0644 FP32 \u0645\u0635\u0631\u0641 \u0645\u06cc \u0634\u0648\u062f <strong>511 \u0645\u06af\u0627\u0628\u0627\u06cc\u062a<\/strong> \u062d\u0627\u0641\u0638\u0647 \u067e\u0631\u062f\u0627\u0632\u0646\u062f\u0647 \u06af\u0631\u0627\u0641\u06cc\u06a9\u06cc.<\/li>\n<li>\n<strong>\u0633\u0631\u0639\u062a:<\/strong> \u062a\u0648\u0644\u06cc\u062f 50 \u0646\u0634\u0627\u0646\u0647 \u0627\u0632 \u0633\u0631\u06cc\u0639 <em>&#8220;\u0631\u0648\u0632\u06cc \u06cc\u06a9 \u0628\u0627\u0631&#8221;<\/em> \u0628\u0631\u062f\u0627\u0634\u062a\u0646 <strong>1.76 \u062b\u0627\u0646\u06cc\u0647<\/strong>\u0628\u0634\u0631<\/li>\n<li>\n<strong>\u0631\u062f\u067e\u0627\u06cc \u067e\u0633 \u0627\u0632 \u062a\u0645\u06cc\u0632 \u06a9\u0631\u062f\u0646:<\/strong> \u062d\u062a\u06cc \u067e\u0633 \u0627\u0632 \u062d\u0630\u0641 \u0645\u062f\u0644 \u060c <strong>458 \u0645\u06af\u0627\u0628\u0627\u06cc\u062a<\/strong> \u062d\u0627\u0641\u0638\u0647 \u0647\u0645\u0686\u0646\u0627\u0646 \u0627\u0634\u063a\u0627\u0644 \u0634\u062f\u0647 \u0627\u0633\u062a.<\/li>\n<\/ul>\n<p>FP32 \u06a9\u0627\u0631 \u0645\u06cc \u06a9\u0646\u062f \u060c \u0627\u0645\u0627 \u062d\u062c\u06cc\u0645 \u0627\u0633\u062a.<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code>    <span class=\"c1\"># Load tokenizer and base model\n<\/span>    <span class=\"n\">tokenizer<\/span> <span class=\"o\">=<\/span> <span class=\"n\">AutoTokenizer<\/span><span class=\"p\">.<\/span><span class=\"nf\">from_pretrained<\/span><span class=\"p\">(<\/span><span class=\"n\">model_name<\/span><span class=\"p\">)<\/span>\n    <span class=\"nf\">print<\/span><span class=\"p\">(<\/span><span class=\"sa\">f<\/span><span class=\"sh\">\"<\/span><span class=\"s\">Pre-load memory: <\/span><span class=\"si\">{<\/span><span class=\"nf\">get_memory_usage<\/span><span class=\"p\">()<\/span><span class=\"si\">}<\/span><span class=\"s\"> MB<\/span><span class=\"sh\">\"<\/span><span class=\"p\">)<\/span>\n\n    <span class=\"c1\"># Full precision model\n<\/span>    <span class=\"n\">model_fp32<\/span> <span class=\"o\">=<\/span> <span class=\"n\">AutoModelForCausalLM<\/span><span class=\"p\">.<\/span><span class=\"nf\">from_pretrained<\/span><span class=\"p\">(<\/span><span class=\"n\">model_name<\/span><span class=\"p\">).<\/span><span class=\"nf\">to<\/span><span class=\"p\">(<\/span><span class=\"n\">device<\/span><span class=\"p\">)<\/span>\n    <span class=\"nf\">print<\/span><span class=\"p\">(<\/span><span class=\"sa\">f<\/span><span class=\"sh\">\"<\/span><span class=\"s\">Post-load memory: <\/span><span class=\"si\">{<\/span><span class=\"nf\">get_memory_usage<\/span><span class=\"p\">()<\/span><span class=\"si\">}<\/span><span class=\"s\"> MB<\/span><span class=\"sh\">\"<\/span><span class=\"p\">)<\/span>  <span class=\"c1\"># 511.15 MB\n<\/span>\n    <span class=\"c1\"># Inference measurement\n<\/span>    <span class=\"n\">inputs<\/span> <span class=\"o\">=<\/span> <span class=\"nf\">tokenizer<\/span><span class=\"p\">(<\/span><span class=\"n\">input_text<\/span><span class=\"p\">,<\/span> <span class=\"n\">return_tensors<\/span><span class=\"o\">=<\/span><span class=\"sh\">\"<\/span><span class=\"s\">pt<\/span><span class=\"sh\">\"<\/span><span class=\"p\">).<\/span><span class=\"nf\">to<\/span><span class=\"p\">(<\/span><span class=\"n\">device<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">start_time<\/span> <span class=\"o\">=<\/span> <span class=\"n\">time<\/span><span class=\"p\">.<\/span><span class=\"nf\">time<\/span><span class=\"p\">()<\/span>\n    <span class=\"n\">output<\/span> <span class=\"o\">=<\/span> <span class=\"n\">model_fp32<\/span><span class=\"p\">.<\/span><span class=\"nf\">generate<\/span><span class=\"p\">(<\/span><span class=\"o\">**<\/span><span class=\"n\">inputs<\/span><span class=\"p\">,<\/span> <span class=\"n\">max_length<\/span><span class=\"o\">=<\/span><span class=\"mi\">50<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">inference_time<\/span> <span class=\"o\">=<\/span> <span class=\"n\">time<\/span><span class=\"p\">.<\/span><span class=\"nf\">time<\/span><span class=\"p\">()<\/span> <span class=\"o\">-<\/span> <span class=\"n\">start_time<\/span>  <span class=\"c1\"># 1.76s\n<\/span>\n    <span class=\"c1\"># Cleanup protocol\n<\/span>    <span class=\"k\">del<\/span> <span class=\"n\">model_fp32<\/span><span class=\"p\">,<\/span> <span class=\"n\">inputs<\/span>\n    <span class=\"n\">gc<\/span><span class=\"p\">.<\/span><span class=\"nf\">collect<\/span><span class=\"p\">()<\/span>\n    <span class=\"n\">torch<\/span><span class=\"p\">.<\/span><span class=\"n\">cuda<\/span><span class=\"p\">.<\/span><span class=\"nf\">empty_cache<\/span><span class=\"p\">()<\/span>\n<\/code><\/pre>\n<div class=\"highlight__panel js-actions-panel\">\n<div class=\"highlight__panel-action js-fullscreen-code-action\">\n    <svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" class=\"highlight-action crayons-icon highlight-action--fullscreen-on\"><title>\u062d\u0627\u0644\u062a \u062a\u0645\u0627\u0645 \u0635\u0641\u062d\u0647 \u0631\u0627 \u0648\u0627\u0631\u062f \u06a9\u0646\u06cc\u062f<\/title>\n    <path d=\"M16 3h6v6h-2V5h-4V3zM2 3h6v2H4v4H2V3zm18 16v-4h2v6h-6v-2h4zM4 19h4v2H2v-6h2v4z\"\/>\n<\/svg><\/p>\n<p>    <svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" class=\"highlight-action crayons-icon highlight-action--fullscreen-off\"><title>\u0627\u0632 \u062d\u0627\u0644\u062a \u062a\u0645\u0627\u0645 \u0635\u0641\u062d\u0647 \u062e\u0627\u0631\u062c \u0634\u0648\u06cc\u062f<\/title>\n    <path d=\"M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z\"\/>\n<\/svg><\/p>\n<\/div>\n<\/div>\n<\/div>\n<hr\/>\n<h2><span class=\"ez-toc-section\" id=\"%D9%81%D8%A7%D8%B2_2_%D9%BE%DB%8C%D8%B1%D8%A7%DB%8C%D8%B4_%DA%86%D8%B1%D8%A8%DB%8C-%D8%A7%D9%86%D8%AF%D8%A7%D8%B2%D9%87_%DA%AF%DB%8C%D8%B1%DB%8C_8_%D8%A8%DB%8C%D8%AA%DB%8C_INT8\"><\/span>\n<p>  \u0641\u0627\u0632 2: \u067e\u06cc\u0631\u0627\u06cc\u0634 \u0686\u0631\u0628\u06cc-\u0627\u0646\u062f\u0627\u0632\u0647 \u06af\u06cc\u0631\u06cc 8 \u0628\u06cc\u062a\u06cc (INT8)<br \/>\n<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>\u06a9\u0645\u06cc\u062a 8 \u0628\u06cc\u062a\u06cc \u0631\u0627 \u0648\u0627\u0631\u062f \u06a9\u0646\u06cc\u062f \u060c \u062c\u0627\u06cc\u06cc \u06a9\u0647 \u0648\u0632\u0646 \u0648 \u0641\u0639\u0627\u0644 \u0633\u0627\u0632\u06cc \u0628\u0647 \u062c\u0627\u06cc \u0634\u0646\u0627\u0648\u0631\u0647\u0627 \u0628\u0647 \u0639\u0646\u0648\u0627\u0646 \u0639\u062f\u062f \u0635\u062d\u06cc\u062d \u0630\u062e\u06cc\u0631\u0647 \u0645\u06cc \u0634\u0648\u0646\u062f. \u062a\u062d\u0648\u0644 \u0641\u0648\u0631\u06cc \u0627\u0633\u062a:<\/p>\n<ul>\n<li>\n<strong>\u062d\u0627\u0641\u0638\u0647:<\/strong> \u0645\u062f\u0644 int8 \u0641\u0642\u0637 \u0628\u0627 <strong>187 \u0645\u06af\u0627\u0628\u0627\u06cc\u062a<\/strong>&#8211;<strong>63 \u066a \u06a9\u0648\u0686\u06a9\u062a\u0631<\/strong> \u0627\u0632 fp32.<\/li>\n<li>\n<strong>\u0633\u0631\u0639\u062a:<\/strong> \u0627\u0633\u062a\u0646\u062a\u0627\u062c \u0634\u062a\u0627\u0628 \u0645\u06cc \u06af\u06cc\u0631\u062f <strong>1.38 \u062b\u0627\u0646\u06cc\u0647<\/strong>\u060c \u0627\u0644\u0641 <strong>\u0628\u0647\u0628\u0648\u062f 22 \u066a<\/strong>\u0628\u0634\u0631<\/li>\n<li>\n<strong>\u0631\u062f\u067e\u0627\u06cc \u067e\u0633 \u0627\u0632 \u062a\u0645\u06cc\u0632 \u06a9\u0631\u062f\u0646:<\/strong> \u062d\u0627\u0641\u0638\u0647 \u0628\u0647 <strong>139 \u0645\u06af\u0627\u0628\u0627\u06cc\u062a<\/strong> \u0628\u0639\u062f \u0627\u0632 \u062d\u0630\u0641<\/li>\n<\/ul>\n<p>\u0627\u06cc\u0646 \u0645\u062f\u0644 \u0633\u0628\u06a9 \u062a\u0631 \u060c \u0633\u0631\u06cc\u0639\u062a\u0631 \u0648 \u0647\u0646\u0648\u0632 \u0647\u0645 \u06a9\u0627\u0631\u0628\u0631\u062f\u06cc \u0627\u0633\u062a. \u0627\u0631\u062a\u0642\u0627\u0621 \u0648\u0627\u0636\u062d<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code>    <span class=\"c1\"># 8-bit configuration\n<\/span>    <span class=\"n\">quant_config_8bit<\/span> <span class=\"o\">=<\/span> <span class=\"nc\">BitsAndBytesConfig<\/span><span class=\"p\">(<\/span><span class=\"n\">load_in_8bit<\/span><span class=\"o\">=<\/span><span class=\"bp\">True<\/span><span class=\"p\">)<\/span>\n\n    <span class=\"nf\">print<\/span><span class=\"p\">(<\/span><span class=\"sa\">f<\/span><span class=\"sh\">\"<\/span><span class=\"s\">Pre-load memory: <\/span><span class=\"si\">{<\/span><span class=\"nf\">get_memory_usage<\/span><span class=\"p\">()<\/span><span class=\"si\">}<\/span><span class=\"s\"> MB<\/span><span class=\"sh\">\"<\/span><span class=\"p\">)<\/span>  <span class=\"c1\"># 9.18 MB\n<\/span>    <span class=\"n\">model_int8<\/span> <span class=\"o\">=<\/span> <span class=\"n\">AutoModelForCausalLM<\/span><span class=\"p\">.<\/span><span class=\"nf\">from_pretrained<\/span><span class=\"p\">(<\/span>\n        <span class=\"n\">model_name<\/span><span class=\"p\">,<\/span> \n        <span class=\"n\">quantization_config<\/span><span class=\"o\">=<\/span><span class=\"n\">quant_config_8bit<\/span>\n    <span class=\"p\">)<\/span>\n\n    <span class=\"c1\"># Dynamic input handling\n<\/span>    <span class=\"n\">inputs_int8<\/span> <span class=\"o\">=<\/span> <span class=\"nf\">tokenizer<\/span><span class=\"p\">(<\/span><span class=\"n\">input_text<\/span><span class=\"p\">,<\/span> <span class=\"n\">return_tensors<\/span><span class=\"o\">=<\/span><span class=\"sh\">\"<\/span><span class=\"s\">pt<\/span><span class=\"sh\">\"<\/span><span class=\"p\">).<\/span><span class=\"nf\">to<\/span><span class=\"p\">(<\/span><span class=\"n\">model_int8<\/span><span class=\"p\">.<\/span><span class=\"n\">device<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">start_time<\/span> <span class=\"o\">=<\/span> <span class=\"n\">time<\/span><span class=\"p\">.<\/span><span class=\"nf\">time<\/span><span class=\"p\">()<\/span>\n    <span class=\"n\">output<\/span> <span class=\"o\">=<\/span> <span class=\"n\">model_int8<\/span><span class=\"p\">.<\/span><span class=\"nf\">generate<\/span><span class=\"p\">(<\/span><span class=\"o\">**<\/span><span class=\"n\">inputs_int8<\/span><span class=\"p\">,<\/span> <span class=\"n\">max_length<\/span><span class=\"o\">=<\/span><span class=\"mi\">50<\/span><span class=\"p\">)<\/span>  <span class=\"c1\"># 1.38s\n<\/span><\/code><\/pre>\n<div class=\"highlight__panel js-actions-panel\">\n<div class=\"highlight__panel-action js-fullscreen-code-action\">\n    <svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" class=\"highlight-action crayons-icon highlight-action--fullscreen-on\"><title>\u062d\u0627\u0644\u062a \u062a\u0645\u0627\u0645 \u0635\u0641\u062d\u0647 \u0631\u0627 \u0648\u0627\u0631\u062f \u06a9\u0646\u06cc\u062f<\/title>\n    <path d=\"M16 3h6v6h-2V5h-4V3zM2 3h6v2H4v4H2V3zm18 16v-4h2v6h-6v-2h4zM4 19h4v2H2v-6h2v4z\"\/>\n<\/svg><\/p>\n<p>    <svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" class=\"highlight-action crayons-icon highlight-action--fullscreen-off\"><title>\u0627\u0632 \u062d\u0627\u0644\u062a \u062a\u0645\u0627\u0645 \u0635\u0641\u062d\u0647 \u062e\u0627\u0631\u062c \u0634\u0648\u06cc\u062f<\/title>\n    <path d=\"M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z\"\/>\n<\/svg><\/p>\n<\/div>\n<\/div>\n<\/div>\n<hr\/>\n<h2><span class=\"ez-toc-section\" id=\"%D9%81%D8%A7%D8%B2_3_%D9%84%D8%A8%D9%87_%D8%A8%D9%87%D8%B1%D9%87_%D9%88%D8%B1%DB%8C-%DA%A9%D9%85%DB%8C%D8%AA_4_%D8%A8%DB%8C%D8%AA%DB%8C_INT4\"><\/span>\n<p>  \u0641\u0627\u0632 3: \u0644\u0628\u0647 \u0628\u0647\u0631\u0647 \u0648\u0631\u06cc-\u06a9\u0645\u06cc\u062a 4 \u0628\u06cc\u062a\u06cc (INT4)<br \/>\n<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>\u0627\u06a9\u0646\u0648\u0646 \u0645\u0627 \u0628\u06cc\u0634\u062a\u0631 \u0641\u0634\u0627\u0631 \u0645\u06cc \u0622\u0648\u0631\u06cc\u0645. \u0628\u0627 \u06a9\u0645\u06cc\u062a 4 \u0628\u06cc\u062a\u06cc \u060c \u0648\u0632\u0646\u0647 \u0647\u0627 \u0628\u0647 \u062f\u0642\u062a \u0646\u0632\u062f\u06cc\u06a9 \u0628\u0647 \u062d\u062f \u0645\u062a\u0648\u0633\u0637 \u200b\u200b\u0641\u0634\u0631\u062f\u0647 \u0645\u06cc \u0634\u0648\u0646\u062f \u0648 \u0645\u062d\u0627\u0633\u0628\u0627\u062a \u0627\u0632 \u0634\u0646\u0627\u0648\u0631\u0647\u0627\u06cc 16 \u0628\u06cc\u062a\u06cc \u0628\u0631\u0627\u06cc \u062b\u0628\u0627\u062a \u0627\u0633\u062a\u0641\u0627\u062f\u0647 \u0645\u06cc \u06a9\u0646\u0646\u062f.<\/p>\n<ul>\n<li>\n<strong>\u062d\u0627\u0641\u0638\u0647:<\/strong> \u0645\u062f\u0644 INT4 \u0648\u0632\u0646 \u062f\u0627\u0631\u062f <strong>149 \u0645\u06af\u0627\u0628\u0627\u06cc\u062a<\/strong>\u0628\u0627 <strong>71 \u066a \u0633\u0628\u06a9 \u062a\u0631<\/strong> \u0627\u0632 fp32.<\/li>\n<li>\n<strong>\u0633\u0631\u0639\u062a:<\/strong> \u0632\u0645\u0627\u0646 \u0627\u0633\u062a\u0646\u062a\u0627\u062c \u0628\u0647 <strong>1.08 \u062b\u0627\u0646\u06cc\u0647<\/strong>\u060c \u0627\u0644\u0641 <strong>39 \u066a \u0633\u0648\u062f<\/strong> \u0628\u06cc\u0634 \u0627\u0632 FP32.<\/li>\n<li>\n<strong>\u0631\u062f\u067e\u0627\u06cc \u067e\u0633 \u0627\u0632 \u062a\u0645\u06cc\u0632 \u06a9\u0631\u062f\u0646:<\/strong> \u062d\u0627\u0641\u0638\u0647 \u0628\u0647 <strong>58 \u0645\u06af\u0627\u0628\u0627\u06cc\u062a<\/strong>&#8211; \u06a9\u0633\u0631\u06cc \u0627\u0632 \u0627\u0635\u0644\u06cc.<\/li>\n<\/ul>\n<p>\u0627\u06cc\u0646 \u0641\u0642\u0637 \u0628\u0647\u06cc\u0646\u0647 \u0633\u0627\u0632\u06cc \u0646\u06cc\u0633\u062a \u061b \u0627\u06cc\u0646 \u06cc\u06a9 \u0645\u0632\u0627\u062d\u0645\u062a \u0627\u0633\u062a.<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code>    <span class=\"c1\"># 8-bit configuration\n<\/span>    <span class=\"n\">quant_config_8bit<\/span> <span class=\"o\">=<\/span> <span class=\"nc\">BitsAndBytesConfig<\/span><span class=\"p\">(<\/span><span class=\"n\">load_in_8bit<\/span><span class=\"o\">=<\/span><span class=\"bp\">True<\/span><span class=\"p\">)<\/span>\n\n    <span class=\"nf\">print<\/span><span class=\"p\">(<\/span><span class=\"sa\">f<\/span><span class=\"sh\">\"<\/span><span class=\"s\">Pre-load memory: <\/span><span class=\"si\">{<\/span><span class=\"nf\">get_memory_usage<\/span><span class=\"p\">()<\/span><span class=\"si\">}<\/span><span class=\"s\"> MB<\/span><span class=\"sh\">\"<\/span><span class=\"p\">)<\/span>  <span class=\"c1\"># 9.18 MB\n<\/span>    <span class=\"n\">model_int8<\/span> <span class=\"o\">=<\/span> <span class=\"n\">AutoModelForCausalLM<\/span><span class=\"p\">.<\/span><span class=\"nf\">from_pretrained<\/span><span class=\"p\">(<\/span>\n        <span class=\"n\">model_name<\/span><span class=\"p\">,<\/span> \n        <span class=\"n\">quantization_config<\/span><span class=\"o\">=<\/span><span class=\"n\">quant_config_8bit<\/span>\n    <span class=\"p\">)<\/span>\n\n    <span class=\"c1\"># Dynamic input handling\n<\/span>    <span class=\"n\">inputs_int8<\/span> <span class=\"o\">=<\/span> <span class=\"nf\">tokenizer<\/span><span class=\"p\">(<\/span><span class=\"n\">input_text<\/span><span class=\"p\">,<\/span> <span class=\"n\">return_tensors<\/span><span class=\"o\">=<\/span><span class=\"sh\">\"<\/span><span class=\"s\">pt<\/span><span class=\"sh\">\"<\/span><span class=\"p\">).<\/span><span class=\"nf\">to<\/span><span class=\"p\">(<\/span><span class=\"n\">model_int8<\/span><span class=\"p\">.<\/span><span class=\"n\">device<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">start_time<\/span> <span class=\"o\">=<\/span> <span class=\"n\">time<\/span><span class=\"p\">.<\/span><span class=\"nf\">time<\/span><span class=\"p\">()<\/span>\n    <span class=\"n\">output<\/span> <span class=\"o\">=<\/span> <span class=\"n\">model_int8<\/span><span class=\"p\">.<\/span><span class=\"nf\">generate<\/span><span class=\"p\">(<\/span><span class=\"o\">**<\/span><span class=\"n\">inputs_int8<\/span><span class=\"p\">,<\/span> <span class=\"n\">max_length<\/span><span class=\"o\">=<\/span><span class=\"mi\">50<\/span><span class=\"p\">)<\/span>  <span class=\"c1\"># 1.38s\n<\/span><\/code><\/pre>\n<div class=\"highlight__panel js-actions-panel\">\n<div class=\"highlight__panel-action js-fullscreen-code-action\">\n    <svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" class=\"highlight-action crayons-icon highlight-action--fullscreen-on\"><title>\u062d\u0627\u0644\u062a \u062a\u0645\u0627\u0645 \u0635\u0641\u062d\u0647 \u0631\u0627 \u0648\u0627\u0631\u062f \u06a9\u0646\u06cc\u062f<\/title>\n    <path d=\"M16 3h6v6h-2V5h-4V3zM2 3h6v2H4v4H2V3zm18 16v-4h2v6h-6v-2h4zM4 19h4v2H2v-6h2v4z\"\/>\n<\/svg><\/p>\n<p>    <svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" class=\"highlight-action crayons-icon highlight-action--fullscreen-off\"><title>\u0627\u0632 \u062d\u0627\u0644\u062a \u062a\u0645\u0627\u0645 \u0635\u0641\u062d\u0647 \u062e\u0627\u0631\u062c \u0634\u0648\u06cc\u062f<\/title>\n    <path d=\"M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z\"\/>\n<\/svg><\/p>\n<\/div>\n<\/div>\n<\/div>\n<hr\/>\n<h2><span class=\"ez-toc-section\" id=\"%D8%AA%D8%AC%D8%A7%D8%B1%D8%AA_%D8%AF%D9%82%D8%AA_%D8%AF%D8%B1_%D9%85%D9%82%D8%A7%D8%A8%D9%84_%D8%B9%D9%85%D9%84%DB%8C\"><\/span>\n<p>  \u062a\u062c\u0627\u0631\u062a: \u062f\u0642\u062a \u062f\u0631 \u0645\u0642\u0627\u0628\u0644 \u0639\u0645\u0644\u06cc<br \/>\n<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>\u06a9\u0645\u06cc\u062a \u0631\u0627\u06cc\u06af\u0627\u0646 \u0646\u06cc\u0633\u062a. \u06a9\u0627\u0647\u0634 \u062f\u0642\u062a \u0645\u06cc \u062a\u0648\u0627\u0646\u062f \u0628\u0647 \u0637\u0648\u0631 \u0638\u0631\u06cc\u0641 \u062f\u0642\u062a \u0645\u062f\u0644 \u0631\u0627 \u062a\u062e\u0631\u06cc\u0628 \u06a9\u0646\u062f \u060c \u0627\u0645\u0627 \u0628\u0631\u0627\u06cc \u0628\u0633\u06cc\u0627\u0631\u06cc \u0627\u0632 \u06a9\u0627\u0631\u0647\u0627 &#8211; \u0645\u0627\u0646\u0646\u062f \u062a\u0648\u0644\u06cc\u062f \u0645\u062a\u0646 \u06af\u0627\u0647 \u0628\u0647 \u06af\u0627\u0647 &#8211; \u062a\u0641\u0627\u0648\u062a \u063a\u06cc\u0631\u0642\u0627\u0628\u0644 \u062a\u0635\u0648\u0631 \u0627\u0633\u062a. \u0622\u0646\u0686\u0647 \u0645\u0627 \u0628\u0647 \u062f\u0633\u062a \u0645\u06cc \u0622\u0648\u0631\u06cc\u0645 \u0628\u0633\u06cc\u0627\u0631 \u0628\u06cc\u0634\u062a\u0631 \u0627\u0632 \u0647\u0632\u06cc\u0646\u0647 \u0627\u0633\u062a:<\/p>\n<ul>\n<li>\n<strong>\u0631\u0627\u0646\u062f\u0645\u0627\u0646 \u062d\u0627\u0641\u0638\u0647:<\/strong>FP32: 511 MB \u2192 INT8: 187 MB \u2192 INT4: 149 MB.<\/li>\n<\/ul>\n<p><em>\u0646\u062a\u06cc\u062c\u0647:<\/em> \u0645\u062f\u0644 \u0647\u0627 \u062f\u0631 \u0645\u062d\u062f\u0648\u062f\u06cc\u062a \u0647\u0627\u06cc \u062d\u0627\u0641\u0638\u0647 \u0645\u062d\u06a9\u0645 \u062a\u0631 \u0642\u0631\u0627\u0631 \u0645\u06cc \u06af\u06cc\u0631\u0646\u062f \u0648 \u0627\u0645\u06a9\u0627\u0646 \u0627\u0633\u062a\u0642\u0631\u0627\u0631 \u062f\u0631 GPU \u0647\u0627\u06cc \u0645\u0635\u0631\u0641 \u06a9\u0646\u0646\u062f\u0647 \u06cc\u0627 \u062f\u0633\u062a\u06af\u0627\u0647 \u0647\u0627\u06cc Edge \u0631\u0627 \u0641\u0631\u0627\u0647\u0645 \u0645\u06cc \u06a9\u0646\u0646\u062f.<\/p>\n<ul>\n<li>\n<strong>\u0633\u0631\u0639\u062a \u0627\u0633\u062a\u0646\u062a\u0627\u062c:<\/strong>FP32: 1.76s \u2192 Int8: 1.38s \u2192 Int4: 1.08s.<\/li>\n<\/ul>\n<p><em>\u0646\u062a\u06cc\u062c\u0647:<\/em> \u067e\u0627\u0633\u062e \u0647\u0627\u06cc \u0633\u0631\u06cc\u0639\u062a\u0631 \u0628\u0631\u0627\u06cc \u0628\u0631\u0646\u0627\u0645\u0647 \u0647\u0627\u06cc \u0632\u0645\u0627\u0646 \u0648\u0627\u0642\u0639\u06cc \u060c \u0627\u0632 Chatbots \u06af\u0631\u0641\u062a\u0647 \u062a\u0627 \u062a\u0648\u0644\u06cc\u062f \u0645\u062d\u062a\u0648\u0627\u06cc \u062e\u0648\u062f\u06a9\u0627\u0631.<\/p>\n<hr\/>\n<h2><span class=\"ez-toc-section\" id=\"%DA%86%DA%AF%D9%88%D9%86%D9%87_%DA%A9%D8%A7%D8%B1_%D9%85%DB%8C_%DA%A9%D9%86%D8%AF_%D9%85%DA%A9%D8%A7%D9%86%DB%8C%DA%A9_%D9%81%D8%B4%D8%B1%D8%AF%D9%87_%D8%B3%D8%A7%D8%B2%DB%8C\"><\/span>\n<p>  \u0686\u06af\u0648\u0646\u0647 \u06a9\u0627\u0631 \u0645\u06cc \u06a9\u0646\u062f: \u0645\u06a9\u0627\u0646\u06cc\u06a9 \u0641\u0634\u0631\u062f\u0647 \u0633\u0627\u0632\u06cc<br \/>\n<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>\u062f\u0631 \u0647\u0633\u062a\u0647 \u0622\u0646 \u060c \u0645\u0642\u0627\u062f\u06cc\u0631 \u0628\u0627 \u062f\u0642\u062a \u0628\u0627\u0644\u0627 (\u0645\u0627\u0646\u0646\u062f \u0634\u0646\u0627\u0648\u0631\u0647\u0627\u06cc 32 \u0628\u06cc\u062a\u06cc) \u0628\u0647 \u0642\u0627\u0644\u0628\u0647\u0627\u06cc \u0628\u0627 \u062f\u0642\u062a \u067e\u0627\u06cc\u06cc\u0646 (\u0639\u062f\u062f \u0635\u062d\u06cc\u062d 8- \u06cc\u0627 4 \u0628\u06cc\u062a\u06cc) \u0646\u0642\u0634\u0647 \u0645\u06cc \u06a9\u0646\u062f. \u0628\u0647 \u0639\u0646\u0648\u0627\u0646 \u0645\u062b\u0627\u0644:<\/p>\n<ul>\n<li>\n<strong>fp32<\/strong> \u0627\u0632 32 \u0628\u06cc\u062a \u062f\u0631 \u0647\u0631 \u0634\u0645\u0627\u0631\u0647 \u0627\u0633\u062a\u0641\u0627\u062f\u0647 \u0645\u06cc \u06a9\u0646\u062f \u060c \u062c\u0632\u0626\u06cc\u0627\u062a \u062e\u0648\u0628 \u0631\u0627 \u0636\u0628\u0637 \u0645\u06cc \u06a9\u0646\u062f \u0627\u0645\u0627 \u062e\u0648\u0627\u0633\u062a\u0627\u0631 \u0645\u0646\u0627\u0628\u0639 \u0633\u0646\u06af\u06cc\u0646 \u0627\u0633\u062a.<\/li>\n<li>\n<strong>you8\/you4<\/strong> \u0627\u0632 \u0628\u06cc\u062a \u0647\u0627\u06cc \u06a9\u0645\u062a\u0631\u06cc \u0627\u0633\u062a\u0641\u0627\u062f\u0647 \u06a9\u0646\u06cc\u062f \u060c \u0645\u0642\u0627\u062f\u06cc\u0631 \u062a\u0642\u0631\u06cc\u0628\u06cc \u0628\u0627 \u062d\u062f\u0627\u0642\u0644 \u0636\u0631\u0631.<\/li>\n<\/ul>\n<p>\u062f\u0631 <code>bitsandbytes<\/code> \u06a9\u062a\u0627\u0628\u062e\u0627\u0646\u0647 \u0627\u06cc\u0646 \u06a9\u0627\u0631 \u0631\u0627 \u0628\u0647 \u0637\u0648\u0631 \u062e\u0648\u062f\u06a9\u0627\u0631 \u0627\u0646\u062c\u0627\u0645 \u0645\u06cc \u062f\u0647\u062f \u060c \u0648\u0632\u0646 \u0647\u0627 \u0631\u0627 \u0628\u0627\u0632\u067e\u0631\u062f\u0627\u062e\u062a \u0645\u06cc \u06a9\u0646\u062f \u0648 \u0645\u062d\u0627\u0633\u0628\u0627\u062a \u0631\u0627 \u0628\u0631\u0627\u06cc \u062d\u0641\u0638 \u062b\u0628\u0627\u062a \u062a\u0646\u0638\u06cc\u0645 \u0645\u06cc \u06a9\u0646\u062f.<\/p>\n<hr\/>\n<h2><span class=\"ez-toc-section\" id=\"%D8%A7%D8%AB%D8%A8%D8%A7%D8%AA_%D8%A8%D8%B5%D8%B1%DB%8C\"><\/span>\n<p>  \u0627\u062b\u0628\u0627\u062a \u0628\u0635\u0631\u06cc<br \/>\n<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><\/p>\n<p>\u06cc\u06a9 \u0645\u0642\u0627\u06cc\u0633\u0647 \u062c\u0627\u0646\u0628\u06cc \u0627\u06cc\u0646 \u0627\u0633\u062a\u062f\u0644\u0627\u0644 \u0631\u0627 \u0645\u0647\u0631 \u0645\u06cc \u06a9\u0646\u062f:<\/p>\n<ul>\n<li>\n<strong>\u0627\u0633\u062a\u0641\u0627\u062f\u0647 \u0627\u0632 \u062d\u0627\u0641\u0638\u0647 (\u0646\u0645\u0648\u062f\u0627\u0631 \u0646\u0648\u0627\u0631):<\/strong> \u0628\u0631\u062c \u0647\u0627\u06cc FP32 \u0628\u06cc\u0634 \u0627\u0632 INT8 \u0648 INT4 \u060c \u0646\u0634\u0627\u0646 \u062f\u0647\u0646\u062f\u0647 \u06a9\u0627\u0647\u0634 \u0634\u062f\u06cc\u062f \u062e\u0648\u0627\u0633\u062a\u0647 \u0647\u0627\u06cc \u0645\u0646\u0627\u0628\u0639.<\/li>\n<li>\n<strong>\u0632\u0645\u0627\u0646 \u0627\u0633\u062a\u0646\u062a\u0627\u062c (\u0637\u0631\u062d \u062e\u0637):<\/strong> \u0634\u06cc\u0628 \u0631\u0648 \u0628\u0647 \u067e\u0627\u06cc\u06cc\u0646 \u0627\u0632 FP32 \u0628\u0647 INT4 \u062f\u0633\u062a\u0627\u0648\u0631\u062f\u0647\u0627\u06cc \u0633\u0631\u0639\u062a \u0631\u0627 \u0628\u0631\u062c\u0633\u062a\u0647 \u0645\u06cc \u06a9\u0646\u062f.<\/li>\n<\/ul>\n<p>\u063a\u0630\u0627\u06cc \u0622\u0645\u0627\u062f\u0647\u061f \u06a9\u0645\u06cc\u062a \u0641\u0642\u0637 \u06cc\u06a9 \u067e\u0627\u0648\u0631\u0642\u06cc \u0641\u0646\u06cc \u0646\u06cc\u0633\u062a &#8211; \u0627\u06cc\u0646 \u06cc\u06a9 \u0627\u0628\u0632\u0627\u0631 \u0639\u0645\u0644\u06cc \u0628\u0631\u0627\u06cc \u062f\u0645\u0648\u06a9\u0631\u0627\u062a\u06cc\u06a9 \u06a9\u0631\u062f\u0646 \u0647\u0648\u0634 \u0645\u0635\u0646\u0648\u0639\u06cc \u0627\u0633\u062a.<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight python\"><code>    <span class=\"c1\"># Visualization setup\n<\/span>    <span class=\"kn\">import<\/span> <span class=\"n\">matplotlib.pyplot<\/span> <span class=\"k\">as<\/span> <span class=\"n\">plt<\/span>\n    <span class=\"n\">quantization_types<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"sh\">'<\/span><span class=\"s\">FP32<\/span><span class=\"sh\">'<\/span><span class=\"p\">,<\/span> <span class=\"sh\">'<\/span><span class=\"s\">INT8<\/span><span class=\"sh\">'<\/span><span class=\"p\">,<\/span> <span class=\"sh\">'<\/span><span class=\"s\">INT4<\/span><span class=\"sh\">'<\/span><span class=\"p\">]<\/span>\n\n    <span class=\"n\">fig<\/span><span class=\"p\">,<\/span> <span class=\"n\">ax1<\/span> <span class=\"o\">=<\/span> <span class=\"n\">plt<\/span><span class=\"p\">.<\/span><span class=\"nf\">subplots<\/span><span class=\"p\">(<\/span><span class=\"n\">figsize<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"mi\">8<\/span><span class=\"p\">,<\/span> <span class=\"mi\">6<\/span><span class=\"p\">))<\/span>\n    <span class=\"n\">bars<\/span> <span class=\"o\">=<\/span> <span class=\"n\">ax1<\/span><span class=\"p\">.<\/span><span class=\"nf\">bar<\/span><span class=\"p\">(<\/span><span class=\"n\">quantization_types<\/span><span class=\"p\">,<\/span> <span class=\"n\">memory_usages<\/span><span class=\"p\">,<\/span> <span class=\"n\">color<\/span><span class=\"o\">=<\/span><span class=\"sh\">'<\/span><span class=\"s\">blue<\/span><span class=\"sh\">'<\/span><span class=\"p\">,<\/span> <span class=\"n\">alpha<\/span><span class=\"o\">=<\/span><span class=\"mf\">0.7<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">ax1<\/span><span class=\"p\">.<\/span><span class=\"nf\">set_ylabel<\/span><span class=\"p\">(<\/span><span class=\"sh\">'<\/span><span class=\"s\">Memory (MB)<\/span><span class=\"sh\">'<\/span><span class=\"p\">,<\/span> <span class=\"n\">color<\/span><span class=\"o\">=<\/span><span class=\"sh\">'<\/span><span class=\"s\">blue<\/span><span class=\"sh\">'<\/span><span class=\"p\">)<\/span>\n\n    <span class=\"c1\"># Annotation logic\n<\/span>    <span class=\"k\">for<\/span> <span class=\"n\">bar<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">bars<\/span><span class=\"p\">:<\/span>\n        <span class=\"n\">yval<\/span> <span class=\"o\">=<\/span> <span class=\"n\">bar<\/span><span class=\"p\">.<\/span><span class=\"nf\">get_height<\/span><span class=\"p\">()<\/span>\n        <span class=\"n\">ax1<\/span><span class=\"p\">.<\/span><span class=\"nf\">text<\/span><span class=\"p\">(<\/span><span class=\"n\">bar<\/span><span class=\"p\">.<\/span><span class=\"nf\">get_x<\/span><span class=\"p\">()<\/span> <span class=\"o\">+<\/span> <span class=\"n\">bar<\/span><span class=\"p\">.<\/span><span class=\"nf\">get_width<\/span><span class=\"p\">()<\/span><span class=\"o\">\/<\/span><span class=\"mi\">2<\/span><span class=\"p\">,<\/span> <span class=\"n\">yval<\/span><span class=\"o\">+<\/span><span class=\"mi\">30<\/span><span class=\"p\">,<\/span> \n                 <span class=\"sa\">f<\/span><span class=\"sh\">'<\/span><span class=\"si\">{<\/span><span class=\"n\">yval<\/span><span class=\"si\">:<\/span><span class=\"p\">.<\/span><span class=\"mi\">2<\/span><span class=\"n\">f<\/span><span class=\"si\">}<\/span><span class=\"sh\">'<\/span><span class=\"p\">,<\/span> <span class=\"n\">ha<\/span><span class=\"o\">=<\/span><span class=\"sh\">'<\/span><span class=\"s\">center<\/span><span class=\"sh\">'<\/span><span class=\"p\">,<\/span> <span class=\"n\">va<\/span><span class=\"o\">=<\/span><span class=\"sh\">'<\/span><span class=\"s\">bottom<\/span><span class=\"sh\">'<\/span><span class=\"p\">,<\/span> \n                 <span class=\"n\">color<\/span><span class=\"o\">=<\/span><span class=\"sh\">'<\/span><span class=\"s\">blue<\/span><span class=\"sh\">'<\/span><span class=\"p\">,<\/span> <span class=\"n\">fontweight<\/span><span class=\"o\">=<\/span><span class=\"sh\">'<\/span><span class=\"s\">bold<\/span><span class=\"sh\">'<\/span><span class=\"p\">)<\/span>\n\n    <span class=\"c1\"># Dual-axis formatting\n<\/span>    <span class=\"n\">ax2<\/span> <span class=\"o\">=<\/span> <span class=\"n\">ax1<\/span><span class=\"p\">.<\/span><span class=\"nf\">twinx<\/span><span class=\"p\">()<\/span>\n    <span class=\"n\">ax2<\/span><span class=\"p\">.<\/span><span class=\"nf\">plot<\/span><span class=\"p\">(<\/span><span class=\"n\">quantization_types<\/span><span class=\"p\">,<\/span> <span class=\"n\">inference_times<\/span><span class=\"p\">,<\/span> <span class=\"n\">color<\/span><span class=\"o\">=<\/span><span class=\"sh\">'<\/span><span class=\"s\">red<\/span><span class=\"sh\">'<\/span><span class=\"p\">,<\/span> \n             <span class=\"n\">marker<\/span><span class=\"o\">=<\/span><span class=\"sh\">'<\/span><span class=\"s\">o<\/span><span class=\"sh\">'<\/span><span class=\"p\">,<\/span> <span class=\"n\">linewidth<\/span><span class=\"o\">=<\/span><span class=\"mi\">2<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">ax2<\/span><span class=\"p\">.<\/span><span class=\"nf\">set_ylabel<\/span><span class=\"p\">(<\/span><span class=\"sh\">'<\/span><span class=\"s\">Time (sec)<\/span><span class=\"sh\">'<\/span><span class=\"p\">,<\/span> <span class=\"n\">color<\/span><span class=\"o\">=<\/span><span class=\"sh\">'<\/span><span class=\"s\">red<\/span><span class=\"sh\">'<\/span><span class=\"p\">)<\/span>\n\n    <span class=\"n\">plt<\/span><span class=\"p\">.<\/span><span class=\"nf\">title<\/span><span class=\"p\">(<\/span><span class=\"sh\">'<\/span><span class=\"s\">Quantization Trade-offs<\/span><span class=\"sh\">'<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">plt<\/span><span class=\"p\">.<\/span><span class=\"nf\">show<\/span><span class=\"p\">()<\/span>\n<\/code><\/pre>\n<div class=\"highlight__panel js-actions-panel\">\n<div class=\"highlight__panel-action js-fullscreen-code-action\">\n    <svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" class=\"highlight-action crayons-icon highlight-action--fullscreen-on\"><title>\u062d\u0627\u0644\u062a \u062a\u0645\u0627\u0645 \u0635\u0641\u062d\u0647 \u0631\u0627 \u0648\u0627\u0631\u062f \u06a9\u0646\u06cc\u062f<\/title>\n    <path d=\"M16 3h6v6h-2V5h-4V3zM2 3h6v2H4v4H2V3zm18 16v-4h2v6h-6v-2h4zM4 19h4v2H2v-6h2v4z\"\/>\n<\/svg><\/p>\n<p>    <svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" class=\"highlight-action crayons-icon highlight-action--fullscreen-off\"><title>\u0627\u0632 \u062d\u0627\u0644\u062a \u062a\u0645\u0627\u0645 \u0635\u0641\u062d\u0647 \u062e\u0627\u0631\u062c \u0634\u0648\u06cc\u062f<\/title>\n    <path d=\"M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z\"\/>\n<\/svg><\/p>\n<\/div>\n<\/div>\n<\/div>\n<hr\/>\n<h2><span class=\"ez-toc-section\" id=\"%D8%AD%D8%B1%D9%81_%D8%A2%D8%AE%D8%B1\"><\/span>\n<p>  \u062d\u0631\u0641 \u0622\u062e\u0631<br \/>\n<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>\u0627\u0632 \u0637\u0631\u06cc\u0642 \u06a9\u0645\u06cc\u062a \u060c \u0645\u0627 GPT-2 \u0631\u0627 \u0627\u0632 \u06cc\u06a9 Behemoth \u0633\u0646\u06af\u06cc\u0646 \u0645\u0646\u0627\u0628\u0639 \u0628\u0647 \u06cc\u06a9 \u0627\u0628\u0632\u0627\u0631 \u0632\u06cc\u0631\u06a9 \u0648 \u06a9\u0627\u0631\u0622\u0645\u062f \u062a\u0628\u062f\u06cc\u0644 \u06a9\u0631\u062f\u0647 \u0627\u06cc\u0645-\u0628\u0627 \u0628\u06cc\u0627\u0646 \u0627\u06cc\u0646\u06a9\u0647 \u0628\u0627 \u062a\u06a9\u0646\u06cc\u06a9 \u0647\u0627\u06cc \u0645\u0646\u0627\u0633\u0628 \u060c \u062d\u062a\u06cc \u063a\u0648\u0644 \u0647\u0627 \u0645\u06cc \u062a\u0648\u0627\u0646\u0646\u062f \u06cc\u0627\u062f \u0628\u06af\u06cc\u0631\u0646\u062f \u06a9\u0647 \u0628\u0647 \u0622\u0631\u0627\u0645\u06cc \u062d\u0631\u06a9\u062a \u06a9\u0646\u0646\u062f.<\/p>\n<p>\u0627\u06cc\u0646 \u067e\u06cc\u0627\u062f\u0647 \u0633\u0627\u0632\u06cc \u0642\u062f\u0631\u062a \u06a9\u0645\u06cc\u062a \u0631\u0627 \u0627\u0632 \u0637\u0631\u06cc\u0642 \u06a9\u062f \u0648 \u0627\u0646\u062f\u0627\u0632\u0647 \u06af\u06cc\u0631\u06cc \u0647\u0627\u06cc \u0645\u0634\u062e\u0635 \u0646\u0634\u0627\u0646 \u0645\u06cc \u062f\u0647\u062f. \u0628\u0627 \u0627\u0635\u0644\u0627\u062d \u0641\u0642\u0637 10-15 \u062e\u0637 \u067e\u06cc\u06a9\u0631\u0628\u0646\u062f\u06cc \u0648 \u0627\u0633\u062a\u0642\u0631\u0627\u0631 \u06a9\u0645\u06cc\u062a \u060c \u0645\u0627 \u0628\u0647 \u062f\u0633\u062a \u0622\u0648\u0631\u062f\u06cc\u0645:<\/p>\n<ul>\n<li>71 \u066a \u06a9\u0627\u0647\u0634 \u062f\u0631 \u0631\u062f\u067e\u0627\u06cc \u062d\u0627\u0641\u0638\u0647\n<\/li>\n<li>39 \u066a \u0633\u0631\u0639\u062a \u0627\u0633\u062a\u0646\u0628\u0627\u0637 \u0633\u0631\u06cc\u0639\u062a\u0631<\/li>\n<\/ul>\n<p>\u0627\u06af\u0631 \u06a9\u0646\u062c\u06a9\u0627\u0648 \u0647\u0633\u062a\u06cc\u062f \u0648 \u0645\u06cc \u062e\u0648\u0627\u0647\u06cc\u062f \u0628\u0631\u0627\u06cc \u0622\u0632\u0645\u0627\u06cc\u0634 \u06a9\u0627\u0645\u0644 \u0628\u0647 \u0646\u0648\u062a \u0628\u0648\u06a9 \u06a9\u0627\u0645\u0644 \u0628\u067e\u0631\u062f\u0627\u0632\u06cc\u062f &#8211; \u0628\u0647 Google Colab \u0628\u0631\u0648\u06cc\u062f.<\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>\u062a\u0635\u0648\u0631 \u06a9\u0646\u06cc\u062f \u06a9\u0647 \u06cc\u06a9 \u0627\u0644\u06af\u0648\u06cc \u0632\u0628\u0627\u0646 \u0642\u062f\u0631\u062a\u0645\u0646\u062f \u0645\u0627\u0646\u0646\u062f GPT-2-\u06a9\u0647 \u0642\u0627\u062f\u0631 \u0628\u0647 \u062a\u0647\u06cc\u0647 \u062f\u0627\u0633\u062a\u0627\u0646 \u060c \u067e\u0627\u0633\u062e \u062f\u0627\u062f\u0646 \u0628\u0647 \u0633\u0624\u0627\u0644\u0627\u062a \u0648 \u062a\u0642\u0644\u06cc\u062f \u0627\u0632 \u0645\u062a\u0646 \u0627\u0646\u0633\u0627\u0646\u06cc \u0627\u0633\u062a-\u0648 \u0641\u0634\u0631\u062f\u0647 \u0633\u0627\u0632\u06cc \u0622\u0646 \u0631\u0627 \u0628\u0647 \u06cc\u06a9 \u0646\u0633\u062e\u0647 \u0644\u0627\u063a\u0631 \u0648 \u0633\u0631\u06cc\u0639\u062a\u0631 \u0628\u062f\u0648\u0646 \u0627\u06cc\u0646\u06a9\u0647 \u0642\u0627\u0628\u0644\u06cc\u062a \u0647\u0627\u06cc \u0622\u0646 \u0631\u0627 \u0641\u0634\u0631\u062f\u0647 \u06a9\u0646\u06cc\u062f \u060c \u0627\u0633\u062a\u0641\u0627\u062f\u0647 \u06a9\u0646\u06cc\u062f. \u0627\u06cc\u0646 \u0646\u0648\u06cc\u062f \u06a9\u0645\u06cc\u062a \u0627\u0633\u062a: \u062a\u06a9\u0646\u06cc\u06a9\u06cc \u06a9\u0647 \u062f\u0642\u062a \u0645\u062d\u0627\u0633\u0628\u0627\u062a \u06cc\u06a9 \u0645\u062f\u0644 &hellip;<\/p>\n","protected":false},"author":2,"featured_media":94725,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"","fifu_image_alt":"","footnotes":""},"categories":[339],"tags":[],"class_list":["post-94724","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-dev"],"_links":{"self":[{"href":"https:\/\/nabfollower.com\/blog\/wp-json\/wp\/v2\/posts\/94724","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nabfollower.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nabfollower.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nabfollower.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/nabfollower.com\/blog\/wp-json\/wp\/v2\/comments?post=94724"}],"version-history":[{"count":0,"href":"https:\/\/nabfollower.com\/blog\/wp-json\/wp\/v2\/posts\/94724\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/nabfollower.com\/blog\/wp-json\/wp\/v2\/media\/94725"}],"wp:attachment":[{"href":"https:\/\/nabfollower.com\/blog\/wp-json\/wp\/v2\/media?parent=94724"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nabfollower.com\/blog\/wp-json\/wp\/v2\/categories?post=94724"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nabfollower.com\/blog\/wp-json\/wp\/v2\/tags?post=94724"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}