RAG با llama.cpp و خدمات API خارجی

ek3nk4r 2024-05-31

0 17 خواندن این مطلب 6 دقیقه زمان میبرد

پیشنهاد ویژه

خرید فالوور واقعی خرید لایک اینستاگرام خرید ویو اینستاگرام خرید فالوور اینستاگرام

txtai یک پایگاه داده تعبیه‌شده یکپارچه برای جستجوی معنایی، ارکستراسیون LLM و گردش‌های کاری مدل زبان است.

txtai همیشه یک چارچوب محلی بوده و خواهد بود. در اصل برای اجرای مدل‌ها بر روی سخت‌افزار محلی با استفاده از ترانسفورماتور Hugging Face طراحی شده بود. همانطور که فضای هوش مصنوعی طی سال گذشته تکامل یافته است، txtai نیز تکامل یافته است. چارچوب‌های استنتاج LLM اضافی برای مدتی با استفاده از llama.cpp و خدمات API خارجی (از طریق LiteLLM) در دسترس بوده‌اند. تغییرات اخیر توانایی استفاده از این چارچوب ها را برای برداری اضافه کرده و استفاده از آن را برای استنتاج LLM آسانتر کرده است.

این مقاله نحوه اجرای فرآیندهای بازیابی نسل افزوده (RAG) (بردارسازی و استنتاج LLM) با llama.cpp و سرویس‌های API خارجی را نشان می‌دهد.

نصب txtai و همه وابستگی ها

# Install txtai and dependencies
pip install llama-cpp-python[server] --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
pip install txtai[pipeline-llm]

مثال اول یک پایگاه داده Embeddings با پشتوانه برداری llama.cpp می سازد.

پروژه llama.cpp بیان می کند: هدف اصلی llama.cpp فعال کردن استنباط LLM با حداقل راه اندازی و عملکرد پیشرفته در طیف گسترده ای از سخت افزار – به صورت محلی و در فضای ابری است..

بیایید آن را امتحان کنیم.

from txtai import Embeddings

# Create Embeddings with llama.cpp GGUF model
embeddings = Embeddings(
    path="second-state/All-MiniLM-L6-v2-Embedding-GGUF/all-MiniLM-L6-v2-Q4_K_M.gguf",
    content=True
)

# Load dataset
wikipedia = Embeddings()
wikipedia.load(provider="huggingface-hub", container="neuml/txtai-wikipedia")

query = """
SELECT id, text FROM txtai
order by percentile desc
LIMIT 10000
"""

# Index dataset
embeddings.index(wikipedia.search(query))

اکنون که پایگاه داده Embeddings آماده است، بیایید یک عبارت جستجو را اجرا کنیم.

embeddings.search("Inventors of electric-powered devices")

[{'id': 'Thomas Edison',
  'text': 'Thomas Alva Edison (February 11, 1847October 18, 1931) was an American inventor and businessman. He developed many devices in fields such as electric power generation, mass communication, sound recording, and motion pictures. These inventions, which include the phonograph, the motion picture camera, and early versions of the electric light bulb, have had a widespread impact on the modern industrialized world. He was one of the first inventors to apply the principles of organized science and teamwork to the process of invention, working with many researchers and employees. He established the first industrial research laboratory.',
  'score': 0.6758285164833069},
 {'id': 'Nikola Tesla',
  'text': 'Nikola Tesla (; , ;  1856\xa0– 7 January 1943) was a Serbian-American inventor, electrical engineer, mechanical engineer, and futurist. He is best-known for his contributions to the design of the modern alternating current (AC) electricity supply system.',
  'score': 0.6077840328216553},
 {'id': 'Alexander Graham Bell',
  'text': 'Alexander Graham Bell (, born Alexander Bell; March 3, 1847 – August 2, 1922) was a  Scottish-born Canadian-American inventor, scientist and engineer who is credited with patenting the first practical telephone. He also co-founded the American Telephone and Telegraph Company (AT&T) in 1885.',
  'score': 0.4573010802268982}]

همانطور که می بینیم، این پایگاه داده Embeddings درست مانند سایر پایگاه داده های Embeddings کار می کند. تفاوت این است که به جای PyTorch از یک مدل llama.cpp برای برداری استفاده می کند.

استنتاج LLM با llama.cpp یک ویژگی txtai جدید نیست. تغییرات اخیر علاوه بر درخواست‌های استاندارد، از پیام‌های مکالمه پشتیبانی می‌کند. این امر نیاز به درک فرمت های درخواستی را از بین می برد.

بیایید یک فرآیند بازیابی نسل افزوده (RAG) را اجرا کنیم که به طور کامل توسط مدل‌های llama.cpp پشتیبانی می‌شود.

توجه به این نکته مهم است که پیام‌های مکالمه با تمام پشتیبان‌های LLM که توسط txtai پشتیبانی می‌شوند کار می‌کنند (ترانسفورماتورها، llama.cpp، litellm).

from txtai import LLM

# LLM instance
llm = LLM(path="TheBloke/Mistral-7B-OpenOrca-GGUF/mistral-7b-openorca.Q4_K_M.gguf")

# Question and context
question = "Write a list of invented electric-powered devices"
context = "\n".join(x["text"] for x in embeddings.search(question))

# Pass messages to LLM
response = llm([
    {"role": "system", "content": "You are a friendly assistant. You answer questions from users."},
    {"role": "user", "content": f"""
Answer the following question using only the context below. Only include information specifically discussed.

question: {question}
context: {context}
"""}
])
print(response)

Based on the given context, here's a list of invented electric-powered devices:

1. Electric light bulb by Thomas Edison
2. Phonograph by Thomas Edison
3. Motion picture camera by Thomas Edison
4. Alternating current (AC) electricity supply system by Nikola Tesla
5. Telephone by Alexander Graham Bell

و همینطور، RAG با llama.cpp🦙!

در مرحله بعد، نشان خواهیم داد که چگونه یک پایگاه داده Embeddings می تواند با سرویس های API خارجی از طریق LiteLLM ادغام شود.

به قول خود پروژه LiteLLM: LiteLLM تعادل بار، بازگشت به عقب و ردیابی هزینه را در بیش از 100 LLM انجام می دهد. همه در قالب OpenAI.

بیایید ابتدا یک سرویس API محلی را راه اندازی کنیم تا برای این نسخه نمایشی استفاده کنیم.

# Download models
wget https://huggingface.co/second-state/All-MiniLM-L6-v2-Embedding-GGUF/resolve/main/all-MiniLM-L6-v2-Q4_K_M.gguf
wget https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF/resolve/main/mistral-7b-openorca.Q4_K_M.gguf

# Start local API services
nohup python -m llama_cpp.server --n_gpu_layers -1 --model all-MiniLM-L6-v2-Q4_K_M.gguf --host 127.0.0.1 --port 8000 &> vector.log &
nohup python -m llama_cpp.server --n_gpu_layers -1 --model mistral-7b-openorca.Q4_K_M.gguf --chat_format chatml --host 127.0.0.1 --port 8001 &> llm.log &
sleep 30

حالا بیایید به هم متصل شویم و از این سرویس محلی برای تولید بردارها برای پایگاه داده Embeddings جدید استفاده کنیم. توجه داشته باشید که سرویس محلی در قالب پاسخ OpenAI پاسخ می دهد، از این رو path تنظیم زیر

from txtai import Embeddings

# Create Embeddings instance with external vectorization
embeddings = Embeddings(
    path="openai/gpt-4-turbo",
    content=True,
    vectors={
        "api_base": "http://localhost:8000/v1",
        "api_key": "sk-1234"
    }
)

# Load dataset
wikipedia = Embeddings()
wikipedia.load(provider="huggingface-hub", container="neuml/txtai-wikipedia")

query = """
SELECT id, text FROM txtai
order by percentile desc
LIMIT 10000
"""

# Index dataset
embeddings.index(wikipedia.search(query))

embeddings.search("Inventors of electric-powered devices")

[{'id': 'Thomas Edison',
  'text': 'Thomas Alva Edison (February 11, 1847October 18, 1931) was an American inventor and businessman. He developed many devices in fields such as electric power generation, mass communication, sound recording, and motion pictures. These inventions, which include the phonograph, the motion picture camera, and early versions of the electric light bulb, have had a widespread impact on the modern industrialized world. He was one of the first inventors to apply the principles of organized science and teamwork to the process of invention, working with many researchers and employees. He established the first industrial research laboratory.',
  'score': 0.6758285164833069},
 {'id': 'Nikola Tesla',
  'text': 'Nikola Tesla (; , ;  1856\xa0– 7 January 1943) was a Serbian-American inventor, electrical engineer, mechanical engineer, and futurist. He is best-known for his contributions to the design of the modern alternating current (AC) electricity supply system.',
  'score': 0.6077840328216553},
 {'id': 'Alexander Graham Bell',
  'text': 'Alexander Graham Bell (, born Alexander Bell; March 3, 1847 – August 2, 1922) was a  Scottish-born Canadian-American inventor, scientist and engineer who is credited with patenting the first practical telephone. He also co-founded the American Telephone and Telegraph Company (AT&T) in 1885.',
  'score': 0.4573010802268982}]

مانند مثال قبلی با llama.cpp، این پایگاه داده Embeddings دقیقاً یکسان رفتار می کند. تفاوت اصلی این است که محتوا برای برداری به یک سرویس خارجی ارسال می شود.

برای آخرین کارمان، یک فرآیند بازیابی نسل افزوده (RAG) را اجرا می کنیم که به طور کامل توسط یک سرویس API خارجی پشتیبانی می شود.

from txtai import LLM

# LLM instance
llm = LLM(path="openai/gpt-4-turbo", api_base="http://localhost:8001/v1", api_key="sk-1234")

# Question and context
question = "Write a list of invented electric-powered devices"
context = "\n".join(x["text"] for x in embeddings.search(question))

# Pass messages to LLM
response = llm([
    {"role": "system", "content": "You are a friendly assistant. You answer questions from users."},
    {"role": "user", "content": f"""
Answer the following question using only the context below. Only include information specifically discussed.

question: {question}
context: {context}
"""}
])
print(response)

Based on the given context, a list of invented electric-powered devices includes:

1. Phonograph by Thomas Edison
2. Motion Picture Camera by Thomas Edison
3. Early versions of the Electric Light Bulb by Thomas Edison
4. AC (Alternating Current) Electricity Supply System by Nikola Tesla
5. Telephone by Alexander Graham Bell

txtai از تعدادی باطن مختلف وکتور و LLM پشتیبانی می کند. روش پیش‌فرض از مدل‌های PyTorch از طریق کتابخانه Hugging Face Transformers استفاده می‌کند. این مقاله نشان داد که چگونه می‌توان از llama.cpp و سرویس‌های API خارجی نیز استفاده کرد.

این پشتیبان‌های برداری اضافی و LLM حداکثر انعطاف‌پذیری و مقیاس‌پذیری را ممکن می‌سازند. به عنوان مثال، بردارسازی را می توان به طور کامل در یک سرویس API خارجی یا یک سرویس محلی دیگر بارگذاری کرد. llama.cpp از دستگاه‌های macOS، شتاب‌دهنده‌های جایگزین مانند پردازنده‌های گرافیکی AMD ROCm / Intel پشتیبانی می‌کند و بر روی دستگاه‌های Raspberry Pi اجرا می‌شود.

دیدن تلاقی همه این پیشرفت های جدید در کنار هم هیجان انگیز است. در ادامه با ما همراه باشید!

ek3nk4r 2024-05-31

0 17 خواندن این مطلب 6 دقیقه زمان میبرد