Part 2: Manipulate flow

In the previous tutorial, we have created a simple flow to retrieve information from Wikipedia. In this tutorial, we will perform question and answering on the text.

Define a Function to do question and answering

We will use llama-cpp-python with the Llama 2 Chat 7B model to do question answering. Make sure that you have installed pip install llama-cpp-python and downloaded the weights. Once doing so and assume the model is downloaded to the current working directory, you can verify the model is working correctly:

from llama_cpp import Llama

model = Llama(model_path="./llama-2-7b-chat.Q4_K_M.gguf")
print(model("Who is Albert Einstein?")

Sample LLM output

  {
    "id": "cmpl-35e9becf-d723-4a2f-acfe-8a6b838c4eb9",
    "object": "text_completion",
    "created": 1695513496,
    "model": "/home/john/llama.cpp/models/llama-2-7b-chat.Q4_K_M.gguf",
    "choices": [
      {
        "text": "?\nEinstein is one of the most well-known scientists...",
        "index": 0,
        "logprobs": null,
        "finish_reason": "length"
      }
    ],
    "usage": {
      "prompt_tokens": 6,
      "completion_tokens": 128,
      "total_tokens": 134
    }
  }

Let’s jump into building a question answering component. For this demo, our question & answering component takes a question from user, decides what keywords to ask Wikipedia, retrieve info from Wikipedia, and then synthesize a final answer.

class AnswerWithWikipedia(Function):
    llm: Function
    get_info: Function

    def run(self, question: str) -> str:

        # list the keywords ask Wikipedia
        text_in = f"For question '{question}', I will search with keyword \""
        text_out = self.llm(text_in, max_tokens=128, stop=["\n"])["choices"][0]["text"]
        text_out = '"' + text_out
        keywords = list(set(re.findall('"([^"]*)"', text_out)))

        # get information
        infos = []
        for keyword in keywords:
            info = self.get_info(keyword)
            infos.append(info)
        context = ". ".join(infos)

        # synthesize answer
        text_in = (
            "Given the source, answer the question using only the knowledge "
            "in the source, if the knowledge isn't there, just say I don't "
            f"know. The question is \"{question}\". "
            f"Given the source:\n\n'{context}'\n\nThe answer is:"
        )
        self.log_progress("text_in", text=text_in)
        answer = self.llm(text_in, max_tokens=256, stop=["\n"])["choices"][0]["text"]

        return answer

guru = AnswerWithWikipedia(
    llm=Llama(model_path="./llama-2-7b-chat.Q4_K_M.gguf", n_ctx=1024),
    get_info=LearnFromWikipedia(search=WikipediaSearch())
)
print(guru("When was Einstein born?"))

Likely we would be greeted with a ValueError from llama-cpp-python that looks similar to this ValueError: Requested tokens (633) exceed context window of 1024. The reason is we set the maximum context size to be 1024, but the prompt that contains both the question and the context likely exceed that. In fact, this can be verified with last_run, as we explicitly log information of text_in in our run.

print(len(guru.last_run.logs("text_in")["text"]))

Depending on LLM predictions and Wikipedia particular edits, the context length can vary but it can be usually larger than 1000. To resolve this issue, we can directly increase context size of Llama, or we can reduce the number of search result in the WikipediaSearch. For demonstration purpose, let’s use the later.

Manipulate run param

The limit param in WikipediaSearch is defined as run param, and it isn’t utilized in AnswerWithWikipedia.run. For portability purpose, it might not be possible to explicitly make use of limit by self.get_info(keyword, limit=limit). What if later on we want to substitute WikipediaSearch with other search method that doesn’t have the limit param? Furthermore, doing so requires AnswerWithWikipedia.run to include limit as 1 of its params as well. This results in a tight interface coupling between these 2 Functions Plus, if there are multiple candidates of get_info, each with their own set of run params, then we can imagine how unwieldy AnwerWithWikipedia will become.

theflow allows setting run parameters dynamically with set_run(kwargs) where kwargs is a dictionary of run name and value. It also allows setting run param for nested node when we supply the node name as a key.

guru.set_run({"get_info": {"limit": 1}})
print(guru("When was Einstein born?"))

Now, there shoudn’t be any error, and the LLM should answer similarly to: I don't know. The source does not provide information on when Albert Einstein was born.! This is expected. Investigating the source, we can see that there’s no mention of Einstein’s birthdate on Wikipedia summary as of Sept 2023.

print(guru.last_run.logs(".get_info")["output"])

String output

Albert Einstein was a German-born theoretical physicist, widely held to be one of the greatest and most influential scientists of all time. Best known for developing the theory of relativity, he also made important contributions to quantum mechanics, and was thus a central figure in the revolutionary reshaping of the scientific understanding of nature that modern physics accomplished in the first decades of the twentieth century. His mass–energy equivalence formula E = mc2, which arises from relativity theory, has been called “the world's most famous equation”. He received the 1921 Nobel Prize in Physics “for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect”, a pivotal step in the development of quantum theory. His work is also known for its influence on the philosophy of science. In a 1999 poll of 130 leading physicists worldwide by the British journal Physics World, Einstein was ranked the greatest physicist of all time. His intellectual achievements and originality have made Einstein synonymous with genius.. The Einstein family is the family of physicist Albert Einstein (1879–1955). Einstein's great-great-great-great-grandfather, Jakob Weil, was his oldest recorded relative, born in the late 17th century, and the family continues to this day. Albert Einstein's great-great-grandfather, Löb Moses Sontheimer (1745–1831), was also the grandfather of the tenor Heinrich Sontheim (1820–1912) of Stuttgart.

There’s a mention of when Einstein received Nobel prize, so let’s try asking that instead:

print(guru("When did Einstein receive the Nobel Prize?"))

The answer should be better: 1921. According to the source, Albert Einstein received the Nobel Prize in Physics in 1921 for his services to.

Export and import flow

As a flow becomes more complex, it can become harder to communicate that flow with other people. A flow with Function can be exported into a config file. And other people or later on can be imported back to a flow object ready to use.

import yaml


config = guru.dump()
print(yaml.dump(config, sort_keys=False))

String output

Cannot serialize <llama_cpp.llama.Llama object at 0x7f15b75d8dc0>. Consider implementing __persist_flow__
type: __main__.AnswerWithWikipedia
params: {}
nodes:
  get_info:
    type: __main__.LearnFromWikipedia
    params: {}
    nodes:
      retrieve:
        type: __main__.RetrieveArticle
        params:
          full_text: false
        nodes: {}
      search:
        type: __main__.WikipediaSearch
        params: {}
        nodes: {}
  llm:
    type: theflow.base.ProxyFunction
    params: {}
    nodes: {}

Once having the config dict (either from loading it from a json/yaml file), and having the pipeline definition, we can initialize the exact flow object by:

from theflow import load

guru2 = load(config)

What’s next?

In this tutorial, we have finished a question and answering module. Along the way, we have dynamically manipulated run param, track errors from run log, export and import flow. In the next tutorial, we will deploy this flow with Docker Compose.