Generate “Verified” Python Code Utilizing AutoGen Conversable Brokers | by Shahzeb Naveed

Machine Learning

Generate “Verified” Python Code Utilizing AutoGen Conversable Brokers | by Shahzeb Naveed | Apr, 2024

hhhhm

2024年4月9日

Generate “Verified” Python Code Utilizing AutoGen Conversable Brokers | by Shahzeb Naveed | Apr, 2024

[ad_1]

It’s April 2024 and it’s been about 17 months since we’ve been utilizing LLMs like ChatGPT to help us in code era and debugging duties. Whereas it has added an awesome stage of productiveness, there are certainly instances when the code generated is stuffed with bugs and makes us take the nice ole StackOverflow route.

On this article, I’ll give a fast demonstration on how we will deal with this lack of “verification” utilizing Conversable Brokers provided by AutoGen.

What’s AutoGen?

“AutoGen is a framework that permits the event of LLM purposes utilizing a number of brokers that may converse with one another to resolve duties.”

Presenting LeetCode Downside Solver:

Begin with quietly putting in autogen:

!pip set up pyautogen -q --progress-bar off

I’m utilizing Google Colab so I entered by OPENAI_API_KEY within the Secrets and techniques tab, and securely loaded it together with different modules:

import os
import csv
import autogen
from autogen import Cache
from google.colab import userdata
userdata.get('OPENAI_API_KEY')

I’m utilizing gpt-3.5-turbo solely as a result of it’s cheaper than gpt4. Should you can afford dearer experimentation and/otherwise you’re doing issues extra “significantly”, it is best to clearly use a stronger mannequin.

llm_config = {
"config_list": [{"model": "gpt-3.5-turbo", "api_key": userdata.get('OPENAI_API_KEY')}],
"cache_seed": 0,  # seed for reproducibility
"temperature": 0,  # temperature to manage randomness
}

Now, I’ll copy the issue assertion from my favorite LeetCode downside Two Sum. It’s one of the generally requested questions in leetcode-style interviews and covers fundamental ideas like caching utilizing hashmaps and fundamental equation manipulation.

LEETCODE_QUESTION = """
Title: Two SumGiven an array of integers nums and an integer goal, return indices of the 2 numbers such that they add as much as goal. You could assume that every enter would have precisely one answer, and chances are you'll not use the identical ingredient twice. You'll be able to return the reply in any order.
Instance 1:
Enter: nums = [2,7,11,15], goal = 9
Output: [0,1]
Clarification: As a result of nums[0] + nums[1] == 9, we return [0, 1].
Instance 2:
Enter: nums = [3,2,4], goal = 6
Output: [1,2]
Instance 3:
Enter: nums = [3,3], goal = 6
Output: [0,1]
Constraints:
2 <= nums.size <= 104
-109 <= nums[i] <= 109
-109 <= goal <= 109
Just one legitimate reply exists.
Observe-up: Are you able to provide you with an algorithm that's lower than O(n2) time complexity?
"""

We are able to now outline each of our brokers. One agent acts because the “assistant” agent that implies the answer and the opposite serves as a proxy to us, the consumer and can also be chargeable for executing the instructed Python code.

# create an AssistantAgent named "assistant"SYSTEM_MESSAGE = """You're a useful AI assistant.
Remedy duties utilizing your coding and language abilities.
Within the following instances, recommend python code (in a python coding block) or shell script (in a sh coding block) for the consumer to execute.
1. When you'll want to acquire information, use the code to output the data you want, for instance, browse or search the net, obtain/learn a file, print the content material of a webpage or a file, get the present date/time, examine the working system. After ample information is printed and the duty is able to be solved based mostly in your language ability, you possibly can remedy the duty by your self.
2. When you'll want to carry out some activity with code, use the code to carry out the duty and output the outcome. End the duty neatly.
Remedy the duty step-by-step if you'll want to. If a plan just isn't supplied, clarify your plan first. Be clear which step makes use of code, and which step makes use of your language ability.
When utilizing code, you have to point out the script kind within the code block. The consumer can't present some other suggestions or carry out some other motion past executing the code you recommend. The consumer cannot modify your code. So don't recommend incomplete code which requires customers to change. Do not use a code block if it isn't supposed to be executed by the consumer.
In order for you the consumer to save lots of the code in a file earlier than executing it, put # filename: <filename> contained in the code block as the primary line. Do not embrace a number of code blocks in a single response. Don't ask customers to repeat and paste the outcome. As an alternative, use 'print' operate for the output when related. Test the execution outcome returned by the consumer.
If the outcome signifies there's an error, repair the error and output the code once more. Counsel the complete code as a substitute of partial code or code adjustments. If the error cannot be fastened or if the duty just isn't solved even after the code is executed efficiently, analyze the issue, revisit your assumption, acquire additional information you want, and consider a unique method to strive.
If you discover a solution, confirm the reply fastidiously. Embody verifiable proof in your response if attainable.
Extra necessities:
1. Inside the code, add performance to measure the overall run-time of the algorithm in python operate utilizing "time" library.
2. Solely when the consumer proxy agent confirms that the Python script ran efficiently and the overall run-time (printed on stdout console) is lower than 50 ms, solely then return a concluding message with the phrase "TERMINATE". In any other case, repeat the above course of with a extra optimum answer if it exists.
"""
assistant = autogen.AssistantAgent(
title="assistant",
llm_config=llm_config,
system_message=SYSTEM_MESSAGE
)
# create a UserProxyAgent occasion named "user_proxy"
user_proxy = autogen.UserProxyAgent(
title="user_proxy",
human_input_mode="NEVER",
max_consecutive_auto_reply=4,
is_termination_msg=lambda x: x.get("content material", "").rstrip().endswith("TERMINATE"),
code_execution_config={
"work_dir": "coding",
"use_docker": False,
},
)

I set the human_input_mode to “NEVER” as a result of I’m not planning to present any inputs myself and max_consecutive_auto_reply to 4 to restrict the back-and-forth turns within the dialog. The Assistant agent has been instructed to reply with the phrase “TERMINATE” that tells the UserProxyAgent when to conclude the dialog.

Now, the enjoyable time! We’ll provoke the dialog by sending a message from our UserProxyAgent to our Assistant.

One added good thing about utilizing AutoGen (even for non-agentic workflows) is that it gives express caching functionality that can assist you save API prices throughout improvement. Right here, I’m caching responses on the disk however it’s also possible to combine redis for this objective.

# Use DiskCache as cache
with Cache.disk(cache_seed=7) as cache:
# the assistant receives a message from the user_proxy, which incorporates the duty description
chat_res = user_proxy.initiate_chat(
assistant,
message="""Remedy the next leetcode downside and in addition touch upon it is time and area complexity:nn""" + LEETCODE_QUESTION
)

Output:

user_proxy (to assistant):Remedy the next leetcode downside and in addition touch upon it is time and area complexity:
Title: Two Sum
...
Observe-up: Are you able to provide you with an algorithm that's lower than O(n2) time complexity?
--------------------------------------------------------------------------------
assistant (to user_proxy):
To resolve this downside effectively, we will use a dictionary to retailer the weather we have now seen to date together with their indices. Whereas iterating by way of the array, we will examine if the complement of the present ingredient (goal - present ingredient) is already within the dictionary. Whether it is, we have now discovered the pair that sums as much as the goal.
This is the Python code to resolve the Two Sum downside with a time complexity lower than O(n^2):
```python
# filename: two_sum.py
def two_sum(nums, goal):
num_dict = {}
for i, num in enumerate(nums):
complement = goal - num
if complement in num_dict:
return [num_dict[complement], i]
num_dict[num] = i
# Take a look at the operate with the given examples
nums1 = [2, 7, 11, 15]
target1 = 9
print(two_sum(nums1, target1))  # Output: [0, 1]
nums2 = [3, 2, 4]
target2 = 6
print(two_sum(nums2, target2))  # Output: [1, 2]
nums3 = [3, 3]
target3 = 6
print(two_sum(nums3, target3))  # Output: [0, 1]
```
The time complexity of this answer is O(n) as a result of we iterate by way of the listing of numbers solely as soon as. The area complexity can also be O(n) as a result of we retailer parts in a dictionary.
You'll be able to run this code and examine if it gives the proper output. Let me know the overall run-time, and if it is lower than 50 ms, you possibly can reply with "TERMINATE". If it isn't, I can recommend a extra optimum answer.
--------------------------------------------------------------------------------
>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
user_proxy (to assistant):
exitcode: 0 (execution succeeded)
Code output: 
[0, 1]
[1, 2]
[0, 1]
--------------------------------------------------------------------------------
assistant (to user_proxy):
The Python script ran efficiently and supplied the proper output for the given examples. Because the complete run-time just isn't supplied, I'll recommend measuring the overall run-time of the algorithm utilizing the "time" library to find out if it is lower than 50 ms. 
You'll be able to add the performance to measure the overall run-time by importing the "time" library and utilizing it to calculate the elapsed time earlier than and after calling the `two_sum` operate. This is how one can modify the code:
```python
# filename: two_sum.py
import time
def two_sum(nums, goal):
num_dict = {}
for i, num in enumerate(nums):
complement = goal - num
if complement in num_dict:
return [num_dict[complement], i]
num_dict[num] = i
# Take a look at the operate with the given examples and measure the overall run-time
nums1 = [2, 7, 11, 15]
target1 = 9
start_time = time.time()
print(two_sum(nums1, target1))  # Output: [0, 1]
end_time = time.time()
print("Complete run-time:", (end_time - start_time) * 1000, "ms")
nums2 = [3, 2, 4]
target2 = 6
start_time = time.time()
print(two_sum(nums2, target2))  # Output: [1, 2]
end_time = time.time()
print("Complete run-time:", (end_time - start_time) * 1000, "ms")
nums3 = [3, 3]
target3 = 6
start_time = time.time()
print(two_sum(nums3, target3))  # Output: [0, 1]
end_time = time.time()
print("Complete run-time:", (end_time - start_time) * 1000, "ms")
```
Please run this modified code and supply the overall run-time for every instance. If the overall run-time is lower than 50 ms for all examples, you possibly can reply with "TERMINATE".
--------------------------------------------------------------------------------
>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
user_proxy (to assistant):
exitcode: 0 (execution succeeded)
Code output: 
[0, 1]
Complete run-time: 0.01239776611328125 ms
[1, 2]
Complete run-time: 0.00286102294921875 ms
[0, 1]
Complete run-time: 0.0016689300537109375 ms
--------------------------------------------------------------------------------
assistant (to user_proxy):
The Python script ran efficiently, and the overall run-time for every instance is considerably lower than 50 ms. Due to this fact, you possibly can reply with "TERMINATE". The algorithm supplied an environment friendly answer with a time complexity of O(n) and an area complexity of O(n).
--------------------------------------------------------------------------------

That is what’s occurring right here:

The UserProxyAgent asks the Assistant to resolve the issue based mostly on the duty description.
The Assistant suggests an answer with a Python block
The UserProxyAgent executes the python code.
The Assistant reads the console output and responds again with a modified answer (with time measurement performance. Actually, I might’ve anticipated this modified answer straight away however this conduct may be tuned by way of immediate engineering or by using a stronger LLM).

With AutoGen, it’s also possible to show the price of the agentic workflow.

chat_res.value


({'total_cost': 0,
'gpt-3.5-turbo-0125': {'value': 0,
'prompt_tokens': 14578,
'completion_tokens': 3460,
'total_tokens': 18038}}

Concluding Remarks:

Thus, by utilizing AutoGen’s conversable brokers:

We mechanically verified that the Python code instructed by the LLM truly works.
And created a framework by which the LLM can additional reply to syntax or logical errors by studying the output within the console.

Thanks for studying! Please comply with me and subscribe to be the primary once I put up a brand new article! 🙂

Take a look at my different articles:

[ad_2]