I built this very same thing today! The only difference is that i pushed the tool call outputs into the conversation history and resent it back to the LLM for it to summarize, or perform further tool calls, if necessary, automagically.
I used ollama to build this and ollama supports tool calling natively, by passing a `tools=[...]` in the Python SDK. The tools can be regular Python functions with docstrings that describe the tool use. The SDK handles converting the docstrings into a format the LLM can recognize, so my tool's code documentation becomes the model's source of truth. I can also include usage examples right in the docstring to guide the LLM to work closely with all my available tools. No system prompt needed!
Moreover, I wrote all my tools in a separate module, and just use `inspect.getmembers` to construct the `tools` list that i pass to Ollama. So when I need to write a new tool, I just write another function in the tools module and it Just Works™
Paired with qwen 32b running locally, i was fairly satisfied with the output.
> The only difference is that i pushed the tool call outputs into the conversation history and resent it back to the LLM for it to summarize, or perform further tool calls, if necessary, automagically.
It looks like this one does that too.
msg = [ handle_tool_call(tc) for tc in tool_calls ]
Both articles are correct, from me reading them. When you invoke a shell script directly, it gets passed to the kernel to try and execve. The kernel returns ENOEXEC when it detects it doesn't have a shebang. The shell catches the error, and then as a last resort, tries opening the file and interpreting its instructions.
I'll quote the line more explicitly from this article:
> If nobody on the list accepts it, then as a last resort the kernel will attempt to treat it as a shell script without a shebang line.
They said that the kernel is responsible for invoking the shell. I honestly think this was just a brain fart and the author meant to put shell and not kernel. With both words flying around in your head, it's an easy mistake to make.
But, the again, the article goes on to talk about how it decides to even try that last step:
> Interesting side note: The kernel decides whether or not to try to parse a file as a shell script by whether or not it contains a line break in the first few hundred bytes — specifically if it contains a line break before the first zero byte. Thus a data file that just happens to have a "\n" near the top can produce some odd-looking error messages if you try to execute it.
I decided to do a bit more testing to make sure that the newline in the script wasn't causing the kernel to do anything different. What I noticed is the output of strace is identical between the different variations of the strace invocation, with one difference, with a new line, there's an extra read call, but that's just for the shell to see what's left to run.
I guess my next step is to look at the kernel source itself. I'll probably end up doing that in a bit.
So, I've dug into the kernel code. I can't find anywhere that has a fallback mechanism. When it fails, the errors bubble up. I might not be looking in all the correct places, but I believe the shell is responsible for attempting to execute the process.
I also put together two version of the same call to a shebangless script in Python, one with `shell=True` and the other without. It's only the one that calls into the shell that successfully runs the script. The strace outputs corroborate my theory.
Without shell=True (truncated)
[pid 961626] execve("./sh.sh", ["./sh.sh"], 0x7fff7bae94a0 /* 66 vars */) = -1 ENOEXEC (Exec format error)
With shell=True (truncated)
[pid 961623] execve("/bin/sh", ["/bin/sh", "-c", "./sh.sh"], 0x7ffd75009e50 /* 66 vars */) = 0
[pid 961624] execve("./sh.sh", ["./sh.sh"], 0x5980a07c70a8 /* 66 vars */) = -1 ENOEXEC (Exec format error)
[pid 961624] execve("/bin/sh", ["/bin/sh", "./sh.sh"], 0x5980a07c70a8 /* 66 vars */) = 0
No, in my YAML example, you could see that there were no credentials directly hard-coded into the pipeline. The credentials are configured separately, and the Pipelines are free to use them to do whatever actions they want.
This is how all major players in the market recommend you set up your CI pipeline. The problem here lies in implicit trust of the pipeline configuration which is stored along with the code.
Even with secrets if the CICD machine can talk to the internet, you could just broadcast the secrets to wherever (assuming you can edit the yaml and trigger the CICD workflow).
I was thinking maybe a better approach instead of CICD SSH into prod machine is to have the prod machine just listen to changes in git.
You're right, there are other avenues of exploitation. This particular approach was interesting to me because it is easily automatable (scour the internet for exposed credentials, clone the repo and detect if Pipelines are being used, profit).
Other exploits might need more targeted steps to achieve. For example, embedding a malware into the source code might require language / framework fingerprinting.
It's pretty common in systems where the final output to be deployed is the same as the root of the source tree. More often than not, lazy developers tend to just git clone the repo and point their web server's document root to the cloned source folder. In default configurations, .git is happily served to anyone asking for it.
This seems to be automatically mitigated in systems which might have a "build" / "compilation" phase, because for the application to work in the first place, you only need the compiled output to be deployed. For instance, Apache Tomcat.
Don't use this. I once tried it and it changed the UUID of the Linux partition without any warning. Grub was unable to pick up the partition and boot, so I was stuck at grub rescue.
Yeah I get it, but everyone needs to be responsible for security as well. Look what happened with Lastpass. I can totally see someone doing something silly like exposing a device with default creds like a MySQL db on a production box, then forgetting about it and getting a new job a year later.
I do block proxies like this, but it’s hard to block every little thing.
I remember when I believed in bastions and DMZ. Many companies have given up on this due to the fact that it can only be enforced by policy and not by tech
Ngrok is just one company tho, there are thousands of ways. Wireguard or nebula can be selfhosted and another server with an actual port open will forward traffic. People can use SSH's reverse port forwarding too.
Or you can use cloudflared or another one of ngrok's competitors.
A sync process syncs the open disk files once every config.syncInterval. Sync also can be done on every request if config.alwaysFsync is True.