Hacker Newsnew | past | comments | ask | show | jobs | submit | ajhai's commentslogin

https://x.com/ajhai/status/1899528923303809217 something I have been working on for a few months now.


It is inference latency most of the time. These VLA models take in an image + state + text and spit out a set of joint angle deltas.

Depending on the model being used, we may get just one set of joint angle deltas or a series of them. In order to be able to complete a task, it will need to capture images from the cameras, current joint angles and send them to the model along with the task text to get the joint angle changes we will need to apply. Once the joint angles are updated, we will need to check if the task is complete (this can come from the model too). We run this loop till the task is complete.

Combine this with the motion planning that has to happen to make sure the joint angles we are getting do not result in colliding with the surroundings and are safe, results in overall slowness.


Building a wheeled robot with arms to help automate household chores - https://x.com/ajhai/status/1891933005729747096

I have been working with LLMs and VLMs to automate browser based workflows among other things for the last couple of years. Given how good the vision models have gotten lately, the perception problem is solved to level where it opens up a lot of possibilities. Manipulation is not generally solved yet but there is a lot of activity in the field and there are promising approaches to solve (OpenVLA, π0). Given these, I'm trying to build an affordable robot that can help around with household chores using language and vision models. Idea is to ship capable enough hardware that can do a few things really well with the currently available models and keep upgrading the AI stack as manipulation models get better over time.


Amazing, lol


You can already run these models locally with Ollama (ollama run llama3.1:latest) along with at places like huggingface, groq etc.

If you want a playground to test this model locally or want to quickly build some applications with it, you can try LLMStack (https://github.com/trypromptly/LLMStack). I wrote last week about how to configure and use Ollama with LLMStack at https://docs.trypromptly.com/guides/using-llama3-with-ollama.

Disclaimer: I'm the maintainer of LLMStack


You are a maintainer of a software that depends on ollama, so you should know that ollama depends on llama.cpp. And as of now, llama.cpp doesn't support the new ROPE: https://github.com/ggerganov/llama.cpp/issues/8650, and all ollama can do is wait for llama.cpp: https://github.com/ollama/ollama/issues/5881


I've tested Q4 on M1 and it works though the quality may not likely be the same as you'd expect as others have pointed out on the issue.


You can actually do this with LLMStack (https://github.com/trypromptly/LLMStack) quite easily in a no-code way. Put together a guide to use LLMStack with Ollama last week - https://docs.trypromptly.com/guides/using-llama3-with-ollama for using local models. It lets you load all your files as a datasource and then build a RAG app over it.

For now it still uses openai for embeddings generation by default and we are updating that in the next couple of releases to be able to use a local model for embedding generation before writing to a vector db.

Disclosure: I'm the maintainer of LLMStack project


If anyone is looking to try it out quick without local installation, we added Llama-8B model to Promptly playground. Please check it out at https://trypromptly.com/playground.


If you are looking to play with the model without installing it locally, we've added it our playground at https://trypromptly.com/playground.


Page not found


Sorry missed this. It was hidden behind login before. It should now be reachable.


Put together a guide on how to do this with your own avatar and posted at https://news.ycombinator.com/item?id=39053304


We can get a lot done with vector db + RAG before having to finetune or custom models. There are a lot of techniques to improve RAG performance. Captured a few of them a while back at https://llmstack.ai/blog/retrieval-augmented-generation.


We have recently added support to query data from SingleStore to our agent framework, LLMStack (https://github.com/trypromptly/LLMStack). Out of the box performance performance when prompting with just the table schemas is pretty good with GPT-4.

The more domain specific knowledge needed for queries, the harder it has gotten in general. We've had good success `teaching` the model different concepts in relation to the dataset and giving it example questions and queries greatly improved performance.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: