Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The issue I was thinking of is if you needed multiple "custom instructions" within a loop. Another related issue, if multiple tiles are considered, is the interconnect between the processor core and the tiles. Interconnect is expensive, so there's a trade off to be made. Profiling data showing a curve of unique accelerator functions for the targeted problem domains and their temporal relationship (e.g. there may be ten accelerator functions but they are executed several milliseconds apart or functions A and B tend to appear in an inner loop together) would help make the trade off.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: