DynaCollect

PI: Daniel Shih (National Taiwan University), CoPI: Kwei-Jay Lin (University of California, Irvine)

Project Objective: 

    A major problem for sensor node VMs has been performance. Most interpreters are between one or two orders of magnitude slower than native code, leading to both lower maximum through-put and an increase in energy consumption. Previous work on ahead-of-time translation to native code has improved performance, but still a significant overhead remains, and the tradeoff is that translating to native code takes up more code space, limiting the size of program that can be loaded onto a device. If code size is the main criteria, the interpreter cannot be beaten, but for many applications the low performance and consequently higher energy consumption may not be acceptable.

    In this project, we developed techniques to reduce this code size overhead by 52% and performance overhead by 80%, resulting in an AOT compiler that produces code that is on average only 1.7 times slower and 2.1x larger than optimized native C.

    Our optimizations do add some size to our VM, but the break-even point at which is compensated for by the reduction in the size of the stored program is well within the range of program memory typically available on a sensor node. This leads us to believe that these optimizations will be useful in most scenarios, and make using a VM a viable option for a wider range of applications.Many opportunities for future work remain. For the mark loops optimization, a good heuristic is needed to make a better decision on the number of registers to pin for the markloop optimization, and we can consider applying this optimization to other blocks that have a single point of entry and exit as well.

    A more general question is what the most suitable architecture and instruction set is for a VM on tiny devices. Hsieh et al. note that the performance problem lies in the mismatch between the VM and the native machine architecture. In fact, we believe JVM is probably not the best choice for a sensor node VM. It has some advanced features, such as threads and exceptions, which add complexity but may not be necessary on a tiny device. At the same time, there is little support for constant data, which is common in embedded code. For example, a table with sine wave values in the FFT benchmark is represented as a normal array at runtime, using up valuable memory. We may also consider extending the bytecode with instructions to express common operations more efficiently.For example an instruction to loop over an array such as the one found in Lua bytecode, would enable us to generate more efficient code and eliminate most of the type of overhead we see in the bubble sort benchmark.

   The VM not only provides a virtual run-time environment but also the safe and robust run-time environment for sensor nodes. However, the limited computation capacity on sensors node suffer the performance. Based on our approach on enhancing run-time performance, we will also extend the approach for safety and robustness on VMs for sensor nodes. (updated in Feb, 2017)