A few techniques I’ve implemented over years to help achieve performance during simulation.
No System Calls
For obvious reasons, the simulator shouldn’t do syscalls during simulation.
-especially I/O.
No Polling
Whenever simulating entire systems, don’t ever, never poll.
Better use callbacks for events notifications between (simulated) system components.
Hash Tables
As much as possible, achieve O(1) decoding of memory addresses when translating from guest to host memory.
Don’t walk arrays of descriptors of memory banks, peripherals, etc…
Better implement hash-tables.
When in doubt, implement both, profile, and decide fact-based.
Pre-Decode
Pre-decode once for all.
For instance:
- At binary download, pre-decode & pre-disassemble the code,
- When setting up the Memory Protection Unit, pre-decode memory regions,
- Etc..
Needless to say, pre-decode may re-occur *during* the simulation (if the setup of the MPU changes, for instance).
But assuming this is rare, pre-decoding is always rewarding.
Setup up some sort of callback to (re-)initiate a pre-decode.
Synthesize Complex Conditions
Pre-compute complex and/or heavily used conditions, and memoize them.
Shall the condition change during simulation, setup up some sort of callback to (re-)initiate the pre-compute / memoization.
State Machines
Hold complex systems in a single state identifier.
Callback methods when transitioning between states.
Code Size
Depending on the guest architecture, factorizing instruction behaviors helps restraining the code size of the simulator, thus avoiding trashing the host CPU cache.
Inlining
For key resources (esp. registers & memory), consider inlining code.
Whenever possible, do several implementation, benchmark them all, and decide fact-based.
Take advantage of gcc optimization flags to control inlining.
Profile
Profile the simulation main loop, optimize it.
Identify the most used paths in the code of the simulator, profile them, optimize them.
These will be called several billion times 😉
Know your C++
During simulation, beware of default constructors, affectation operators, casts, etc…
These can hold significant amount of code, including memory allocation.
Likewise destructors releasing memory.
Especially true for the STL.
Setup
Implement some sort of init(); or reset(); method, responsible for setting up the simulation.
Perform all the parsing, configuration, memory allocations, etc.. only once, at the beginning of the simulation.
Output
Identically, have some sort of terminate(); method responsible to output the outcome (eg. profile, dumps, ..) of the simulation.
Plugin
Modularize the extra features (eg. trace, debug, …) as plugins, keeping the core of the simulator lean and fit.
Implement these features in shared libraries, and load them dynamically on-demand.