In order to come to a firm persistent memory management paradigm we have to review available techniques and add possible new techniques to determine which combination of techniques will lead to the best end result. In the following a number of techniques to and aspects of memory management are reviewed.
Dangling pointers
Naturally data that is not used anymore has to be freed at some time. The key word here is used. Once data has been freed while something else is pointing to it you have a dangling pointer. In explicit memory management you run this risk as the developer has the power to remove something regardless of whether something else is using it. With garbage collection this risk is taken away because the developer cannot remove data that is still being used.
However, in Charta data can be referenced that resides in an other data store, possibly being transient or located at the other side of the world. It will be impossible for a garbage collector to find out which other stores are still using the data. Also, it would not be desirable to keep data alive just because some outdated store might still be using it.
Questions that need to be answered here:
- Do we allow dangling pointers and guard against the resulting access violations?
- Do we need a separate mechanism to memory management within a data store and between data stores?
- Can we use cascading free operations to prevent dangling pointers?
- If we quard against access violations can data identifiers be reused?
Garbage collection
Garbage collection is a technique that is used to find data that cannot be reached from a certain root set of data and then free that data. Garbage collection can be used as the only memory management technique as to relief the developer from freeing data explicitly. However, doing so would lead to a lesser understanding of memory usage by the application. This in turn can lead to unwanted memory retention which means that data will be kept alive because somewhere there is still a pointer to that data.
However, in the case of a persistent data store one could possibly run the risk of creating data that is not freed explicitly but is in fact garbage. This can be the result of a programming error for instance. In this case garbage collection could be the one technique that can be used to clean up data not explicitly freed.
Questions that need to be answered here:
- Do we need a garbage collector anyway regardless of whether we use an explicit memory management technique?
- Do we need only one garbage collection strategy or should we combine it for instance with reference counting?
Reference counting
Reference counting is a powerful technique which keeps a count of all the references to some data. It can be used as a garbage collection algorithm for data that is not circular. Whenever a reference count drops to zero the data is stale and could be freed.
Also, the reference count provides some extra feel on how heavily data is used. It is however less powerful than reference indexing as reference indexing keeps an overview of all the references to data instead of compressing that overview to a single number.
Questions that need to be answered here:
- If we use reference indexing do we still need reference counting?
- If we use an other garbage collection algorithm do we still need reference counting?
- Do we need to keep a separate count for strong references and weak references?
Reference indexing
Reference indexing is a term introduced here which refers to keeping track of references between data in a way that it is possible to quickly query for all the data that is pointing to certain data. This reference overview can be very valuable at various locations in the application. If for instance, one wants to know why certain data is not garbage collected one can quickly identify the locations still referencing the data. It also might be used as a means to avoid the need for creating cyclic data references.
Questions that need to be answered here:
- Do we need reference indexing to avoid cyclic data references?
- How can reference indexing be implemented while still allowing efficient data updates?
- How do we represent references to elements in a list in such a way that modifying the list will keep the index intact?
Ownership
Ownership is the fact that some data owns an other piece of data. This means that if the owning data is freed then the owned data will be freed as well. In explicit memory management this technique is often applied by hand to correctly free data.
The ownership relation might be a very valuable tool to provide the developer with the control to specify when to free memory. However, it is the question whether this is sufficient enough for all types of data and whether we can formalize the relation to take the hand coding out of the practice and allow the compiler to analyze ownership relations.
Questions that need to be answered here:
- Is ownership an adequate memory management tool for all or parts of the data?
- Is ownership specified at the class or the object level?
- Can we formalize ownership to allow compiler analysis?
- Can data only be owned by one owner or can there be many owners?
- Are balloon types of interest regarding ownership?
- Can other data point to data being owned by other data and in which circumstances?
Region inference
Region inference is a newly studied technique which promises to hold the right middle between automatic garbage collection and explicit memory management. The idea is to extend the compiler to analyze the places where data is created and where it can escape the location of creation and to what extent. In effect the compiler infers a region for each piece of data. If the region is freed then all data in that region will be freed as well. Being a compile time analysis this technique provides the developer with the extra insight on where data will be freed and allows to solve problems. Also, the time and cost of freeing data is deterministic as opposed to many garbage collection techniques.
Questions that need to be answered here:
- Can region inference be used with persistent data?
- Can region inference serve as the single memory management technique or should it be combined with for instance ownership?
Strong versus weak references
When data refers to other data we might want to make the distinction between strong references and weak references. Strong references will establish something like an ownership relation whereas weak references are a means to just reference data. Memory management techniques could use this distinction to provide a different strategy for either type of reference.
Questions that need to be answered here:
- Is it possible to discriminate all references accurately based on the criterion of strength?
- What should we do with data when there are no strong references but only weak references?
Versioning
Versioning is an interesting dimension to memory management. With versioning data is not actually freed but only labeled as having been freed. This way a history of data is recorded. In a persistent data store a versioning system would be very proficient. We have to investigate how we can use versioning in combination with other memory management techniques.
Questions that need to be answered here:
- How does versioning help in managing memory?
- Can versioning solve the dangling pointer problem?
- How does versioning complicate the management of memory?
- How do we efficiently version the contents of an integer indexed list?
