Introducing Project Deliverance, I try to demonstrate here how public and private data sources that are commonly available can be used to create artificial intelligence for classification of awqaf assets in terms of their potential for development and address the twin challenges of preservation and development.
The term “deliverance” implies the act of rescue or recovery. And Project Deliverance, a recent initiative of IBF Net has invited volunteers to contribute to the process of rescue and recovery of hundreds and thousands of waqf assets – land and buildings – from the encroachers who are varied and whose list includes government agencies, private corporates, individuals and what-have-you.
India has the second largest Muslim population in the world at around 195 million, comprising 10.3 percent of world Muslim population, next only to Indonesia. A 2006 survey of waqf assets places the number of registered Waqf properties at about 650,000. Using book value as a basis of estimation, these assets had a combined value of about USD24 billion. Current value of these properties have obviously grown many folds since. On this grossly understated valuation, annual rate of return is estimated at a meagre 2.7 per cent. If developed, these assets will undoubtedly provide a much higher return. There are also solid reasons to believe that this undervalued portfolio may further suffer diminution (encroachment further strengthens this trend) over time.
I have dealt with the issue of preservation-versus-development -tradeoff in case of waqf in an earlier publication. The bottom line of this study is that development is the best way to ensure preservation. In several blogs I have sought to make a case in favor of developing waqf assets in India using additional waqf capital or private capital, if need be, citing some excellent examples in Malaysia and Singapore. I have discussed how the Awqaf Properties Investment Fund managed by the Islamic Development Bank has developed waqf properties across the globe. Barring a few windows of optimism (such as, when the NAWADCO was formed with a similar mandate), we are yet to witness any serious effort at developing India’s huge waqf assets portfolio.
I hope Project Deliverance will deliver. It rightly believes that development of waqf assets begins with their identification and preservation against unlawful encroachment. In the absence of systematic identification of waqf assets, any allocation of resources for development will largely be random and sub-optimal. Therefore, in the first phase, Project Deliverance has embarked on validation and enhancement of the WAMSI database – the output of the maiden national survey of waqf assets – by enlarging the scope of data points, undertaking a community-based process of verification and validation of data and making them amenable to classification, (dis)aggregation, and analysis at a micro or group or macro level. It is a long long journey requiring lots of patient hard work and dedication. However, with validation of every single additional waqf property by a volunteer-member of the community (that is then placed on blockchain), the Indian Muslim community will be closer to the goals of preservation and development.
It is a long long journey requiring lots of patient hard work and dedication. However, with validation of every single additional waqf property by a volunteer-member of the community (that is then placed on blockchain), the Indian Muslim community will be closer to the goals of preservation and development.
It is therefore, a good idea to think in terms of optimizing resource use, beginning with identification of waqf properties with maximum development potential and then riding down the ladder. And I am suggesting the development of an AI-based model that will classify properties in order of their potential for development.
Application of AI requires training data. IBF Net is using its base in Odisha, India to undertake a survey of property values in the state. The training data set will have the “neighborhood prices of similar properties” as the endogenous variable. The exogenous variables for plots of land will use data on the following twenty-one variables – spatial variables: (1) latitude (2) longitude (3) elevation; demographic variables: (4) population count (5) population density; remote sensing variables: (6) land cover/ vegetation (7) night-time lights; (8) distance to roads (9) distance to waterways (10) urban/rural; climate variables: (11) precipitation (12) temperature and more. In models of valuation for real estate, it has generally been found by researchers that the inclusion of geographical information enhances the accuracy of predictions for the econometric model. Several other variables for which data can be easily found may be used in the training set – (13) proximity of a property to urban elements, such as infrastructures, (14) facilities, (15) services or natural elements with environmental value. These are supposed to lead to a possible value increase (e.g. proximity to park, health care facility, railway station, university campus, etc.) or, in some cases, to a probable depreciation (e.g. proximity to road junctions, environmental pollution sources or noise generators). The measure of these real estate advantages (or disadvantages) generated by the presence in the proximity of a natural or anthropic element is measured by geographical distance. However, when market assigns an advantage to the property in relation to the possibility of direct use of an urban element, (16) walking (travel) time defines the proximity to this element. The walking (travel) time is obviously closely related to roads, and to the presence of anthropic or natural barriers (e.g. a railway, a cliff, a river). Several other variables may also be considered, such as, (17) density of financial service providers (18) distance from the centre of the city (19) road density (20) building density (number of dwellings per square kilometer) and (21) state of the economy.
For valuation of buildings, the above geographic and demographic variables need to be supplemented by asset-specific variables, such as, (22) floor area, (23) age of the property, (24) number of floors (25) number of bath rooms, (26) the presence of air-conditioning and (27) type of surface. The purpose of the latter variable is to examine the impact of environmental degradation on the valuation of the building. A number of additional variables may be expressed on ordinal scales containing information about the (28) surroundings (29) standard equipment – lifts etc. and (30) the availability of transportation infrastructure. However, when variables are expressed on an ordinal scale with no breakdown by individual levels, it is assumed that for each pair of adjacent levels difference in impact these levels have on the dependent variable is the same.
The type of AI or machine learning (ML) model that I suggest for use is supervised learning, which is fast, accurate and most commonly used ML. Machine learning will now take the data on above 30-odd variables as input. While most of this “training” data described above – demographic, spatial and remote sensing- can be easily read from the web, some data will be generated by surveys. The training data will include both inputs (exogenous variables) and labels (targets or endogenous variables). So, we first train the model with the lots of training data (inputs & targets). Then with new data and the logic we get, we predict the output. This is our prediction based on training data and algorithm. As we know, during training we feed data to the algorithm and allow the algorithm to adjust itself and improve. With supervised learning we may go for regression and classification. Basically, classification separates the data. Regression fits the data.
With regression, we can deal with the valuation problem where we need to predict the continuous-response value – the value of the asset (we predict a number which can vary from -infinity to +infinity). As a next step, we can rank them in order of their potential for development based on the value. We may also address the problem as a classification problem where we predict the categorical response value where the data can be separated into specific “classes” (we predict one of the values in a set of values). We may seek to get a multi-class classification based on whether the potential for development or value of the property is (1) very high or (2) high or (3) moderate or (4) low or (5) very low.
The entire process of training the algorithm with different variables and data sets is fraught with many challenges along with way, especially relating to the choice of the model and balancing different types of classification errors. So, let us assume our able data scientist in the team gives us his best shot. We then initiate action for development of a waqf property based on the class to which it belongs.
And then of course, we should be ready for an entirely new set of challenges!