The recent Facebook and the Cambridge Analytica leak is teaching us that giving away data is a high price to pay for the services provided. What if we could get the same valuable services we rely on from social media or other community platforms, but without giving the data away?
The Cambridge Analytica incident is one of the clearest demonstration to date that technology meant to work on an individual level is routinely repurposed and exploited after it becomes available in a central repository. Data collected under the guise of a psychological test became one of the sources for a vote manipulating strategy. What we were told were privacy settings, were in fact publicity settings with no regards for actual privacy of data.
Centralization has its place
Sometimes, centralization is very effective, for example when collaborating on cleaning up and editing a common database which aims to hold some form of truth. OpenStreetMap or Wikipedia are good examples of this.
Community activity does not need centralization
Very often, however, data is only collected in a central place to enable the activities of a community or a social network, collecting individual activities, links, data, etc. As the size of such a database grows, a new purpose always emerges: mining by the collecting party and turning the locked-in community of users into a very profitable product.
Users of social networks or B2B giant databases have a choice: stop using the service and lose access to both their community and very valuable functions, or accept the Faustian bargain, submit to the monopoly and upload their data into a black box that will extract as much value as it can with little regards to privacy: Facebook, Waze, Google, Amazon, Apple, Booking.com, Hotels.com, AirBnB, Uber…
Sharing #deletefacebook does not free anyone, it does however signal to Twitter that user will react positively to ads which equate SUVs to freedom.
Unless the data itself is under the control of its creator, and remains there, it is impossible to claim each user can have full control of confidentiality. What is needed is for community members to store and process their data locally for their purpose. Furthermore they should drive value themselves by providing services such as responding to external requests for a fee, including from advertisers.
Data nodes vs. data centers
Storing and processing data locally implies running one’s own node or hiring one: a private datacenter. A technology, dockers, makes it trivial to scale a service from a tiny 20$ Raspberry Pi to the massive datacenters AWS or competitors provide. Lack of hardware dependency makes the actual service both cheap and easy to migrate, therefore giving the owner total and ongoing control on where and how data is stored and processed.
Leak proofing data Locally stored data can still provides the same level of service, in particular to its creator, without exposing a single point of failure to attacks, hack and breaches. When data and its processing is distributed, the risk of leakage, and therefore repurposing of the data, is contained. The result of a narrow query is unlikely to be reused: who else but a direct competitor would want for example to target women living in the Memphis interested in waxing ? Inversely, a suspiciously broad, raw data leaking query can be easily detected and prevented by the community before it occurs.
Qualifying sources With data solidly decentralized, processing and requests have to be distributed to where the data is. The first step in processing data for a purpose is identifying the appropriate data sources. Some of the metadata used to qualify are simple (sex, state…) but more complex requests (person with 5 friends in the same city) can require extensive processing. Providing such processing for free would not be sustainable for the nodes.
A distributed business model Distributing queries to the data incurs processing costs that need to be accounted for and passed onto the responsible party. Obviously, the value created by the data for its client also needs to be distributed fairly. Clients of the distributed data mining need to make a large number of small payments to individual nodes instead of a single large one to a giant database.
Enters blockchain Decentralizing storage and processing has existed for years, but such a business model was not possible as the banking system is not designed to handle what needs to be a massive number of transactions. Blockchain opens up the possibility of such decentralized trading between untrusted parties. Moreover, blockchain technology ensures the immutability of data, regardless of where it is stored. Blockchain committed digital data fingerprints ensure tampering will be detected even on distributed untrusted storage.
Local data labs vs. global data factory With these critical missing pieces, the time has come to introduce a new class of data collection and sharing community: a decentralized, blockchain based community of data labs. Data remains local and instead queries are distributed from the client to data holding nodes: data labs, for a fee. Precise queries are unlocked by discrete payments, making access control a lot more granular and secure than granting authenticated API access.
The pros of centralizing data are that if data is in one location, access and use are simplified. On the other hand, data decentralization assumes the approach that data stays where it gets created, which is a lot more complex as it also distributes processing. Crypto trading enables the fair distribution of revenues generated by data, instead of a single monopoly absorbing all the profit from the value chain on top of exposing data to a wide range of security and privacy issues.
Using a blockchain approach, queries are exposed to the data and immutably stored on the blockchain with the transactions, providing traceability and auditability. Advertisers and companies wanting to explore and process the distributed data are only able to advance in the open, exposing their research intentions and algorithms. This open approach is the only way to rebuild the confidence that Facebook and others have broken.
With a restored trust in confidentiality and a financial incentive to make useful data available to external queries and processing, the market will grow, in turn enabling better services to those who participate, further incentivizing the sharing of intelligence instead of raw data.
The greater accountability, higher resilience and more equitable rewarding of data far outweighs the apparent inefficiencies of the local data labs model.
Opportunities to change the approach from data harvesting to trading intelligence map to any situation in which participants in a community would be willing to exchange insights but resist the emergence of a monopolistic data harvester. Bluenote aims to foster the exchange of actionnable intelligence related to commercial buildings’ energy efficiency and its financial impact between stakeholders in commercial real estate: building owners and operators, banks, utility companies, etc.
Liberating energy insights to save the planet
Until now, for lack of economic incentive and for fear of confidentiality breach, data has remained trapped inside buildings, and further inside industry specific silos. The essential learnings from this data, good and bad, from trailblazers, are not being shared with potential followers, resulting in a very slow upgrade rate of old buildings to new energy standards (~1%/yr). Average buildings consume 4x the energy of the most efficient ones, producing massive (35% of total) yet avoidable green house gas emissions.
Bluenote makes it possible for building owners to establish their own data labs, locally measuring and establishing the impact of energy upgrades to real estate asset value, while profiting from trading with peers. Bluenote enables the trading of actionnable intelligence without itself collecting data or risking the exposure of confidential information.