Philippe Tarbouriech

In light of the recent Strava and Polar incidents where life-threatening information of soldiers and spies was exposed, a look into the inherent dangers of sharing raw data and how blockchain can both negate that risk and unlock opportunities.

In light of the recent Strava and Polar incidents where life-threatening information of soldiers and spies was exposed, a look into the inherent dangers of sharing raw data and how blockchain can both negate that risk and unlock opportunities.

Only you know what your data says about you
Only you know what your data says about you

Raw data collected for a particular purpose can very easily be repurposed to expose confidential or private information. Any time raw data is shared, such breaches are bound to occur.

Only when data remains local and every query is exposed to the owner can confidentiality and privacy be protected. Sharing data should focus on what the data can do, not what it is.

The Bluenote protocol reveals valuable energy efficiency insights contained in building data without exposing raw data to alternate purposes.

Your data tells a more intimate story than you think

In January, the Strava fitness app was shown to reveal the location of military bases in Afghanistan, Syria and Iraq.

Just this week, it was revealed Polar goes one step further and exposes the homes of soldiers and spies.

Location is obviously a very sensitive information, especially over time. These two examples illustrate the inherent danger of exposing or sharing raw data.

Yet the more intimate the data, the more useful it is

What makes Strata and Polar so compelling to users is there is a lot of learning potential in precise and intercorrelated data. Location + heart rate + weather conditions over time while exercising can measure progress and suggest specific training methodologies much better than a coarse overview and general run time. What is the impact of incline? How does temperature affect heart rate? Answering those type of questions in a very personalized way help users progress faster and train more effectively.

Similarly, in order to maximize the energy efficiency of buildings, high temporal resolution energy and financial data are essential. Only precise data can expose particular weaknesses (air tightness, HVAC performance enveloppe, need to both cool and heat at the same time…) that monthly bills fail to expose. Acting on these insights through improved operations or retrofit, and then measuring the impact of those changes, creates actionable evidence. Such evidence will prove to real estate owners energy efficiency is in their best economic interest. Accelerating the energy upgrade of the vast stock of older buildings can both unlock a massive financial opportunity for their owners and lower global emissions (35% are from buildings).

Explicit consent or revenue share does not protect you

Until now, the two main ways of addressing data privacy/confidentiality issues have been one or two of:

  1. Explicit consent. Opt-in has become the norm for individuals with European GDPR in action. Associated with explicit collection consent generally comes the possibility to access data and the possibility to have such data deleted.
  2. Revenue share. A number of marketplaces including quite a few blockchain based solutions aim to address the issue as if it was solely one of unfair profit, with the assumption cash will offset privacy and confidentiality concerns.

Neither of these address the core issue: the real possibility that data exposes more than what was intended. Their only real purpose is actually to shift responsibility to the user and protect the collecting agent.

Data stored and processed centrally on the cloud
Data stored and processed centrally on the cloud

Decentralization is a step in the right direction

One fair progress that most blockchain based solutions have brought is the possibility of removing the need for a collecting agent. Without a centralized organization, the risk for a monopoly or rent seeker to arise is marginal.

As we have seen with the Polar and Strata examples, the issue is not who collects the data but what gets exposed. Once raw data is shared, it can be combined with external sources to reconstruct it further. Anonymizing strategies generally fail to prevent much unless they significantly degrade the data.

The core problem is how to enable cross learning without exposing confidential raw data.

Keeping data local does protect your confidentiality

Bluenote’s innovation is a decentralized approach that stores and conceals raw data locally while enabling the distributed execution of vetted queries over that data.

Instead of a centralizing database collecting and processing data in any way possible, Bluenote establishes a protocol through which queries can be distributed to select raw data collecting nodes. Each building becomes its own data repository and sets its precise confidentiality conditions.

Each query is exposed, along with the number of other data sources selected for a specific query. The amount of confidentiality contained in each result can be traced to the query and set of data sources. For example, if a query returns a model of temperature sensitivity based on data from 5 buildings, each one only contributes 20% of output and accordingly limits the possibilities of repurposing output to gain insights into that building.

Data stored and processed locally
Data stored and processed locally

While rewarding the value your data creates

As we have seen, processing and querying data locally makes it possible to provide useful results without opening the door to ulterior usages. Furthermore, it creates an opportunity for data gathering agents to reap the rewards such value creates, further incentivizing them to collect more and higher quality data.

The Bluenote protocol creates the foundation for an ecosystem of distributed data collection and processing, further enabled by the open nature of software module creation. Because it is a genuinely decentralized and open source protocol, any third party can offer distributable software solutions that can be sold to protocol users and appropriately distributed to the data gathering nodes.

You can have your privacy and crowd learning too: an example

Utility companies struggle to predict their clients’ energy demand. Mispredictions are costly as they involve expensive reserve power sources or worse, blackouts. Utility companies can model their clients as black boxes, projecting impact of various forecasts (weather, calendar days…). On the other side, their clients have access to the same sources but also know a lot more precisely the actual occupancy (vacation schedules, calendars), machine operations (from building management system set temperatures to maintenance and scheduled tasks) and their buildings’ response to various conditions (direct sunlight, wind, …). Algorithms or deep learning solutions can only be as good as the data they have access to, and clients have an unfair advantage at predicting their energy consumption.

Bluenote enables commercial buildings to provide high quality forecast of electric load to their utility company without exposing confidential data (actual occupancy and therefore headcount for example). The pricing of the prediction can be set according to how accurate it turned out to be. Every day, previous day’s prediction is compared to actual load curve to calculate a discount. This pricing scheme is entirely automated with a smart contract.

The client is incentivized to improve the quality of its prediction while the utility only pays for quality predictions, all while never exposing confidential raw data.

Bluenote protocol removes the risk of leaking confidential information while establishing a fair economic incentive to those who collect data and share their knowledge.

Bluenote Blog

Stake your claim!