Aarhus University Seal / Aarhus Universitets segl

Scale-IoT

Deatomizing
the web

New project tackles the bottleneck of superfast cloud computing

Access is becoming a growing bottleneck for a speedy internet. A new project aims to help reduce the amount of storage space needed for every single file on the web.

1,100,000,000. That’s around the number of new Internet of Things (IoT) devices connected to the internet in 2018.

These are not mobile phones, computers or tablets. The IoT has devices such as smart TVs, wearables, security systems and other items with sensors that are connected to the internet.

The figure corresponds to more than three million new devices connected to the internet every single day throughout 2018.

The IoT is growing so fast that the internet is rapidly approaching a bottleneck. The huge amount of data generated lowers reading speed throughout the internet.

“If you look at the way things are going, there’s a massive amount of data that needs to be stored, and one of the challenges is how to gain access to it. You can have a huge amount of storage space in a single disk, but your access speed remains the same. That means that, if you’re running a datacentre, you run into a bottleneck. Already now, some datacentres are aiming for smaller hard drives, simply because the access speed is a bottleneck,” says Associate Professor Daniel Lucani Rötter from the Department of Engineering at Aarhus University.

Therefore, he has just kicked off a project aiming to reduce demand for storage space.

“This project is about limiting the amount of data needed for storage. Instead of simple compression, it’s more about how to manage the data. How we can exploit the characteristics of different types of data to be able to compress it dramatically,” he says.

It’s all about similarity; different data that share similarities.

Take a JPEG image for example. As soon as the picture is taken, it will be compressed. Every pixel is not usually saved because there is a lot of redundancy, so the picture is divided into parts to save and redundant parts. Daniel Lucani Rötter is aiming to use the same technique in his project.

But rather than only compressing pictures, he wants to embrace all data.

“Normally, when people think of data compression, they might think about Winzip. What happens there is that you compress a bunch of files, but if you want to read them, you need to decompress all of them. The idea behind our concept is that we basically want to be able to compress everything and still be able to read every single file without having to decompress other files every time you want to access it,” he says, and continues:

“In theory we take a file and split it into many different small chunks. The critical thing is that you fragment the data into smaller chunks and try to identify similarities between the chunks in the system.”

And since you can compress across all the data you have, there’s a good opportunity to exploit similarities, for instance when we’re talking about cloud storage and datacentres, and he goes on:

“In order to save a picture, we don’t need the entire file. We just need a sort of index of how the picture is built up. Like the instructions for a Lego kit. A detailed list of how to put the picture together with bits from other pictures.”

Instead of searching for exact matches in the small chunks of split up data, Associated Professor Rötter looks for something that’s “close enough”. There may be a small error or something that’s different, but then the error is stored, and the rest is indexed.

What you end up with is a number of ID blocks with associated errors. That way, you can recover your original data without errors.

“The project is not only limited to IoT and cloud data. With modifications, it can also be used for normal local data storage. What if suddenly you could get a lot more space out of your 256 GB hard disk drive, simply because data didn’t take up so much space? There’s a huge potential in compressing data in this way,” says Daniel Lucani Rötter.