This article was published on November 26, 2020

How to build a search engine for criminal data

Hansken speeds up digital forensic analysis

How to build a search engine for criminal data

Whether it’s a WhatsApp message arranging the distribution of cocaine from São Paulo to Amsterdam or other encrypted conversations to lure the enemy into a deadly ambush; criminals have long tried to keep their digital footprints hidden.

The evidence of crime is all stored in the digital archive: emails, photos, and cloud storage data. Law enforcement agencies can use these digital clues to find out where criminals have been, and what they’re currently doing.

Data analysis platforms are becoming increasingly crucial in the fight against crime. We spoke with two forensic software experts from Hansken about how they support law enforcement agencies, like the Dutch National Police and the Dutch Fiscal Information and Investigation Service. 

Digital digging

No lone detective can efficiently search the vast pool of data stored on confiscated data carriers.

Since 2012, The Netherlands Forensic Institute (NFI) has focused on Digital Forensics as a Service (DFaaS) with the aim to provide a service that can process huge amounts of digital forensic material with accessible and secure access to analyzed data.

In 2015, the NFI launched the platform Hansken – named after the famous 17th-century elephant immortalized – as a valuable tool in digital forensic analysis.

Hansken processes chat conversations, photos, emails, audio, and more. It makes the data transparent and searchable, like a search engine. The goal is that detectives and experts can use standard search queries, and will be able to access the data 48 hours after a crime. The platform minimizes the case lead time, ensures maximal coverage, and users can easily search through it.

The ins and outs

Hansken can be divided into three levels: the back-end which holds the forensic knowledge, the centralized DFaaS platform, and the front-end which can be used in criminal investigations, research, and development.

“The core platform of Hansken and its extraction tools are coded in Java,” notes Hansken forensic software developer Christophe Creeten. Creeten works in the back-end team that’s responsible for collaboration with third parties. By enabling them to add their own digital forensic knowledge and tools, which can then be shared with even more people, the platform can be developed further.

NFI’s forensic software developers use existing and self-developed tools, from open-source software Hadoop for distributed processing to Elasticsearch for making the information searchable. “We also use Cassandra for storing large blocks of data, Kafka, for sending messages between services, and Zookeeper for naming, storing information and synchronization of services,” says Creeten.

When law enforcement agencies legally confiscate a data carrier, it’s sent to Hansken to process its data, pull it apart, and then describe where the information came from.

“Everything is stored. In Elasticsearch, we store traces as well information on how we derived those traces, so we can trace back,” Creeten tells TNW. “So if a detective types something into Hansken, it becomes a search query that is thrown over the Elasticsearch database and searches for the traces that match it and gives it back as a result.”

Whether it’s drugs, fraud, money laundering, or another form of organized crime, more and more data is encrypted. It’s an arduous task to access the data when the key is no longer available.“But it’s a fun challenge to dive deeper into various data structures,” says Carly Bakker, a forensic software developer for Hansken’s back-end libraries team.

Bakker and her colleagues work hard to aptly interpret data from confiscated carriers. “Metal is a Java library developed by the NFI to really read data at byte-level. So we often use it to read file formats and to extract bytes. Then we can parse a file and split it into small chunks where we purposefully can extract the information,” says Bakker. “So you don’t have to go through a laborious process in Java to extract all those bytes one by one from that stream which often makes the code unreadable.”

Want to work at Rijksoverheid? They’re hiring.

Smooth user experience

The user-friendliness of the platform ensures that detectives, both with and without IT-knowledge, can use the search engine to extract evidence from the available data. 

The user experience of detectives and digital experts improves automated testing and integration for continuous deployment. One adjustment was a visual timeline, says Bakker: “What we have worked on is that we can display everything in a timeframe. There’s a timeline where users can see when certain data has been changed. The detective or expert then immediately sees what happened during a certain period of time. It often comes in handy for email traffic or chats.”

The NFI developers ensure that Hansken is able to expose (deleted) emails, recognize patterns, categorize images, and map the locations of data with coordinates, but it’s up to the detectives and digital experts to interpret and assess the presented data.

High profile cases

Hansken’s platform is designed to handle privacy, transparency, and security in criminal investigations, and has now been used in more than 700 criminal cases.

In 2016, the Dutch Prosecution Office seized mail servers in Canada which were used for secure (PGP) communication with adapted Blackberry phones. In 2018, The Court of Amsterdam ruled that Hansken could lawfully be used to search through and provide insight into already available evidence — 3.6 million encrypted messages from Canadian mail servers were lawfully searched.

It was a bitter pill to swallow for the Dutch criminal Naoufal F., nicknamed Noffel, when he was sentenced in 2018 to 18 years in prison for a failed liquidation. A year later, six men were convicted, with sentences ranging from seven years to life imprisonment for their extremely violent wave of preparation and (attempted) liquidations. The Dutch Prosecution Office, with the help of Hansken, used the evidence found in encrypted messages to convict them.

The smart assistant

Hansken challenges forensic software developers to keep evaluating and developing methods to efficiently analyze large data collections. Bakker: “The work encapsulates our love for puzzles, problem-solving, and passion for programming.”

The NFI ensures that law enforcement agencies receive sufficient aid during digital forensic investigations. Hansken saves time with problem-solving, quickly analyzing data, ensuring forensic knowledge is safeguarded, and providing valuable leads in criminal casework. Digital forensic investigation will play an increasingly important role in criminal justice. “We continue to develop the platform and expand its forensic capabilities. There’s always room for improvement,” adds Creeten.

Get the TNW newsletter

Get the most important tech news in your inbox each week.

Also tagged with

Back to top