C ++ standard library for large-scale data processing

Could you tell me some standard C ++ library, useful for processing large-scale data, for example, natural language processing with a huge data set, data set about interactions of protein proteins, etc.

Best, Thetna

+7
source share
2 answers

You can use STXXL when working with huge amounts of data. Quote from the website:

STXXL implements containers and algorithms that can handle huge amounts of data that are only suitable for disks. Although proximity to STL maintains ease of use and compatibility with existing applications, another development priority is high performance.

In addition, the license is permissible:

STXXL is free, open source and is licensed under Boost Software 1.0.

+12
source

I like to add HDF5 as a non-profit (alternatives to the BSD style):

HDF5 Package Includes:

- A versatile data model that can represent very complex data objects and a wide variety of metadata. - A completely portable file format with no limit on the number or size of data objects in the collection. - A software library that runs on a range of computational platforms, from laptops to massively parallel systems, and implements a high-level API with C, C++, Fortran 90, and Java interfaces. - A rich set of integrated performance features that allow for access time and storage space optimizations. - Tools and applications for managing, manipulating, viewing, and analyzing the data in the collection. 
0
source

All Articles