![]() Unfortunately, the decompression utility, unpigz, is not parallelized. The flags -9 and -v indicate maximum compression and verbose output. One option particular to pigz is -p # (where # is the number of threads). Without the space between the -p and the number of threads, pigz would try to use all of the processors and threads possible. For example, in the command $ pigz -9 -v -p 8 file.dat The pigz utility has options very similar to gzip, but I haven’t really exercised all of them. Probably one of the most popular multithreaded compression utilities is pigz, a threaded implementation of gzip that has saved my bacon, pun intended, on more than one occasion, because serial compression would take too long. This means the serial gzip is fully capable of decompressing files compressed with the multithreaded tools. The tools mentioned here are capable of parallel (multithreaded) compression with the gzip file format. For example, gzip and bzip2 are each specific categories. In the following discussion, I organize the tools by the file format used. For both categories, the tools are compatible with file formats of the serial tools. A couple of cases had more than one threaded tool for a corresponding serial tool. Several of the compression utilities discussed in the previous article are multithreaded, but some utilities had threaded versions of the corresponding serial tools. In general, when all chunks are finished, they are concatenated into the final compressed file in the file format of the compression tool. The general approach for any of the multithreaded utilities is to break the file into chunks, each compressed by a single thread. As you will see, though, one compression utility uses the Message Passing Interface (MPI) standard, so it’s parallel across distributed nodes. In this article, I use the phrase “parallel” to indicate that the application is multithreaded but not multinode that is, they are parallel within a single Streaming Message Protocol (SMP) node. Fortunately, many of these utilities have become multithreaded. Beyond the speed of the compression algorithm affecting the time it takes to compress a file, many compression utilities only use one thread (serial). If you have ever compressed a large file, you know that it can take a long time to complete. My advice is based purely on my experience, so I have no test data to back it up. In this article, I also offer some advice with respect to compression, because I’m sure some readers will be asking, “What is the best tool?” (perhaps without defining “best”). Disclaimer: I’m not a security expert, so I can’t judge the strength of the encryption. In this article, I want to mention briefly which tools can encrypt. While researching compression tools, I discovered that some of them are capable of encryption. In a previous article, I presented some tried and true compression tools (e.g., gzip) and some newer ones (e.g., zstd). Most were serial tools, yet with all the wonderful cores available, where are the parallel compression tools? To me, this seems like a great opportunity to use compression tools to reduce the used storage capacity and perhaps save a little money. I base this statement on the observation that data, once created, is seldom accessed again.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |