Loading Documentation/ABI/testing/sysfs-block-zram +25 −93 Original line number Diff line number Diff line Loading @@ -22,41 +22,6 @@ Description: device. The reset operation frees all the memory associated with this device. What: /sys/block/zram<id>/num_reads Date: August 2010 Contact: Nitin Gupta <ngupta@vflare.org> Description: The num_reads file is read-only and specifies the number of reads (failed or successful) done on this device. What: /sys/block/zram<id>/num_writes Date: August 2010 Contact: Nitin Gupta <ngupta@vflare.org> Description: The num_writes file is read-only and specifies the number of writes (failed or successful) done on this device. What: /sys/block/zram<id>/invalid_io Date: August 2010 Contact: Nitin Gupta <ngupta@vflare.org> Description: The invalid_io file is read-only and specifies the number of non-page-size-aligned I/O requests issued to this device. What: /sys/block/zram<id>/failed_reads Date: February 2014 Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Description: The failed_reads file is read-only and specifies the number of failed reads happened on this device. What: /sys/block/zram<id>/failed_writes Date: February 2014 Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Description: The failed_writes file is read-only and specifies the number of failed writes happened on this device. What: /sys/block/zram<id>/max_comp_streams Date: February 2014 Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Loading @@ -73,74 +38,24 @@ Description: available and selected compression algorithms, change compression algorithm selection. What: /sys/block/zram<id>/notify_free Date: August 2010 Contact: Nitin Gupta <ngupta@vflare.org> Description: The notify_free file is read-only. Depending on device usage scenario it may account a) the number of pages freed because of swap slot free notifications or b) the number of pages freed because of REQ_DISCARD requests sent by bio. The former ones are sent to a swap block device when a swap slot is freed, which implies that this disk is being used as a swap disk. The latter ones are sent by filesystem mounted with discard option, whenever some data blocks are getting discarded. What: /sys/block/zram<id>/zero_pages Date: August 2010 Contact: Nitin Gupta <ngupta@vflare.org> Description: The zero_pages file is read-only and specifies number of zero filled pages written to this disk. No memory is allocated for such pages. What: /sys/block/zram<id>/orig_data_size Date: August 2010 Contact: Nitin Gupta <ngupta@vflare.org> Description: The orig_data_size file is read-only and specifies uncompressed size of data stored in this disk. This excludes zero-filled pages (zero_pages) since no memory is allocated for them. Unit: bytes What: /sys/block/zram<id>/compr_data_size Date: August 2010 Contact: Nitin Gupta <ngupta@vflare.org> Description: The compr_data_size file is read-only and specifies compressed size of data stored in this disk. So, compression ratio can be calculated using orig_data_size and this statistic. Unit: bytes What: /sys/block/zram<id>/mem_used_total Date: August 2010 Contact: Nitin Gupta <ngupta@vflare.org> Description: The mem_used_total file is read-only and specifies the amount of memory, including allocator fragmentation and metadata overhead, allocated for this disk. So, allocator space efficiency can be calculated using compr_data_size and this statistic. Unit: bytes What: /sys/block/zram<id>/mem_used_max Date: August 2014 Contact: Minchan Kim <minchan@kernel.org> Description: The mem_used_max file is read/write and specifies the amount of maximum memory zram have consumed to store compressed data. For resetting the value, you should write "0". Otherwise, you could see -EINVAL. The mem_used_max file is write-only and is used to reset the counter of maximum memory zram have consumed to store compressed data. For resetting the value, you should write "0". Otherwise, you could see -EINVAL. Unit: bytes What: /sys/block/zram<id>/mem_limit Date: August 2014 Contact: Minchan Kim <minchan@kernel.org> Description: The mem_limit file is read/write and specifies the maximum amount of memory ZRAM can use to store the compressed data. The limit could be changed in run time and "0" means disable the limit. No limit is the initial state. Unit: bytes The mem_limit file is write-only and specifies the maximum amount of memory ZRAM can use to store the compressed data. The limit could be changed in run time and "0" means disable the limit. No limit is the initial state. Unit: bytes What: /sys/block/zram<id>/compact Date: August 2015 Loading @@ -166,3 +81,20 @@ Description: The mm_stat file is read-only and represents device's mm statistics (orig_data_size, compr_data_size, etc.) in a format similar to block layer statistics file format. What: /sys/block/zram<id>/debug_stat Date: July 2016 Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Description: The debug_stat file is read-only and represents various device's debugging info useful for kernel developers. Its format is not documented intentionally and may change anytime without any notice. What: /sys/block/zram<id>/backing_dev Date: June 2017 Contact: Minchan Kim <minchan@kernel.org> Description: The backing_dev file is read-write and set up backing device for zram to write incompressible pages. For using, user should enable CONFIG_ZRAM_WRITEBACK. Documentation/blockdev/zram.txt +113 −76 Original line number Diff line number Diff line Loading @@ -83,6 +83,16 @@ pre-created. Default: 1. #select lzo compression algorithm echo lzo > /sys/block/zram0/comp_algorithm For the time being, the `comp_algorithm' content does not necessarily show every compression algorithm supported by the kernel. We keep this list primarily to simplify device configuration and one can configure a new device with a compression algorithm that is not listed in `comp_algorithm'. The thing is that, internally, ZRAM uses Crypto API and, if some of the algorithms were built as modules, it's impossible to list all of them using, for instance, /proc/crypto or any other method. This, however, has an advantage of permitting the usage of custom crypto compression modules (implementing S/W or H/W compression). 4) Set Disksize Set disk size by writing the value to sysfs node 'disksize'. The value can be either in bytes or you can use mem suffixes. Loading Loading @@ -151,41 +161,15 @@ Name access description disksize RW show and set the device's disk size initstate RO shows the initialization state of the device reset WO trigger device reset num_reads RO the number of reads failed_reads RO the number of failed reads num_write RO the number of writes failed_writes RO the number of failed writes invalid_io RO the number of non-page-size-aligned I/O requests mem_used_max WO reset the `mem_used_max' counter (see later) mem_limit WO specifies the maximum amount of memory ZRAM can use to store the compressed data max_comp_streams RW the number of possible concurrent compress operations comp_algorithm RW show and change the compression algorithm notify_free RO the number of notifications to free pages (either slot free notifications or REQ_DISCARD requests) zero_pages RO the number of zero filled pages written to this disk orig_data_size RO uncompressed size of data stored in this disk compr_data_size RO compressed size of data stored in this disk mem_used_total RO the amount of memory allocated for this disk mem_used_max RW the maximum amount of memory zram have consumed to store the data (to reset this counter to the actual current value, write 1 to this attribute) mem_limit RW the maximum amount of memory ZRAM can use to store the compressed data pages_compacted RO the number of pages freed during compaction (available only via zram<id>/mm_stat node) compact WO trigger memory compaction debug_stat RO this file is used for zram debugging purposes backing_dev RW set up backend storage for zram to write out WARNING ======= per-stat sysfs attributes are considered to be deprecated. The basic strategy is: -- the existing RW nodes will be downgraded to WO nodes (in linux 4.11) -- deprecated RO sysfs nodes will eventually be removed (in linux 4.11) The list of deprecated attributes can be found here: Documentation/ABI/obsolete/sysfs-block-zram Basically, every attribute that has its own read accessible sysfs node (e.g. num_reads) *AND* is accessible via one of the stat files (zram<id>/stat or zram<id>/io_stat or zram<id>/mm_stat) is considered to be deprecated. User space is advised to use the following files to read the device statistics. Loading @@ -200,22 +184,41 @@ The stat file represents device's I/O statistics not accounted by block layer and, thus, not available in zram<id>/stat file. It consists of a single line of text and contains the following stats separated by whitespace: failed_reads failed_writes invalid_io notify_free failed_reads the number of failed reads failed_writes the number of failed writes invalid_io the number of non-page-size-aligned I/O requests notify_free Depending on device usage scenario it may account a) the number of pages freed because of swap slot free notifications or b) the number of pages freed because of REQ_DISCARD requests sent by bio. The former ones are sent to a swap block device when a swap slot is freed, which implies that this disk is being used as a swap disk. The latter ones are sent by filesystem mounted with discard option, whenever some data blocks are getting discarded. File /sys/block/zram<id>/mm_stat The stat file represents device's mm statistics. It consists of a single line of text and contains the following stats separated by whitespace: orig_data_size compr_data_size mem_used_total mem_limit mem_used_max zero_pages num_migrated orig_data_size uncompressed size of data stored in this disk. This excludes same-element-filled pages (same_pages) since no memory is allocated for them. Unit: bytes compr_data_size compressed size of data stored in this disk mem_used_total the amount of memory allocated for this disk. This includes allocator fragmentation and metadata overhead, allocated for this disk. So, allocator space efficiency can be calculated using compr_data_size and this statistic. Unit: bytes mem_limit the maximum amount of memory ZRAM can use to store the compressed data mem_used_max the maximum amount of memory zram have consumed to store the data same_pages the number of same element filled pages written to this disk. No memory is allocated for such pages. pages_compacted the number of pages freed during compaction huge_pages the number of incompressible pages 9) Deactivate: swapoff /dev/zram0 Loading @@ -230,5 +233,39 @@ line of text and contains the following stats separated by whitespace: resets the disksize to zero. You must set the disksize again before reusing the device. * Optional Feature = writeback With incompressible pages, there is no memory saving with zram. Instead, with CONFIG_ZRAM_WRITEBACK, zram can write incompressible page to backing storage rather than keeping it in memory. User should set up backing device via /sys/block/zramX/backing_dev before disksize setting. = memory tracking With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the zram block. It could be useful to catch cold or incompressible pages of the process with*pagemap. If you enable the feature, you could see block state via /sys/kernel/debug/zram/zram0/block_state". The output is as follows, 300 75.033841 .wh 301 63.806904 s.. 302 63.806919 ..h First column is zram's block index. Second column is access time since the system was booted Third column is state of the block. (s: same page w: written page to backing store h: huge page) First line of above example says 300th block is accessed at 75.033841sec and the block's state is huge so it is written back to the backing storage. It's a debugging feature so anyone shouldn't rely on it to work properly. Nitin Gupta ngupta@vflare.org drivers/block/zram/Kconfig +23 −13 Original line number Diff line number Diff line config ZRAM tristate "Compressed RAM block device support" depends on BLOCK && SYSFS select ZPOOL select LZO_COMPRESS select LZO_DECOMPRESS depends on BLOCK && SYSFS && ZSMALLOC && CRYPTO select CRYPTO_LZO default n help Creates virtual block devices called /dev/zramX (X = 0, 1, ...). Loading @@ -14,14 +12,26 @@ config ZRAM It has several use cases, for example: /tmp storage, use as swap disks and maybe many more. See zram.txt for more information. See Documentation/blockdev/zram.txt for more information. config ZRAM_LZ4_COMPRESS bool "Enable LZ4 algorithm support" config ZRAM_WRITEBACK bool "Write back incompressible page to backing device" depends on ZRAM select LZ4_COMPRESS select LZ4_DECOMPRESS default n help This option enables LZ4 compression algorithm support. Compression algorithm can be changed using `comp_algorithm' device attribute. No newline at end of file With incompressible page, there is no memory saving to keep it in memory. Instead, write it out to backing device. For this feature, admin should set up backing device via /sys/block/zramX/backing_dev. See Documentation/blockdev/zram.txt for more information. config ZRAM_MEMORY_TRACKING bool "Track zRam block status" depends on ZRAM && DEBUG_FS help With this feature, admin can track the state of allocated blocks of zRAM. Admin could see the information via /sys/kernel/debug/zram/zramX/block_state. See Documentation/blockdev/zram.txt for more information. drivers/block/zram/Makefile +1 −3 Original line number Diff line number Diff line zram-y := zcomp_lzo.o zcomp.o zram_drv.o zram-$(CONFIG_ZRAM_LZ4_COMPRESS) += zcomp_lz4.o zram-y := zcomp.o zram_drv.o obj-$(CONFIG_ZRAM) += zram.o drivers/block/zram/zcomp.c +98 −55 Original line number Diff line number Diff line Loading @@ -14,108 +14,153 @@ #include <linux/wait.h> #include <linux/sched.h> #include <linux/cpu.h> #include <linux/crypto.h> #include "zcomp.h" #include "zcomp_lzo.h" #ifdef CONFIG_ZRAM_LZ4_COMPRESS #include "zcomp_lz4.h" #endif static struct zcomp_backend *backends[] = { &zcomp_lzo, #ifdef CONFIG_ZRAM_LZ4_COMPRESS &zcomp_lz4, static const char * const backends[] = { "lzo", #if IS_ENABLED(CONFIG_CRYPTO_LZ4) "lz4", #endif #if IS_ENABLED(CONFIG_CRYPTO_DEFLATE) "deflate", #endif #if IS_ENABLED(CONFIG_CRYPTO_LZ4HC) "lz4hc", #endif #if IS_ENABLED(CONFIG_CRYPTO_842) "842", #endif #if IS_ENABLED(CONFIG_CRYPTO_ZSTD) "zstd", #endif NULL }; static struct zcomp_backend *find_backend(const char *compress) static void zcomp_strm_free(struct zcomp_strm *zstrm) { int i = 0; while (backends[i]) { if (sysfs_streq(compress, backends[i]->name)) break; i++; } return backends[i]; } static void zcomp_strm_free(struct zcomp *comp, struct zcomp_strm *zstrm) { if (zstrm->private) comp->backend->destroy(zstrm->private); if (!IS_ERR_OR_NULL(zstrm->tfm)) crypto_free_comp(zstrm->tfm); free_pages((unsigned long)zstrm->buffer, 1); kfree(zstrm); } /* * allocate new zcomp_strm structure with ->private initialized by * allocate new zcomp_strm structure with ->tfm initialized by * backend, return NULL on error */ static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp, gfp_t flags) static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp) { struct zcomp_strm *zstrm = kmalloc(sizeof(*zstrm), flags); struct zcomp_strm *zstrm = kmalloc(sizeof(*zstrm), GFP_KERNEL); if (!zstrm) return NULL; zstrm->private = comp->backend->create(flags); zstrm->tfm = crypto_alloc_comp(comp->name, 0, 0); /* * allocate 2 pages. 1 for compressed data, plus 1 extra for the * case when compressed size is larger than the original one */ zstrm->buffer = (void *)__get_free_pages(flags | __GFP_ZERO, 1); if (!zstrm->private || !zstrm->buffer) { zcomp_strm_free(comp, zstrm); zstrm->buffer = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1); if (IS_ERR_OR_NULL(zstrm->tfm) || !zstrm->buffer) { zcomp_strm_free(zstrm); zstrm = NULL; } return zstrm; } bool zcomp_available_algorithm(const char *comp) { int i = 0; while (backends[i]) { if (sysfs_streq(comp, backends[i])) return true; i++; } /* * Crypto does not ignore a trailing new line symbol, * so make sure you don't supply a string containing * one. * This also means that we permit zcomp initialisation * with any compressing algorithm known to crypto api. */ return crypto_has_comp(comp, 0, 0) == 1; } /* show available compressors */ ssize_t zcomp_available_show(const char *comp, char *buf) { bool known_algorithm = false; ssize_t sz = 0; int i = 0; while (backends[i]) { if (!strcmp(comp, backends[i]->name)) for (; backends[i]; i++) { if (!strcmp(comp, backends[i])) { known_algorithm = true; sz += scnprintf(buf + sz, PAGE_SIZE - sz - 2, "[%s] ", backends[i]->name); else "[%s] ", backends[i]); } else { sz += scnprintf(buf + sz, PAGE_SIZE - sz - 2, "%s ", backends[i]->name); i++; "%s ", backends[i]); } sz += scnprintf(buf + sz, PAGE_SIZE - sz, "\n"); return sz; } bool zcomp_available_algorithm(const char *comp) { return find_backend(comp) != NULL; /* * Out-of-tree module known to crypto api or a missing * entry in `backends'. */ if (!known_algorithm && crypto_has_comp(comp, 0, 0) == 1) sz += scnprintf(buf + sz, PAGE_SIZE - sz - 2, "[%s] ", comp); sz += scnprintf(buf + sz, PAGE_SIZE - sz, "\n"); return sz; } struct zcomp_strm *zcomp_strm_find(struct zcomp *comp) struct zcomp_strm *zcomp_stream_get(struct zcomp *comp) { return *get_cpu_ptr(comp->stream); } void zcomp_strm_release(struct zcomp *comp, struct zcomp_strm *zstrm) void zcomp_stream_put(struct zcomp *comp) { put_cpu_ptr(comp->stream); } int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm, const unsigned char *src, size_t *dst_len) int zcomp_compress(struct zcomp_strm *zstrm, const void *src, unsigned int *dst_len) { return comp->backend->compress(src, zstrm->buffer, dst_len, zstrm->private); /* * Our dst memory (zstrm->buffer) is always `2 * PAGE_SIZE' sized * because sometimes we can endup having a bigger compressed data * due to various reasons: for example compression algorithms tend * to add some padding to the compressed buffer. Speaking of padding, * comp algorithm `842' pads the compressed length to multiple of 8 * and returns -ENOSP when the dst memory is not big enough, which * is not something that ZRAM wants to see. We can handle the * `compressed_size > PAGE_SIZE' case easily in ZRAM, but when we * receive -ERRNO from the compressing backend we can't help it * anymore. To make `842' happy we need to tell the exact size of * the dst buffer, zram_drv will take care of the fact that * compressed buffer is too big. */ *dst_len = PAGE_SIZE * 2; return crypto_comp_compress(zstrm->tfm, src, PAGE_SIZE, zstrm->buffer, dst_len); } int zcomp_decompress(struct zcomp *comp, const unsigned char *src, size_t src_len, unsigned char *dst) int zcomp_decompress(struct zcomp_strm *zstrm, const void *src, unsigned int src_len, void *dst) { return comp->backend->decompress(src, src_len, dst); unsigned int dst_len = PAGE_SIZE; return crypto_comp_decompress(zstrm->tfm, src, src_len, dst, &dst_len); } static int __zcomp_cpu_notifier(struct zcomp *comp, Loading @@ -127,7 +172,7 @@ static int __zcomp_cpu_notifier(struct zcomp *comp, case CPU_UP_PREPARE: if (WARN_ON(*per_cpu_ptr(comp->stream, cpu))) break; zstrm = zcomp_strm_alloc(comp, GFP_KERNEL); zstrm = zcomp_strm_alloc(comp); if (IS_ERR_OR_NULL(zstrm)) { pr_err("Can't allocate a compression stream\n"); return NOTIFY_BAD; Loading @@ -138,7 +183,7 @@ static int __zcomp_cpu_notifier(struct zcomp *comp, case CPU_UP_CANCELED: zstrm = *per_cpu_ptr(comp->stream, cpu); if (!IS_ERR_OR_NULL(zstrm)) zcomp_strm_free(comp, zstrm); zcomp_strm_free(zstrm); *per_cpu_ptr(comp->stream, cpu) = NULL; break; default: Loading Loading @@ -209,18 +254,16 @@ void zcomp_destroy(struct zcomp *comp) struct zcomp *zcomp_create(const char *compress) { struct zcomp *comp; struct zcomp_backend *backend; int error; backend = find_backend(compress); if (!backend) if (!zcomp_available_algorithm(compress)) return ERR_PTR(-EINVAL); comp = kzalloc(sizeof(struct zcomp), GFP_KERNEL); if (!comp) return ERR_PTR(-ENOMEM); comp->backend = backend; comp->name = compress; error = zcomp_init(comp); if (error) { kfree(comp); Loading Loading
Documentation/ABI/testing/sysfs-block-zram +25 −93 Original line number Diff line number Diff line Loading @@ -22,41 +22,6 @@ Description: device. The reset operation frees all the memory associated with this device. What: /sys/block/zram<id>/num_reads Date: August 2010 Contact: Nitin Gupta <ngupta@vflare.org> Description: The num_reads file is read-only and specifies the number of reads (failed or successful) done on this device. What: /sys/block/zram<id>/num_writes Date: August 2010 Contact: Nitin Gupta <ngupta@vflare.org> Description: The num_writes file is read-only and specifies the number of writes (failed or successful) done on this device. What: /sys/block/zram<id>/invalid_io Date: August 2010 Contact: Nitin Gupta <ngupta@vflare.org> Description: The invalid_io file is read-only and specifies the number of non-page-size-aligned I/O requests issued to this device. What: /sys/block/zram<id>/failed_reads Date: February 2014 Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Description: The failed_reads file is read-only and specifies the number of failed reads happened on this device. What: /sys/block/zram<id>/failed_writes Date: February 2014 Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Description: The failed_writes file is read-only and specifies the number of failed writes happened on this device. What: /sys/block/zram<id>/max_comp_streams Date: February 2014 Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Loading @@ -73,74 +38,24 @@ Description: available and selected compression algorithms, change compression algorithm selection. What: /sys/block/zram<id>/notify_free Date: August 2010 Contact: Nitin Gupta <ngupta@vflare.org> Description: The notify_free file is read-only. Depending on device usage scenario it may account a) the number of pages freed because of swap slot free notifications or b) the number of pages freed because of REQ_DISCARD requests sent by bio. The former ones are sent to a swap block device when a swap slot is freed, which implies that this disk is being used as a swap disk. The latter ones are sent by filesystem mounted with discard option, whenever some data blocks are getting discarded. What: /sys/block/zram<id>/zero_pages Date: August 2010 Contact: Nitin Gupta <ngupta@vflare.org> Description: The zero_pages file is read-only and specifies number of zero filled pages written to this disk. No memory is allocated for such pages. What: /sys/block/zram<id>/orig_data_size Date: August 2010 Contact: Nitin Gupta <ngupta@vflare.org> Description: The orig_data_size file is read-only and specifies uncompressed size of data stored in this disk. This excludes zero-filled pages (zero_pages) since no memory is allocated for them. Unit: bytes What: /sys/block/zram<id>/compr_data_size Date: August 2010 Contact: Nitin Gupta <ngupta@vflare.org> Description: The compr_data_size file is read-only and specifies compressed size of data stored in this disk. So, compression ratio can be calculated using orig_data_size and this statistic. Unit: bytes What: /sys/block/zram<id>/mem_used_total Date: August 2010 Contact: Nitin Gupta <ngupta@vflare.org> Description: The mem_used_total file is read-only and specifies the amount of memory, including allocator fragmentation and metadata overhead, allocated for this disk. So, allocator space efficiency can be calculated using compr_data_size and this statistic. Unit: bytes What: /sys/block/zram<id>/mem_used_max Date: August 2014 Contact: Minchan Kim <minchan@kernel.org> Description: The mem_used_max file is read/write and specifies the amount of maximum memory zram have consumed to store compressed data. For resetting the value, you should write "0". Otherwise, you could see -EINVAL. The mem_used_max file is write-only and is used to reset the counter of maximum memory zram have consumed to store compressed data. For resetting the value, you should write "0". Otherwise, you could see -EINVAL. Unit: bytes What: /sys/block/zram<id>/mem_limit Date: August 2014 Contact: Minchan Kim <minchan@kernel.org> Description: The mem_limit file is read/write and specifies the maximum amount of memory ZRAM can use to store the compressed data. The limit could be changed in run time and "0" means disable the limit. No limit is the initial state. Unit: bytes The mem_limit file is write-only and specifies the maximum amount of memory ZRAM can use to store the compressed data. The limit could be changed in run time and "0" means disable the limit. No limit is the initial state. Unit: bytes What: /sys/block/zram<id>/compact Date: August 2015 Loading @@ -166,3 +81,20 @@ Description: The mm_stat file is read-only and represents device's mm statistics (orig_data_size, compr_data_size, etc.) in a format similar to block layer statistics file format. What: /sys/block/zram<id>/debug_stat Date: July 2016 Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Description: The debug_stat file is read-only and represents various device's debugging info useful for kernel developers. Its format is not documented intentionally and may change anytime without any notice. What: /sys/block/zram<id>/backing_dev Date: June 2017 Contact: Minchan Kim <minchan@kernel.org> Description: The backing_dev file is read-write and set up backing device for zram to write incompressible pages. For using, user should enable CONFIG_ZRAM_WRITEBACK.
Documentation/blockdev/zram.txt +113 −76 Original line number Diff line number Diff line Loading @@ -83,6 +83,16 @@ pre-created. Default: 1. #select lzo compression algorithm echo lzo > /sys/block/zram0/comp_algorithm For the time being, the `comp_algorithm' content does not necessarily show every compression algorithm supported by the kernel. We keep this list primarily to simplify device configuration and one can configure a new device with a compression algorithm that is not listed in `comp_algorithm'. The thing is that, internally, ZRAM uses Crypto API and, if some of the algorithms were built as modules, it's impossible to list all of them using, for instance, /proc/crypto or any other method. This, however, has an advantage of permitting the usage of custom crypto compression modules (implementing S/W or H/W compression). 4) Set Disksize Set disk size by writing the value to sysfs node 'disksize'. The value can be either in bytes or you can use mem suffixes. Loading Loading @@ -151,41 +161,15 @@ Name access description disksize RW show and set the device's disk size initstate RO shows the initialization state of the device reset WO trigger device reset num_reads RO the number of reads failed_reads RO the number of failed reads num_write RO the number of writes failed_writes RO the number of failed writes invalid_io RO the number of non-page-size-aligned I/O requests mem_used_max WO reset the `mem_used_max' counter (see later) mem_limit WO specifies the maximum amount of memory ZRAM can use to store the compressed data max_comp_streams RW the number of possible concurrent compress operations comp_algorithm RW show and change the compression algorithm notify_free RO the number of notifications to free pages (either slot free notifications or REQ_DISCARD requests) zero_pages RO the number of zero filled pages written to this disk orig_data_size RO uncompressed size of data stored in this disk compr_data_size RO compressed size of data stored in this disk mem_used_total RO the amount of memory allocated for this disk mem_used_max RW the maximum amount of memory zram have consumed to store the data (to reset this counter to the actual current value, write 1 to this attribute) mem_limit RW the maximum amount of memory ZRAM can use to store the compressed data pages_compacted RO the number of pages freed during compaction (available only via zram<id>/mm_stat node) compact WO trigger memory compaction debug_stat RO this file is used for zram debugging purposes backing_dev RW set up backend storage for zram to write out WARNING ======= per-stat sysfs attributes are considered to be deprecated. The basic strategy is: -- the existing RW nodes will be downgraded to WO nodes (in linux 4.11) -- deprecated RO sysfs nodes will eventually be removed (in linux 4.11) The list of deprecated attributes can be found here: Documentation/ABI/obsolete/sysfs-block-zram Basically, every attribute that has its own read accessible sysfs node (e.g. num_reads) *AND* is accessible via one of the stat files (zram<id>/stat or zram<id>/io_stat or zram<id>/mm_stat) is considered to be deprecated. User space is advised to use the following files to read the device statistics. Loading @@ -200,22 +184,41 @@ The stat file represents device's I/O statistics not accounted by block layer and, thus, not available in zram<id>/stat file. It consists of a single line of text and contains the following stats separated by whitespace: failed_reads failed_writes invalid_io notify_free failed_reads the number of failed reads failed_writes the number of failed writes invalid_io the number of non-page-size-aligned I/O requests notify_free Depending on device usage scenario it may account a) the number of pages freed because of swap slot free notifications or b) the number of pages freed because of REQ_DISCARD requests sent by bio. The former ones are sent to a swap block device when a swap slot is freed, which implies that this disk is being used as a swap disk. The latter ones are sent by filesystem mounted with discard option, whenever some data blocks are getting discarded. File /sys/block/zram<id>/mm_stat The stat file represents device's mm statistics. It consists of a single line of text and contains the following stats separated by whitespace: orig_data_size compr_data_size mem_used_total mem_limit mem_used_max zero_pages num_migrated orig_data_size uncompressed size of data stored in this disk. This excludes same-element-filled pages (same_pages) since no memory is allocated for them. Unit: bytes compr_data_size compressed size of data stored in this disk mem_used_total the amount of memory allocated for this disk. This includes allocator fragmentation and metadata overhead, allocated for this disk. So, allocator space efficiency can be calculated using compr_data_size and this statistic. Unit: bytes mem_limit the maximum amount of memory ZRAM can use to store the compressed data mem_used_max the maximum amount of memory zram have consumed to store the data same_pages the number of same element filled pages written to this disk. No memory is allocated for such pages. pages_compacted the number of pages freed during compaction huge_pages the number of incompressible pages 9) Deactivate: swapoff /dev/zram0 Loading @@ -230,5 +233,39 @@ line of text and contains the following stats separated by whitespace: resets the disksize to zero. You must set the disksize again before reusing the device. * Optional Feature = writeback With incompressible pages, there is no memory saving with zram. Instead, with CONFIG_ZRAM_WRITEBACK, zram can write incompressible page to backing storage rather than keeping it in memory. User should set up backing device via /sys/block/zramX/backing_dev before disksize setting. = memory tracking With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the zram block. It could be useful to catch cold or incompressible pages of the process with*pagemap. If you enable the feature, you could see block state via /sys/kernel/debug/zram/zram0/block_state". The output is as follows, 300 75.033841 .wh 301 63.806904 s.. 302 63.806919 ..h First column is zram's block index. Second column is access time since the system was booted Third column is state of the block. (s: same page w: written page to backing store h: huge page) First line of above example says 300th block is accessed at 75.033841sec and the block's state is huge so it is written back to the backing storage. It's a debugging feature so anyone shouldn't rely on it to work properly. Nitin Gupta ngupta@vflare.org
drivers/block/zram/Kconfig +23 −13 Original line number Diff line number Diff line config ZRAM tristate "Compressed RAM block device support" depends on BLOCK && SYSFS select ZPOOL select LZO_COMPRESS select LZO_DECOMPRESS depends on BLOCK && SYSFS && ZSMALLOC && CRYPTO select CRYPTO_LZO default n help Creates virtual block devices called /dev/zramX (X = 0, 1, ...). Loading @@ -14,14 +12,26 @@ config ZRAM It has several use cases, for example: /tmp storage, use as swap disks and maybe many more. See zram.txt for more information. See Documentation/blockdev/zram.txt for more information. config ZRAM_LZ4_COMPRESS bool "Enable LZ4 algorithm support" config ZRAM_WRITEBACK bool "Write back incompressible page to backing device" depends on ZRAM select LZ4_COMPRESS select LZ4_DECOMPRESS default n help This option enables LZ4 compression algorithm support. Compression algorithm can be changed using `comp_algorithm' device attribute. No newline at end of file With incompressible page, there is no memory saving to keep it in memory. Instead, write it out to backing device. For this feature, admin should set up backing device via /sys/block/zramX/backing_dev. See Documentation/blockdev/zram.txt for more information. config ZRAM_MEMORY_TRACKING bool "Track zRam block status" depends on ZRAM && DEBUG_FS help With this feature, admin can track the state of allocated blocks of zRAM. Admin could see the information via /sys/kernel/debug/zram/zramX/block_state. See Documentation/blockdev/zram.txt for more information.
drivers/block/zram/Makefile +1 −3 Original line number Diff line number Diff line zram-y := zcomp_lzo.o zcomp.o zram_drv.o zram-$(CONFIG_ZRAM_LZ4_COMPRESS) += zcomp_lz4.o zram-y := zcomp.o zram_drv.o obj-$(CONFIG_ZRAM) += zram.o
drivers/block/zram/zcomp.c +98 −55 Original line number Diff line number Diff line Loading @@ -14,108 +14,153 @@ #include <linux/wait.h> #include <linux/sched.h> #include <linux/cpu.h> #include <linux/crypto.h> #include "zcomp.h" #include "zcomp_lzo.h" #ifdef CONFIG_ZRAM_LZ4_COMPRESS #include "zcomp_lz4.h" #endif static struct zcomp_backend *backends[] = { &zcomp_lzo, #ifdef CONFIG_ZRAM_LZ4_COMPRESS &zcomp_lz4, static const char * const backends[] = { "lzo", #if IS_ENABLED(CONFIG_CRYPTO_LZ4) "lz4", #endif #if IS_ENABLED(CONFIG_CRYPTO_DEFLATE) "deflate", #endif #if IS_ENABLED(CONFIG_CRYPTO_LZ4HC) "lz4hc", #endif #if IS_ENABLED(CONFIG_CRYPTO_842) "842", #endif #if IS_ENABLED(CONFIG_CRYPTO_ZSTD) "zstd", #endif NULL }; static struct zcomp_backend *find_backend(const char *compress) static void zcomp_strm_free(struct zcomp_strm *zstrm) { int i = 0; while (backends[i]) { if (sysfs_streq(compress, backends[i]->name)) break; i++; } return backends[i]; } static void zcomp_strm_free(struct zcomp *comp, struct zcomp_strm *zstrm) { if (zstrm->private) comp->backend->destroy(zstrm->private); if (!IS_ERR_OR_NULL(zstrm->tfm)) crypto_free_comp(zstrm->tfm); free_pages((unsigned long)zstrm->buffer, 1); kfree(zstrm); } /* * allocate new zcomp_strm structure with ->private initialized by * allocate new zcomp_strm structure with ->tfm initialized by * backend, return NULL on error */ static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp, gfp_t flags) static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp) { struct zcomp_strm *zstrm = kmalloc(sizeof(*zstrm), flags); struct zcomp_strm *zstrm = kmalloc(sizeof(*zstrm), GFP_KERNEL); if (!zstrm) return NULL; zstrm->private = comp->backend->create(flags); zstrm->tfm = crypto_alloc_comp(comp->name, 0, 0); /* * allocate 2 pages. 1 for compressed data, plus 1 extra for the * case when compressed size is larger than the original one */ zstrm->buffer = (void *)__get_free_pages(flags | __GFP_ZERO, 1); if (!zstrm->private || !zstrm->buffer) { zcomp_strm_free(comp, zstrm); zstrm->buffer = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1); if (IS_ERR_OR_NULL(zstrm->tfm) || !zstrm->buffer) { zcomp_strm_free(zstrm); zstrm = NULL; } return zstrm; } bool zcomp_available_algorithm(const char *comp) { int i = 0; while (backends[i]) { if (sysfs_streq(comp, backends[i])) return true; i++; } /* * Crypto does not ignore a trailing new line symbol, * so make sure you don't supply a string containing * one. * This also means that we permit zcomp initialisation * with any compressing algorithm known to crypto api. */ return crypto_has_comp(comp, 0, 0) == 1; } /* show available compressors */ ssize_t zcomp_available_show(const char *comp, char *buf) { bool known_algorithm = false; ssize_t sz = 0; int i = 0; while (backends[i]) { if (!strcmp(comp, backends[i]->name)) for (; backends[i]; i++) { if (!strcmp(comp, backends[i])) { known_algorithm = true; sz += scnprintf(buf + sz, PAGE_SIZE - sz - 2, "[%s] ", backends[i]->name); else "[%s] ", backends[i]); } else { sz += scnprintf(buf + sz, PAGE_SIZE - sz - 2, "%s ", backends[i]->name); i++; "%s ", backends[i]); } sz += scnprintf(buf + sz, PAGE_SIZE - sz, "\n"); return sz; } bool zcomp_available_algorithm(const char *comp) { return find_backend(comp) != NULL; /* * Out-of-tree module known to crypto api or a missing * entry in `backends'. */ if (!known_algorithm && crypto_has_comp(comp, 0, 0) == 1) sz += scnprintf(buf + sz, PAGE_SIZE - sz - 2, "[%s] ", comp); sz += scnprintf(buf + sz, PAGE_SIZE - sz, "\n"); return sz; } struct zcomp_strm *zcomp_strm_find(struct zcomp *comp) struct zcomp_strm *zcomp_stream_get(struct zcomp *comp) { return *get_cpu_ptr(comp->stream); } void zcomp_strm_release(struct zcomp *comp, struct zcomp_strm *zstrm) void zcomp_stream_put(struct zcomp *comp) { put_cpu_ptr(comp->stream); } int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm, const unsigned char *src, size_t *dst_len) int zcomp_compress(struct zcomp_strm *zstrm, const void *src, unsigned int *dst_len) { return comp->backend->compress(src, zstrm->buffer, dst_len, zstrm->private); /* * Our dst memory (zstrm->buffer) is always `2 * PAGE_SIZE' sized * because sometimes we can endup having a bigger compressed data * due to various reasons: for example compression algorithms tend * to add some padding to the compressed buffer. Speaking of padding, * comp algorithm `842' pads the compressed length to multiple of 8 * and returns -ENOSP when the dst memory is not big enough, which * is not something that ZRAM wants to see. We can handle the * `compressed_size > PAGE_SIZE' case easily in ZRAM, but when we * receive -ERRNO from the compressing backend we can't help it * anymore. To make `842' happy we need to tell the exact size of * the dst buffer, zram_drv will take care of the fact that * compressed buffer is too big. */ *dst_len = PAGE_SIZE * 2; return crypto_comp_compress(zstrm->tfm, src, PAGE_SIZE, zstrm->buffer, dst_len); } int zcomp_decompress(struct zcomp *comp, const unsigned char *src, size_t src_len, unsigned char *dst) int zcomp_decompress(struct zcomp_strm *zstrm, const void *src, unsigned int src_len, void *dst) { return comp->backend->decompress(src, src_len, dst); unsigned int dst_len = PAGE_SIZE; return crypto_comp_decompress(zstrm->tfm, src, src_len, dst, &dst_len); } static int __zcomp_cpu_notifier(struct zcomp *comp, Loading @@ -127,7 +172,7 @@ static int __zcomp_cpu_notifier(struct zcomp *comp, case CPU_UP_PREPARE: if (WARN_ON(*per_cpu_ptr(comp->stream, cpu))) break; zstrm = zcomp_strm_alloc(comp, GFP_KERNEL); zstrm = zcomp_strm_alloc(comp); if (IS_ERR_OR_NULL(zstrm)) { pr_err("Can't allocate a compression stream\n"); return NOTIFY_BAD; Loading @@ -138,7 +183,7 @@ static int __zcomp_cpu_notifier(struct zcomp *comp, case CPU_UP_CANCELED: zstrm = *per_cpu_ptr(comp->stream, cpu); if (!IS_ERR_OR_NULL(zstrm)) zcomp_strm_free(comp, zstrm); zcomp_strm_free(zstrm); *per_cpu_ptr(comp->stream, cpu) = NULL; break; default: Loading Loading @@ -209,18 +254,16 @@ void zcomp_destroy(struct zcomp *comp) struct zcomp *zcomp_create(const char *compress) { struct zcomp *comp; struct zcomp_backend *backend; int error; backend = find_backend(compress); if (!backend) if (!zcomp_available_algorithm(compress)) return ERR_PTR(-EINVAL); comp = kzalloc(sizeof(struct zcomp), GFP_KERNEL); if (!comp) return ERR_PTR(-ENOMEM); comp->backend = backend; comp->name = compress; error = zcomp_init(comp); if (error) { kfree(comp); Loading