How Buddy Allocator do filling disk from empty to full

Let’s see how ext4 buddy allocator work in with such workload

I use ldiskfs – Lustre FS backend. Allocator works same way there.

Mount Lustre with one 4G OST. We research how OST works so don’t want stripping and other Lustre FS feature.

OSTCOUNT=1 OSTSIZE=4194304 lustre/tests/llmount.sh

Enable buddy allocator statistics.

echo “1” > /sys/fs/ldiskfs/loop1/mb_stats

Let’s start from 1024Mb of data using dd

dd if=/dev/zero of=/mnt/lustre/foofile bs=1048576 count=1024

1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 3.35335 s, 320 MB/s

lustre/tests/llmountcleanup.sh to get statistics

[311743.337750] LDISKFS-fs (loop1): mballoc: 262144 blocks 1024 reqs (1024 success)
[311743.337753] LDISKFS-fs (loop1): mballoc: 131 extents scanned, 1017 goal hits, 7 2^N hits, 0 breaks, 0 lost
[311743.337754] LDISKFS-fs (loop1): mballoc: (0, 0, 0) useless c(0,1,2) loops
[311743.337755] LDISKFS-fs (loop1): mballoc: 10 generated and it took 15604
[311743.337757] LDISKFS-fs (loop1): mballoc: 229632 preallocated, 0 discarded

Same, but fill half of disk. (Mount again. Set mb_stat)

dd if=/dev/zero of=/mnt/lustre/foofile bs=1048576 count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 6.09781 s, 352 MB/s

[312172.970702] LDISKFS-fs (loop1): mballoc: 524288 blocks 2048 reqs (2048 success)
[312172.970705] LDISKFS-fs (loop1): mballoc: 259 extents scanned, 2039 goal hits, 9 2^N hits, 0 breaks, 0 lost
[312172.970706] LDISKFS-fs (loop1): mballoc: (0, 0, 0) useless c(0,1,2) loops
[312172.970707] LDISKFS-fs (loop1): mballoc: 18 generated and it took 28696
[312172.970709] LDISKFS-fs (loop1): mballoc: 459008 preallocated, 0 discarded

3/4 of disk

dd if=/dev/zero of=/mnt/lustre/foofile bs=1048576 count=3072
3072+0 records in
3072+0 records out
3221225472 bytes (3.2 GB) copied, 9.25951 s, 348 MB/s

[312364.166116] LDISKFS-fs (loop1): mballoc: 786432 blocks 3072 reqs (3072 success)
[312364.166119] LDISKFS-fs (loop1): mballoc: 387 extents scanned, 3061 goal hits, 11 2^N hits, 0 breaks, 0 lost
[312364.166120] LDISKFS-fs (loop1): mballoc: (0, 0, 0) useless c(0,1,2) loops
[312364.166121] LDISKFS-fs (loop1): mballoc: 26 generated and it took 51332
[312364.166122] LDISKFS-fs (loop1): mballoc: 688384 preallocated, 0 discarded

Let’s fill disk

dd if=/dev/zero of=/mnt/lustre/foofile bs=1048576 count=4096
dd: error writing ‘/mnt/lustre/foofile’: No space left on device
3585+0 records in
3584+0 records out
3758153728 bytes (3.8 GB) copied, 11.0218 s, 341 MB/s

[312557.402752] LDISKFS-fs (loop1): mballoc: 917505 blocks 3614 reqs (3613 success)
[312557.402755] LDISKFS-fs (loop1): mballoc: 452 extents scanned, 3587 goal hits, 21 2^N hits, 0 breaks, 0 lost
[312557.402757] LDISKFS-fs (loop1): mballoc: (0, 0, 0) useless c(0,1,2) loops
[312557.402758] LDISKFS-fs (loop1): mballoc: 31 generated and it took 56836
[312557.402759] LDISKFS-fs (loop1): mballoc: 805119 preallocated, 2034 discarded

Disk parameters.

dumpe2fs /tmp/lustre-ost1
dumpe2fs 1.42.13.x6 (01-Mar-2018)
Filesystem volume name:   lustre-OST0000
Last mounted on:          /
Filesystem UUID:          f06873af-8707-4fb1-91a9-176a05d61996
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink quota
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              262144
Block count:              1048576
Reserved block count:     52428
Free blocks:              64384
Free inodes:              261880
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      1022
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    256
Filesystem created:       Thu Feb 21 22:43:25 2019
Last mount time:          Thu Feb 21 22:43:26 2019
Last write time:          Thu Feb 21 22:45:31 2019
Mount count:              3
Maximum mount count:      -1
Last checked:             Thu Feb 21 22:43:25 2019
Check interval:           0 (<none>)
Lifetime writes:          4201 kB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:           256
Required extra isize:     32
Desired extra isize:      32
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      daf914ff-b5c9-4725-b4cd-f80472ca3dc9
Journal backup:           inode blocks
User quota inode:         3
Group quota inode:        4
Journal features:         (none)
Journal size:             163M
Journal length:           41728
Journal sequence:         0x00000004
Journal start:            0




Group 0: (Blocks 0-32767) [ITABLE_ZEROED]
  Checksum 0x47bc, unused inodes 8063
  Primary superblock at 0, Group descriptors at 1-1
  Reserved GDT blocks at 2-1023
  Block bitmap at 1024 (+1024), Inode bitmap at 1056 (+1056)
  Inode table at 1088-1599 (+1088)
  15149 free blocks, 8063 free inodes, 4 directories, 8063 unused inodes
  Free blocks: 17619-32767
  Free inodes: 130-8192
Group 1: (Blocks 32768-65535) [ITABLE_ZEROED]
  Checksum 0x0e26, unused inodes 8185
  Backup superblock at 32768, Group descriptors at 32769-32769
  Reserved GDT blocks at 32770-33791
  Block bitmap at 1025 (bg #0 + 1025), Inode bitmap at 1057 (bg #0 + 1057)
  Inode table at 1600-2111 (bg #0 + 1600)
  2913 free blocks, 8185 free inodes, 7 directories, 8185 unused inodes
  Free blocks: 33941-34303, 34308-34559, 34822-35327, 63744-65535
  Free inodes: 8200-16384
Group 2: (Blocks 65536-98303) [ITABLE_ZEROED]
  Checksum 0x37f9, unused inodes 8185
  Block bitmap at 1026 (bg #0 + 1026), Inode bitmap at 1058 (bg #0 + 1058)
  Inode table at 2112-2623 (bg #0 + 2112)
  0 free blocks, 8185 free inodes, 7 directories, 8185 unused inodes
  Free blocks: 
  Free inodes: 16392-24576
Group 3: (Blocks 98304-131071) [ITABLE_ZEROED]
  Checksum 0x5660, unused inodes 8185
  Backup superblock at 98304, Group descriptors at 98305-98305
  Reserved GDT blocks at 98306-99327
  Block bitmap at 1027 (bg #0 + 1027), Inode bitmap at 1059 (bg #0 + 1059)
  Inode table at 2624-3135 (bg #0 + 2624)
  1024 free blocks, 8185 free inodes, 7 directories, 8185 unused inodes
  Free blocks: 99328-100351
  Free inodes: 24584-32768
Group 4: (Blocks 131072-163839) [ITABLE_ZEROED]
  Checksum 0x7daf, unused inodes 8185
  Block bitmap at 1028 (bg #0 + 1028), Inode bitmap at 1060 (bg #0 + 1060)
  Inode table at 3136-3647 (bg #0 + 3136)
  0 free blocks, 8185 free inodes, 7 directories, 8185 unused inodes
  Free blocks: 
  Free inodes: 32776-40960
Group 5: (Blocks 163840-196607) [ITABLE_ZEROED]
  Checksum 0x1c36, unused inodes 8185
  Backup superblock at 163840, Group descriptors at 163841-163841
  Reserved GDT blocks at 163842-164863
  Block bitmap at 1029 (bg #0 + 1029), Inode bitmap at 1061 (bg #0 + 1061)
  Inode table at 3648-4159 (bg #0 + 3648)
  1024 free blocks, 8185 free inodes, 7 directories, 8185 unused inodes
  Free blocks: 164864-165887
  Free inodes: 40968-49152
Group 6: (Blocks 196608-229375) [ITABLE_ZEROED]
  Checksum 0xa581, unused inodes 8185
  Block bitmap at 1030 (bg #0 + 1030), Inode bitmap at 1062 (bg #0 + 1062)
  Inode table at 4160-4671 (bg #0 + 4160)
  0 free blocks, 8185 free inodes, 7 directories, 8185 unused inodes
  Free blocks: 
  Free inodes: 49160-57344
Group 7: (Blocks 229376-262143) [ITABLE_ZEROED]
  Checksum 0xc418, unused inodes 8185
  Backup superblock at 229376, Group descriptors at 229377-229377
  Reserved GDT blocks at 229378-230399
  Block bitmap at 1031 (bg #0 + 1031), Inode bitmap at 1063 (bg #0 + 1063)
  Inode table at 4672-5183 (bg #0 + 4672)
  1024 free blocks, 8185 free inodes, 7 directories, 8185 unused inodes
  Free blocks: 230400-231423
  Free inodes: 57352-65536
Group 8: (Blocks 262144-294911) [ITABLE_ZEROED]
  Checksum 0x7811, unused inodes 8185
  Block bitmap at 1032 (bg #0 + 1032), Inode bitmap at 1064 (bg #0 + 1064)
  Inode table at 5184-5695 (bg #0 + 5184)
  0 free blocks, 8185 free inodes, 7 directories, 8185 unused inodes
  Free blocks: 
  Free inodes: 65544-73728
Group 9: (Blocks 294912-327679) [ITABLE_ZEROED]
  Checksum 0x1988, unused inodes 8185
  Backup superblock at 294912, Group descriptors at 294913-294913
  Reserved GDT blocks at 294914-295935
  Block bitmap at 1033 (bg #0 + 1033), Inode bitmap at 1065 (bg #0 + 1065)
  Inode table at 5696-6207 (bg #0 + 5696)
  1024 free blocks, 8185 free inodes, 7 directories, 8185 unused inodes
  Free blocks: 295936-296959
  Free inodes: 73736-81920
Group 10: (Blocks 327680-360447) [ITABLE_ZEROED]
  Checksum 0x312d, unused inodes 8185
  Block bitmap at 1034 (bg #0 + 1034), Inode bitmap at 1066 (bg #0 + 1066)
  Inode table at 6208-6719 (bg #0 + 6208)
  0 free blocks, 8185 free inodes, 7 directories, 8185 unused inodes
  Free blocks: 
  Free inodes: 81928-90112
Group 11: (Blocks 360448-393215) [ITABLE_ZEROED]
  Checksum 0x9e39, unused inodes 8184
  Block bitmap at 1035 (bg #0 + 1035), Inode bitmap at 1067 (bg #0 + 1067)
  Inode table at 6720-7231 (bg #0 + 6720)
  0 free blocks, 8184 free inodes, 8 directories, 8184 unused inodes
  Free blocks: 
  Free inodes: 90121-98304
Group 12: (Blocks 393216-425983) [ITABLE_ZEROED]
  Checksum 0x74b4, unused inodes 8184
  Block bitmap at 1036 (bg #0 + 1036), Inode bitmap at 1068 (bg #0 + 1068)
  Inode table at 7232-7743 (bg #0 + 7232)
  0 free blocks, 8184 free inodes, 8 directories, 8184 unused inodes
  Free blocks: 
  Free inodes: 98313-106496
Group 13: (Blocks 425984-458751) [ITABLE_ZEROED]
  Checksum 0xd46f, unused inodes 8184
  Block bitmap at 1037 (bg #0 + 1037), Inode bitmap at 1069 (bg #0 + 1069)
  Inode table at 7744-8255 (bg #0 + 7744)
  0 free blocks, 8184 free inodes, 8 directories, 8184 unused inodes
  Free blocks: 
  Free inodes: 106505-114688
Group 14: (Blocks 458752-491519) [ITABLE_ZEROED]
  Checksum 0xcebd, unused inodes 8184
  Block bitmap at 1038 (bg #0 + 1038), Inode bitmap at 1070 (bg #0 + 1070)
  Inode table at 8256-8767 (bg #0 + 8256)
  0 free blocks, 8184 free inodes, 8 directories, 8184 unused inodes
  Free blocks: 
  Free inodes: 114697-122880
Group 15: (Blocks 491520-524287) [ITABLE_ZEROED]
  Checksum 0x6e66, unused inodes 8184
  Block bitmap at 1039 (bg #0 + 1039), Inode bitmap at 1071 (bg #0 + 1071)
  Inode table at 8768-9279 (bg #0 + 8768)
  0 free blocks, 8184 free inodes, 8 directories, 8184 unused inodes
  Free blocks: 
  Free inodes: 122889-131072
Group 16: (Blocks 524288-557055) [ITABLE_ZEROED]
  Checksum 0x227e, unused inodes 8185
  Block bitmap at 1040 (bg #0 + 1040), Inode bitmap at 1072 (bg #0 + 1072)
  Inode table at 9280-9791 (bg #0 + 9280)
  1280 free blocks, 8185 free inodes, 7 directories, 8185 unused inodes
  Free blocks: 533248-534527
  Free inodes: 131080-139264
Group 17: (Blocks 557056-589823) [ITABLE_ZEROED]
  Checksum 0xd3b6, unused inodes 8185
  Block bitmap at 1041 (bg #0 + 1041), Inode bitmap at 1073 (bg #0 + 1073)
  Inode table at 9792-10303 (bg #0 + 9792)
  0 free blocks, 8185 free inodes, 7 directories, 8185 unused inodes
  Free blocks: 
  Free inodes: 139272-147456
Group 18: (Blocks 589824-622591) [ITABLE_ZEROED]
  Checksum 0x3a51, unused inodes 8185
  Block bitmap at 1042 (bg #0 + 1042), Inode bitmap at 1074 (bg #0 + 1074)
  Inode table at 10304-10815 (bg #0 + 10304)
  0 free blocks, 8185 free inodes, 7 directories, 8185 unused inodes
  Free blocks: 
  Free inodes: 147464-155648
Group 19: (Blocks 622592-655359) [ITABLE_ZEROED]
  Checksum 0x9a49, unused inodes 8188
  Block bitmap at 1043 (bg #0 + 1043), Inode bitmap at 1075 (bg #0 + 1075)
  Inode table at 10816-11327 (bg #0 + 10816)
  0 free blocks, 8188 free inodes, 4 directories, 8188 unused inodes
  Free blocks: 
  Free inodes: 155653-163840
Group 20: (Blocks 655360-688127) [INODE_UNINIT, ITABLE_ZEROED]
  Checksum 0x3603, unused inodes 8192
  Block bitmap at 1044 (bg #0 + 1044), Inode bitmap at 1076 (bg #0 + 1076)
  Inode table at 11328-11839 (bg #0 + 11328)
  0 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
  Free blocks: 
  Free inodes: 163841-172032
Group 21: (Blocks 688128-720895) [INODE_UNINIT, ITABLE_ZEROED]
  Checksum 0x96d8, unused inodes 8192
  Block bitmap at 1045 (bg #0 + 1045), Inode bitmap at 1077 (bg #0 + 1077)
  Inode table at 11840-12351 (bg #0 + 11840)
  0 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
  Free blocks: 
  Free inodes: 172033-180224
Group 22: (Blocks 720896-753663) [INODE_UNINIT, ITABLE_ZEROED]
  Checksum 0xee2d, unused inodes 8192
  Block bitmap at 1046 (bg #0 + 1046), Inode bitmap at 1078 (bg #0 + 1078)
  Inode table at 12352-12863 (bg #0 + 12352)
  0 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
  Free blocks: 
  Free inodes: 180225-188416
Group 23: (Blocks 753664-786431) [INODE_UNINIT, ITABLE_ZEROED]
  Checksum 0x4ef6, unused inodes 8192
  Block bitmap at 1047 (bg #0 + 1047), Inode bitmap at 1079 (bg #0 + 1079)
  Inode table at 12864-13375 (bg #0 + 12864)
  0 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
  Free blocks: 
  Free inodes: 188417-196608
Group 24: (Blocks 786432-819199) [INODE_UNINIT, ITABLE_ZEROED]
  Checksum 0x33bd, unused inodes 8192
  Block bitmap at 1048 (bg #0 + 1048), Inode bitmap at 1080 (bg #0 + 1080)
  Inode table at 13376-13887 (bg #0 + 13376)
  0 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
  Free blocks: 
  Free inodes: 196609-204800
Group 25: (Blocks 819200-851967) [INODE_UNINIT, ITABLE_ZEROED]
  Checksum 0x5224, unused inodes 8192
  Backup superblock at 819200, Group descriptors at 819201-819201
  Reserved GDT blocks at 819202-820223
  Block bitmap at 1049 (bg #0 + 1049), Inode bitmap at 1081 (bg #0 + 1081)
  Inode table at 13888-14399 (bg #0 + 13888)
  1024 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
  Free blocks: 820224-821247
  Free inodes: 204801-212992
Group 26: (Blocks 851968-884735) [INODE_UNINIT, ITABLE_ZEROED]
  Checksum 0x7a81, unused inodes 8192
  Block bitmap at 1050 (bg #0 + 1050), Inode bitmap at 1082 (bg #0 + 1082)
  Inode table at 14400-14911 (bg #0 + 14400)
  0 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
  Free blocks: 
  Free inodes: 212993-221184
Group 27: (Blocks 884736-917503) [INODE_UNINIT, ITABLE_ZEROED]
  Checksum 0xd99f, unused inodes 8192
  Backup superblock at 884736, Group descriptors at 884737-884737
  Reserved GDT blocks at 884738-885759
  Block bitmap at 1051 (bg #0 + 1051), Inode bitmap at 1083 (bg #0 + 1083)
  Inode table at 14912-15423 (bg #0 + 14912)
  3072 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
  Free blocks: 885760-886783, 897024-897279, 915712-917503
  Free inodes: 221185-229376
Group 28: (Blocks 917504-950271) [INODE_UNINIT, ITABLE_ZEROED]
  Checksum 0x30d7, unused inodes 8192
  Block bitmap at 1052 (bg #0 + 1052), Inode bitmap at 1084 (bg #0 + 1084)
  Inode table at 15424-15935 (bg #0 + 15424)
  0 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
  Free blocks: 
  Free inodes: 229377-237568
Group 29: (Blocks 950272-983039) [INODE_UNINIT, ITABLE_ZEROED]
  Checksum 0x900c, unused inodes 8192
  Block bitmap at 1053 (bg #0 + 1053), Inode bitmap at 1085 (bg #0 + 1085)
  Inode table at 15936-16447 (bg #0 + 15936)
  0 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
  Free blocks: 
  Free inodes: 237569-245760
Group 30: (Blocks 983040-1015807) [INODE_UNINIT, ITABLE_ZEROED]
  Checksum 0x8b9d, unused inodes 8192
  Block bitmap at 1054 (bg #0 + 1054), Inode bitmap at 1086 (bg #0 + 1086)
  Inode table at 16448-16959 (bg #0 + 16448)
  4096 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
  Free blocks: 985088-985343, 987392-989183, 1007616-1007871, 1011968-1013759
  Free inodes: 245761-253952
Group 31: (Blocks 1015808-1048575) [INODE_UNINIT, ITABLE_ZEROED]
  Checksum 0x4dc9, unused inodes 8192
  Block bitmap at 1055 (bg #0 + 1055), Inode bitmap at 1087 (bg #0 + 1087)
  Inode table at 16960-17471 (bg #0 + 16960)
  32754 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
  Free blocks: 1015822-1048575
  Free inodes: 253953-262144

Finally 31 groups. This is there reason why 31 bulk allocator requests were created.