00:17.29 | *** join/#maemo-ssu drrz (~drrrrz@104.200.151.59) |
00:20.57 | *** join/#maemo-ssu drrrz (~drrrrz@tx-71-51-46-104.dhcp.embarqhsd.net) |
01:18.18 | *** join/#maemo-ssu freemangordon (~ivo@46.249.74.23) |
04:00.51 | *** join/#maemo-ssu DocScrutinizer05 (~saturn@openmoko/engineers/joerg) |
05:13.04 | *** join/#maemo-ssu chainsawbike (~chainsawb@unaffiliated/chainsawbike) |
09:51.33 | *** join/#maemo-ssu Pali (~pali@Maemo/community/contributor/Pali) |
10:30.35 | *** join/#maemo-ssu peetah (~peetah@cha92-9-82-236-202-86.fbx.proxad.net) |
11:30.53 | *** join/#maemo-ssu dafox (~dafox@2a02:a448:c25a:0:decf:5ea9:e960:80dc) |
12:19.19 | *** join/#maemo-ssu macmaN (~chezburge@90.190.182.21) |
15:17.46 | *** join/#maemo-ssu APic (apic@apic.name) |
15:21.28 | *** join/#maemo-ssu APic (apic@apic.name) |
15:23.45 | *** join/#maemo-ssu APic (apic@apic.name) |
19:19.25 | *** join/#maemo-ssu handaxe (~chatter@c83-248-22-153.bredband.comhem.se) |
19:32.15 | DocScrutinizer05 | RFC: backport TRIM ioctl cmd support to MMC driver |
19:33.17 | DocScrutinizer05 | rationale: particularly for swap volume a TRIM will significantly improve performance |
19:33.40 | kerio | i don't think TRIM is a thing, for microSDs |
19:34.02 | DocScrutinizer05 | I'm talking about mmc |
19:34.18 | kerio | isn't it basically just a big SD card anyway |
19:34.25 | DocScrutinizer05 | though TRIM even exists in ATA |
19:34.35 | kerio | the issue is that the controller needs to understand it |
19:34.49 | DocScrutinizer05 | well, that's why TRIM got standardized |
19:34.54 | kerio | yeah but |
19:35.05 | kerio | i don't think the one in the N900 understands it |
19:35.13 | DocScrutinizer05 | subject to evaluation: can *N900* eMMC understand TRIM |
19:35.52 | kerio | can the neo900 eMMC understand TRIM? |
19:37.06 | DocScrutinizer05 | https://en.wikipedia.org/wiki/Trim_(computing)#SD.2FMMC |
19:37.30 | kerio | ok so it's still a thing |
19:48.07 | DocScrutinizer05 | http://datasheet.octopart.com/THGBM1G8D8EBAI2-Toshiba-datasheet-11973698.pdf |
19:48.52 | DocScrutinizer05 | Full compliance w ith JEDEC/MMCA Ver. 4.3 |
19:54.36 | DocScrutinizer05 | https://en.wikipedia.org/wiki/MultiMediaCard#Open_standard |
19:55.32 | kerio | aww, 4.5 is the one with the proper TRIM |
19:56.47 | *** join/#maemo-ssu handaxe (~chatter@c83-248-22-153.bredband.comhem.se) |
20:00.51 | DocScrutinizer05 | yes, with even better TRIM, but 4.4 and prolly 4.3 already support something similar |
20:02.59 | DocScrutinizer05 | >>The MultiMediaCard and SD ERASE (CMD38) command provides similar functionality to the ATA TRIM command, although it requires that erased blocks be overwritten with either zeroes or ones<< https://en.wikipedia.org/wiki/Trim_(computing)#SD.2FMMC |
20:05.01 | kerio | yeah but in addition to supporting the command, the controller should also know how to use it in a smart way, shouldn't it |
20:05.31 | kerio | and since it's likely not very used, we'd need some testing done on it |
20:13.13 | DocScrutinizer05 | ohmy, you realy think a command is implemented in a chip in a way so it returns no error and does nothing else? |
20:13.44 | DocScrutinizer05 | I doubt JEDEC would agree on such implementation |
20:16.44 | *** join/#maemo-ssu DrCode (~DrCode@5.28.134.3) |
20:17.03 | kerio | DocScrutinizer05: turn ERASE into "write 0" |
20:17.09 | kerio | through the normal channels |
20:17.14 | DocScrutinizer05 | aha |
20:17.18 | DocScrutinizer05 | and then? |
20:17.25 | kerio | and then you have an implementation of ERASE |
20:17.28 | kerio | that doesn't help at all |
20:17.31 | DocScrutinizer05 | nonsense |
20:17.33 | kerio | but it's valid |
20:18.12 | kerio | you're underestimating the hardware manufacturers' cheapness |
20:18.54 | DocScrutinizer05 | there's clearly a difference on all levels of device management between a released unused (erased/TRIMed) block and a block that holds valid and valuable data even when that data consists of all zeroes |
20:26.59 | DocScrutinizer05 | >>There are different types of TRIM defined by SATA ...<< Non-deterministic TRIM, Deterministic TRIM (DRAT), Deterministic Read Zero after TRIM (RZAT). I guess >> MultiMediaCard and SD ERASE (CMD38) command provides similar functionality to the ATA TRIM<< RZAT, that's why |
20:28.53 | DocScrutinizer05 | prolly the controler is even smarter than you'd hope, by actually sensing the all 0 write and transforming it into a special shortcut that doesn't need any real write |
20:29.31 | DocScrutinizer05 | my SSD does for sure, writing all zero files I get write speeds at 600MB/s |
20:29.56 | DocScrutinizer05 | sustainable |
20:33.21 | kerio | DocScrutinizer05: writing zeroes on my ssd yields 650MB/s |
20:33.25 | kerio | which is the normal write speed |
20:35.02 | DocScrutinizer05 | sure, please tell me which SSD you got |
20:36.13 | DocScrutinizer05 | a SSD that can do a sustained write speed of 650MB/s on random data - that thing must be a monster |
20:37.25 | DocScrutinizer05 | It also clearly proves that we need better interfaces since 650MB/s is the theoretical max of a 6G SATA |
20:45.37 | DocScrutinizer05 | well, ok, fastest SSDs seem to be near 512MB/s |
20:46.50 | DocScrutinizer05 | http://ssd.userbenchmark.com/Patriot-Ignite-960GB/Rating/3575 |
20:54.53 | DocScrutinizer05 | nah, we already got better interfaces: SATA Express 8 Gbit/s and 16 Gbit/s |
20:57.28 | kerio | this is a PCIe SSD, yeah |
20:58.28 | kerio | what are you talking about |
20:59.09 | DocScrutinizer05 | TRIM was the topic |
20:59.22 | kerio | the intel 750 gets up to 1200MB/s of sequential writes |
21:00.29 | kerio | and the ludicrous intel DC P3608 4TB gets 3GB/s of sequential writes |
21:00.51 | kerio | (PCIe 3.0 x8) |
21:01.06 | DocScrutinizer05 | highly irrelevant |
21:01.36 | kerio | it's only 8959.99 USD on amazon.com |
21:03.12 | DocScrutinizer05 | the point is that writing zeroes isn't a working replacement for TRIM |
21:03.41 | kerio | indeed |
21:04.12 | DocScrutinizer05 | the controller needs to receive a hint that the block doesn't contain valid data |
21:04.14 | kerio | it should be "easy" to test if that MMC ERASE command works |
21:04.55 | kerio | write random data over the whole thing |
21:05.02 | kerio | then write random data again, measuring the speed |
21:05.06 | kerio | then ERASE everything |
21:05.11 | kerio | then write random data again and measure the speed |
21:05.12 | DocScrutinizer05 | exactly |
21:05.45 | kerio | do we have a debug utility for the eMMC? |
21:05.47 | DocScrutinizer05 | also exactly what I had in mind, for swap |
21:05.51 | DocScrutinizer05 | no |
21:05.55 | DocScrutinizer05 | afaik |
21:05.58 | kerio | would it need a more recent kernel? |
21:06.35 | DocScrutinizer05 | prolly needs backport of the ERASE (or TRIM) ioctl command to mmc_core |
21:06.54 | DocScrutinizer05 | or a more recent kernel, freemangordon coult test it |
21:06.59 | DocScrutinizer05 | could* |
21:07.54 | DocScrutinizer05 | http://lxr.free-electrons.com/source/drivers/mmc/core/core.c#L2198 |
21:09.56 | DocScrutinizer05 | (wildly guessing there, no kernel developer) |
21:11.11 | DocScrutinizer05 | modinfo mmc_core |
21:13.26 | DocScrutinizer05 | objdump -t /lib/modules/2.6.28-omap1/mmc_core.ko dunno |
21:37.39 | DocScrutinizer05 | freemangordon: could you test fstrim on emmc? |
21:38.35 | DocScrutinizer05 | (,ake sure eMMC volume isn't mounted -o discard) |
21:39.27 | kerio | why shouldn't it |
21:39.45 | kerio | fstrim should still work, right |
21:39.50 | DocScrutinizer05 | otherwise we have unsolicited TRIM in between |
21:40.28 | DocScrutinizer05 | so any such test would be rather meaningless with -o discard, no? |
21:41.12 | kerio | oh, performance tests |
21:41.16 | kerio | yeah, if it worked |
21:44.02 | DocScrutinizer05 | <kerio> write random data over the whole thing then write random data again, measuring the speed then ERASE [rm -r *; fstrim] everything then write random data again and measure the speed |
21:44.20 | *** join/#maemo-ssu trx (ns-team@devbin/founder/trx) |
21:53.33 | *** join/#maemo-ssu DrCode (~DrCode@5.28.134.3) |
21:57.10 | *** join/#maemo-ssu handaxe (~chatter@c83-248-22-153.bredband.comhem.se) |
22:12.14 | *** join/#maemo-ssu freemangordon (~ivo@46.249.74.23) |
22:33.27 | ShadowJK | I have the impression that there isn't all that much sophistication that can be squeezed into emmc, that trim is mostly a NOOP unless you give it a full 8MB block properly aligned that it can erase |
22:33.50 | ShadowJK | Or however big it gets reported as in /sys/block/.../preferred_erase_size or something like that |
22:40.36 | DocScrutinizer05 | ShadowJK: TRIM is not about erase |
22:42.25 | DocScrutinizer05 | https://www.youtube.com/watch?v=x6lqYU4j7no |
22:43.54 | DocScrutinizer05 | when controller copies an erase page to change one block in it, it can leave out resp skip copying of the blocks tagged as TRIMed |
22:44.16 | DocScrutinizer05 | so those are fresh unused blocks on the new page, ready to take new data |
22:46.06 | DocScrutinizer05 | worst case when all blocks been used to write some (possibly already obsolete) data to them, each write of one block (to overwrite the obsolete old content) involves copy of one complete erase page just to replace that one block |
22:47.05 | DocScrutinizer05 | if all the blocks of the page been tagged as TRIMed, the copy would result in just one used and many free blocks in the new erase page |
22:49.18 | DocScrutinizer05 | when you fill the complete MMC with one file and then delete that file on fs level, subsequent writes to the device to fill it again completely with data would either cause $number-of-blocks page copies without TRIM, or only $number-of-erasepages copies with TRIM |
22:51.27 | DocScrutinizer05 | to accomplish that on controller level, you need just one bit per block in metadata |
22:58.46 | kerio | that's how things should go |
22:59.00 | kerio | on the other hand, hardware manufacturers will likely do the absolute bare minimum for anything |
22:59.20 | kerio | i mean |
22:59.34 | kerio | actual SSDs that *do* advertise ATA TRIM support actually fuck it up |
22:59.38 | kerio | because of firmware bugs |
23:00.13 | DocScrutinizer05 | that's a completely different story |
23:00.40 | kerio | do you really expect a MMC firmware to handle a barely used feature correctly and in a way that enhances performance |
23:01.03 | kerio | maybe you can ask about it for the neo900 |
23:02.04 | DocScrutinizer05 | hardware manufacturers try to create as good a product as possible from a given amount of resources. A two bits per block used to tag free blocks with either 00 or 11 while used blocks are 01 doesn't cost them anything and will provide a selling point in datasheet |
23:02.21 | DocScrutinizer05 | barely used is nonsense |
23:02.34 | kerio | well, is it a point in the n900 emmc datasheet? |
23:02.39 | DocScrutinizer05 | obviously all android phones use that |
23:02.46 | kerio | i mean, i'd pay more for it |
23:02.58 | kerio | but nokia probably didn't |
23:03.14 | DocScrutinizer05 | the point is you won't have to pay more for it |
23:03.56 | DocScrutinizer05 | it's a mere one-shot effort to implement it in controller firmware |
23:04.08 | DocScrutinizer05 | so the cost per chip ~= zilch |
23:08.40 | DocScrutinizer05 | and microsoft obviously even specifies a max duration a single block write may take, or something along that line, which is only achievable with proper TRIM support |
23:12.47 | DocScrutinizer05 | the datasheet for eMMC in N900 says it's >>Full compliance w ith JEDEC/MMCA Ver. 4.3<<, so all you have to do is to find the specs JEDEC only publishes for registered users |
23:23.26 | Pali | normal trim command cannot be send in queue for ATA disks |
23:23.45 | Pali | so before sending trim, you need to wait until queue of commands are empty |
23:24.10 | Pali | and so using trim can slow down read/write operations of disks |
23:25.54 | Pali | yes, there is also queud trim ATA command, but it is not supported by Microsoft and Apple systems |
23:26.10 | Pali | and so if something advertise that supports it, it is buggy |
23:26.31 | Pali | before playing with discard on linux, look at this loooong table: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/ata/libata-core.c#n4270 |
23:31.15 | kerio | samsung controller botched queued trim => queued trim is broken for every SSD forever |
23:31.18 | kerio | seems good |
23:31.20 | DocScrutinizer05 | I'm not interested in discard. TRIM however will be mad useful |
23:32.08 | DocScrutinizer05 | -o discard is arguably not the best way to do TRIM anyway |
23:32.19 | Pali | discard is just linux API for trim for FS |
23:32.39 | kerio | he wants to use trim or something like it for the swap partition |
23:32.52 | DocScrutinizer05 | please read about batch trim vs online trim |
23:32.54 | kerio | (i'm not entirely sure it's a thing on linux tbh) |
23:33.24 | DocScrutinizer05 | refer to fstrim |
23:33.29 | DocScrutinizer05 | for example |
23:33.43 | Pali | discard is good idea, but only useful when all layers supports queued trim && queued trim is implemented correctly in FW |
23:33.52 | kerio | i'm pretty sure that both me and Pali understand the difference, doc |
23:34.13 | DocScrutinizer05 | no, queued trim is only needed for -o discard aka online trim |
23:34.17 | kerio | Pali: the real best way to "do TRIM" is to aggressively reuse sectors, anyway |
23:34.31 | DocScrutinizer05 | huh? |
23:34.33 | kerio | so you don't need separate commands except when you absolutely have to |
23:34.51 | DocScrutinizer05 | sorry that's absolute nonsense |
23:35.07 | kerio | DocScrutinizer05: on a SSD, TRIM means "i don't need this LBA address anymore" |
23:35.15 | DocScrutinizer05 | the whole point about trim is that you _cannot_ 'reuse sectors' |
23:35.29 | kerio | wut |
23:35.56 | kerio | the controller will remap your writes all over the place anyway |
23:36.11 | DocScrutinizer05 | each "reuse sector" means you need to do a erase page copy |
23:36.21 | kerio | ...no it doesn't |
23:36.25 | kerio | unless there's no more space |
23:36.46 | DocScrutinizer05 | which is exactly whyt TRIM accomplishes: free space |
23:37.08 | kerio | yes, but if you're deleting a file and creating a new one |
23:37.16 | kerio | you can just put the new one on the same logical address of the old one |
23:37.23 | DocScrutinizer05 | so what? |
23:37.30 | kerio | and you'll have the same effect without having to issue a separate command |
23:37.35 | DocScrutinizer05 | no you can't use the same physical address |
23:37.57 | kerio | yes |
23:38.02 | kerio | which is why i said logical address |
23:38.24 | DocScrutinizer05 | please, read e.g. http://www.thessdreview.com/daily-news/latest-buzz/garbage-collection-and-trim-in-ssds-explained-an-ssd-primer/ |
23:39.07 | DocScrutinizer05 | you using same logical address means the complete physical erase page needs to get copied to change your one sector you write to |
23:39.31 | Pali | anyway, it is not better to have direct access to NAND erase blocks and use e.g. ubifs on NAND directly? |
23:39.56 | kerio | DocScrutinizer05: that's for the ssd controller to decide |
23:39.57 | DocScrutinizer05 | so "agressively reuse sectors" is meaningless at best, worst case more likely |
23:40.16 | DocScrutinizer05 | Pali: we talk about eMMC |
23:40.21 | DocScrutinizer05 | not NAND aka mtd |
23:40.36 | kerio | Pali: in theory, sure |
23:40.52 | kerio | in practice, separation of concerns has proven to be more successful |
23:41.23 | DocScrutinizer05 | actually ubifs implements pretty much exactly the same scheme on application processor which the controller of emmc uses for TRIM and wear leveling |
23:41.58 | kerio | i'd trust a SSD plus ZFS over UBIFS on raw flash, if only because the tools are better |
23:42.04 | Pali | is not eMMC some flash or nand memory with own software on it? |
23:42.24 | DocScrutinizer05 | on MMC your only way to have some control over page erases is to use ERASE/TRIM |
23:42.33 | kerio | yeah, and in theory more control should yield better results |
23:42.42 | kerio | which is the same argument for software raid over hardware raid |
23:43.09 | kerio | however that hasn't become the case in modern computer hardware |
23:43.51 | kerio | probably because SSD controllers that take the raw flash and turn it into a perfect block device are Good Enough |
23:49.57 | DocScrutinizer05 | only as long as they can keep the erase pages for all concurrent write(pointer)s in buffer RAM |
23:51.22 | DocScrutinizer05 | as soon as the buffer RAM gets filled they need to write back one erase page sized chunk of data to make space for reading in another page so the next sector/block write can modify it |
23:52.46 | DocScrutinizer05 | and depending on several other system parameters you don't want to keep large amounts of dirty buffers all the time, since... powerfail |
23:53.51 | DocScrutinizer05 | I guess some SSDs even have their own battery to write back dirty buffers on powerfail (incl regular power down powerfail) |
23:55.41 | DocScrutinizer05 | generally speaking you have little to no problems with SSD and TRIM and performance impact therefrom as long as you do a single sequential write, since all controllers can keep a single erasepage in RAM buffer |
23:56.36 | DocScrutinizer05 | and they usually won't write it back until another erase page gets accessed or a certain timeout expired between interface write() commands |