IRC log for #maemo-ssu on 20160709

00:17.29*** join/#maemo-ssu drrz (~drrrrz@104.200.151.59)
00:20.57*** join/#maemo-ssu drrrz (~drrrrz@tx-71-51-46-104.dhcp.embarqhsd.net)
01:18.18*** join/#maemo-ssu freemangordon (~ivo@46.249.74.23)
04:00.51*** join/#maemo-ssu DocScrutinizer05 (~saturn@openmoko/engineers/joerg)
05:13.04*** join/#maemo-ssu chainsawbike (~chainsawb@unaffiliated/chainsawbike)
09:51.33*** join/#maemo-ssu Pali (~pali@Maemo/community/contributor/Pali)
10:30.35*** join/#maemo-ssu peetah (~peetah@cha92-9-82-236-202-86.fbx.proxad.net)
11:30.53*** join/#maemo-ssu dafox (~dafox@2a02:a448:c25a:0:decf:5ea9:e960:80dc)
12:19.19*** join/#maemo-ssu macmaN (~chezburge@90.190.182.21)
15:17.46*** join/#maemo-ssu APic (apic@apic.name)
15:21.28*** join/#maemo-ssu APic (apic@apic.name)
15:23.45*** join/#maemo-ssu APic (apic@apic.name)
19:19.25*** join/#maemo-ssu handaxe (~chatter@c83-248-22-153.bredband.comhem.se)
19:32.15DocScrutinizer05RFC: backport TRIM ioctl cmd support to MMC driver
19:33.17DocScrutinizer05rationale: particularly for swap volume a TRIM will significantly improve performance
19:33.40kerioi don't think TRIM is a thing, for microSDs
19:34.02DocScrutinizer05I'm talking about mmc
19:34.18kerioisn't it basically just a big SD card anyway
19:34.25DocScrutinizer05though TRIM even exists in ATA
19:34.35keriothe issue is that the controller needs to understand it
19:34.49DocScrutinizer05well, that's why TRIM got standardized
19:34.54kerioyeah but
19:35.05kerioi don't think the one in the N900 understands it
19:35.13DocScrutinizer05subject to evaluation: can *N900* eMMC understand TRIM
19:35.52keriocan the neo900 eMMC understand TRIM?
19:37.06DocScrutinizer05https://en.wikipedia.org/wiki/Trim_(computing)#SD.2FMMC
19:37.30keriook so it's still a thing
19:48.07DocScrutinizer05http://datasheet.octopart.com/THGBM1G8D8EBAI2-Toshiba-datasheet-11973698.pdf
19:48.52DocScrutinizer05Full compliance w ith JEDEC/MMCA Ver. 4.3
19:54.36DocScrutinizer05https://en.wikipedia.org/wiki/MultiMediaCard#Open_standard
19:55.32kerioaww, 4.5 is the one with the proper TRIM
19:56.47*** join/#maemo-ssu handaxe (~chatter@c83-248-22-153.bredband.comhem.se)
20:00.51DocScrutinizer05yes, with even better TRIM, but 4.4 and prolly 4.3 already support something similar
20:02.59DocScrutinizer05>>The MultiMediaCard and SD ERASE (CMD38) command provides similar functionality to the ATA TRIM command, although it requires that erased blocks be overwritten with either zeroes or ones<< https://en.wikipedia.org/wiki/Trim_(computing)#SD.2FMMC
20:05.01kerioyeah but in addition to supporting the command, the controller should also know how to use it in a smart way, shouldn't it
20:05.31kerioand since it's likely not very used, we'd need some testing done on it
20:13.13DocScrutinizer05ohmy, you realy think a command is implemented in a chip in a way so it returns no error and does nothing else?
20:13.44DocScrutinizer05I doubt JEDEC would agree on such implementation
20:16.44*** join/#maemo-ssu DrCode (~DrCode@5.28.134.3)
20:17.03kerioDocScrutinizer05: turn ERASE into "write 0"
20:17.09keriothrough the normal channels
20:17.14DocScrutinizer05aha
20:17.18DocScrutinizer05and then?
20:17.25kerioand then you have an implementation of ERASE
20:17.28keriothat doesn't help at all
20:17.31DocScrutinizer05nonsense
20:17.33keriobut it's valid
20:18.12kerioyou're underestimating the hardware manufacturers' cheapness
20:18.54DocScrutinizer05there's clearly a difference on all levels of device management between a released unused (erased/TRIMed) block and a block that holds valid and valuable data even when that data consists of all zeroes
20:26.59DocScrutinizer05>>There are different types of TRIM defined by SATA ...<<   Non-deterministic TRIM, Deterministic TRIM (DRAT), Deterministic Read Zero after TRIM (RZAT). I guess >> MultiMediaCard and SD ERASE (CMD38) command provides similar functionality to the ATA TRIM<< RZAT, that's why
20:28.53DocScrutinizer05prolly the controler is even smarter than you'd hope, by actually sensing the all 0 write and transforming it into a special shortcut that doesn't need any real write
20:29.31DocScrutinizer05my SSD does for sure, writing all zero files I get write speeds at 600MB/s
20:29.56DocScrutinizer05sustainable
20:33.21kerioDocScrutinizer05: writing zeroes on my ssd yields 650MB/s
20:33.25keriowhich is the normal write speed
20:35.02DocScrutinizer05sure, please tell me which SSD you got
20:36.13DocScrutinizer05a SSD that can do a sustained write speed of 650MB/s on random data - that thing must be a monster
20:37.25DocScrutinizer05It also clearly proves that we need better interfaces since 650MB/s is the theoretical max of a 6G SATA
20:45.37DocScrutinizer05well, ok, fastest SSDs seem to be near 512MB/s
20:46.50DocScrutinizer05http://ssd.userbenchmark.com/Patriot-Ignite-960GB/Rating/3575
20:54.53DocScrutinizer05nah, we already got better interfaces: SATA Express 8 Gbit/s and 16 Gbit/s
20:57.28keriothis is a PCIe SSD, yeah
20:58.28keriowhat are you talking about
20:59.09DocScrutinizer05TRIM was the topic
20:59.22keriothe intel 750 gets up to 1200MB/s of sequential writes
21:00.29kerioand the ludicrous intel DC P3608 4TB gets 3GB/s of sequential writes
21:00.51kerio(PCIe 3.0 x8)
21:01.06DocScrutinizer05highly irrelevant
21:01.36kerioit's only 8959.99 USD on amazon.com
21:03.12DocScrutinizer05the point is that writing zeroes isn't a working replacement for TRIM
21:03.41kerioindeed
21:04.12DocScrutinizer05the controller needs to receive a hint that the block doesn't contain valid data
21:04.14kerioit should be "easy" to test if that MMC ERASE command works
21:04.55keriowrite random data over the whole thing
21:05.02keriothen write random data again, measuring the speed
21:05.06keriothen ERASE everything
21:05.11keriothen write random data again and measure the speed
21:05.12DocScrutinizer05exactly
21:05.45keriodo we have a debug utility for the eMMC?
21:05.47DocScrutinizer05also exactly what I had in mind, for swap
21:05.51DocScrutinizer05no
21:05.55DocScrutinizer05afaik
21:05.58keriowould it need a more recent kernel?
21:06.35DocScrutinizer05prolly needs backport of the ERASE (or TRIM) ioctl command to mmc_core
21:06.54DocScrutinizer05or a more recent kernel, freemangordon coult test it
21:06.59DocScrutinizer05could*
21:07.54DocScrutinizer05http://lxr.free-electrons.com/source/drivers/mmc/core/core.c#L2198
21:09.56DocScrutinizer05(wildly guessing there, no kernel developer)
21:11.11DocScrutinizer05modinfo mmc_core
21:13.26DocScrutinizer05objdump -t /lib/modules/2.6.28-omap1/mmc_core.ko  dunno
21:37.39DocScrutinizer05freemangordon: could you test fstrim on emmc?
21:38.35DocScrutinizer05(,ake sure eMMC volume isn't mounted -o discard)
21:39.27keriowhy shouldn't it
21:39.45keriofstrim should still work, right
21:39.50DocScrutinizer05otherwise we have unsolicited TRIM in between
21:40.28DocScrutinizer05so any such test would be rather meaningless with -o discard, no?
21:41.12keriooh, performance tests
21:41.16kerioyeah, if it worked
21:44.02DocScrutinizer05<kerio> write random data over the whole thing  then write random data again, measuring the speed  then ERASE [rm -r *; fstrim] everything  then write random data again and measure the speed
21:44.20*** join/#maemo-ssu trx (ns-team@devbin/founder/trx)
21:53.33*** join/#maemo-ssu DrCode (~DrCode@5.28.134.3)
21:57.10*** join/#maemo-ssu handaxe (~chatter@c83-248-22-153.bredband.comhem.se)
22:12.14*** join/#maemo-ssu freemangordon (~ivo@46.249.74.23)
22:33.27ShadowJKI have the impression that there isn't all that much sophistication that can be squeezed into emmc, that trim is mostly a NOOP unless you give it a full 8MB block properly aligned that it can erase
22:33.50ShadowJKOr however big it gets reported as in /sys/block/.../preferred_erase_size or something like that
22:40.36DocScrutinizer05ShadowJK: TRIM is not about erase
22:42.25DocScrutinizer05https://www.youtube.com/watch?v=x6lqYU4j7no
22:43.54DocScrutinizer05when controller copies an erase page to change one block in it, it can leave out resp skip copying of the blocks tagged as TRIMed
22:44.16DocScrutinizer05so those are fresh unused blocks on the new page, ready to take new data
22:46.06DocScrutinizer05worst case when all blocks been used to write some (possibly already obsolete) data to them, each write of one block (to overwrite the obsolete old content) involves copy of one complete erase page just to replace that one block
22:47.05DocScrutinizer05if all the blocks of the page been tagged as TRIMed, the copy would result in just one used and many free blocks in the new erase page
22:49.18DocScrutinizer05when you fill the complete MMC with one file and then delete that file on fs level, subsequent writes to the device to fill it again completely with data would either cause $number-of-blocks page copies without TRIM, or only $number-of-erasepages copies with TRIM
22:51.27DocScrutinizer05to accomplish that on controller level, you need just one bit per block in metadata
22:58.46keriothat's how things should go
22:59.00kerioon the other hand, hardware manufacturers will likely do the absolute bare minimum for anything
22:59.20kerioi mean
22:59.34kerioactual SSDs that *do* advertise ATA TRIM support actually fuck it up
22:59.38keriobecause of firmware bugs
23:00.13DocScrutinizer05that's a completely different story
23:00.40keriodo you really expect a MMC firmware to handle a barely used feature correctly and in a way that enhances performance
23:01.03keriomaybe you can ask about it for the neo900
23:02.04DocScrutinizer05hardware manufacturers try to create as good a product as possible from a given amount of resources. A two bits per block used to tag free blocks with either 00 or 11 while used blocks are 01 doesn't cost them anything and will provide a selling point in datasheet
23:02.21DocScrutinizer05barely used is nonsense
23:02.34keriowell, is it a point in the n900 emmc datasheet?
23:02.39DocScrutinizer05obviously all android phones use that
23:02.46kerioi mean, i'd pay more for it
23:02.58keriobut nokia probably didn't
23:03.14DocScrutinizer05the point is you won't have to pay more for it
23:03.56DocScrutinizer05it's a mere one-shot effort to implement it in controller firmware
23:04.08DocScrutinizer05so the cost per chip ~= zilch
23:08.40DocScrutinizer05and microsoft obviously even specifies a max duration a single block write may take, or something along that line, which is only achievable with proper TRIM support
23:12.47DocScrutinizer05the datasheet for eMMC in N900 says it's >>Full compliance w ith JEDEC/MMCA Ver. 4.3<<, so all you have to do is to find the specs JEDEC only publishes for registered users
23:23.26Palinormal trim command cannot be send in queue for ATA disks
23:23.45Paliso before sending trim, you need to wait until queue of commands are empty
23:24.10Paliand so using trim can slow down read/write operations of disks
23:25.54Paliyes, there is also queud trim ATA command, but it is not supported by Microsoft and Apple systems
23:26.10Paliand so if something advertise that supports it, it is buggy
23:26.31Palibefore playing with discard on linux, look at this loooong table: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/ata/libata-core.c#n4270
23:31.15keriosamsung controller botched queued trim => queued trim is broken for every SSD forever
23:31.18kerioseems good
23:31.20DocScrutinizer05I'm not interested in discard. TRIM however will be mad useful
23:32.08DocScrutinizer05-o discard is arguably not the best way to do TRIM anyway
23:32.19Palidiscard is just linux API for trim for FS
23:32.39keriohe wants to use trim or something like it for the swap partition
23:32.52DocScrutinizer05please read about batch trim vs online trim
23:32.54kerio(i'm not entirely sure it's a thing on linux tbh)
23:33.24DocScrutinizer05refer to fstrim
23:33.29DocScrutinizer05for example
23:33.43Palidiscard is good idea, but only useful when all layers supports queued trim && queued trim is implemented correctly in FW
23:33.52kerioi'm pretty sure that both me and Pali understand the difference, doc
23:34.13DocScrutinizer05no, queued trim is only needed for -o discard aka online trim
23:34.17kerioPali: the real best way to "do TRIM" is to aggressively reuse sectors, anyway
23:34.31DocScrutinizer05huh?
23:34.33kerioso you don't need separate commands except when you absolutely have to
23:34.51DocScrutinizer05sorry that's absolute nonsense
23:35.07kerioDocScrutinizer05: on a SSD, TRIM means "i don't need this LBA address anymore"
23:35.15DocScrutinizer05the whole point about trim is that you _cannot_ 'reuse sectors'
23:35.29keriowut
23:35.56keriothe controller will remap your writes all over the place anyway
23:36.11DocScrutinizer05each "reuse sector" means you need to do a erase page copy
23:36.21kerio...no it doesn't
23:36.25keriounless there's no more space
23:36.46DocScrutinizer05which is exactly whyt TRIM accomplishes: free space
23:37.08kerioyes, but if you're deleting a file and creating a new one
23:37.16kerioyou can just put the new one on the same logical address of the old one
23:37.23DocScrutinizer05so what?
23:37.30kerioand you'll have the same effect without having to issue a separate command
23:37.35DocScrutinizer05no you can't use the same physical address
23:37.57kerioyes
23:38.02keriowhich is why i said logical address
23:38.24DocScrutinizer05please, read e.g. http://www.thessdreview.com/daily-news/latest-buzz/garbage-collection-and-trim-in-ssds-explained-an-ssd-primer/
23:39.07DocScrutinizer05you using same logical address means the complete physical erase page needs to get copied to change your one sector you write to
23:39.31Palianyway, it is not better to have direct access to NAND erase blocks and use e.g. ubifs on NAND directly?
23:39.56kerioDocScrutinizer05: that's for the ssd controller to decide
23:39.57DocScrutinizer05so "agressively reuse sectors" is meaningless at best, worst case more likely
23:40.16DocScrutinizer05Pali: we talk about eMMC
23:40.21DocScrutinizer05not NAND aka mtd
23:40.36kerioPali: in theory, sure
23:40.52kerioin practice, separation of concerns has proven to be more successful
23:41.23DocScrutinizer05actually ubifs implements pretty much exactly the same scheme on application processor which the controller of emmc uses for TRIM and wear leveling
23:41.58kerioi'd trust a SSD plus ZFS over UBIFS on raw flash, if only because the tools are better
23:42.04Paliis not eMMC some flash or nand memory with own software on it?
23:42.24DocScrutinizer05on MMC your only way to have some control over page erases is to use ERASE/TRIM
23:42.33kerioyeah, and in theory more control should yield better results
23:42.42keriowhich is the same argument for software raid over hardware raid
23:43.09keriohowever that hasn't become the case in modern computer hardware
23:43.51kerioprobably because SSD controllers that take the raw flash and turn it into a perfect block device are Good Enough
23:49.57DocScrutinizer05only as long as they can keep the erase pages for all concurrent write(pointer)s in buffer RAM
23:51.22DocScrutinizer05as soon as the buffer RAM gets filled they need to write back one erase page sized chunk of data to make space for reading in another page so the next sector/block write can modify it
23:52.46DocScrutinizer05and depending on several other system parameters you don't want to keep large amounts of dirty buffers all the time, since... powerfail
23:53.51DocScrutinizer05I guess some SSDs even have their own battery to write back dirty buffers on powerfail (incl regular power down powerfail)
23:55.41DocScrutinizer05generally speaking you have little to no problems with SSD and TRIM and performance impact therefrom as long as you do a single sequential write, since all controllers can keep a single erasepage in RAM buffer
23:56.36DocScrutinizer05and they usually won't write it back until another erase page gets accessed or a certain timeout expired between interface write() commands

Generated by irclog2html.pl Modified by Tim Riker to work with infobot.