index.xml (60260B)
1 <?xml version="1.0" encoding="utf-8" standalone="yes"?> 2 <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"> 3 <channel> 4 <title>Software on Chris Bracken</title> 5 <link>https://chris.bracken.jp/tags/software/</link> 6 <description>Recent content in Software on Chris Bracken</description> 7 <generator>Hugo -- gohugo.io</generator> 8 <language>en</language> 9 <lastBuildDate>Fri, 22 May 2020 14:55:23 -0700</lastBuildDate><atom:link href="https://chris.bracken.jp/tags/software/index.xml" rel="self" type="application/rss+xml" /> 10 <item> 11 <title>Thoughts on Licences</title> 12 <link>https://chris.bracken.jp/2020/05/thoughts-on-licences/</link> 13 <pubDate>Fri, 22 May 2020 14:55:23 -0700</pubDate> 14 15 <guid>https://chris.bracken.jp/2020/05/thoughts-on-licences/</guid> 16 <description><p>Software licences are probably the single most boring aspect of software 17 development, but it&rsquo;s important to carefully consider the terms under which the 18 stuff I hack on is shared to ensure they&rsquo;re consistent with my values. Despite 19 my general dislike for all things legalistic, the most unambiguous way to state 20 those terms is through a licence. So a couple days ago, I tossed LICENSE files 21 into any of my public <a href="https://chris.bracken.jp/code">repos</a> that didn&rsquo;t already have one.</p> 22 <p>So how did I settle on which licences to apply? Jump on into the DeLorean and 23 let&rsquo;s set the dial back to the late 1980s.</p> 24 <p>It&rsquo;s 1986 and I&rsquo;ve got a 1200 baud modem wired up to a beat-up 286 with a steel 25 case that would easily allow it to double as a boat anchor if needed. Armed 26 with a dot-matrix printout of local BBSes with names like Camelot, Tommy&rsquo;s 27 Holiday Camp, and Forbidden Night Castle, I fire up PC-Talk. A series of 28 <a href="https://www.windytan.com/2012/11/the-sound-of-dialup-pictured.html">high-pitched squeals and tones</a> fill the air, then text 29 flashes across the screen. I&rsquo;m online.</p> 30 <p>BBSes were a treasure trove of information, filled to the brim with zip archives 31 full of downloadable programs, source code, patches for existing programs, and 32 all manner of text files with names like <a href="https://insecure.org/stf/smashstack.html">Smashing The Stack For Fun And 33 Profit</a>. You could find everything from how to crack copy-protected 34 software, to details on phone phreaking, to how to make nitroglycerine from 35 commonly-available household items. It was through BBSes that I first downloaded 36 an I&rsquo;m sure <em>totally legitimate</em> copy of Borland Turbo C++ and took my first 37 baby steps writing <em>real</em> programs. No more BASIC for me.</p> 38 <p>This culture of open sharing in the online world has had a huge impact on me. 39 From those early experiences with BBSes to my first forays onto the Internet a 40 few years later, seeing people openly sharing code and patches and helping each 41 other solve problems over Usenet seemed almost revolutionary to me at the time. 42 In some ways, it still does. I feel lucky to have been a part of it from such an 43 early age.</p> 44 <p>The end result is that I try to publicly share all the work I do. So when it 45 came time to chuck licences on stuff, I sat down to work out a personals ad for 46 my ideal licence. Aside from enjoying long walks on the beach, it should:</p> 47 <ol> 48 <li>Allow free use, modification, and distribution both of the original 49 work and any derived works.</li> 50 <li>Require that people distributing the work or any derived work to 51 give appropriate credit.</li> 52 <li>Disallow suggesting that I in any way endorse any derived products 53 or whoever produces them.</li> 54 <li>Gently encourage a culture of open exchange and sharing of 55 information and techniques.</li> 56 <li>Be short, clear, and easy to understand.</li> 57 </ol> 58 <p>On the software side, there were lots of options, but the best matches in my 59 mind are the <a href="https://opensource.org/licenses/MIT">MIT</a> or <a href="https://opensource.org/licenses/BSD-3-Clause">BSD</a> licences. The 3-clause 60 &rsquo;new&rsquo; BSD licence has an advantage in that it required written permission from 61 the author to use their name in any endorsement/promotion of a derived work. 62 That happens to be what we already use for <a href="https://github.com/flutter/flutter">work</a>.</p> 63 <p>On the content side, I&rsquo;ve always posted my web site&rsquo;s content under a <a href="https://creativecommons.org/licenses/by-sa/4.0/">Creative 64 Commons Attribution-ShareAlike</a> licence. But I don&rsquo;t believe that&rsquo;s 65 actually the ideal match based on my priorities. Why is it that I&rsquo;ve elected to 66 use a licence that requires that derived works also be licensed under the same 67 terms rather than under whatever terms someone feels like, so long as 68 acknowledgement is given? In the end I settled on the more permissive <a href="https://creativecommons.org/licenses/by/4.0/">Creative 69 Commons Attribution</a> licence.</p> 70 <p>This feels to me a bit like the difference between <a href="https://opensource.org/licenses/BSD-3-Clause">BSD</a> and 71 <a href="https://opensource.org/licenses/GPL-3.0">GPL</a> terms, where the latter requires that derived works also be 72 GPL-licensed. This &ldquo;viral&rdquo; nature has always rubbed me the wrong way: rather 73 than gently promoting a culture of sharing by example, it legally <em>requires</em> 74 sharing under the same terms whether or not you want to.</p> 75 <p>Personally, I&rsquo;d like for people to do the right thing and share their work for 76 everyone&rsquo;s benefit not because they <em>have</em> to, but because they <em>want</em> to. If 77 they don&rsquo;t want to, why should my reaction be to disallow their use of my work? 78 Isn&rsquo;t that contrary to my stated goals of sharing as much and as broadly as 79 possible?</p> 80 <p>While I <em>hope</em> that more people share more of their work, it doesn&rsquo;t bother me 81 if you don&rsquo;t. If anything I&rsquo;ve written is somehow useful to you, I&rsquo;m glad. Use 82 your knowledge to help others and make the world a better place, and if you can 83 find time to do so, share a bit with the rest of us.</p> 84 <p>Got thoughts and opinions on licences? Fire an email my way at 85 <a href="mailto:chris@bracken.jp">chris@bracken.jp</a>.</p> 86 </description> 87 </item> 88 89 <item> 90 <title>Hand-decoding an ELF binary image</title> 91 <link>https://chris.bracken.jp/2018/10/decoding-an-elf-binary/</link> 92 <pubDate>Wed, 31 Oct 2018 00:00:00 +0000</pubDate> 93 94 <guid>https://chris.bracken.jp/2018/10/decoding-an-elf-binary/</guid> 95 <description><p>While recovering from some dentistry the other day I figured I&rsquo;d have a go at 96 better understanding the ELF binary format. What better way to do that than to 97 compile a small program and hand-decode the resulting binary with a hex editor 98 and whatever ELF format spec I could find.</p> 99 <h2 id="overview">Overview</h2> 100 <p>Below, we&rsquo;ll use <code>nasm</code> to build a small assembly Hello World program to a 101 64-bit ELF object file, then link that into an ELF executable with GNU <code>ld</code>. 102 Finally, we&rsquo;ll run the resulting object file and binary image through <code>xxd</code> and 103 hand-decode the resulting hex.</p> 104 <p>The code and instructions below work on FreeBSD 11 on x86_64 hardware. For 105 other operating systems, hardware, and toolchains, you&rsquo;re on your own! I&rsquo;d 106 imagine this should all work just fine on Linux. If I get bored one day, I may 107 redo this for Mach-O binaries on macOS.</p> 108 <h2 id="helloasm">hello.asm</h2> 109 <p>First we&rsquo;ll bang up a minimal Hello World program in assembly. In the <code>.data</code> 110 section, we add a null-terminated string, <code>hello</code>, and its length <code>hbytes</code>. In 111 the program text, we set up and execute the <code>write(stdout, hello, hbytes)</code> 112 syscall, then set up and execute an <code>exit(0)</code> syscall.</p> 113 <p>Note that 64-bit FreeBSD, macOS, and Linux all use the SysV AMD64 calling 114 convention. For calls against the kernel interface, the syscall number is 115 stored in <code>rax</code> and up to six parameters are passed, in order, in <code>rdi</code>, <code>rsi</code>, 116 <code>rdx</code>, <code>r10</code>, <code>r8</code>, <code>r9</code>. For user calls, replace <code>r10</code> with <code>rcx</code> in this 117 list, and pass further arguments on the stack. In all cases, the return value 118 is passed through <code>rax</code>. More details can be found in section A.2.1 of the 119 <a href="https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf">System V AMD64 ABI Reference</a>.</p> 120 <pre><code>; hello.asm 121 122 %define stdin 0 123 %define stdout 1 124 %define stderr 2 125 %define SYS_exit 1 126 %define SYS_write 4 127 128 %macro system 1 129 mov rax, %1 130 syscall 131 %endmacro 132 133 %macro sys.exit 0 134 system SYS_exit 135 %endmacro 136 137 %macro sys.write 0 138 system SYS_write 139 %endmacro 140 141 section .data 142 hello db 'Hello, World!', 0Ah 143 hbytes equ $-hello 144 145 section .text 146 global _start 147 _start: 148 mov rdi, stdout 149 mov rsi, hello 150 mov rdx, hbytes 151 sys.write 152 153 xor rdi,rdi 154 sys.exit 155 </code></pre> 156 <h2 id="compile-to-object-code">Compile to object code</h2> 157 <p>Next, we&rsquo;ll compile <code>hello.asm</code> to a 64-bit ELF object file using <code>nasm</code>:</p> 158 <pre><code>% nasm -f elf64 hello.asm 159 </code></pre> 160 <p>This emits <code>hello.o</code>, an 880-byte ELF-64 object file. Since we haven&rsquo;t yet run 161 this through the linker, addresses of global symbols (in this case, <code>hello</code>) 162 are not yet known and thus left with address 0x0 placeholders. We can see this 163 in the <code>movabs</code> instruction at offset 0x15 of the <code>.text</code> section below.</p> 164 <p>The relocation section (Section 6: <code>.rela.text</code>) contains an entry for each 165 symbolic reference that needs to be filled in by the linker. In this case 166 there&rsquo;s just a single entry for the symbol <code>hello</code> (which points to our hello 167 world string). The relocation table entry&rsquo;s <code>r_offset</code> indicates the address to 168 replace is at an offset of 0x7 into the section of the associated symbol table 169 entry. Its <code>r_info</code> (0x0000000200000001) encodes a relocation type in its lower 170 4 bytes (0x1: <code>R_AMD64_64</code>) and the associated symbol table entry in its upper 171 4 bytes (0x2, which, if we look it up in the symbol table is the <code>.text</code> 172 section). The <code>r_addend</code> field (0x0) specifies an additional adjustment to the 173 substituted symbol to be applied at link time; specifically, for the 174 <code>R_AMD64_64</code>, the final address is computed as S + A, where S is the 175 substituted symbol value (in our case, the address of <code>hello</code>) and A is the 176 addend (in our case, 0x0).</p> 177 <p>Without further ado, let&rsquo;s dump the object file:</p> 178 <pre><code>% xxd hello.o 179 </code></pre> 180 <p>With whatever ELF64 <a href="https://docs.oracle.com/cd/E19120-01/open.solaris/819-0690/index.html">linker &amp; loader guide</a> we can find at hand, 181 let&rsquo;s get decoding this thing:</p> 182 <h3 id="elf-header">ELF Header</h3> 183 <pre><code>|00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000| .ELF............ 184 |00000010: 0100 3e00 0100 0000 0000 0000 0000 0000| ..&gt;............. 185 |00000020: 0000 0000 0000 0000 4000 0000 0000 0000| ........@....... 186 |00000030: 0000 0000 4000 0000 0000 4000 0700 0300| ....@.....@..... 187 188 e_ident[EI_MAG0..EI_MAG3] 0x7f + ELF Magic 189 e_ident[EI_CLASS] 0x02 64-bit 190 e_ident[EI_DATA] 0x01 Little-endian 191 e_ident[EI_VERSION] 0x01 ELF v1 192 e_ident[EI_OSABI] 0x00 System V 193 e_ident[EI_ABIVERSION] 0x00 Unused 194 e_ident[EI_PAD] 0x00000000000000 7 bytes unused padding 195 e_type 0x0001 ET_REL 196 e_machine 0x003e x86_64 197 e_version 0x00000001 Version 1 198 e_entry 0x0000000000000000 Entrypoint address (none) 199 e_phoff 0x0000000000000000 Program header table offset in image 200 e_shoff 0x0000000000000040 Section header table offset in image 201 e_flags 0x00000000 Architecture-dependent interpretation 202 e_ehsize 0x0040 Size of this ELF header (64B) 203 e_phentsize 0x0000 Size of program header table entry 204 e_phnum 0x0000 Number of program header table entries 205 e_shentsize 0x0040 Size of section header table entry (64B) 206 e_shnum 0x0007 Number of section header table entries 207 e_shstrndx 0x0003 Index of section header for .shstrtab 208 </code></pre> 209 <h3 id="section-header-table-entry-0-null">Section header table: Entry 0 (null)</h3> 210 <pre><code>|00000040: 0000 0000 0000 0000 0000 0000 0000 0000| ................ 211 |00000050: 0000 0000 0000 0000 0000 0000 0000 0000| ................ 212 |00000060: 0000 0000 0000 0000 0000 0000 0000 0000| ................ 213 |00000070: 0000 0000 0000 0000 0000 0000 0000 0000| ................ 214 215 sh_name 0x00000000 Offset into .shstrtab 216 sh_type 0x00000000 SHT_NULL 217 sh_flags 0x0000000000000000 Section attributes 218 sh_addr 0x0000000000000000 Virtual address of section in memory 219 sh_offset 0x0000000000000000 Offset of section in file image 220 sh_size 0x0000000000000000 Size in bytes of section in file image 221 sh_link 0x00000000 Section index of associated section 222 sh_info 0x00000000 Extra info about section 223 sh_addralign 0x0000000000000000 Alignment 224 sh_entsize 0x0000000000000000 Size in bytes of each entry 225 </code></pre> 226 <h3 id="section-header-table-entry-1-data">Section header table: Entry 1 (.data)</h3> 227 <pre><code>|00000080: 0100 0000 0100 0000 0300 0000 0000 0000| ................ 228 |00000090: 0000 0000 0000 0000 0002 0000 0000 0000| ................ 229 |000000a0: 0e00 0000 0000 0000 0000 0000 0000 0000| ................ 230 |000000b0: 0400 0000 0000 0000 0000 0000 0000 0000| ................ 231 232 sh_name 0x00000001 Offset into .shstrtab 233 sh_type 0x00000001 SHT_PROGBITS 234 sh_flags 0x0000000000000003 SHF_WRITE | SHF_ALLOC 235 sh_addr 0x0000000000000000 Virtual address of section in memory 236 sh_offset 0x0000000000000200 Offset of section in file image 237 sh_size 0x000000000000000e Size in bytes of section in file image 238 sh_link 0x00000000 Section index of associated section 239 sh_info 0x00000000 Extra info about section 240 sh_addralign 0x0000000000000004 Alignment 241 sh_entsize 0x0000000000000000 Size in bytes of each entry 242 </code></pre> 243 <h3 id="section-header-table-entry-2-text">Section header table: Entry 2 (.text)</h3> 244 <pre><code>|000000c0: 0700 0000 0100 0000 0600 0000 0000 0000| ................ 245 |000000d0: 0000 0000 0000 0000 1002 0000 0000 0000| ................ 246 |000000e0: 2500 0000 0000 0000 0000 0000 0000 0000| %............... 247 |000000f0: 1000 0000 0000 0000 0000 0000 0000 0000| ................ 248 249 sh_name 0x00000007 Offset into .shstrtab 250 sh_type 0x00000001 SHT_PROGBITS 251 sh_flags 0x0000000000000006 SHF_ALLOC | SHF_EXECINSTR 252 sh_addr 0x0000000000000000 Virtual address of section in memory 253 sh_offset 0x0000000000000210 Offset of section in file image 254 sh_size 0x0000000000000025 Size in bytes of section in file image 255 sh_link 0x00000000 Section index of associated section 256 sh_info 0x00000000 Extra info about section 257 sh_addralign 0x0000000000000001 Alignment 258 sh_entsize 0x0000000000000000 Size in bytes of each entry 259 </code></pre> 260 <h3 id="section-header-table-entry-3-shstrtab">Section header table: Entry 3 (.shstrtab)</h3> 261 <pre><code>|00000100: 0d00 0000 0300 0000 0000 0000 0000 0000| ................ 262 |00000110: 0000 0000 0000 0000 4002 0000 0000 0000| ........@....... 263 |00000120: 3200 0000 0000 0000 0000 0000 0000 0000| 2............... 264 |00000130: 0100 0000 0000 0000 0000 0000 0000 0000| ................ 265 266 sh_name 0x0000000d Offset into .shstrtab 267 sh_type 0x00000003 SHT_STRTAB 268 sh_flags 0x0000000000000000 Section attributes 269 sh_addr 0x0000000000000000 Virtual address of section in memory 270 sh_offset 0x0000000000000240 Offset of section in file image 271 sh_size 0x0000000000000032 Size in bytes of section in file image 272 sh_link 0x00000000 Section index of associated section 273 sh_info 0x00000000 Extra info about section 274 sh_addralign 0x0000000000000001 Alignment 275 sh_entsize 0x0000000000000000 Size in bytes of each entry 276 </code></pre> 277 <h3 id="section-header-table-entry-4-symtab">Section header table: Entry 4 (.symtab)</h3> 278 <pre><code>|00000140: 1700 0000 0200 0000 0000 0000 0000 0000| ................ 279 |00000150: 0000 0000 0000 0000 8002 0000 0000 0000| ................ 280 |00000160: a800 0000 0000 0000 0500 0000 0600 0000| ................ 281 |00000170: 0800 0000 0000 0000 1800 0000 0000 0000| ................ 282 283 sh_name 0x00000017 Offset into .shstrtab 284 sh_type 0x00000002 SHT_SYMTAB 285 sh_flags 0x0000000000000000 Section attributes 286 sh_addr 0x0000000000000000 Virtual address of section in memory 287 sh_offset 0x0000000000000280 Offset of section in file image 288 sh_size 0x00000000000000a8 Size in bytes of section in file image 289 sh_link 0x00000005 Section index of associated section 290 sh_info 0x00000006 Extra info about section 291 sh_addralign 0x0000000000000008 Alignment 292 sh_entsize 0x0000000000000018 Size in bytes of each entry 293 </code></pre> 294 <h3 id="section-header-table-entry-5-strtab">Section header table: Entry 5 (.strtab)</h3> 295 <pre><code>|00000180: 1f00 0000 0300 0000 0000 0000 0000 0000| ................ 296 |00000190: 0000 0000 0000 0000 3003 0000 0000 0000| ........0....... 297 |000001a0: 1f00 0000 0000 0000 0000 0000 0000 0000| ................ 298 |000001b0: 0100 0000 0000 0000 0000 0000 0000 0000| ................ 299 300 sh_name 0x0000001f Offset into .shstrtab 301 sh_type 0x00000003 SHT_STRTAB 302 sh_flags 0x0000000000000000 Section attributes 303 sh_addr 0x0000000000000000 Virtual address of section in memory 304 sh_offset 0x0000000000000330 Offset of section in file image 305 sh_size 0x000000000000001f Size in bytes of section in file image 306 sh_link 0x00000000 Section index of associated section 307 sh_info 0x00000000 Extra info about section 308 sh_addralign 0x0000000000000001 Alignment 309 sh_entsize 0x0000000000000000 Size in bytes of each entry 310 </code></pre> 311 <h3 id="section-header-table-entry-6-relatext">Section header table: Entry 6 (.rela.text)</h3> 312 <pre><code>|000001c0: 2700 0000 0400 0000 0000 0000 0000 0000| '............... 313 |000001d0: 0000 0000 0000 0000 5003 0000 0000 0000| ........P....... 314 |000001e0: 1800 0000 0000 0000 0400 0000 0200 0000| ................ 315 |000001f0: 0800 0000 0000 0000 1800 0000 0000 0000| ................ 316 317 sh_name 0x00000027 Offset into .shstrtab 318 sh_type 0x00000004 SHT_RELA 319 sh_flags 0x0000000000000000 Section attributes 320 sh_addr 0x0000000000000000 Virtual address of section in memory 321 sh_offset 0x0000000000000350 Offset of section in file image 322 sh_size 0x0000000000000018 Size in bytes of section in file image 323 sh_link 0x00000004 Section index of associated section 324 sh_info 0x00000002 Extra info about section 325 sh_addralign 0x0000000000000008 Alignment 326 sh_entsize 0x0000000000000018 Size in bytes of each entry 327 </code></pre> 328 <h3 id="section-1-data-sht_progbits-shf_write--shf_alloc">Section 1: .data (SHT_PROGBITS; SHF_WRITE | SHF_ALLOC)</h3> 329 <pre><code>|00000200: 4865 6c6c 6f2c 2057 6f72 6c64 210a 0000| Hello, World!... 330 331 0x000000 'Hello, World!\n' 332 Zero-padding (2 bytes starting at 0x20e) 333 </code></pre> 334 <h3 id="section-2-text-sht_progbits-shf_alloc--shf_execinstr">Section 2: .text (SHT_PROGBITS; SHF_ALLOC | SHF_EXECINSTR)</h3> 335 <pre><code>|00000210: bf01 0000 0048 be00 0000 0000 0000 00ba| .....H.......... 336 |00000220: 0e00 0000 b804 0000 000f 0548 31ff b801| ...........H1... 337 |00000230: 0000 000f 0500 0000 0000 0000 0000 0000| ................ 338 339 0x00000010 mov edi, 0x1 340 0x00000015 movabs rsi, 0x000000 (placeholder for db hello) 341 0x0000001f mov edx, 0xe 342 0x00000024 mov eax, 0x4 343 0x00400029 syscall 344 0x0040002b xor rdi, rdi 345 0x0040002e mov eax, 0x1 346 0x00400033 syscall 347 Zero-padding (11 bytes starting at 0x235) 348 </code></pre> 349 <h3 id="section-3-shstrtab-sht_strtab">Section 3: .shstrtab (SHT_STRTAB;)</h3> 350 <pre><code>|00000240: 002e 6461 7461 002e 7465 7874 002e 7368| ..data..text..sh 351 |00000250: 7374 7274 6162 002e 7379 6d74 6162 002e| strtab..symtab.. 352 |00000260: 7374 7274 6162 002e 7265 6c61 2e74 6578| strtab..rela.tex 353 |00000270: 7400 0000 0000 0000 0000 0000 0000 0000| t............... 354 355 0x00000000: '' 356 0x00000001: '.data' 357 0x00000007: '.text' 358 0x0000000d: '.shstrtab' 359 0x00000017: '.symtab' 360 0x0000001f: '.strtab' 361 0x00000027: '.rela.text' 362 Zero-padding (14 bytes starting at 0x272) 363 </code></pre> 364 <h3 id="section-4-symtab-sht_symtab">Section 4: .symtab&rsquo; (SHT_SYMTAB;)</h3> 365 <h4 id="symbol-table-entry-0">Symbol table entry 0</h4> 366 <pre><code>|00000280: 0000 0000 0000 0000 0000 0000 0000 0000| ................ 367 |00000290: 0000 0000 0000 0000 | ........ 368 369 st_name 0x00000000 370 st_info 0x00 371 st_other 0x00 372 st_shndx 0x0000 (SHN_UNDEF) 373 st_value 0x0000000000000000 374 st_size 0x0000000000000000 375 </code></pre> 376 <h4 id="symbol-table-entry-1-helloasm">Symbol table entry 1 (hello.asm)</h4> 377 <pre><code>|00000298: 0100 0000 0400 f1ff| ........ 378 |000002a0: 0000 0000 0000 0000 0000 0000 0000 0000| ................ 379 380 st_name 0x00000001 381 st_info 0x04 (STT_FILE) 382 st_other 0x00 383 st_shndx 0xfff1 (SHN_ABS) 384 st_value 0x0000000000000000 385 st_size 0x0000000000000000 386 </code></pre> 387 <h4 id="symbol-table-entry-2">Symbol table entry 2</h4> 388 <pre><code>|000002b0: 0000 0000 0300 0100 0000 0000 0000 0000| ................ 389 |000002c0: 0000 0000 0000 0000 | ........ 390 391 st_name 0x00000000 392 st_info 0x03 (STT_OBJECT | STT_FUNC) 393 st_other 0x00 394 st_shndx 0x0001 (Section 1: .data) 395 st_value 0x0000000000000000 396 st_size 0x0000000000000000 397 </code></pre> 398 <h4 id="symbol-table-entry-3">Symbol table entry 3</h4> 399 <pre><code>|000002c8: 0000 0000 0300 0200| ........ 400 |000002d0: 0000 0000 0000 0000 0000 0000 0000 0000| ................ 401 402 st_name 0x00000000 403 st_info 0x03 (STT_OBJECT | STT_FUNC) 404 st_other 0x00 405 st_shndx 0x0002 (Section 2: .text) 406 st_value 0x0000000000000000 407 st_size 0x0000000000000000 408 </code></pre> 409 <h4 id="symbol-table-entry-4-hello">Symbol table entry 4 (hello)</h4> 410 <pre><code>|000002e0: 0b00 0000 0000 0100 0000 0000 0000 0000| ................ 411 |000002f0: 0000 0000 0000 0000 | ........ 412 413 st_name 0x0000000b 414 st_info 0x00 415 st_other 0x00 416 st_shndx 0x0001 (Section 1: .data) 417 st_value 0x0000000000000000 418 st_size 0x0000000000000000 419 </code></pre> 420 <h3 id="symbol-table-entry-5-hbytes">Symbol table entry 5 (hbytes)</h3> 421 <pre><code>|000002f8: 1100 0000 0000 f1ff| ........ 422 |00000300: 0e00 0000 0000 0000 0000 0000 0000 0000| ................ 423 424 st_name 0x00000011 425 st_info 0x00 426 st_other 0x00 427 st_shndx 0xfff1 (SHN_ABS) 428 st_value 0x000000000000000e 429 st_size 0x0000000000000000 430 </code></pre> 431 <h4 id="symbol-table-entry-6-_start">Symbol table entry 6 (_start)</h4> 432 <pre><code>|00000310: 1800 0000 1000 0200 0000 0000 0000 0000| ................ 433 |00000320: 0000 0000 0000 0000 0000 0000 0000 0000| ................ 434 435 st_name 0x00000018 436 st_info 0x01 (STT_OBJECT) 437 st_other 0x00 438 st_shndx 0x0002 (Section 2: .text) 439 st_value 0x0000000000000000 440 st_size 0x0000000000000000 441 Zero-padding (8 bytes starting at 0x328) 442 </code></pre> 443 <h3 id="section-5-strtab-sht_strtab">Section 5: .strtab (SHT_STRTAB;)</h3> 444 <pre><code>|00000330: 0068 656c 6c6f 2e61 736d 0068 656c 6c6f| .hello.asm.hello 445 |00000340: 0068 6279 7465 7300 5f73 7461 7274 0000| .hbytes._start.. 446 447 0x00000000: '' 448 0x00000001: 'hello.asm' 449 0x0000000b: 'hello' 450 0x00000011: 'hbytes' 451 0x00000018: '_start' 452 Zero-padding (1 byte starting at 0x34f) 453 </code></pre> 454 <h3 id="section-6-relatext-sht_rela">Section 6: .rela.text (SHT_RELA;)</h3> 455 <pre><code>|00000350: 0700 0000 0000 0000 0100 0000 0200 0000| ................ 456 |00000360: 0000 0000 0000 0000 0000 0000 0000 0000| ................ 457 458 r_offset 0x0000000000000007 459 r_info 0x0000000200000001 (Symbol table entry 2, type R_AMD64_64) 460 r_addend 0x0000000000000000 461 Zero-padding (8 bytes starting at 0x368) 462 </code></pre> 463 <h2 id="link-to-executable-image">Link to executable image</h2> 464 <p>Next, let&rsquo;s link <code>hello.o</code> into a 64-bit ELF executable:</p> 465 <pre><code>% ld -o hello hello.o 466 </code></pre> 467 <p>This emits <code>hello</code>, a 951-byte ELF-64 executable image.</p> 468 <p>Since the linker has decided which segment each section maps into (if any) and 469 what the segment addresses are, addresses are now known for all (statically 470 linked) symbols, and address 0x0 placeholders have been replaced with actual 471 addresses. We can see this in the <code>mov</code> instruction at address 0x4000b5, which 472 now specifies an address of 0x6000d8.</p> 473 <p>Running the linked executable image through <code>xxd</code> as above and picking our 474 trusty linker &amp; loader guide back up, here we go again:</p> 475 <h3 id="elf-header-1">ELF Header</h3> 476 <pre><code>|00000000: 7f45 4c46 0201 0109 0000 0000 0000 0000| .ELF............ 477 |00000010: 0200 3e00 0100 0000 b000 4000 0000 0000| ..&gt;.......@..... 478 |00000020: 4000 0000 0000 0000 1001 0000 0000 0000| @............... 479 |00000030: 0000 0000 4000 3800 0200 4000 0600 0300| ....@.8...@..... 480 481 e_ident[EI_MAG0..EI_MAG3] 0x7f + ELF Magic 482 e_ident[EI_CLASS] 0x02 64-bit 483 e_ident[EI_DATA] 0x01 Little-endian 484 e_ident[EI_VERSION] 0x01 ELF v1 485 e_ident[EI_OSABI] 0x09 FreeBSD 486 e_ident[EI_ABIVERSION] 0x00 Unused 487 e_ident[EI_PAD] 0x0000000000 7 bytes unused padding 488 e_type 0x0002 ET_EXEC 489 e_machine 0x003e x86_64 490 e_version 0x00000001 Version 1 491 e_entry 0x00000000004000b0 Entrypoint addr 492 e_phoff 0x0000000000000040 Program header table offset in image 493 e_shoff 0x0000000000000110 Section header table offset in image 494 e_flags 0x00000000 Architecture-dependent interpretation 495 e_ehsize 0x0040 Size of this ELF header 496 e_phentsize 0x0038 Size of program header table entry 497 e_phnum 0x0002 Number of program header table entries 498 e_shentsize 0x0040 Size of section header table entry 499 e_shnum 0x0006 Number of section header table entries 500 e_shstrndx 0x0003 Index of section header for .shstrtab 501 </code></pre> 502 <h3 id="program-header-table-entry-0-pf_x--pf_r">Program header table: Entry 0 (PF_X | PF_R)</h3> 503 <pre><code>|00000040: 0100 0000 0500 0000 0000 0000 0000 0000| ................ 504 |00000050: 0000 4000 0000 0000 0000 4000 0000 0000| ..@.......@..... 505 |00000060: d500 0000 0000 0000 d500 0000 0000 0000| ................ 506 |00000070: 0000 2000 0000 0000 | .. ............. 507 508 p_type 0x00000001 PT_LOAD 509 p_flags 0x00000005 PF_X | PF_R 510 p_offset 0x00000000 Offset of segment in file image 511 p_vaddr 0x0000000000400000 Virtual address of segment in memory 512 p_paddr 0x0000000000400000 Physical address of segment 513 p_filesz 0x00000000000000d5 Size in bytes of segment in file image 514 p_memsz 0x00000000000000d5 Size in bytes of segment in memory 515 p_align 0x0000000000200000 Alignment (2MB) 516 </code></pre> 517 <h3 id="program-header-table-entry-1-pf_w--pf_r">Program header table: Entry 1 (PF_W | PF_R)</h3> 518 <pre><code>|00000078: 0100 0000 0600 0000| ........ 519 |00000080: d800 0000 0000 0000 d800 6000 0000 0000| ..........`..... 520 |00000090: d800 6000 0000 0000 0e00 0000 0000 0000| ..`............. 521 |000000a0: 0e00 0000 0000 0000 0000 2000 0000 0000| .......... ..... 522 523 p_type 0x00000001 PT_LOAD 524 p_flags 0x00000006 PF_W | PF_R 525 p_offset 0x00000000000000d8 Offset of segment in file image 526 p_vaddr 0x00000000006000d8 Virtual address of segment in memory 527 p_paddr 0x00000000006000d8 Physical address of segment 528 p_filesz 0x000000000000000e Size in bytes of segment in file image 529 p_memsz 0x000000000000000e Size in bytes of segment in memory 530 p_align 0x0000000000200000 Alignment (2MB) 531 </code></pre> 532 <h3 id="section-1-text-sht_progbits-shf_alloc--shf_execinstr">Section 1: .text (SHT_PROGBITS; SHF_ALLOC | SHF_EXECINSTR)</h3> 533 <pre><code>|000000b0: bf01 0000 0048 bed8 0060 0000 0000 00ba| .....H...`...... 534 |000000c0: 0e00 0000 b804 0000 000f 0548 31ff b801| ...........H1... 535 |000000d0: 0000 000f 05 | ..... 536 537 0x4000b0 mov edi, 0x1 538 0x4000b5 movabs rsi, 0x6000d8 539 0x4000bf mov edx, 0xe 540 0x4000c4 mov eax, 0x4 541 0x4000c9 syscall 542 0x4000cb xor rdi, rdi 543 0x4000ce mov eax, 0x1 544 0x4000d3 syscall 545 Zero-padding (5 bytes starting at 0x000000d5) 546 </code></pre> 547 <h3 id="section-2-data-sht_progbits-shf_write--shf_alloc">Section 2: .data (SHT_PROGBITS; SHF_WRITE | SHF_ALLOC)</h3> 548 <pre><code>|000000d8: 4865 6c6c 6f2c 2057| Hello, W 549 |000000e0: 6f72 6c64 210a | orld!. 550 551 0x6000d8 'Hello, World!\n' 552 </code></pre> 553 <h3 id="section-3-shstrtab-sht_strtab-1">Section 3: .shstrtab (SHT_STRTAB;)</h3> 554 <pre><code>|000000e6: 002e 7379 6d74 6162 002e| ..symtab.. 555 |000000f0: 7374 7274 6162 002e 7368 7374 7274 6162| strtab..shstrtab 556 |00000100: 002e 7465 7874 002e 6461 7461 0000 0000| ..text..data. 557 558 0x00000000: '' 559 0x00000001: '.symtab' 560 0x00000009: '.strtab' 561 0x00000011: '.shstrtab' 562 0x0000001b: '.text' 563 0x00000021: '.data' 564 Zero-padding (3 bytes starting at 0x0000010d) 565 </code></pre> 566 <h3 id="section-header-table-entry-0-null-1">Section header table: Entry 0 (null)</h3> 567 <pre><code>|00000110: 0000 0000 0000 0000 0000 0000 0000 0000| ................ 568 |00000120: 0000 0000 0000 0000 0000 0000 0000 0000| ................ 569 |00000130: 0000 0000 0000 0000 0000 0000 0000 0000| ................ 570 |00000140: 0000 0000 0000 0000 0000 0000 0000 0000| ................ 571 572 sh_name 0x00000000 Offset into .shstrtab 573 sh_type 0x00000000 SHT_NULL 574 sh_flags 0x0000000000000000 Section attributes 575 sh_addr 0x0000000000000000 Virtual address of section in memory 576 sh_offset 0x0000000000000000 Offset of section in file image 577 sh_size 0x0000000000000000 Size in bytes of section in file image 578 sh_link 0x00000000 Section index of associated section 579 sh_info 0x00000000 Extra info about section 580 sh_addralign 0x0000000000000000 Alignment 581 sh_entsize 0x0000000000000000 Size in bytes of each entry 582 </code></pre> 583 <h3 id="section-header-table-entry-1-text">Section header table: Entry 1 (.text)</h3> 584 <pre><code>|00000150: 1b00 0000 0100 0000 0600 0000 0000 0000| ................ 585 |00000160: b000 4000 0000 0000 b000 0000 0000 0000| ..@............. 586 |00000170: 2500 0000 0000 0000 0000 0000 0000 0000| %............... 587 |00000180: 1000 0000 0000 0000 0000 0000 0000 0000| ................ 588 589 sh_name 0x0000001b Offset into .shstrtab 590 sh_type 0x00000001 SHT_PROGBITS 591 sh_flags 0x00000006 SHF_ALLOC | SHF_EXECINSTR 592 sh_addr 0x00000000004000b0 Virtual address of section in memory 593 sh_offset 0x00000000000000b0 Offset of section in file image 594 sh_size 0x0000000000000025 Size in bytes of section in file image 595 sh_link 0x00000000 Section index of associated section 596 sh_info 0x00000000 Extra info about section 597 sh_addralign 0x0000000000000010 Alignment (2B) 598 sh_entsize 0x0000000000000000 Size in bytes of each entry 599 </code></pre> 600 <h3 id="section-header-table-entry-2-data">Section header table: Entry 2 (.data)</h3> 601 <pre><code>|00000190: 2100 0000 0100 0000 0300 0000 0000 0000| !............... 602 |000001a0: d800 6000 0000 0000 d800 0000 0000 0000| ..`............. 603 |000001b0: 0e00 0000 0000 0000 0000 0000 0000 0000| ................ 604 |000001c0: 0400 0000 0000 0000 0000 0000 0000 0000| ................ 605 606 sh_name 0x00000021 Offset into .shstrtab 607 sh_type 0x00000001 SHT_PROGBITS 608 sh_flags 0x0000000000000003 SHF_WRITE | SHF_ALLOC 609 sh_addr 0x00000000006000d8 Virtual address of section in memory 610 sh_offset 0x00000000000000d8 Offset of section in file image 611 sh_size 0x000000000000000e Size in bytes of section in file image 612 sh_link 0x00000000 Section index of associated section 613 sh_info 0x00000000 Extra info about section 614 sh_addralign 0x0000000000000004 Alignment (4B) 615 sh_entsize 0x0000000000000000 Size in bytes of each entry 616 </code></pre> 617 <h3 id="section-header-table-entry-3-shstrtab-1">Section header table: Entry 3 (.shstrtab)</h3> 618 <pre><code>|000001d0: 1100 0000 0300 0000 0000 0000 0000 0000| ................ 619 |000001e0: 0000 0000 0000 0000 e600 0000 0000 0000| ................ 620 |000001f0: 2700 0000 0000 0000 0000 0000 0000 0000| '............... 621 |00000200: 0100 0000 0000 0000 0000 0000 0000 0000| ................ 622 623 sh_name 0x00000011 Offset into .shstrtab 624 sh_type 0x00000003 SHT_STRTAB 625 sh_flags 0x00000000 No flags 626 sh_addr 0x0000000000000000 Virtual address of section in memory 627 sh_offset 0x00000000000000e6 Offset of section in file image 628 sh_size 0x0000000000000027 Size in bytes of section in file image 629 sh_link 0x00000000 Section index of associated section 630 sh_info 0x00000000 Extra info about section 631 sh_addralign 0x0000000000000001 Alignment (1B) 632 sh_entsize 0x0000000000000000 Size in bytes of each entry 633 </code></pre> 634 <h3 id="section-header-table-entry-4-symtab-1">Section header table: Entry 4 (.symtab)</h3> 635 <pre><code>|00000210: 0100 0000 0200 0000 0000 0000 0000 0000| ................ 636 |00000220: 0000 0000 0000 0000 9002 0000 0000 0000| ................ 637 |00000230: f000 0000 0000 0000 0500 0000 0600 0000| ................ 638 |00000240: 0800 0000 0000 0000 1800 0000 0000 0000| ................ 639 640 sh_name 0x00000001 Offset into .shstrtab 641 sh_type 0x00000002 SHT_SYMTAB 642 sh_flags 0x00000000 No flags 643 sh_addr 0x0000000000000000 Virtual address of section in memory 644 sh_offset 0x0000000000000290 Offset of section in file image 645 sh_size 0x00000000000000f0 Size in bytes of section in file image 646 sh_link 0x00000005 Section index of associated section 647 sh_info 0x00000006 Flags 648 sh_addralign 0x0000000000000008 Alignment (8B) 649 sh_entsize 0x0000000000000018 Size in bytes of each entry (24B) 650 </code></pre> 651 <h3 id="section-header-table-entry-5-strtab-1">Section header table: Entry 5 (.strtab)</h3> 652 <pre><code>|00000250: 0900 0000 0300 0000 0000 0000 0000 0000| ................ 653 |00000260: 0000 0000 0000 0000 8003 0000 0000 0000| ................ 654 |00000270: 3700 0000 0000 0000 0000 0000 0000 0000| 7............... 655 |00000280: 0100 0000 0000 0000 0000 0000 0000 0000| ................ 656 657 sh_name 0x00000009 Offset into .shstrtab 658 sh_type 0x00000003 SHT_STRTAB 659 sh_flags 0x0000000000000000 No flags 660 sh_addr 0x0000000000000000 Virtual address of section in memory 661 sh_offset 0x0000000000000380 Offset of section in file image 662 sh_size 0x0000000000000037 Size in bytes of section in file image 663 sh_link 0x00000000 Section index of associated section 664 sh_info 0x00000000 Extrac info about section 665 sh_addralign 0x0000000000000001 Alignment (1B) 666 sh_entsize 0x0000000000000000 Size in bytes of each entry 667 </code></pre> 668 <h3 id="section-4-symtab-sht_symtab-1">Section 4: .symtab (SHT_SYMTAB;)</h3> 669 <h4 id="symbol-table-entry-0-1">Symbol table entry 0</h4> 670 <pre><code>|00000290: 0000 0000 0000 0000 0000 0000 0000 0000| ................ 671 |000002a0: 0000 0000 0000 0000 | ........ 672 673 st_name 0x00000000 674 st_info 0x00 675 st_other 0x00 676 st_shndx 0x0000 (SHN_UNDEF) 677 st_value 0x0000000000000000 678 st_size 0x0000000000000000 679 </code></pre> 680 <h4 id="symbol-table-entry-1">Symbol table entry 1</h4> 681 <pre><code>|000002a8: 0000 0000 0300 0100| ........ 682 |000002b0: b000 4000 0000 0000 0000 0000 0000 0000| ..@............. 683 684 st_name 0x00000000 685 st_info 0x03 (STT_OBJECT | STT_FUNC) 686 st_other 0x00 687 st_shndx 0x0001 (Section 1: .text) 688 st_value 0x00000000004000b0 689 st_size 0x0000000000000000 690 </code></pre> 691 <h4 id="symbol-table-entry-2-1">Symbol table entry 2</h4> 692 <pre><code>|000002c0: 0000 0000 0300 0200 d800 6000 0000 0000| ..........`..... 693 |000002d0: 0000 0000 0000 0000 | ........ 694 695 st_name 0x00000000 696 st_info 0x03 (STT_OBJECT | STT_FUNC) 697 st_other 0x00 698 st_shndx 0x0002 (Section 2: .data) 699 st_value 0x00000000006000d8 700 st_size 0x0000000000000000 701 </code></pre> 702 <h4 id="symbol-table-entry-3-helloasm">Symbol table entry 3 (hello.asm)</h4> 703 <pre><code>|000002d0: 0100 0000 0400 f1ff| ........ 704 |000002e0: 0000 0000 0000 0000 0000 0000 0000 0000| ................ 705 706 st_name 0x00000001 707 st_info 0x04 (STT_FILE) 708 st_other 0x00 709 st_shndx 0xfff1 (SHN_ABS) 710 st_value 0x0000000000000000 711 st_size 0x0000000000000000 712 </code></pre> 713 <h4 id="symbol-table-entry-4-hello-1">Symbol table entry 4 (hello)</h4> 714 <pre><code>|000002f0: 0b00 0000 0000 0200 d800 6000 0000 0000| ..........`..... 715 |00000300: 0000 0000 0000 0000 | ................ 716 717 st_name 0x0000000b 718 st_info 0x00 719 st_other 0x00 720 st_shndx 0x0002 (Section 2: .data) 721 st_value 0x00000000006000d8 722 st_size 0x0000000000000000 723 </code></pre> 724 <h4 id="symbol-table-entry-5-hbytes-1">Symbol table entry 5 (hbytes)</h4> 725 <pre><code>|00000300: 1100 0000 0000 f1ff| ........ 726 |00000310: 0e00 0000 0000 0000 0000 0000 0000 0000| ................ 727 728 st_name 0x00000011 729 st_info 0x00 730 st_other 0x00 731 st_shndx 0xfff1 (SHN_ABS) 732 st_value 0x000000000000000e 733 st_size 0x0000000000000000 734 </code></pre> 735 <h4 id="symbol-table-entry-6-_start-1">Symbol table entry 6 (_start)</h4> 736 <pre><code>|00000320: 1800 0000 1000 0100 b000 4000 0000 0000| ..........@..... 737 |00000330: 0000 0000 0000 0000 | ........ 738 739 st_name 0x00000018 740 st_info 0x10 (STB_GLOBAL) 741 st_other 0x00 742 st_shndx 0x0001 (Section 1: .text) 743 st_value 0x00000000004000b0 744 st_size 0x0000000000000000 745 </code></pre> 746 <h4 id="symbol-table-entry-7-__bss_start">Symbol table entry 7 (__bss_start)</h4> 747 <pre><code>|00000330: 1f00 0000 1000 f1ff| ........ 748 |00000340: e600 6000 0000 0000 0000 0000 0000 0000| ..`............. 749 750 st_name 0x0000001f 751 st_info 0x10 (STB_GLOBAL) 752 st_other 0x00 753 st_shndx 0xfff1 (SHN_ABS) 754 st_value 0x00000000006000e6 755 st_size 0x0000000000000000 756 </code></pre> 757 <h4 id="symbol-table-entry-8-_edata">Symbol table entry 8 (_edata)</h4> 758 <pre><code>|00000350: 2b00 0000 1000 f1ff e600 6000 0000 0000| +.........`..... 759 |00000360: 0000 0000 0000 0000 | ........ 760 761 st_name 0x0000002b 762 st_info 0x10 (STB_GLOBAL) 763 st_other 0x00 764 st_shndx 0xfff1 (SHN_ABS) 765 st_value 0x00000000006000e6 766 st_size 0x0000000000000000 767 </code></pre> 768 <h4 id="symbol-table-entry-9-_end">Symbol table entry 9 (_end)</h4> 769 <pre><code>|00000360: 3200 0000 1000 f1ff| 2....... 770 |00000370: e800 6000 0000 0000 0000 0000 0000 0000| ..`............. 771 772 st_name 0x00000032 773 st_info 0x10 (STB_GLOBAL) 774 st_other 0x00 775 st_shndx 0xfff1 (SHN_ABS) 776 st_value 0x00000000006000e8 777 st_size 0x0000000000000000 778 </code></pre> 779 <h3 id="section-6-strtab-sht_strtab">Section 6: .strtab (SHT_STRTAB;)</h3> 780 <pre><code>|00000380: 0068 656c 6c6f 2e61 736d 0068 656c 6c6f| .hello.asm.hello 781 |00000390: 0068 6279 7465 7300 5f73 7461 7274 005f| .hbytes._start._ 782 |000003a0: 5f62 7373 5f73 7461 7274 005f 6564 6174| _bss_start._edat 783 |000003b0: 6100 5f65 6e64 00 | a._end. 784 785 0x00000000: '' 786 0x00000001: 'hello.asm' 787 0x0000000b: 'hello' 788 0x00000011: 'hbytes' 789 0x00000018: '_start' 790 0x0000001f: '__bss_start' 791 0x0000002b: '_edata' 792 0x00000032: '_end' 793 </code></pre> 794 <h2 id="effect-of-stripping">Effect of stripping</h2> 795 <p>Running <code>strip</code> on the binary has the effect of dropping the <code>.symtab</code> and 796 <code>.strtab</code> sections along with their section headers and 16 bytes of data (the 797 section names <code>.symtab</code> and <code>.strtab</code>) from the <code>.shstrtab</code> section, reducing the 798 total binary size to 512 bytes.</p> 799 <h2 id="in-memory-process-image">In-memory process image</h2> 800 <p>FreeBSD uses a memory superpage size of 2MB (page size of 4kB) on x86_64. Since 801 attributes are set at the page level, read+execute program <code>.text</code> and 802 read+write <code>.data</code> are loaded into two separate segments on separate pages, as 803 laid-out by the linker.</p> 804 <p>On launch, the kernel maps the binary image into memory as specified in the 805 program header table:</p> 806 <ul> 807 <li>PHT Entry 0: The ELF header, program header table, and Section 1 (<code>.text</code>) 808 are mapped from offset 0x00 of the binary image (with length 0xd6 bytes) 809 into Segment 1 (readable, executable) at address 0x400000.</li> 810 <li>PHT Entry 1: Section 2 (<code>.data</code>) at offset 0xd8 of the binary image is 811 mapped into Segment 2 (readable, writeable) at address 0x6000d8 from offset 812 0xd8 with length 0x0e bytes.</li> 813 </ul> 814 <p>The program entrypoint is specified to be 0x4000b0, the start of the <code>.text</code> 815 section.</p> 816 <p>And that&rsquo;s it! Any corrections or comments are always welcome. Shoot me an 817 email at <a href="mailto:chris@bracken.jp">chris@bracken.jp</a>.</p> 818 </description> 819 </item> 820 821 <item> 822 <title>Installing Mozc on Ubuntu</title> 823 <link>https://chris.bracken.jp/2011/04/installing-mozc-on-ubuntu/</link> 824 <pubDate>Fri, 22 Apr 2011 00:00:00 +0000</pubDate> 825 826 <guid>https://chris.bracken.jp/2011/04/installing-mozc-on-ubuntu/</guid> 827 <description><p>If you&rsquo;re a Japanese speaker, one of the first things you do when you install a 828 fresh Linux distribution is to install a decent <a href="https://en.wikipedia.org/wiki/Japanese_IME">Japanese IME</a>. 829 Ubuntu defaults to <a href="https://sourceforge.jp/projects/anthy/news/">Anthy</a>, but I personally prefer <a href="https://code.google.com/p/mozc/">Mozc</a>, and 830 that&rsquo;s what I&rsquo;m going to show you how to install here.</p> 831 <p><em>Update (2011-05-01):</em> Found an older <a href="https://www.youtube.com/watch?v=MfgjTCXZ2-s">video tutorial</a> on YouTube 832 which provides an alternative (and potentially more comprehensive) solution for 833 Japanese support on 10.10 using ibus instead of uim, which is the better choice 834 for newer releases.</p> 835 <p><em>Update (2011-10-25):</em> The software installation part of this process got a 836 whole lot easier in Ubuntu releases after Natty, and as noted above, I&rsquo;d 837 recommend sticking with ibus over uim.</p> 838 <h3 id="japanese-input-basics">Japanese Input Basics</h3> 839 <p>Before we get going, let&rsquo;s understand a bit about how Japanese input works on 840 computers. Japanese comprises three main character sets: the two phonetic 841 character sets, hiragana and katakana at 50 characters each, plus many 842 thousands of Kanji, each with multiple readings. Clearly a full keyboard is 843 impractical, so a mapping is required.</p> 844 <p>Input happens in two steps. First, you input the text phonetically, then you 845 convert it to a mix of kanji and kana.</p> 846 <figure><img src="https://chris.bracken.jp/post/2011-04-22-henkan.png" 847 alt="Japanese IME completion menu"> 848 </figure> 849 850 <p>Over the years, two main mechanisms evolved to input kana. The first was common 851 on old <em>wapuro</em>, and assigns a kana to each key on the keyboard—e.g. where 852 the <em>A</em> key appears on a QWERTY keyboard, you&rsquo;ll find a ち. This is how our 853 grandparents hacked out articles for the local <em>shinbun</em>, but I suspect only a 854 few die-hard traditionalists still do this. The second and more common method 855 is literal <a href="https://en.wikipedia.org/wiki/Wapuro">transliteration of roman characters into kana</a>. You 856 type <em>fujisan</em> and out comes ふじさん.</p> 857 <p>Once the phonetic kana have been input, you execute a conversion step wherein 858 the input is transformed into the appropriate mix of kanji and kana. Given the 859 large number of homonyms in Japanese, this step often involves disambiguating 860 your input by selecting the intended kanji. For example, the <em>mita</em> in <em>eiga wo 861 mita</em> (I watched a movie) is properly rendered as 観た whereas the <em>mita</em> in 862 <em>kuruma wo mita</em> (I saw a car) should be 見た, and in neither case is it <em>mita</em> 863 as in the place name <em>Mita-bashi</em> (Mita bridge) which is written 三田.</p> 864 <h3 id="some-implementation-details">Some Implementation Details</h3> 865 <p>Let&rsquo;s look at implementation. There are two main components used in inputting 866 Japanese text:</p> 867 <p>The GUI system (e.g. ibus, uim) is responsible for:</p> 868 <ol> 869 <li>Maintaining and switching the current input mode: 870 ローマ字、ひらがな、カタカナ、半額カタカナ.</li> 871 <li>Transliteration of character input into kana: <em>ku</em> into く, 872 <em>nekko</em> into ねっこ, <em>xtu</em> into っ.</li> 873 <li>Managing the text under edit (the underlined stuff) and the 874 drop-down list of transliterations.</li> 875 <li>Ancillary functions such as supplying a GUI for custom dictionary 876 management, kanji lookup by radical, etc.</li> 877 </ol> 878 <p>The transliteration engine (e.g. Anthy, Mozc) is responsible for transforming a 879 piece of input text, usually in kana form, into kanji: for example みる into 880 one of: 見る、観る、診る、視る. This involves:</p> 881 <ol> 882 <li>Breaking the input phrase into components.</li> 883 <li>Transforming each component into the appropriate best guess based on context 884 and historical input.</li> 885 <li>Supplying alternative transformations in case the best guess was incorrect.</li> 886 </ol> 887 <h3 id="why-mozc">Why Mozc?</h3> 888 <p>TL;DR: because it&rsquo;s better. Have a look at the conversion list up at the top of 889 this post. The input is <em>kinou</em>, for which there are two main conversion 890 candidates: 機能 (feature) and 昨日 (yesterday). Notice however, that it also 891 supplies several conversions for yesterday&rsquo;s date in various formats, including 892 「平成23年4月21日」 using <a href="https://en.wikipedia.org/wiki/Japanese_era_name">Japanese Era Name</a> rather than the 893 Western notation 2011. This is just one small improvement among dozens of 894 clever tricks it performs. If you&rsquo;re thinking this bears an uncanny resemblance 895 to tricks that <a href="https://www.google.com/intl/ja/ime/">Google&rsquo;s Japanese IME</a> supports, you&rsquo;re right: Mozc 896 originated from the same codebase.</p> 897 <h3 id="switching-to-mozc">Switching to Mozc</h3> 898 <p>So let&rsquo;s assume you&rsquo;re now convinced to abandon Anthy and switch to Mozc. 899 You&rsquo;ll need to make some changes. Here are the steps:</p> 900 <p>If you haven&rsquo;t yet done so, install some Japanese fonts from either Software 901 Centre or Synaptic. I&rsquo;d recommend grabbing the <em>ttf-takao</em> package.</p> 902 <p>Next up, we&rsquo;ll install and configure Mozc.</p> 903 <ol> 904 <li><strong>Install ibus-mozc:</strong> <code>sudo apt-get install ibus-mozc</code></li> 905 <li><strong>Restart the ibus daemon:</strong> <code>/usr/bin/ibus-daemon --xim -r -d</code></li> 906 <li><strong>Set your input method to mozc:</strong> 907 <ol> 908 <li>Open <em>Keyboard Input Methods</em> settings.</li> 909 <li>Select the <em>Input Method</em> tab.</li> 910 <li>From the <em>Select an input method</em> drop-down, select Japanese, then mozc from 911 the sub-menu.</li> 912 <li>Select <em>Japanese - Anthy</em> from the list, if it appears there, and click 913 <em>Remove</em>.</li> 914 </ol> 915 </li> 916 <li><strong>Optionally, remove Anthy from your system:</strong> <code>sudo apt-get autoremove anthy</code></li> 917 </ol> 918 <p>Log out, and back in. You should see an input method menu in the menu 919 bar at the top of the screen.</p> 920 <p>That&rsquo;s it, Mozcを楽しんでください!</p> 921 </description> 922 </item> 923 924 <item> 925 <title>Google Reader</title> 926 <link>https://chris.bracken.jp/2007/05/google-reader/</link> 927 <pubDate>Wed, 30 May 2007 00:00:00 +0000</pubDate> 928 929 <guid>https://chris.bracken.jp/2007/05/google-reader/</guid> 930 <description><p>For years, I&rsquo;ve been a fan of <a href="http://inessential.com/">Brent Simmons&rsquo;</a> OS X-based feed 931 reader, <a href="http://www.newsgator.com/Individuals/NetNewsWire/">NetNewsWire</a>. It&rsquo;s a fantastic application, and I&rsquo;ve definitely 932 got my money&rsquo;s worth out of it. After partnering with <a href="http://newsgator.com/">NewsGator</a>, I 933 started using their online feed-reader on and off, with mixed 934 results. I like that it keeps my feeds in sync between my computers, 935 and that I can browse articles at lunch, but the interface is still not on par 936 with NetNewsWire itself.</p> 937 <p>While NewsGator&rsquo;s implementation was lacking, I really did like the idea of 938 dropping the desktop app altogether and going with a fully online solution, so 939 I started exploring other options. The obvious free alternative is <a href="http://www.google.com/reader/">Google 940 Reader</a>, and I have to say, I&rsquo;m impressed. While the 941 presentation isn&rsquo;t as customizable as NetNewsWire, the functionality that I use 942 is all there, and in fact, it has some extra search features that I miss on the 943 desktop. It was only when I launched NetNewsWire today and saw 290 unread 944 items, that it hit me I hadn&rsquo;t used it in almost a month. So while I look 945 forward to <a href="http://www.flickr.com/photos/hicksdesign/210309912/">NetNewsWire 3</a>, I&rsquo;m sticking to Google Reader for the time 946 being.</p> 947 <figure><img src="https://chris.bracken.jp/post/2007-05-30-google-reader.png" 948 alt="Google reader graph of usage by hour of day"> 949 </figure> 950 951 <p>I also discovered that my prime news reading hours are apparently 6:30am to 952 7:30am and 9pm to 11pm, with a strange local maximum straggling out around 953 12:30am. I&rsquo;d be curious to compare this to <em>before</em> I had a baby that woke me 954 up around that time.</p> 955 <p><em>Update (2007-06-06):</em> NetNewsWire 3.0 is now out.</p> 956 </description> 957 </item> 958 959 </channel> 960 </rss>