{"id":315,"date":"2012-04-16T07:29:01","date_gmt":"2012-04-16T07:29:01","guid":{"rendered":"http:\/\/www.plugged.in\/?p=315"},"modified":"2012-04-16T07:29:01","modified_gmt":"2012-04-16T07:29:01","slug":"find-the-largest-files","status":"publish","type":"post","link":"https:\/\/www.veriteknik.net.tr\/en\/find-the-largest-files\/","title":{"rendered":"Find the Largest Files"},"content":{"rendered":"<p>The other day I was working on a server and needed the LARGEST files on some directory &#8211; including its subdirectories.<\/p>\n<p>As it turns out, it&#8217;s a very simple task limiting the file size you want with the output of the <strong>find<\/strong> tool.<\/p>\n<p>The <strong>-size<\/strong> argument will define the borders of your output. Let&#8217;s say you want to find the files smaller than 50 MB on your server,<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">find \/ -type f -size -50M<\/pre>\n<p>Well, this will print out the full path and the file names, you won&#8217;t know which file is at what size. So to improve this, we can execute an <strong>ls<\/strong> command on each output,<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">find \/ -type f -size -50M -exec ls -lh {} \\;<\/pre>\n<p>On this command, the <strong>{}<\/strong> refers to the output of each find command, and the <strong>\\;<\/strong> is mandatory since we need to tell <strong>find<\/strong> where our <strong>-exec<\/strong> line ends, hence the escape character. (<strong>\\<\/strong>)<\/p>\n<p>Even though this looks good, we can keep improving by printing out the file size all in the same units. (Let&#8217;s say, megabytes) The problem with that is, the <strong>ls<\/strong> command can printout with the specified block size limit but it will take that block size <em><a href=\"http:\/\/www.thefreedictionary.com\/Quantised\" target=\"_blank\">quantised<\/a><\/em>, meaning the output will only be the exact multiplicands of that block size. So if our block size is set to 1 MB and a file is 900 KB, <strong>ls<\/strong> will output it as 1 MB.<\/p>\n<p>Although this is not so accurate, we can always work it around using <strong>awk<\/strong> to calculate the numeric values for us. Since <strong>ls<\/strong> normally prints out the file size in <em>bytes<\/em>, we can divide them to become actual <em>megabytes<\/em>. The line below will printout <strong>ls<\/strong> with actual megabytes.<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">ls -l | awk '{print $1 \" \" $2 \" \" $3 \" \" $4 \" \" $5\/1048576 \" \" $6 \" \" $7 \" \" $8 \" \" $9}'<\/pre>\n<p>Well, we only need the <em>5th<\/em> column and the <em>9th<\/em> column which are the <em>size<\/em> and the <em>path<\/em> respectively, so the command below will suffice :<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">ls -l | awk '{ print $5\/1048576 \" \" $9 }'<\/pre>\n<p>As you can see, we had to use a <em>pipe<\/em> to get things done here. So we need to use this pipe in our <em>exec<\/em> part of our <strong>find<\/strong> command, which is another problem. Well, the work around for this is to <strong>-exec<\/strong> a <em>shell instance<\/em> and pass the whole <strong>ls<\/strong> and <strong>awk<\/strong> line including the pipes so that new shell instance will handle things for us.<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">find \/ -type f -size -50M -size +20M -exec sh -c \"ls -l '{}'|awk '{print \\$5\/1048576 \" MB: \" \\$9}'\" \\;<\/pre>\n<p>Ok, let&#8217;s have a look at the command above. As you can see we narrowed our limits further, by getting only the files smaller than 50 MB and larger than 20 MB. We also passed our whole command with a shell instance. On this instance, the argument for <strong>ls<\/strong> was passed with the <strong>{}<\/strong> method. We apostrophized it (<strong>&#8221;<\/strong>) due to the possibility of having spaces in the filename, which would have caused a problem. After that, we&#8217;ve piped our output to <strong>awk<\/strong>, divided the <em>bytes<\/em>, and added a string &#8221; MB: &#8221; right before printing the <em>9th<\/em> column which is the file path. Don&#8217;t forget that we should escape the <em>$5<\/em> and <em>$9<\/em> using the escape character <strong>\\<\/strong> since we don&#8217;t want the whole <strong>find<\/strong> line to process it before our <strong>awk<\/strong> does.<\/p>\n<p>Well, the good thing is we have necessary output, the bad thing is that, it isn&#8217;t in order! So let&#8217;s make things even prettier and <em>sort<\/em> them, while making the output of each &#8220;<strong>MB:<\/strong>&#8221; bold to get some eye candy.<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">find \/ -type f -size -50M -size +20M -exec sh -c \"ls -l '{}'|awk '{print \\$5\/1048576 \\\" \\033[1mMB:\\033[0;0m \\\" \\$9}'\" \\; | sort -nr -k1<\/pre>\n<p>As you can see here, we&#8217;ve piped the <strong>find<\/strong> command to the <strong>sort<\/strong>, not the <strong>shell<\/strong> instance that we invoked in the <strong>find<\/strong> command, that&#8217;s why the pipe is right after our <strong>\\;<\/strong> character.<\/p>\n<p>On this command, find will also search inside <strong>\/proc<\/strong> directory, which is a living directory, so during the search some files will be created and destroyed rapidly, which will cause some annoying outputs saying &#8220;file not found&#8221;. To avoid that, let&#8217;s tell find <strong>NOT<\/strong> to search the <strong>\/proc<\/strong> directory using the <strong>-prune<\/strong> argument.<\/p>\n<pre class=\"brush: bash; gutter: true; first-line: 1\">find \/ -path '\/proc' -prune -o -type f -size -50M -size +20M -exec sh -c \"ls -l '{}'|awk '{print \\$5\/1048576 \\\" \\033[1mMB:\\033[0;0m \\\" \\$9}'\" \\; | sort -nr -k1<\/pre>\n<p>You can add new directories to prune with the <strong>-path &#8216;\/new\/directory\/to\/prune&#8217; -prune -o<\/strong> method.<\/p>\n<p>Hope this helps.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The other day I was working on a server and needed the LARGEST files on some directory &#8211; including its subdirectories. As it turns out, it&#8217;s a very simple task limiting the file size you want with the output of the find tool. The -size argument will define the borders of your output. Let&#8217;s say [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_uag_custom_page_level_css":"","footnotes":""},"categories":[370,372],"tags":[471,366,469,472,473,474,371,470,475,476],"yst_prominent_words":[1483,1481,1477,1482,1486,1474,336,1261,1480,605,1475,711,705,1473,1485,1478,1484,1476,226,1479],"class_list":["post-315","post","type-post","status-publish","format-standard","hentry","category-linux","category-linux_help","tag-awk","tag-centos","tag-console","tag-file","tag-find","tag-large","tag-linux-2","tag-ssh","tag-terminal","tag-unix"],"jetpack_featured_media_url":"","uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false},"uagb_author_info":{"display_name":"Mustafa Emre Ayd\u0131n","author_link":"https:\/\/www.veriteknik.net.tr\/en\/author\/eaydin\/"},"uagb_comment_info":0,"uagb_excerpt":"The other day I was working on a server and needed the LARGEST files on some directory &#8211; including its subdirectories. As it turns out, it&#8217;s a very simple task limiting the file size you want with the output of the find tool. The -size argument will define the borders of your output. Let&#8217;s say&hellip;","_links":{"self":[{"href":"https:\/\/www.veriteknik.net.tr\/en\/wp-json\/wp\/v2\/posts\/315","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.veriteknik.net.tr\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.veriteknik.net.tr\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.veriteknik.net.tr\/en\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.veriteknik.net.tr\/en\/wp-json\/wp\/v2\/comments?post=315"}],"version-history":[{"count":1,"href":"https:\/\/www.veriteknik.net.tr\/en\/wp-json\/wp\/v2\/posts\/315\/revisions"}],"predecessor-version":[{"id":7296,"href":"https:\/\/www.veriteknik.net.tr\/en\/wp-json\/wp\/v2\/posts\/315\/revisions\/7296"}],"wp:attachment":[{"href":"https:\/\/www.veriteknik.net.tr\/en\/wp-json\/wp\/v2\/media?parent=315"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.veriteknik.net.tr\/en\/wp-json\/wp\/v2\/categories?post=315"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.veriteknik.net.tr\/en\/wp-json\/wp\/v2\/tags?post=315"},{"taxonomy":"yst_prominent_words","embeddable":true,"href":"https:\/\/www.veriteknik.net.tr\/en\/wp-json\/wp\/v2\/yst_prominent_words?post=315"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}