Partial postgres db dumps for a Django app

Off late, I have been working with a large postgres database that is used by an app built in Django. I wanted a partial dump of the database to try out some experimental clean up scripts. I haven't really used databases before, and the last time I had to do this I did it in a pretty ad-hoc fashion. This time around, I tried to do it more methodically and to document it.

The Django Route

I looked around for tools that let you do this, and found django-fixture-magic. I first tried it out on my older partial dump (10% as large as the original db) and it turned out to be reasonably fast and worked well, after making a few changes to get it working with Python 3.x. Its kitchensink flag to the dump_object seemed like a promising option, but didn't really seem to get all the required tables for ManyToManyFields. I worked around it, by getting a dump of all the models which were related using Django's dumpdata.

Get a dump with objects of interest

The dump_object command lets you run commands to select the objects that you want to have in the dump, and that is quite a useful thing.

python dump_object dataset.Product -k --query '{"subcategory_id__in": [1886, ...]}' > products.json

Also, get a dump of related tables.

# Dump of related fields
python dumpdata dataset.Attribute > attributes.json

Create the new empty db

Next, create a new database where this fixture can be loaded!

# Create the db
sudo su - postgres
createdb mydb

# Create a user, if required
createuser -P

Grant access to the user

In the psql prompt type the following to grant the user permissions for the database.


Fix and create tables for the models.

Make changes to to use the newly created database, and then create the tables used by the app, and then load the data.

python syncdb
python loaddata products.json

Too slow!

This method worked and was reasonably fast when I was trying to get 20k rows from a table with about 200k rows, with all the dependencies.

But, when I tried to get a dump of about 200k rows from a table with 2M rows, it was way too slow to be of any use. There could've been a couple of reasons for this, which I didn't have the time to look into, and debug.

  • The web-server where the Django app was running, and the db server with the postgres database were on different machines in separate datacenters, which could've been adding a significant amount of latency.
  • Just the size of the database being much larger could be making it slower?

These are things I should be looking into and learning about, when I have more time at hand. For now, I needed a quicker way to get a dump. Even though the raw SQL route was more manual, it turned out to be much quicker.

Raw SQL dump route

Get a dump of the interesting tables

First, I had to get a dump of all the tables with the data I was interested in, one-by-one.

COPY (SELECT * FROM "dataset_product" WHERE ("dataset_product"."subcategory_id" IN (319557, 94589, 332, 406, 626, 1886) AND "dataset_product"."gender_id" = 1)) TO '/tmp/products.tsv'

COPY (SELECT * FROM "dataset_photo" WHERE "dataset_photo"."product_id" IN (SELECT U0."id" FROM "dataset_product" U0 WHERE (U0."subcategory_id" IN (319557, 94589, 332, 406, 626, 1886) AND U0."gender_id" = 1))) TO '/tmp/photos.tsv'

-- Copy a bunch of other tables!

Load the data from the dumps

-- syncdb
COPY dataset_product FROM '/tmp/products.tsv' ENCODING 'UTF8';
COPY dataset_photo FROM '/tmp/photos.tsv' ENCODING 'UTF8';
-- Copy a bunch of other tables!

Make tables writable

Some of the tables did not let me write anything to them, until I altered the sequence for these tables.


It would be pretty nice if all of this was automated – allow a user to enter exactly the same kind of a query that django-fixture-magic lets you run, and figure out the SQL copies that need to be done to get the requested dump. Its something that currently would qualify as yak-shaving, but may be a handy thing to have. Someone somewhere possibly already has something that does this.

Clock in and get-shit-done

I had setup a couple of hooks about an year ago that turn off all notifications while I'm clocking in. But, I find myself switching to the browser and jumping to twitter, out of habit. I've tried get-shit-done in the past to help myself break this habit. But enabling get-shit-done manually is step that quickly became a non-habit.

So, I hooked up get-shit-done into an org-clock-in-hook. The snippet below is what I added into a function that is added to this hook.

  (cd "/sudo::/")
  (shell-command "HOME=/home/punchagan get-shit-done work"))

get-shit-done needs to be run as root, since it does things like modifying /etc/hosts and restarting networking. Just calling get-shit-done as a shell command fails with the error sudo: no tty present and no askpass program specified. I found a couple of ways to fix this. The snippet above piggy-backs on tramp to allow for a way to enter the password for sudo to use. This also means that I don't need to enter the password, as long as the tramp connection is alive.

For someone worried about having such an easy way of running something as root, using something like gnome-ssh-askpass as the askpass program might work better.

(shell-command "SUDO_ASKPASS=\\"/usr/lib/openssh/gnome-ssh-askpass\\" get-shit-done work")

Elfeed hook to fetch full content

I have started to use Pinboard's unread tag as my to-read list. It has a bookmark-let that works pretty well for adding stuff into my "to-read" list. I then catch up on this list using elfeed and subscribing to the unread items' RSS feed. The work-flow is pretty nice for adding stuff into the list, and finding items on the list. But, when it comes to the actual reading part, the entries in the feed don't have the actual content I want to read, and I end up opening the links in a browser.

Inspired by a comment from FiloSottile, I realized it should be pretty easy to setup a hook that fetches the actual content to make my reading work-flow smoother. I wrote a small script, using python-readability, to fetch the page content, given a URL. This script is then hooked onto elfeed-new-entry-hook, to fetch content of for new entries as they are fetched. All the old entries can be easily fixed with a single call to elfeed-apply-hooks-now.

(defun pc/get-url-content (url)
  "Fetches the content for a url."
  (shell-command-to-string (format "~/bin/ %s" url)))

(defun pc/get-entry-content (entry)
  "Fetches content for pinboard entries that are not tweets."
   (let ((entry elfeed-show-entry))
     (list entry)))

  (let ((url (elfeed-entry-link entry))
	(feed-id (elfeed-deref (elfeed-entry-feed-id entry)))
	(content (elfeed-deref (elfeed-entry-content entry))))
    (when (and (s-matches? "" feed-id)
	       (not (s-matches? "\\\\|pdf$\\\\|png$\\\\|jpg$" url))
	       (string-equal "" content))
      (setq content (pc/get-url-content url))
      (setf (elfeed-entry-content entry) (elfeed-ref content)))))

(add-hook 'elfeed-new-entry-hook #'pc/get-entry-content)

Say Howdy with Emacs!

Staying in touch with people is something I'm not very good at. Since I am not on popular (among my friends/family) networks – FB and Whatsapp – I don't even see random updates from people, to get some sense of being in touch.

I recently read some old posts by Sacha Chua and was inspired by how much code she had for contact management in her old blog posts. I was inspired by this post in particular to try and be more meticulous about how I stay in touch with people. Michael Fogleman blogged about his contact management work-flow using keepintouch. It seemed to do most of what I wanted, but I wanted this to be integrated with my org-contacts-db and I felt having native elisp code would make it easier to hook up email, chat, etc. to this.

I ended up writing a small utility called howdy to help me keep in touch with people. It currently has only a couple of features:

  • M-x howdy lets me update the last contacted timestamp for a contact.
  • Shows me contacts that I'm out of touch in the agenda, once I add the following snippet to an agenda file.
    * Howdy

I also have a few hooks to hook up jabber messages and email to update the db. I've added them to howdy-hooks.el in case anybody else wants to use them. They can also be used as examples to write other hooks. Feel free to contribute other hooks or suggest improvements. The library also ships with a modest test suite, that will hopefully make it easier for others to contribute.

I'm looking forward to experimenting with this over the next few weeks and improving it. Hopefully, it'll help me keep in touch, better than I do now.

WiFi Channels & my router's random invisibility

To update some data plan changes, I had to restart my router today. I did so from the (web) admin interface, from my dad's laptop. Guess what! The router goes invisible (to the laptop), after the restart! Other devices were still happily connected.

There were a few occasions in the past when this laptop had failed to find our home router. The solution that worked for my dad before was to restart the router; once, or may be a couple of times. But, this time it didn't work.

I had a feeling this was something to do with "Channels", but I had no idea what was going on.

I logged onto the router and started looking at the Wireless settings in the hope of finding some clue, and there it was! Wireless Channel: Auto Scan


It was anybody's guess that some channels that the router uses weren't supported by my dad's laptop. The restarts sometimes helped because the router would scan for channels that it thinks would work well, and end-up using something that the laptop's wifi card supported.

The router only had options of channels from 1 to 13. To see what channels a wifi card supports, you can run the iwlist freq command.

iwlist freq
eth1      26 channels in total; available frequencies :
	  Channel 01 : 2.412 GHz
	  Channel 02 : 2.417 GHz
	  Channel 03 : 2.422 GHz
	  Channel 04 : 2.427 GHz
	  Channel 05 : 2.432 GHz
	  Channel 06 : 2.437 GHz
	  Channel 07 : 2.442 GHz
	  Channel 08 : 2.447 GHz
	  Channel 09 : 2.452 GHz
	  Channel 10 : 2.457 GHz
	  Channel 11 : 2.462 GHz
	  Channel 12 : 2.467 GHz
	  Channel 13 : 2.472 GHz
	  Channel 14 : 2.484 GHz
	  Channel 36 : 5.18 GHz
	  Channel 38 : 5.19 GHz
	  Channel 40 : 5.2 GHz
	  Channel 42 : 5.21 GHz
	  Channel 44 : 5.22 GHz
	  Channel 46 : 5.23 GHz
	  Channel 48 : 5.24 GHz
	  Channel 149 : 5.745 GHz
	  Channel 153 : 5.765 GHz
	  Channel 157 : 5.785 GHz
	  Channel 161 : 5.805 GHz
	  Channel 165 : 5.825 GHz

So, the laptop's wifi card support all the channels that the router could use. iwlist freq also gives the current Channel if the device is already connected to a router. I ran this on my laptop and found that the router was using Channel 13. Why the hell wouldn't this work?! I don't know!

I decided to set a channel manually, but how do I pick a channel? The idea of "Auto Scan" is to find a channel that has relatively low interference. How do we do this manually? sudo iwlist scan scans for all the WiFi access points nearby and gives detailed information about them, which includes the Channel.

It looked like most of the routers nearby used channels 1 and 6, with a few 3s and 4s. Assuming, most of the other routers aren't doing this "Auto Scan" business, I set my router to always use Channel 9, and the laptop seems to be happily connected, now!

Playing music using mpsyt from Emacs

I've started using the wonderful mpsyt to play any music from youtube, since I'm not really interested in the video. But, since I use emacs for chat/IRC, I end up getting youtube links into emacs and opening them opens them up in my browser. I ended up writing some elisp to play the songs from within an instance of mpsyt running inside an emacs buffer.

(defun pc/short-url-at-point ()
  "Gets the short url at point.

This function is required only because
`thing-at-point-url-at-point' ignores urls (without a scheme)
that don't start with www."
  (let ((bounds (thing-at-point-bounds-of-url-at-point t)))
    (when (and bounds (< (car bounds) (cdr bounds)))
      (buffer-substring-no-properties (car bounds) (cdr bounds)))))

(defun pc/mpsyt-url (url)
  (let ((buffer (current-buffer))
	(mpsyt-proc-name "*mpsyt*"))

    ;; Start a new term with *mpsyt* if there isn't one
    (unless (get-process mpsyt-proc-name)
      (when (get-buffer mpsyt-proc-name)
	(kill-buffer (get-buffer mpsyt-proc-name)))
      (ansi-term "mpsyt" "mpsyt"))

    ;; Play given url in mpsyt
    (let ((mpsyt-proc (get-process mpsyt-proc-name)))
      ;; If something is already playing, stop it and play this...
      (term-send-string mpsyt-proc "\\n\\n\\n")
      ;; We wait for a bit, since looking for the prompt seems to fail, sometimes?
      (sleep-for 1)
      (term-send-string mpsyt-proc "\\n")

      ;; Actually send the command to playurl
      (term-simple-send (get-process mpsyt-proc-name)
			(format "playurl %s" url)))

    (switch-to-buffer buffer)))

(defun pc/mpsyt-url-at-point ()
  "Play the URL at point using mpsyt."
  (let ((url (or (url-get-url-at-point) (pc/short-url-at-point))))
    (if (not url)
      (message "No URL found")
	(message (format "Playing %s with mpsyt" url))
      (pc/mpsyt-url url))))

The current version of mpsyt crashes when run from inside emacs due to a bug in the code to get the terminal size, which should be fixed once this patch is merged.

I would've expected thing-at-point-url-at-point to be able to find urls even when they don't have a schema, but it tries to guess the schema from urls and fails to work when the url starts with instead of

I started off using the command-line interface of mpsyt by running it using shell-command or start-process. But, it seemed useful to have a buffer of mpsyt to switch to – easier to search for new music, repeating songs, etc. Not all tasks/actions are achievable through mpsyt's command line args.

I ended up writing more code than I thought I would have to1. But, I'm pretty happy with how this all works, right now.



- Isn't it true, more often than not?

org-drill for making it stick!

Those who read the last few posts here, would know that I have been experimenting with org-drill (a spaced repetition extension to Org mode). I have been using the system (almost) religiously for the past 2 months, and I do find that it has helped a great deal! (in some respects). I have also spent a considerable amount of time trying to reduce the friction to put new stuff into the system, and am constantly on the look out for further improvements.

Using this system has definitely helped with retention, and I find that I can recall quite a few things I have read a few weeks ago, that I would normally have been unable to. Though, I can recall a lot of information, I have been having a feeling of "fragmentation": the feeling of just retaining individual bits/fragments of information, while losing out on actually internalizing the knowledge; not seeing the big picture, etc.

Wozniak (the author of super-memo) warns against learning without understanding, and memorizing before actually learning stuff. I haven't consciously added stuff into the system that I didn't understand (when I added it), but, later it does feel like I have lost some connections or the understanding, and am only holding onto the fragments of information.

The problems as explained in (read: as interpreted by me from) Make it Stick appear to be:

  1. The understanding (if any) at the time of adding stuff into the spaced-repetition system is untested. It may just be familiarity masquerading as understanding.
  2. The lack of any spaced repetitions for the overall concept/understanding and actual repetitions only for individual bits doesn't help retention of the understanding (even if there was any, in the first place).

To work around this, I'm going to try adding questions that test understanding, to the system. The Super-memo team strongly recommends keeping the drill items small and easy to answer. This may be helpful in keeping each drill session short, but I would really like to add conceptual questions to the system, and see how it goes. I hacked org-drill to allow me to type out answers, before looking at the "correct" ones. This is an adaptation of a system that a fellow Hacker Schooler uses, and shared. Also, hopefully forcing myself to type out the answer will help me get around the problem of sometimes saying "yeah I know that", then looking at the answer only to reaffirm the feeling of familiarity, rather than actually testing myself. I'm still going to continue adding quick and short questions that test "bits of information", though. But, hopefully the additional conceptual questions are going to tie things together and help fill in the gaps. Lets see how this goes!

For those interested, my hacks to org-drill below. The code is really a hack, and welcome any suggestions on cleaning up the code.

(advice-add 'org-drill-presentation-prompt :around 'pc/org-drill-presentation-prompt)

(defun pc/org-drill-presentation-prompt (old-fun &rest fmt-and-args)
  "A presentation prompt that allows capturing answers."

  (let ((cb (current-buffer))
	(heading (nth 4 (org-heading-components)))
	(entry-id (org-entry-get (point) "ID"))
	(input ""))
    (switch-to-buffer-other-window "*org-capture-drill-answer*")
    (insert "# Hit C-c C-c once you are done answering!\\n")
    (insert (format "Answer: %s" heading))
    (org-entry-put (point) "QUESTION_ID" entry-id)
    (goto-char (point-max))
    (insert "  ")
    (org-time-stamp-inactive '(16))
    (insert "\\n\\n  ")
    (while (not (and input (equal input "")))
	(execute-kbd-macro input))
      (setq input (read-key-sequence nil)))
    (switch-to-buffer-other-window cb)
    (apply old-fun fmt-and-args)))

(advice-add 'org-drill-reschedule :around 'pc/org-drill-reschedule)

(defun pc/org-drill-reschedule (old-fun)
  "Calls the original reschedule, but also archives the answer"
  (prog1 (funcall old-fun)
    (let ((cb (current-buffer)))
      (switch-to-buffer-other-window "*org-capture-drill-answer*")
      (pc/org-refile-to-datetree "drill.org_archive")
      (message (buffer-name))
      (switch-to-buffer-other-window cb)
      (kill-buffer "*org-capture-drill-answer*"))))

(require 'org-datetree)
(defun pc/org-refile-to-datetree (journal)
  "Refile an entry to journal file's date-tree"
  (interactive "fRefile to: ")
  (let* ((journal (expand-file-name journal org-directory))
	 (date-string (or (org-entry-get (point) "TIMESTAMP_IA")
			  (org-entry-get (point) "TIMESTAMP")))
	 (dct (decode-time (or (and date-string (org-time-string-to-time date-string))
	 (date (list (nth 4 dct) (nth 3 dct) (nth 5 dct))))
    (with-current-buffer (or (find-buffer-visiting journal)
			     (find-file-noselect journal))
	(org-datetree-file-entry-under (current-kill 0) date)
	(bookmark-set "org-refile-last-stored")))
    (message "Refiled to %s" journal)))