Yesterday AMY v1.5.4 was released with a bunch of interesting changes.
AMY is now capable of going through all active workshops and checking if
their metadata (slug, start/end date, instructors and helpers) had changed.
If so, a notification would be shown to the person associated with the event.
Aditya Narayan improved history log by enabling it to show related
objects’ real names instead of IDs.
Greg Wilson added a button to mail everyone involved in a workshop
as part of his GSoC 2016 project, Chris Medrela added the trainings
dashboard in its first shape
Chris Medrela in collab with Greg Wilson added SWC/DC instructor
badge indicators to: all persons, event details, and “find instructors” views
Finally, I upgraded the “Find instructors” view to enable admins to search
for not only instructors, but also in-progress instructor trainees and people
who once had been associated with the workshop organization. Therefore the
new name for “Finding instructors” is now “Find Workshop Staff”.
Aditya Narayan fixed permissions issue when accessing event details page
by people without permission to add ToDo items.
I fixed a small error preventing DataCarpentry logo from showing up on DC
workshop request page.
I fixed a small error doubling people with both superuser and admin group
permissions in the admin lookup backend.
Even smaller error was pointing admins to use wrong URL in import event
template. It is now fixed.
Chris Medrela fixed the former “debriefing” view (now
“instructors by date”) errors concerning emails generation when some people’s
emails were unavailable.
Chris Medrela fixed I think the oldest unnoticed issue ever: wrong link
generated for airport’s IATA code.
Finally, Chris Medrela fixed missing template from one of the new
features for this release.
I’d like to thank Chris and Aditya for their continuous work on AMY. This is
just the beginning, and if you’re curious go check out what’s planned for
v1.6 (probably the next release). There’s a lot happening around AMY
recently, so stay tuned for the next release.
This observation led to an investigation on the servers and eventually to fix
for a critical bug that caused the data loss.
But before we jump into sysadmin work…
What is a “tag” in AMY terms?
A tag is label that we give to various workshops. For example, all Software
Carpentry workshops will have SWC tag, and all Data Carpentry workshops will
have DC tag.
There are more labels we use, and the one that went missing was DC.
One event can have between zero and all the tags we have in the system, which
means it’s a many-to-many relationship between events and tags. This type of
relationship requires additional intermediate table in the database.
Contents from that table were missing because they were removed with removal
of the DC tag.
I started the investigation by narrowing timespan where the event, that led to
data loss, occurred.
Then I followed by reading WWW server access logs to find out what happened in
that timespan in hope I could find the bug.
After narrowing list of suspects, I was able to reproduce the bug.
Finally I retrieved the lost data from the most recent backup that still had
Ways to remove tags from AMY
There’s no interface for removing tags other than Django’s auto-generated admin
interface; only a couple of people have access to it.
So the data loss was either human error or it was caused by code bug. This
conclusion helped me define what I should be looking for in the WWW server log.
Narrowing event occurence timespan
AMY’s being backed-up by multiple systems; I logged into each of them and run
multiple SQL queries on different databases to find out which backup had the
DC tag and was the newest.
It turned out that backup from 2016-04-06 17:00 UTC-4 was the most recent still
with the DC tag.
In the meantime I was fighting timezone correction… Our backup systems are in
different datacenters and were running on different timezones.
Reading access log
First thing I checked in the access log is if anyone was using the admin panel
to remove the tag. Unfortunately this possibility was quickly ruled out; so
the loss was caused by code bug.
However, after reading the log no action stood out.
Short, important side story: Software Carpentry website rebuilds every 30
minutes. Each rebuild is shown in the log by multiple requests to AMY’s API:
Website grabs published events tagged by SWC, DC and TTT tags. This
sequence of requests repeats every 30 minutes.
After reading the log over and over I noticed that two consecutive calls to
/api/v1/events/published.yaml?tag=DC yielded results of very different
Apparently then the DC tag disappeared, the API started returning all the
published events, no matter if they were tagged SWC or something else.
This was a clear indication that the DC tag disappeared between 15:01 and
That timespan doesn’t look like 17:00. Timezones… programmer’s nightmare.
There was some user activity in this 30-minutes long window and one thing
caught my eye:
(The actual URL was slightly changed to remove unnecessary information.)
This was a call to event merge functionality: someone wanted to merge
workshops 2016-05-06-RDAP16-Atlanta and 2016-05-06-asist.
Short side note: merge functionality allows user to use more advanced
strategy for merge; one can select which properties (or fields) in the final
event should be used from event A (2016-05-06-RDAP16-Atlanta in our example)
and which should be from event B (2016-05-06-asist). Additionally in case of
event’s tags it’s possible to combine them from both base events.
I started testing different strategies. I had a feeling that the bug had
something to do with strategy for event tags. :)
Finally I reproduced the bug by using following strategy:
base event: 2016-05-06-RDAP16-Atlanta (event A)
tags: from event B.
At that point I decided to retrieve the lost data using SQL import/export
functionality from the optimal (newest & containing the lost data) backup found
The only code used in event merge functionality that would trigger accidental
This code is used for substituting related objects (tags in our case). It works
if some field’s strategy is to switch to objects from the other event, then
remove all currently assigned objects and add objects from the other event’s
Translated into tags:
if user wants to use event 2016-05-06-RDAP16-Atlanta as base event, but
keep tags from the other event (2016-05-06-asist) then remove current tags
from base event and add tags from the other event.
See what’s going on here? Base event’s tags were removed instead of being
Django: related manager and assignments
In this section I’m going to talk about how relations work and if they can be
unassigned instead of being removed.
For many-to-many relationships (e.g. multiple events can be assigned multiple
tags) Django creates an intermediate table that stores assignments. In this
case, unassigning event from tag is as simple as removing that stored
assignment from the intermediate table.
For one-to-many relationships (e.g. multiple events can have the same
organizer) there’s no need for additional table; storing the organizer looks
like event.organizer = SomeOrganizer.
In case of the one-to-many relationships we can unassign the event from
SomeOrganizer if and only if event.organizer field can store NULL value.
If it cannot, then we have to remove the event.
So the bug existed because the case of unassignment was not taken into account
– only removal of related objects was accounted for.
Fix: need to find out when we can unassign
Long story short: in Django only related manager with .clear method can
unassign; if this method is not present then the only option is removal.
So fixed code looks like this (minus the comments):
(Yes, it probably should use try - except block instead of hasattr; pull
All in all, I feel good about this bug; if anything, I’d like eliminate the
errorneous timezone arithmetics.
Also all backup mechanics and logging worked really nice.
As a result of investigation described above, the bug and the solution to it
last night I released AMY v1.5.2.
Since my comeback to university for MSc, the development of AMY slowed down.
This past month we had a number of submissions from prospect GSOC’16 students
(yay!) and, for the first time, number of bugs fixed exceeded number of new
Since the number of new features is small, I decided to release a minor version
Contributions by GSOC students
March 2016 held GSOC’16 applications period for students. We had a lot of
students this year and we encouraged them to take a look at AMY and maybe fix
something. This resulted in a number of good contributions.
Starting with new features since there’s so few of them:
Greg Wilson extended the check_certificates.py command to additionally
return events people participated in
Shubham Singh added “Notes” field to instructor profile update form
Development of AMY in February had seen a boost due to my winter break
(I graduated university and had about a month of free time before MSc studies
started), and that ended with today’s release of v1.5.
One bugfix: don’t break whole timeline widget when there are TODOs without due
New feature: stop using dots (.) for usernames, use underscores (_)
This was an interesting issue: since we rely on some Ruby software on the SWC
website, we can’t have dots in filenames (they’re treated as parameter access
operator, for example: banaszkiewicz.piotr is piotr parameter on
banaszkiewicz object). But we have filenames that correspond to usernames in
AMY. So it was necessary to drop dots and switch to underscores…
Unfortunately, due to the way we have our project laid out on GitHub, some of
the features implemented for v1.4 before this feature were included in the
deployment; I will still put them to v1.4 section, though.
The biggest highlights of this month are definitely:
first approach to the new API
There were also some essential features, but not much. In v1.5 there will be
a lot more.
We had to programmatically fix/complete some of our records:
historical events on production server were assigned an administering
organization (that’s the one responsible for taking care of the workshop
new DC instructors were added: anyone with a special note or anyone who
taught at DC workshop now has a DC instructor badge.
Looking at the
list of issues
for this release, it seems like many bugs were fixed. It’s true, however the
bugs themselves weren’t that big:
some fields containing numerical values were switched to other type of fields
to prevent slider from appearing; the background for this issue was that
when scrolling through a page with form, on MacOSX people would accidentally
change values of numerical fields,
generation of initial revisions was added to the process of creating a fake
database for development use,
some types of events (stalled and unresponsive) were kicked out from
some invoice options were changed to remain consistent with the rest.
As usual, we hit a fair number of new features for AMY:
Person model is now able to store person’s occupation and ORCID code,
events can hold links to survey results (pre-workshop for learners and for
instructors, post-workshop for learners and for instructors, and long-term
API call for getting members list is now for logged in users only, and
returns members’ usernames too,
merging events: with option to select fields from either of events, or (in
some cases) even to combine them together. The underlying code may be
reused to fix persons merging,
workshop issues page now allows to filter workshops by assigned admin
move most of reports to the API; 3 reports now present a graph for easy use,
1 report was requested to be moved to API, and 1 new report was requested
(and I made it in API),
API: new structure. It’s using hyperlinks between resources and allows to
view and filter for example people associated with specific events,
slow tests were fixed (we gained probably around 10s on whole test suite,
even though about 10-20 new tests were introduced); now it’s time to speed up
Greg added two new badges to the database: maintainer and trainer; I made
sure to allow for editing badges via Django Admin interface, and also added
these new badges to the fake database command,
Greg also added a new command for getting list of people who should be warned
because their instructor training was about to close,
meanwhile I added a command for displaying report about instructor training
The next release may be last one made on such regular basis. The reason for
this is that in March I start a new academic year (Masters!) and I know it will
be very hard; what I don’t know is if I have time to work on AMY this much as
in previous months.
Therefore there are multiple important features we want to implement in the
v1.5 release – look for
the “essential” issues.
I’m studying Automation Control and Robotics, a major that doesn’t say
clearly what a person would do after graduating it.
From time to time I talk with people who either think I’m studying programming,
or that I’m going to build robots in the future.
What is Automation Control and Robotics?
Most people don’t know what automation control is, so they focus on the part
that sounds familiar: robotics. They automatically assume I’ll be building
Well, I won’t.
My studies concentrate on things like control systems (think of it as
control theory, electronics,
specialized electronics (FPGAs, embedded systems, assembler programming,
industrial-class robots), leverage of
in industrial process identification, computer vision,
As you can see, I was mostly balancing between engineering and very specialized
computer science. There really was very little robotics during my Bachelor’s
What can you do after graduating?
I like to call it: engineering.
People graduating automatics control and robotics are vast-minded, and ready to
work in pretty much any engineering field that has something to do with
we can set up wind-turbines
or air control systems
or nitrogen refill systems
or fine-tuning of power plants
or building assembly lines
There are thousands of options, all different kinds of industries.
Do I enjoy these studies?
Contrary to many of my friends, I do enjoy studying automation control and
robotics. I learned a lot of engineering- and maths-related subjects, and
I have hopes for a great work in future.
the same API gained a new endpoint used for generating list of
current members of Software-Carpentry Foundation; this is in no way
official list of members, but it can be used to help determine who’s
eligible (credits for this one go to both Greg and me since I
finished his pull request),
it’s now possible to search in events’ URL, contact, venue and
2 new options for invoice status were added (not invoiced for
historical reasons and not invoiced because of membership),
more places (workshop issues, and on each workshop without
attendance data) to send “Give us attendance figures” emails, more
people to send to,
profile update requests can now be edited by admins.
There’s number of issues scheduled for
v1.3 release, and
there will be others added to that list. The problem is that December
end of my semester,
one huge exam, a couple of smaller tests,
deadline for my BEng. project and thesis.
So total time spent on AMY in December probably will be lower than what
I did in November.