A first year retrospective on Spark-in.me

Or what we achieved last year and a small summary for potential business clients

Posted by snakers41 on January 2, 2018

Buy me a coffeeBuy me a coffee

Become a PatronBecome a Patron

0 What?

It is customary for content makers to make annual retrospective posts. Usually such content is kind of cringe inducing and long.

In my case I will just list our achievements in 2017 and say explicitly what we can do for potential clients, namely:

  • Provide a personal summary of what is achievable within a short time span and limited cash investment (zero, not counting a PC build and a new one coming);
  • Provide some stats on our website and channel and some insights;
  • List explicitly what we can do for clients on project basis (being realistic - I yet do not believe that in 2018 this would be able to become our sole activity);

1 What was achieved, channel stats

This is more or less personal. I started self studying Data Science in yearly 2017. In a nutshell in 2017 I achieved the following:

  • Finished a decent amount of educational material  (ofc I did not read 100% of it) and shared the gist of it. Established a further learning curve, which I more or less followed since then (plus shared my opinions on the channel);
  • Found a decent job with an international company in ML/DS related field - which probably may be the best achievement of them all;
  • Participated in a number of international ML/CV competitions with a steady progression of results (some submits at all => 67th place => 18th place => 3rd place), more details below;
  • Wrote 50+ articles on the spark-in.me blog;
  • Wrote ~1000 blog posts on the telegram channel;
  • Almost enticed a handful of authors to participate in our blog. Probably this will be most likely DS-competition related;

Commercial and semi commercial project retrospective (yes I started doing commercial side-projects)

  • Social network phenomena modeling - post - commercial project;
  • Playing with 1m flat photos (also more done on this topic, I am unsure whether to share the results - they were mixed) - post RU / EN - semi-commercial;
  • Fast and brief industry website analysis via sitemaps - post sequence - semi-commercial;

CV competitions in retrospective

You may take it or you may leave it (and I participated only in competitions that provided value and education for me, not just stacking casino-like events):

  • 67th place here with 1 model (full HD semantic segmentation competition, with 5k images in train and 100k in test) - post;
  • 18th place here with 1 model (working with video, large dataset ~50+GB, working with sota object detection models) - post;
  • 3rd place here withing a team and an ensemble of 4 encoders + blending + weak semi supervised approach (working with video, working with HUGE dataset - 1 TB, working with a huge variety of models and approaches) - post;

I believe that competitions should bring out the best in people, make them learn and read papers and not just be stacking contests.

Looking for a decent job in retrospective

Well, looking for a job is a clusterfuck. 95%+ of job posts in the CIS are either / or:

  • False advertising (they entice you, but then you just work with legacy shit);
  • Severely underpaid (1/10 of same job in the USA - 1/2 or 1/3  would be ok though);
  • Crazy stupid requirements (top-notchC++ + JS,  perfect knowledge of ALL the deep learning frameworks, 5 years of experience with a technology released in beta 1-3 years ago to name a few);

I made a few cynical passes on the job marked in the CIS (1 2 3), until I landed on my current job, for which I consider myself lucky. Also I did a small pass on immigration to Cyprus.

Blog and Telegram channel stats

Well, I started the blog and Telegram channel to:

  • share what little I have (w/o investing much time in it though - I am not a marketing person);
  • attract smart people and facilitate discussion;
  • to make it serve as my real CV that would be growing with me not bound by any stupid business rules or corporate bullshit;

To address an elephant in the room - there is a open-data-science community in the CIS, which is sponsored by Mail.ru. Though their educational efforts are noteworthy and noble, I have a few concerns:

  • their open course targets hunting cheap cannon fodder for the Mail.ru machine (just read the comments here);
  • the community is poorly moderated and prone to mob mentality (and so called bro-culture - not to say that I am in favour of SJWs and safe-space, but Russian tech scene is notorious for being stupidly aggressive at times - just read comments on habrhabr.ru);
  • a closed corporate-driven non-SEO friendly platform can hardly be a launchpad for your own initiatives;

On the other hand - there a lot of people from competitions, that can share their experience.

Also a note on habrhabr.ru audience and moderation. Mostly because of prevalence of paid blog content there - you either create free content for them on their terms (they wanted to create some kind of remuneration system for authors but their rewards were laughable - like US$100 per year), or you pay for a blog where you can say any shit possible.

Suffice to say that 90% of BS is published under paid terms there and the audience in general is therefore really hostile towards ANY content regardless of its quality. I played with their website when I had time - but then decided that it was not worth the time invested - you will see in the stats that it is a good means of adding . 

Anyway, below are the stats for our website and channel.

Also during the existence of the channel there was zero sponsored content and zero content shared without me believing in its merit.

Telegram channel subscribers - most likely it looks like log, not like exp, i.e. it gets saturated as the content gets more professional and  focused - i.e. general audience (students) leaves

Telegram channel views - summer bump is due to i) habrhabr articles ii) students being active in summer

Telegram channel posts- this more or less reflects how much work and spare time I have

Channel reposts - as the content gets more professional and mature - the less likely I am to be reposted by mass media - it makes sense - 99% of channels in telegram are just cringe content, even DS channels are mostly just reposting some other feed or serve to be monetized - which are not my goals.

Website traffic - notice the bump - these are neural chicken coop articles

Traffic sources - the first line is telegram channel. I do not know which percentage of organic search is just me being lazy to remember a URL, but I hope that 50% is real visits

Post tag groups (some in Russian) - Data Science, interesting / philosophy, programming and nerd stuff, self-education and language learning, business, internet and web, our telegram channel. Note that one post can have multiple tags

Some other worthy content

  1. Bird voice recognition series - one to eight - I did this just for lulz and self-education mostly, if I knew about honk or wavenet, I would most likely have build a better project;
  2. Hardware posts 1 and 2;
  3. Data Science course recap;
  4. Neuro-chicken coop article sequence - mostly stupid pet project we did to test the limit of what we can do;

2 What we can do (for clients)

Well, almost anything is possible if properly funded =)

Most likely we have expertise in the following fields:

  • Modern computer vision;
  • ETL / data analysis;
  • Anything where state-of-the-art has ample paper and open-source code support;
  • Anything similar to the projects listed above;

But seriously there is a list of people, who is interested in investing a number of sleepless nights into interesting ML/CV/DS projects:

  • Yours truly - in case of a project most likely a jack of all trades;
  • Several senior / middle / junior data scientists, both CV and non-CV (computer vision) including me, I consider myself to be more of a middle-level specialist;
  • One really cool engineer guy, that may serve as DevOps (C++, assembler, deploy, etc);
  • Probably - a mathematician if required;
  • One Linux / Bitcoin enthusiast - who has assembled his mining rigs, but has nothing to do day-by-day, but has ample experience...with GPUs;

Key project caveats:

  • Most likely the project will be done part-time - we will respect the deadlines we set and consult on regular basis, but do not expect 24/7 support;
  • Most ML/DS projects require data annotation / literature overview / burning time to see if stuff sticks to the wall:
    • Therefore do not be frightened to invest 20-50% of total cost as down payment. Why? Because we value our time and projects like "do 95% of the job, and we will decide whether to pay for it" - are not about us (seriously - I have seen people who want a 1-2 month work to be done in advance for free just to determine whether to start - it does not work this way);
    • Failure is always an option;
    • We will be open about some kind of early stopping policy in the project to save our and your time;

3 Future plans

  1. Deploy at least 2-3 projects at work this year;
  2. Do at least 1 ML/CV competition per quarter;
  3. Do at least 1-2 commercial projects per year;
  4. Finish my powerful server build;

Happy holidays to everyone!

Illustration source

Buy me a coffeeBuy me a coffee

Become a PatronBecome a Patron