>

You Should not be Using Anaconda

Imagine you’re a newcomer to Python and data analytics and some website tells you to use conda. Days later, you get an email from Anaconda telling you that you’re in breach of their licensing terms because the organisation you’re working for has more than 200 employees! Confused, you do a quick google and find this: https://www.anaconda.com/blog/is-conda-free Now you’re even more confused. Meanwhile on the second search result, the answer is clearer: https://stackoverflow.com/questions/74762863/are-conda-miniconda-and-anaconda-free-to-use-and-open-source ...

August 1, 2024 · 2 min · Shen Ting

DataScience SG Talk: Data Challenges

I gave a talk last night at Data Science SG entitled “Trustable Data: Challenges in a National Sports Association”. It gives an outline of what I’ve encountered and done in the past few years for SCBA. Talk slides can be found here

April 25, 2024 · 1 min · Shen Ting

On Mentorship

Koo writes: “In lifelong learning, we are expecting the participants to be able to apply what is being taught into their work. Applications to generate value is the key objectives for lifelong learning programmes. Assessment can conducted if it is on the application phase but unnecessary (but good to have) if it is to check if the participants have gained the knowledge needed from the course. In fact, mentoring might be more important as it guides participants, with an unorganized knowledge base as mentioned above, to start organizing the knowledge base and see where the applications of the knowledge are at the same time. However, this is difficult again due to cost issue. Yes, current experienced staff can be the mentor but they are already swarmed with their own work. Hiring external mentor could be a solution but again, opportunity cost for the freelancer can be high if the company only require an hour from the external mentor for guidance.” ...

October 3, 2023 · 4 min · Shen Ting

ETH Node Part 3: Block Sync

eth.syncing { currentBlock: 13060865, highestBlock: 13060940, knownStates: 142012464, pulledStates: 141992942, startingBlock: 0 } After slightly over 38 hours, the blocks are sync-ed as of 4pm! I actually started over at 2am the previous day because I accidentally turned off power to the Pi and the ancient got corrupted :(

August 20, 2021 · 1 min · Shen Ting

ETH Node Part 2: 64 Bit Ubuntu

Out of memory So I left the node running for a few days and noticed the sync rate dropping significantly. Upon watching the log print for a while, I saw the following: Aug 13 01:51:22 geth geth[12749]: runtime: out of memory: cannot allocate 4194304-byte block (2422145024 in use) Aug 13 01:51:22 geth geth[12749]: fatal error: out of memory Aug 13 01:51:22 geth geth[12749]: goroutine 16172 [running]: Aug 13 01:51:22 geth geth[12749]: runtime.throw(0xce93c8, 0xd) ...

August 19, 2021 · 3 min · Shen Ting

ETH Node Part 1: Raspberry Pi 4

I’ve been meaning to look into running an ETH node for a while, and Roman helpfully sent me this guide, which helpfully links to another guide on running an ETH node on a Raspberry Pi 4. There was some scepticism if a Raspberry Pi 4 is actually powerful enough, but the hardware would cost me just $300 for both the Pi and an external 500GB SSD, and it’s still usable for other projects even if this doesn’t work out. ...

August 13, 2021 · 2 min · Shen Ting

Why you probably shouldn't outsource data work

I was having a conversation with David and the subject of outsourcing came up. I’ll start by stating I am not against outsourcing. There are definitely situations where it makes sense, especially for resource-constrained organizations who can’t possibly cover every single function by themselves. (On a personal level, hosting this site on SquareSpace is also a form of outsourcing.) Outsourcing does work for one-off projects where it doesn’t make sense for an organization to hire long-term. So, if you’re working on a one-off data project, it probably makes sense to outsource the work. ...

July 2, 2021 · 2 min · Shen Ting

Mandatory Reading on Names

Yes, we’re talking about actual names It’s a pity that I came across this 6 months too late, as it would have saved me an hour repeating myself thrice on why we can’t use names as a unique key to join across different data sources. Thankfully my point eventually got across, but to any fellow developer/data scientist/engineer having to explain to stakeholders, hopefully this helps. And no, you do not want to use email addresses as a unique key either: ...

April 29, 2021 · 1 min · Shen Ting

MRT Challenge

Since the new house is relatively settled now, we were inspired by woonie’s post to try a different route. We lost by less than 5 minutes, maybe we would have beaten his team if not for the toilet break at Bukit Panjang which resulted in us missing a LRT.

January 10, 2021 · 1 min · Shen Ting

Rating Systems (2): Glicko-2

Point vs Interval Estimates One issue with the Elo system as previously raised is how it doesn’t give any information about the uncertainty of the estimate. The Glicko system is an attempt to encompass this information additionally. Think about this: If a player has played 3 games: win against a player rated 1500, lose against a player rated 1450 and win against a player rated 1200, how confident are we in the derived rating? On the other hand, if a player has played 50 games, and has won 35 games against opponents rated 2000 and below, drew 5 games against opponents rated 2000-2050 and lost the remaining 10 games versus opponents rated above 2050, we can be fairly certain that his rating is somewhere between 2000-2050. ...

October 4, 2020 · 2 min · Shen Ting