I’m Shen Ting, I write about Data Science, Analytics, Contract Bridge, Web3 and more.
For more about myself, see my About page or read my latest posts.
>
I’m Shen Ting, I write about Data Science, Analytics, Contract Bridge, Web3 and more.
For more about myself, see my About page or read my latest posts.
Out of memory So I left the node running for a few days and noticed the sync rate dropping significantly. Upon watching the log print for a while, I saw the following: Aug 13 01:51:22 geth geth[12749]: runtime: out of memory: cannot allocate 4194304-byte block (2422145024 in use) Aug 13 01:51:22 geth geth[12749]: fatal error: out of memory Aug 13 01:51:22 geth geth[12749]: goroutine 16172 [running]: Aug 13 01:51:22 geth geth[12749]: runtime.throw(0xce93c8, 0xd) ...
I’ve been meaning to look into running an ETH node for a while, and Roman helpfully sent me this guide, which helpfully links to another guide on running an ETH node on a Raspberry Pi 4. There was some scepticism if a Raspberry Pi 4 is actually powerful enough, but the hardware would cost me just $300 for both the Pi and an external 500GB SSD, and it’s still usable for other projects even if this doesn’t work out. ...
I was having a conversation with David and the subject of outsourcing came up. I’ll start by stating I am not against outsourcing. There are definitely situations where it makes sense, especially for resource-constrained organizations who can’t possibly cover every single function by themselves. (On a personal level, hosting this site on SquareSpace is also a form of outsourcing.) Outsourcing does work for one-off projects where it doesn’t make sense for an organization to hire long-term. So, if you’re working on a one-off data project, it probably makes sense to outsource the work. ...
Yes, we’re talking about actual names It’s a pity that I came across this 6 months too late, as it would have saved me an hour repeating myself thrice on why we can’t use names as a unique key to join across different data sources. Thankfully my point eventually got across, but to any fellow developer/data scientist/engineer having to explain to stakeholders, hopefully this helps. And no, you do not want to use email addresses as a unique key either: ...
Since the new house is relatively settled now, we were inspired by woonie’s post to try a different route. We lost by less than 5 minutes, maybe we would have beaten his team if not for the toilet break at Bukit Panjang which resulted in us missing a LRT.
Point vs Interval Estimates One issue with the Elo system as previously raised is how it doesn’t give any information about the uncertainty of the estimate. The Glicko system is an attempt to encompass this information additionally. Think about this: If a player has played 3 games: win against a player rated 1500, lose against a player rated 1450 and win against a player rated 1200, how confident are we in the derived rating? On the other hand, if a player has played 50 games, and has won 35 games against opponents rated 2000 and below, drew 5 games against opponents rated 2000-2050 and lost the remaining 10 games versus opponents rated above 2050, we can be fairly certain that his rating is somewhere between 2000-2050. ...
I was invited by the Rafflesian Parents Association to give a talk on being a data scientist (as part of a five speaker panel) earlier today. Here are the slides. Was not totally happy as I didn’t realise that Google Sheets speaker notes covers the actual slides when I was screen sharing over zoom. Also, this is the first time I’ve actually given a career talk, and realised it was 15 years ago when I was sitting in the audience on the other side. Time flies! ...
Introduction This is going to be a new series on rating systems, which is a vastly underrated (pun intended) area of statistics and data science. Rating systems has actually been a part of my life (and probably yours), from early days in chess and then games with matchmaking like CS:GO and Valorant, to now thinking if contract bridge should also have one. Historical Context Having been around for centuries, chess is a game which people have wasted much time on arguing/debating who is the best player. It’s probably slightly surprising then that the first modern rating systems only appeared around or after the end of World War 2. The first systems (Ingo and Harkness) were quite simple and used the idea of the average rating of opponents with adjustments for the results. ...
A while ago, I wrote about the GE2020 sample count. Together with Yong Sheng,[] we gave a talk about this at DataScience SG last night (Youtube link)](https://www.youtube.com/watch?v=U9-zax0mMrw). Do also check out the second talk as reinforcement learning is always an interesting subject - props to Siddarth for giving that quick summary of RL! Also, Symbolic Connection’s episode featuring me is now live! Thanks to Koo Ping Shung for organizing both of the above. ...
Election season is upon us again here in Singapore and Polling Day is this Friday. The last election in 2015 introduced the sample count. What is the sample count? From the ELD Website: From the votes cast at each polling station, a counting assistant picks up a random bundle of 100 ballot papers (in front of the candidates and counting agents present) and counts the number of votes for each candidate (or group of candidates in the case of a GRC). ...