#duraspace IRC Log


IRC Log for 2017-01-18

Timestamps are in GMT/BST.

[0:45] * peterdietz (uid52203@gateway/web/irccloud.com/x-bkppgbupyaeintzi) Quit (Quit: Connection closed for inactivity)
[6:46] -adams.freenode.net- *** Looking up your hostname...
[6:46] -adams.freenode.net- *** Checking Ident
[6:46] -adams.freenode.net- *** Found your hostname
[6:46] -adams.freenode.net- *** No Ident response
[6:46] * DuraLogBot (~PircBot@webster.duraspace.org) has joined #duraspace
[6:46] * Topic is 'Welcome to DuraSpace IRC. This channel is used for formal meetings and is logged - http://irclogs.duraspace.org/'
[6:46] * Set by tdonohue on Thu Sep 15 17:49:38 UTC 2016
[13:10] * mhwood (mwood@mhw.ulib.iupui.edu) has joined #duraspace
[14:19] * tdonohue (~tdonohue@dspace/tdonohue) has joined #duraspace
[14:57] * th5 (~th5@unaffiliated/th5) has joined #duraspace
[16:53] * ntorres (c1895861@gateway/web/freenode/ip. has joined #duraspace
[17:48] * ntorres (c1895861@gateway/web/freenode/ip. has left #duraspace
[19:12] * hpottinger (~hpottinge@ has joined #duraspace
[19:57] <tdonohue> REMINDER TO ALL: DevMtg starts in a few minutes. Agenda at: https://wiki.duraspace.org/display/DSPACE/DevMtg+2017-01-18
[19:57] <kompewter> [ DevMtg 2017-01-18 - DSpace - DuraSpace Wiki ] - https://wiki.duraspace.org/display/DSPACE/DevMtg+2017-01-18
[20:01] <tdonohue> Hi all, welcome to our weekly DSpace DevMtg. Agenda linked above
[20:01] <tdonohue> let's wakeup the Committers now (pinging helix84, hpottinger, mhwood, terry-b)
[20:01] <mhwood> ZZzzz...*snort*!
[20:02] <terry-b> hello!
[20:02] <tdonohue> So, before we get into the meeting topics, just a reminder the next DSpace 7 meeting will be Thurs, Jan 26. We (Art, Andrea & I) will be testing out a new call platform (to let us expand potentially beyond 15 attendees) and will be sending out more info soon
[20:02] * jcreel (~jcreel@jcreel.tamu.edu) has joined #duraspace
[20:03] <tdonohue> Also, as of today, we have a DSpace Slack team. If you want an invite, please visit https://goo.gl/forms/s70dh26zY2cSqn2K3 and submit your email. This Slack team has not (yet) been announced on lists, but it will be once we get some more "early testers" on the signup process, etc.
[20:03] <kompewter> [ Request invite to DSpace.org Slack ] - https://goo.gl/forms/s70dh26zY2cSqn2K3
[20:04] <tdonohue> Now, into the actual agenda for today...
[20:05] <tdonohue> terry-b had started/tracked this discussion and wanted more feedback I believe: DS-3454
[20:05] <kompewter> [ https://jira.duraspace.org/browse/DS-3454 ] - [DS-3454] Disable Legacy Usage Reports (log file based usage statistics) - DuraSpace JIRA
[20:05] <tdonohue> Also discussed on lists at https://groups.google.com/forum/#!topic/dspace-tech/BKaTaVe9KTc
[20:05] <kompewter> [ Google Groups ] - https://groups.google.com/forum/#!topic/dspace-tech/BKaTaVe9KTc
[20:05] * tom_desair (~tom@d8D874131.access.telenet.be) has joined #duraspace
[20:06] <tdonohue> My feeling is that we should include this in the RoadMap as part of our goal to have *one* primary stats engine (Solr Stats).
[20:06] <terry-b> I feel like the legacy reports have been "legacy" for a while, but users depend on them
[20:06] <terry-b> I presume disabling functionality in a 6.x release would not be advised
[20:07] <tdonohue> And, in fact, I already updated the RoadMap (see 7.0 Priority 1 table, #3 row) to link to terry-b's ticket: https://wiki.duraspace.org/display/DSPACE/RoadMap#RoadMap-CandidateFeaturesforDSpace7.0-Priority1
[20:07] <kompewter> [ RoadMap - DSpace - DuraSpace Wiki ] - https://wiki.duraspace.org/display/DSPACE/RoadMap#RoadMap-CandidateFeaturesforDSpace7.0-Priority1
[20:07] <tdonohue> terry-b: yes, we don't remove features in minor releases. So, we're "stuck" with this for 6.0, but can remove/replace it for 7.0
[20:08] <terry-b> There are a handful of additional action we will want to track (logins, oai requests, item archives). If we then have a rich reporting interface in REST, we should be good
[20:08] <tdonohue> I also see that aschweer already added some comments to the ticket on features that these Legacy Stats have that are not (yet) in Solr Stats. So, it'd also be good to add those into Solr Stats
[20:09] <terry-b> There were some side comments about migrating from SOLR, as the primary record but I presume that is not the roadmap
[20:09] <tom_desair> We could even extend the event logging to have a complete “audit” trail: Who edited which item and when.
[20:09] <terry-b> That would be useful
[20:10] <terry-b> I will create a ticket for the new transactions we want to track
[20:10] <tdonohue> So, I guess the main discussion question here is: Does anyone feel we should *not* be moving to remove these Legacy Stats? If there's really no controversy here, then basically we just need a volunteer to start moving this forward
[20:11] <tom_desair> My concern with only storing statistics in SOLR has two reasons. One: SOLR is not meant to be used a data store, only as a search index (see Peter Dietz comment in the mailing list). Two: Making a consistent backup of DSpace (including SOLR stats) is now not trivial.
[20:11] <mhwood> The current situation is confusing. We should sort it all out. I'm less sure that we ought to just shovel everything into Solr .
[20:12] <tdonohue> I'm perfectly OK with rebuilding Solr stats to be only an Index. That's a *separate* task to me though.
[20:12] <tdonohue> First, I just want to solve the issue of having 3 Stats engines under "simultaneous support" (Elasticsearch, Solr and Legacy)
[20:13] <terry-b> How should we track the question of whether solr should be master? Would that need to fall later in the roadmap, or would that need to happen in 7.0?
[20:13] <mhwood> A. We should be capturing the raw cases that we need. B. We should provide consistent statistical products that meet common needs. C. We should not stand in the way of folks who want raw cases to do statistics that we haven't thought of.
[20:14] <tdonohue> terry-b: "whether solr should be master"? Not sure I understand that statement
[20:14] <hpottinger> re ES stats, DS-3287 has an update
[20:14] <mhwood> D. We should not confuse people with overlapping reports by different methods.
[20:14] <kompewter> [ https://jira.duraspace.org/browse/DS-3287 ] - [DS-3287] ElasticSearch fails (does not work at all) - DuraSpace JIRA
[20:14] <terry-b> The master copy of the stats records or just a secondary index
[20:15] <tdonohue> oh, I see now, terry-b. Sorry, I always think of "master" as GitHub master branch ;)
[20:15] <tom_desair> The work on removing the “legacy stats reports” will have to wait untill the REST UI in mature enough. However, making the database master of the stats (and using SOLR only to index and efficiently query those stats) can already start.
[20:15] <hpottinger> My brain is trying to tell me we've had this discussion before
[20:16] <terry-b> I remember the discussion but not a resolution
[20:16] <tdonohue> DS-2525 is related to this (I'm searching JIRA right now)
[20:16] <kompewter> [ https://jira.duraspace.org/browse/DS-2525 ] - [DS-2525] Separate &quot;access log&quot; for usage event data - DuraSpace JIRA
[20:17] <tdonohue> DS-3450 is related (somewhat)
[20:17] <kompewter> [ https://jira.duraspace.org/browse/DS-3450 ] - [DS-3450] Record OAI, SWORD, REST calls in SOLR stats - DuraSpace JIRA
[20:18] <tdonohue> I'm updating this tickets as "Statistics" component
[20:18] <tom_desair> I’m not sure if it is a good idea to only log usage events to a (separate) log file as that would not simplify the DSpace backup process.
[20:18] <tdonohue> *these*
[20:19] <mhwood> How not? Flat files are easy to back up.
[20:19] <tom_desair> But not if they are constantly changing. You could catch an unfinished write operation.
[20:20] <tom_desair> (That’s the difficulty you have with SOLR now)
[20:21] <mhwood> OK. The storage mechanism is less interesting than getting the cases out where serious ad-hoc analysis can be done, if people want to do that. Also uncluttering the general operational logs is interesting.
[20:22] <tdonohue> We don't have to figure this all out now. But, we should create a ticket to update Solr Stats to be an *index* of [something] (DB, flat file or something else)
[20:22] <tom_desair> Agreed
[20:22] <mhwood> Solr can perhaps contain finished products, indexing them to make slicing easy.
[20:22] <hpottinger> I'm pretty sure log file issues have been well resolved by others, it's a pretty common pattern (common enough that log analysis tools exist for this use case)
[20:22] <terry-b> What do other repository systems do for stats?
[20:23] <terry-b> hpottinger, I was thinking the same thing.
[20:23] <tdonohue> google analytics & stuff things in log files
[20:23] <mhwood> DO other systems do stat.s built-in?
[20:24] <tdonohue> I'm pretty sure a lot of others don't have stats built in. They assume you will use Google Analytics (or something like it), or just point a third-party log parser at your access logs
[20:24] <tom_desair> The advantage of DSpace could be that we could take this much further that just “view” stats.
[20:25] <mhwood> I worry that DSpace will wind up as 20% repository code, 80% statistical do-it-all.
[20:25] <hpottinger> if we had a URL that reflected the hiearchy of the repository, we could easily just use the apache logs
[20:25] <tdonohue> Yes, to be fair, it could be we still depend on Google Analytics (and the like) for flashy "view info stats", and tailor our "built-in" stats engine for stuff that is highly DSpace specific (e.g. Admin stats, etc)
[20:26] <tdonohue> As an example, Google Analytics cannot track "how many new items were deposited this month/year/forever?" But, DSpace could do that much easier
[20:27] <terry-b> We have a very full plate for 7.0. How much do we want to take on?
[20:27] <tdonohue> In any case, let's get a volunteer to create a JIRA ticket about making Solr stats an index (instead of a storage area).
[20:27] <terry-b> I will create a ticket.
[20:28] <hpottinger> 2525 is kind of already that ticket?
[20:28] <mhwood> Kind of. Because we will need *some* new storage for cases.
[20:28] <mhwood> How we store them accessibly and reliably is discussable.
[20:28] <tdonohue> terry-b: to clarify, we are not promising this for DSpace 7. Our RoadMap is very specifically ordered. This is #3 on our list. #1 is massive (new UI), and if we only get that done, then everything else will be rescheduled
[20:29] <hpottinger> maybe just re-write the summary/description for 2525
[20:29] <tdonohue> So, it's "nice to have" for DSpace 7. If someone wants to "run with it", then we may be able to make it happen. Otherwise, we aren't guaranteeing it
[20:29] <terry-b> mhwood, it is your ticket. Are you OK with a new description?
[20:30] <hpottinger> in the process of working on 7, we might get it "for free"
[20:30] <mhwood> Yes, I reckon so.
[20:31] <mhwood> The same thing, but more abstractly?
[20:31] <hpottinger> I think so, pull out the specifics, maybe capture them in a comment instead
[20:32] <tdonohue> I'm not sure 2525 is the same as "Solr Stats needs to be an index"
[20:32] <tom_desair> When create the REST API methods for DSpace 7, we can log some “extra” events in the same effort.
[20:32] <mhwood> Sure. I'd just like to have less clutter and more accessibility of raw data.
[20:32] <hpottinger> I'm not sure Solr Stats needs to be an index ;-)
[20:33] <terry-b> Title updated, conversation logged: https://jira.duraspace.org/browse/DS-2525
[20:33] <kompewter> [ [DS-2525] Make SOLR Stats an Index of Stats Records (rather than the definitive copy of stats) - DuraSpace JIRA ] - https://jira.duraspace.org/browse/DS-2525
[20:33] <kompewter> [ https://jira.duraspace.org/browse/DS-2525 ] - [DS-2525] Make SOLR Stats an Index of Stats Records (rather than the definitive copy of stats) - DuraSpace JIRA
[20:33] <tom_desair> Looks good to me
[20:33] <mhwood> If the word "index" is making people nervous, how about "cache"? That's what Solr is for.
[20:34] <mhwood> Whichever you like.
[20:34] <tdonohue> right...Solr is a cache/synthesis of raw data coming from somewhere else
[20:34] <hpottinger> "something that is built to hold authoritative data"
[20:34] <terry-b> https://jira.duraspace.org/browse/DS-2525
[20:34] <kompewter> [ [DS-2525] Make SOLR Stats a Cache of Stats Records (rather than the definitive copy of stats) - DuraSpace JIRA ] - https://jira.duraspace.org/browse/DS-2525
[20:34] <kompewter> [ https://jira.duraspace.org/browse/DS-2525 ] - [DS-2525] Make SOLR Stats a Cache of Stats Records (rather than the definitive copy of stats) - DuraSpace JIRA
[20:34] <mhwood> Good.
[20:34] <tdonohue> ok, looks good for now
[20:35] <tom_desair> and SOLR can contain more data (e.g. metadata of the item related to the usage event) than de definitive copy (only the timestamp, user agent and other request info).
[20:35] <tdonohue> So, sounds like we are done with this discussion
[20:36] <hpottinger> well... we want to avoid the situation where we use Solr as the data store
[20:36] <mhwood> No worries. The definitive metadata store is still elsewhere.
[20:37] <tdonohue> As we are over 1/2 through this meeting, I'm going to ask everyone to bring forward tickets you wish to discuss further (or need more eyes on).
[20:37] <mhwood> The point is that if your statistics core were trashed, you could readily rebuild it.
[20:37] <terry-b> I linked the 2 other stats tickets to the one we started with: https://jira.duraspace.org/browse/DS-3454
[20:37] <kompewter> [ https://jira.duraspace.org/browse/DS-3454 ] - [DS-3454] Disable Legacy Usage Reports (log file based usage statistics) - DuraSpace JIRA
[20:37] <kompewter> [ [DS-3454] Disable Legacy Usage Reports (log file based usage statistics) - DuraSpace JIRA ] - https://jira.duraspace.org/browse/DS-3454
[20:37] <tdonohue> The list of "high priority" & "need code review" tickets are in our agenda at https://wiki.duraspace.org/display/DSPACE/DevMtg+2017-01-18
[20:37] <kompewter> [ DevMtg 2017-01-18 - DSpace - DuraSpace Wiki ] - https://wiki.duraspace.org/display/DSPACE/DevMtg+2017-01-18
[20:37] <tdonohue> Please review those lists and let us know if there are any you want discussed now
[20:37] <terry-b> I would like to discuss https://jira.duraspace.org/browse/DS-3457?filter=-2
[20:37] <kompewter> [ https://jira.duraspace.org/browse/DS-3457 ] - [DS-3457] Tomcat Restart Hangs after Sharding DSpace 6x Statistics - DuraSpace JIRA
[20:37] <kompewter> [ Log in - DuraSpace JIRA ] - https://jira.duraspace.org/browse/DS-3457?filter=-2
[20:39] <tdonohue> terry-b: does this relate in any way to the open PR around Sharding?
[20:39] <terry-b> I encountered it while testing that issue
[20:39] <terry-b> The sharding succeeds... all seems ok until you restart tomcat
[20:39] <terry-b> tom_desair, I wonder if you can recreate this issue
[20:40] <hpottinger> wait... this is a specific data situation that results in a failure to boot, not a general "if you shard your stats, tomcat won't boot" issue
[20:40] <terry-b> hpottinger, why do you say that
[20:40] <terry-b> I am saying that if you shard in DSpace 6, this problem appears to happen
[20:41] <tom_desair> I’ll try to reproduce the issue and see if I can fix it. My PR is pretty useless if it breaks Tomcat restarts.
[20:41] <tdonohue> terry-b: why does your Sharding process include those 1-3 steps where you are pasting in a different record & saving?
[20:41] <hpottinger> Hmm... I can't duplicate this. I have sharded stats with DSpace 6, and my repository is still running
[20:41] <tdonohue> If you simply run "stats-util -s" on DSpace 6, does this still occur?
[20:41] <terry-b> hpottinger, Have you restarted tomcat?
[20:41] <hpottinger> I have
[20:42] <terry-b> Hmm, I have one other user who saw the behavior: https://groups.google.com/forum/#!topic/dspace-tech/oKf7M6FoaLY
[20:42] <kompewter> [ Google Groups ] - https://groups.google.com/forum/#!topic/dspace-tech/oKf7M6FoaLY
[20:42] <terry-b> Unfortunately, the sharding is very difficult to re-test, so I have to follow my contrived example to reproduce the issue
[20:43] <tdonohue> true, ok
[20:43] <terry-b> I wish I had backed up my solr directories...
[20:44] <terry-b> But it is a test box
[20:44] <tdonohue> It sounds like this needs more testers. We need to see if others can reproduce this as reliably as terry-b...and if so or if not that may give us more clues (hopefully)
[20:44] <tom_desair> I’ll test with differnent stats cores to try to reproduce the issue consistently.
[20:44] <tdonohue> thanks tom_desair!
[20:44] <terry-b> tom_desair, thank you!
[20:45] <hpottinger> FYI, I have another meeting at the top of the hour
[20:45] <tdonohue> Ok, so another ticket to mention (briefly). Committers, I still need a volunteer for DS-3431 (currently access restricted to Committers)
[20:45] <kompewter> [ https://jira.duraspace.org/browse/DS-3431 ] - ('Unexpected error:', <type 'exceptions.AttributeError'>)
[20:46] <tdonohue> Any other tickets folks want to bring up here, in the last 15 mins?
[20:47] <tom_desair> I have a (junior) developer working on the collection sorting. We should be able to deliver that next week.
[20:47] <tdonohue> Thanks tom_desair! And thanks for all the recent work you (and your team) have been doing to help track down bug fixes, etc
[20:48] <hpottinger> DS-3287 has a rather definitive update
[20:48] <kompewter> [ https://jira.duraspace.org/browse/DS-3287 ] - [DS-3287] ElasticSearch fails (does not work at all) - DuraSpace JIRA
[20:48] <tom_desair> You can see it as a New Year's resolution ;-)
[20:48] <tdonohue> I already moved that back to "needs volunteer", hpottinger. So, yes, we need more help there in just getting it "basically working" again (for now)
[20:49] <tdonohue> I will mention again, I've been slowly trying to help chip away at the JIRA backlog (when I find a free 1/2 hour or so). I see Bram is doing the same these days. If others find time during the week, I'd again recommend helping out!
[20:50] <tdonohue> Part of my New Year's resolution is to try and get a "handle" on JIRA & PRs.... So far there's just so much of a backlog, that I haven't made much more than a dent...but that's where I'll be concentrating efforts in coming weeks.
[20:51] <hpottinger> I made this filter: https://jira.duraspace.org/browse/DS-2349?filter=13907
[20:51] <kompewter> [ Issue Navigator - DuraSpace JIRA ] - https://jira.duraspace.org/browse/DS-2349?filter=13907
[20:51] <kompewter> [ https://jira.duraspace.org/browse/DS-2349 ] - [DS-2349] Encoding issue with % in ItemTag.java - DuraSpace JIRA
[20:51] <tdonohue> hpottinger: I cannot see that filter it seems (you might need to share it). All I see is that single ticket
[20:51] <hpottinger> bah, one moment
[20:52] <terry-b> tdonohue, that is a good resolution. My motivation to contribute is dampened when PR's and tickets languish
[20:52] <hpottinger> try again
[20:53] <hpottinger> 305 issues with "affected version" that isn't one of our supported versions
[20:53] <tdonohue> hpottinger: now I see it. Wow, that's a long query ;) But, thanks. Yes, those are all our unresolved tickets for unsupported versions
[20:54] <tdonohue> Some of those likely need to just have "affectedVersions" updated...others though might be closeable
[20:54] <tdonohue> hpottinger: just realized your query is slightly wrong. It's 3.x and below that are "unsupported" (you included 4.x in there)
[20:55] <hpottinger> revised it
[20:55] <hpottinger> 239 issues
[20:57] <hpottinger> correction, revised it again, 204 issues
[20:58] <tdonohue> Ok, in any case, sounds like we have nothing else to discuss today.
[20:59] <tdonohue> So, I'm going to go ahead and close up our meeting. We'll see you next week! Don't forget, if you want to join us in Slack, go to https://goo.gl/forms/s70dh26zY2cSqn2K3 for an invite
[20:59] <kompewter> [ Request invite to DSpace.org Slack ] - https://goo.gl/forms/s70dh26zY2cSqn2K3
[20:59] <tom_desair> Ok, see you all next week!
[21:00] <terry-b> Thanks for letting me dominate the agenda for the day!
[21:01] * tom_desair (~tom@d8D874131.access.telenet.be) Quit (Quit: tom_desair)
[21:02] <mhwood> Slack invitation form worked for me.
[21:02] <tdonohue> mhwood: excellent. Glad to hear. Seems like so far, so good (haven't heard any issues yet)
[21:03] * hpottinger (~hpottinge@ Quit (Ping timeout: 260 seconds)
[22:08] * mhwood (mwood@mhw.ulib.iupui.edu) Quit (Remote host closed the connection)
[22:31] * tdonohue (~tdonohue@dspace/tdonohue) has left #duraspace
[22:41] * th5 (~th5@unaffiliated/th5) Quit ()
[23:09] * dyelar (~dyelar@biolinux.mrb.ku.edu) Quit (Quit: Leaving.)
[23:24] * jcreel (~jcreel@jcreel.tamu.edu) Quit (*.net *.split)
[23:24] * jcreel (~jcreel@jcreel.tamu.edu) has joined #duraspace

These logs were automatically created by DuraLogBot on irc.freenode.net using the Java IRC LogBot.